Skip to content

GUACAMOLE-2265: Fix worker process leak after client disconnect.#677

Open
escra wants to merge 1 commit into
apache:mainfrom
ESCRA-GmbH:fix/worker-process-leak-main
Open

GUACAMOLE-2265: Fix worker process leak after client disconnect.#677
escra wants to merge 1 commit into
apache:mainfrom
ESCRA-GmbH:fix/worker-process-leak-main

Conversation

@escra

@escra escra commented Jun 10, 2026

Copy link
Copy Markdown

This is the main-targeted forward-port of #662 (GUACAMOLE-2265). The original PR targets the frozen staging/1.6.1 branch; this applies the same fix on top of current main (HEAD e775052).

Problem

A guacd worker process could leak after client disconnect: the blocking guacd_recv_fd() loop in guacd_exec_proc() never returns when the connection drops without a clean FD close, leaving the child alive in the process map.

Fix

  • Replace the blocking receive loop with a poll()-based loop with idle timeout, plus reconnect-grace and max-idle handling (GUACD_PROC_IDLE_TIMEOUT_MS, GUACD_PROC_RECONNECT_GRACE_MS, GUACD_PROC_MAX_IDLE_MS).
  • Handle POLLERR/POLLHUP/POLLNVAL.
  • Add pthread_create() failure handling in guacd_proc_add_user() (close fd, free params, stop proc).

Verification

Builds cleanly on main; guacd links successfully. Substance is identical to #662 (verified diff parity).

Worker processes are never terminated after a client disconnects when
the normal exit path is blocked, leaking one guacd child per affected
connection until the container is restarted.

The worker's main loop blocked indefinitely in recvmsg() waiting for new
user file descriptors on fd_socket and only woke up when a new user
joined, so it could not notice that the client had stopped and no further
users would ever arrive. Replace the blocking receive with a poll() that
wakes on a fixed idle interval (GUACD_PROC_IDLE_TIMEOUT_MS) and decides
whether to keep waiting or exit based on client state and connected user
count. After the last user disconnects a short reconnect grace period
(GUACD_PROC_RECONNECT_GRACE_MS) is honoured so a browser refresh or tab
re-open can rejoin the existing session, and an absolute safety net
(GUACD_PROC_MAX_IDLE_MS) force-exits a worker whose teardown is wedged
(e.g. guac_argv_await stuck in the FreeRDP authenticate callback).

The absolute safety net is gated on connected_users == 0 and idle_cycles
is reset while users are connected. The poll loop only observes new user
FDs on fd_socket; an already-connected single-user session generates no
new FDs, so without this gating a healthy active session accumulated idle
cycles and was force-terminated at the GUACD_PROC_MAX_IDLE_MS mark. The
safety net now measures true post-disconnect idle time and can never fire
on a live session, while still catching the blocked-teardown case.

If pthread_create() for a user thread fails, the connection is closed and
the worker is stopped so a failed thread launch cannot leak the worker.

Verified with a k6 soak (143s) and an interactive session (285s) that an
actively connected session survives well past GUACD_PROC_MAX_IDLE_MS with
no premature disconnect and no 'exceeded maximum idle time' warning, and
with a deterministic reproducer that holds one active single-user session
past the safety-net threshold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant