fix(systemd): prevent PID 1 generator-phase userdb self-deadlock#17791
fix(systemd): prevent PID 1 generator-phase userdb self-deadlock#17791bfjelds wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a downstream patch to the systemd component to fix a PID 1 self-deadlock that can freeze early boot on Azure Linux 4. During the synchronous generator phase, PID 1's event loop is paused, so it cannot answer its own io.systemd.DynamicUser userdb Varlink service; a generator that performs an nss-systemd group/user lookup (forced by the AZL4 group: files [SUCCESS=merge] systemd nsswitch config) blocks until the 90s generator timeout, after which PID 1 freezes. The fix sets SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the generator environment (build_generator_environment()), reusing the same bypass systemd already applies to the system D-Bus broker and DynamicUser= units.
Changes:
- Adds a vendored patch
0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patchthat exportsSYSTEMD_NSS_DYNAMIC_BYPASS=1for generators. - Wires the patch into
systemd.specwith an unnumberedPatch:directive inside the%if %{without upstream}downstream-only block.
Key concern: The change hand-edits the rendered spec (specs/s/systemd/systemd.spec, which is generated by azldev comp render and carries a "Do not edit manually" banner) and drops the patch file directly into the rendered tree, instead of staging it via base/comps/systemd/systemd.comp.toml overlays (file-add + spec-add-tag). As committed this will not survive a re-render and will fail the Check Rendered Specs CI check. The patch's technical analysis and C code are sound, and the Patch: style/placement and %autosetup -C -p1 application are consistent with the surrounding spec — the issue is purely that the change was made in the wrong place. The PR is also marked Draft with build/E2E validation still pending.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
specs/s/systemd/systemd.spec |
Registers the new Patch: directive — but edits the generated rendered spec directly instead of via a comp.toml overlay. |
specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch |
New vendored patch — but placed directly in the rendered specs tree instead of base/comps/systemd/ + file-add overlay. |
Azure Linux 4 systems can freeze during early boot, immediately after
switch-root, with PID 1 logging:
Failed to fork off sandboxing environment for executing generators: Protocol error
Freezing execution
This was observed reliably as a boot hang in Trident A/B update rollback
tests on AZL4, but the trigger is general and not specific to Trident.
Root cause:
PID 1 runs system generators synchronously - manager_startup() ->
manager_run_generators() -> manager_execute_generators() ->
execute_directories() forks a generator executor and blocks in
pidref_wait_for_terminate_and_check() until the batch finishes or the 90s
DEFAULT_TIMEOUT_USEC timeout fires. While blocked there PID 1's event loop
is not running, so PID 1 cannot answer its own Varlink IPC - including the
io.systemd.DynamicUser userdb service it exposes on
/run/systemd/userdb/io.systemd.DynamicUser.
If a generator performs an NSS group/user lookup routed through
nss-systemd, the lookup connects to that socket and blocks waiting for a
reply that cannot arrive until generators complete: a self-deadlock broken
only by the generator timeout, after which PID 1 freezes.
AZL4 hits this because its authselect-managed nsswitch.conf group line is
"files [SUCCESS=merge] systemd". [SUCCESS=merge] forces nss-systemd to be
consulted to merge memberships even when the group already exists in
/etc/group, so even a lookup of a static group such as systemd-network
results in a Varlink call into the busy PID 1. AZL3 is unaffected because
its active nsswitch group line is plain "files" (no systemd), so the lookup
reads /etc/group directly without any Varlink round-trip.
Evidence (strace of the netplan system generator at the hang):
connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45)
sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships",
"parameters":{"groupName":"systemd-network",
"service":"io.systemd.DynamicUser"}, ...})
epoll_wait(5, ...) <- blocks until SIGALRM at the 90s timeout
The bug is present in current upstream systemd (verified against main /
262~devel); the relevant code paths are unchanged from 258.
Fix:
systemd already exposes a $SYSTEMD_NSS_DYNAMIC_BYPASS environment variable
that nss-systemd honours (nss_glue_userdb_flags() adds
USERDB_EXCLUDE_DYNAMIC_USER, so the userdb client skips connecting to the
dynamic-user socket served by PID 1). It is already used inside PID 1 to
break exactly this class of deadlock: service.c sets EXEC_NSS_DYNAMIC_BYPASS
for the system D-Bus unit ("System D-Bus needs nss-systemd disabled, so that
we don't deadlock") and exec-invoke.c exports the same variable for
DynamicUser= units to avoid an nss-systemd feedback loop.
The patch applies the same, established mechanism to generators by setting
SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the environment PID 1 builds for them
(build_generator_environment() in src/core/manager.c). Static "files"
records and any already-running userdb services keep resolving; only the
dynamic-user source is excluded, and only for the generator phase. This is
the same patch we intend to upstream; it is carried here as a downstream
patch until it lands in a systemd release.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
65197b4 to
cab8829
Compare
Summary
Fixes a PID 1 self-deadlock during the systemd generator phase that can freeze early boot on Azure Linux 4.
Symptom (logged by PID 1 immediately after switch-root):
This was observed reliably as a boot hang in Trident A/B update rollback testing on AZL4, but the trigger is general and not specific to Trident.
Root cause
PID 1 runs system generators synchronously:
manager_startup()->manager_run_generators()->manager_execute_generators()->execute_directories()forks a generator executor and blocks until the batch finishes or the 90sDEFAULT_TIMEOUT_USECtimeout fires. While blocked there, PID 1's event loop is not running, so it cannot answer its own Varlink IPC — including theio.systemd.DynamicUseruserdb service it exposes on/run/systemd/userdb/io.systemd.DynamicUser.If a generator performs an NSS group/user lookup routed through
nss-systemd, the lookup connects to that socket and blocks waiting for a reply that cannot arrive until generators complete — a self-deadlock broken only by the generator timeout, after which PID 1 freezes.AZL4 hits this because its authselect-managed
nsswitch.confgroup line is:[SUCCESS=merge]forcesnss-systemdto be consulted (to merge memberships) even when the group already exists in/etc/group, so even a lookup of a static group such assystemd-networkresults in a Varlink call into the busy PID 1. AZL3 is unaffected because its activensswitch.confgroup line is plainfiles(nosystemd), so the lookup reads/etc/groupdirectly with no Varlink round-trip.Evidence (strace of the netplan system generator at the hang):
The bug is present in current upstream systemd (verified against
main/ 262~devel); the relevant code paths are unchanged from 258.Fix
systemd already exposes a
$SYSTEMD_NSS_DYNAMIC_BYPASSenvironment variable thatnss-systemdhonours (nss_glue_userdb_flags()addsUSERDB_EXCLUDE_DYNAMIC_USER, so the userdb client skips connecting to the dynamic-user socket served by PID 1). It is already used inside PID 1 to break exactly this class of deadlock:service.csetsEXEC_NSS_DYNAMIC_BYPASSfor the system D-Bus unit ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") andexec-invoke.cexportsSYSTEMD_NSS_DYNAMIC_BYPASS=1in its environment.exec-invoke.cexports the same variable forDynamicUser=units, to avoid annss-systemdfeedback loop.This change applies the same established mechanism to generators, setting
SYSTEMD_NSS_DYNAMIC_BYPASS=1in the environment PID 1 builds for them (build_generator_environment()insrc/core/manager.c). Staticfilesrecords and any already-running userdb services keep resolving; only the dynamic-user source is excluded, and only for the generator phase.The patch is vendored as
specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patchand wired intosystemd.spec. It is intended to be upstreamed; it is carried here as a downstream patch until it lands in a systemd release.Testing