From cab88297858eb67c3c0863cc14717ed6f811669e Mon Sep 17 00:00:00 2001 From: Brian Fjeldstad Date: Wed, 24 Jun 2026 01:50:03 +0000 Subject: [PATCH] fix(systemd): prevent PID 1 generator-phase userdb self-deadlock Azure Linux 4 systems can freeze during early boot, immediately after switch-root, with PID 1 logging: Failed to fork off sandboxing environment for executing generators: Protocol error Freezing execution This was observed reliably as a boot hang in Trident A/B update rollback tests on AZL4, but the trigger is general and not specific to Trident. Root cause: PID 1 runs system generators synchronously - manager_startup() -> manager_run_generators() -> manager_execute_generators() -> execute_directories() forks a generator executor and blocks in pidref_wait_for_terminate_and_check() until the batch finishes or the 90s DEFAULT_TIMEOUT_USEC timeout fires. While blocked there PID 1's event loop is not running, so PID 1 cannot answer its own Varlink IPC - including the io.systemd.DynamicUser userdb service it exposes on /run/systemd/userdb/io.systemd.DynamicUser. If a generator performs an NSS group/user lookup routed through nss-systemd, the lookup connects to that socket and blocks waiting for a reply that cannot arrive until generators complete: a self-deadlock broken only by the generator timeout, after which PID 1 freezes. AZL4 hits this because its authselect-managed nsswitch.conf group line is "files [SUCCESS=merge] systemd". [SUCCESS=merge] forces nss-systemd to be consulted to merge memberships even when the group already exists in /etc/group, so even a lookup of a static group such as systemd-network results in a Varlink call into the busy PID 1. AZL3 is unaffected because its active nsswitch group line is plain "files" (no systemd), so the lookup reads /etc/group directly without any Varlink round-trip. Evidence (strace of the netplan system generator at the hang): connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45) sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships", "parameters":{"groupName":"systemd-network", "service":"io.systemd.DynamicUser"}, ...}) epoll_wait(5, ...) <- blocks until SIGALRM at the 90s timeout The bug is present in current upstream systemd (verified against main / 262~devel); the relevant code paths are unchanged from 258. Fix: systemd already exposes a $SYSTEMD_NSS_DYNAMIC_BYPASS environment variable that nss-systemd honours (nss_glue_userdb_flags() adds USERDB_EXCLUDE_DYNAMIC_USER, so the userdb client skips connecting to the dynamic-user socket served by PID 1). It is already used inside PID 1 to break exactly this class of deadlock: service.c sets EXEC_NSS_DYNAMIC_BYPASS for the system D-Bus unit ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") and exec-invoke.c exports the same variable for DynamicUser= units to avoid an nss-systemd feedback loop. The patch applies the same, established mechanism to generators by setting SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the environment PID 1 builds for them (build_generator_environment() in src/core/manager.c). Static "files" records and any already-running userdb services keep resolving; only the dynamic-user source is excluded, and only for the generator phase. This is the same patch we intend to upstream; it is carried here as a downstream patch until it lands in a systemd release. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ...-generators-deadlock-on-PID-1-userdb.patch | 118 ++++++++++++++++++ base/comps/systemd/systemd.comp.toml | 5 + locks/systemd.lock | 2 +- ...-generators-deadlock-on-PID-1-userdb.patch | 118 ++++++++++++++++++ specs/s/systemd/systemd.spec | 6 +- 5 files changed, 247 insertions(+), 2 deletions(-) create mode 100644 base/comps/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch create mode 100644 specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch diff --git a/base/comps/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch b/base/comps/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch new file mode 100644 index 00000000000..759f3a8d0de --- /dev/null +++ b/base/comps/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch @@ -0,0 +1,118 @@ +From 150e75217f151e1b4d3ef01f089cef23c79e8228 Mon Sep 17 00:00:00 2001 +From: Brian Fjeldstad +Date: Wed, 24 Jun 2026 01:39:29 +0000 +Subject: [PATCH] core: don't let generators deadlock on PID 1's own userdb + service + +When PID 1 runs system generators it does so synchronously, from +manager_startup() -> manager_run_generators() -> manager_execute_generators() +-> execute_directories(), which forks a generator executor and blocks in +pidref_wait_for_terminate_and_check() until the whole batch finishes or the +DEFAULT_TIMEOUT_USEC (90s) timeout fires. While blocked there PID 1's event +loop does not run, so PID 1 cannot answer any of its own Varlink IPC - +including the io.systemd.DynamicUser userdb service it listens on at +/run/systemd/userdb/io.systemd.DynamicUser (see src/core/varlink.c). + +If a generator performs an NSS user/group lookup that is routed through +nss-systemd, the lookup connects to that socket and waits for a reply that +cannot arrive until generators complete. The result is a self-deadlock: the +generator blocks for the full timeout, the executor is then killed, and +PID 1 logs + + Failed to fork off sandboxing environment for executing generators: Protocol error + Freezing execution + +immediately after switch-root. + +This is easy to hit with the shipped factory nsswitch.conf, whose group line +is + + group: files [SUCCESS=merge] systemd + +The [SUCCESS=merge] action causes nss-systemd to be consulted (to merge +group memberships) even when the group already exists in /etc/group, so even +a lookup of a purely static group such as systemd-network results in a +GetGroupRecord/GetMemberships Varlink call into the busy PID 1. + +Reproducer: + - Boot a system whose nsswitch.conf group line is + "files [SUCCESS=merge] systemd" (the systemd factory default). + - Have a system generator resolve a group during early boot, e.g. the + netplan generator (/usr/lib/systemd/system-generators/netplan), which + looks up the "systemd-network" group. + - On a sufficiently slow machine the generator's lookup connects to + /run/systemd/userdb/io.systemd.DynamicUser and blocks; at the generator + timeout the manager freezes. + +strace of the wrapped generator shows the final syscalls: + + connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45) + sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships", + "parameters":{"groupName":"systemd-network", + "service":"io.systemd.DynamicUser"}, ...}) + epoll_wait(5, ...) <- blocks until SIGALRM + +We already have machinery for exactly this situation. nss-systemd honours a +$SYSTEMD_NSS_DYNAMIC_BYPASS environment variable: when set, +nss_glue_userdb_flags() adds USERDB_EXCLUDE_DYNAMIC_USER so the userdb client +skips connecting to /run/systemd/userdb/io.systemd.DynamicUser entirely. It +is already used in two places to avoid deadlocking against / looping back +into PID 1: + + - For the system D-Bus broker: service.c sets EXEC_NSS_DYNAMIC_BYPASS for + the dbus unit ("System D-Bus needs nss-systemd disabled, so that we don't + deadlock") and exec-invoke.c then exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in + its environment, because nss-systemd relies on blocking Varlink calls back + to a PID 1 that is itself waiting on D-Bus. + - For DynamicUser= units: exec-invoke.c exports the same variable while + PID 1 sets the unit up, to avoid an nss-systemd feedback loop with + ourselves. + +Generators are the same class of problem - code that PID 1 must wait on, +performing NSS lookups that route back to a PID 1 which cannot answer - so +apply the same bypass for them: set $SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the +environment built by build_generator_environment(). Static records (nss +"files") and any userdb services that happen to already be running continue +to resolve; only the dynamic-user source provided by the still-busy PID 1 is +skipped, and only for the duration of the generator phase. + +env-generators run even earlier and share the same exposure, but do not +perform NSS lookups in practice, so they are left unchanged here. +--- + src/core/manager.c | 20 ++++++++++++++++++++ + 1 file changed, 20 insertions(+) + +diff --git a/src/core/manager.c b/src/core/manager.c +index 015f575ac1..ffa56cc05c 100644 +--- a/src/core/manager.c ++++ b/src/core/manager.c +@@ -4362,6 +4362,26 @@ static int build_generator_environment(Manager *m, char ***ret) { + if (r < 0) + return r; + ++ /* PID 1 runs generators synchronously: while blocked inside manager_run_generators() its event loop ++ * is not running, so it cannot answer its own Varlink IPC - including the io.systemd.DynamicUser ++ * userdb service it exposes on /run/systemd/userdb/. If a generator performs an NSS lookup that ++ * resolves through nss-systemd (e.g. getgrnam() for a group with the "[SUCCESS=merge] systemd" ++ * nsswitch.conf default), the lookup connects to that socket and blocks waiting for a reply that ++ * cannot arrive until the generators finish - a self-deadlock broken only by the generator timeout ++ * (DEFAULT_TIMEOUT_USEC), after which the manager freezes. ++ * ++ * This is the same class of deadlock we already break for the system D-Bus broker, which nss-systemd ++ * depends on yet PID 1 must wait for: service.c sets EXEC_NSS_DYNAMIC_BYPASS for the dbus unit ++ * ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") and exec-invoke.c then ++ * exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in its environment; the same variable is also set in ++ * exec-invoke.c when PID 1 sets up a DynamicUser= unit, to avoid an nss-systemd feedback loop. Apply ++ * the same bypass for generators. nss-systemd honours the variable (nss_glue_userdb_flags() -> ++ * USERDB_EXCLUDE_DYNAMIC_USER) and skips the dynamic-user source served by the busy PID 1; static ++ * "files" records and any already-running userdb services still resolve. */ ++ r = strv_env_assign(&nl, "SYSTEMD_NSS_DYNAMIC_BYPASS", "1"); ++ if (r < 0) ++ return r; ++ + *ret = TAKE_PTR(nl); + return 0; + } +-- +2.49.0 + diff --git a/base/comps/systemd/systemd.comp.toml b/base/comps/systemd/systemd.comp.toml index 25259e34c6d..bb549cc10ca 100644 --- a/base/comps/systemd/systemd.comp.toml +++ b/base/comps/systemd/systemd.comp.toml @@ -52,3 +52,8 @@ description = "Remove build params reducing default timeout to 45s; leaving upst type = "spec-search-replace" regex = '-Ddefault-user-timeout-sec=45' replacement = "#-Ddefault-user-timeout-sec=45" + +[[components.systemd.overlays]] +description = "AzureLinux: avoid PID 1 self-deadlock during the generator phase (nss-systemd userdb lookup). Seen as a boot freeze on AZL4 during Trident A/B rollback. Submitted upstream." +type = "patch-add" +source = "0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch" diff --git a/locks/systemd.lock b/locks/systemd.lock index 8ced0771134..1cfde462504 100644 --- a/locks/systemd.lock +++ b/locks/systemd.lock @@ -2,5 +2,5 @@ version = 1 import-commit = '5218dd0c26aa860bf163e326fa9733c03e8b381f' upstream-commit = '5218dd0c26aa860bf163e326fa9733c03e8b381f' -input-fingerprint = 'sha256:4b25a8183b9190b878b7d06e43d90f95eb92df4c849ef6a107ba5403a51f7c23' +input-fingerprint = 'sha256:23e03a08db64d0457b033c07148a2a71cf98e2b3af4b4b23cedd9455595772f6' resolution-input-hash = 'sha256:466421704711c4fd3c71f0b2ed715a0e61d49e3e26f3a2637fee755795849c8e' diff --git a/specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch b/specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch new file mode 100644 index 00000000000..759f3a8d0de --- /dev/null +++ b/specs/s/systemd/0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch @@ -0,0 +1,118 @@ +From 150e75217f151e1b4d3ef01f089cef23c79e8228 Mon Sep 17 00:00:00 2001 +From: Brian Fjeldstad +Date: Wed, 24 Jun 2026 01:39:29 +0000 +Subject: [PATCH] core: don't let generators deadlock on PID 1's own userdb + service + +When PID 1 runs system generators it does so synchronously, from +manager_startup() -> manager_run_generators() -> manager_execute_generators() +-> execute_directories(), which forks a generator executor and blocks in +pidref_wait_for_terminate_and_check() until the whole batch finishes or the +DEFAULT_TIMEOUT_USEC (90s) timeout fires. While blocked there PID 1's event +loop does not run, so PID 1 cannot answer any of its own Varlink IPC - +including the io.systemd.DynamicUser userdb service it listens on at +/run/systemd/userdb/io.systemd.DynamicUser (see src/core/varlink.c). + +If a generator performs an NSS user/group lookup that is routed through +nss-systemd, the lookup connects to that socket and waits for a reply that +cannot arrive until generators complete. The result is a self-deadlock: the +generator blocks for the full timeout, the executor is then killed, and +PID 1 logs + + Failed to fork off sandboxing environment for executing generators: Protocol error + Freezing execution + +immediately after switch-root. + +This is easy to hit with the shipped factory nsswitch.conf, whose group line +is + + group: files [SUCCESS=merge] systemd + +The [SUCCESS=merge] action causes nss-systemd to be consulted (to merge +group memberships) even when the group already exists in /etc/group, so even +a lookup of a purely static group such as systemd-network results in a +GetGroupRecord/GetMemberships Varlink call into the busy PID 1. + +Reproducer: + - Boot a system whose nsswitch.conf group line is + "files [SUCCESS=merge] systemd" (the systemd factory default). + - Have a system generator resolve a group during early boot, e.g. the + netplan generator (/usr/lib/systemd/system-generators/netplan), which + looks up the "systemd-network" group. + - On a sufficiently slow machine the generator's lookup connects to + /run/systemd/userdb/io.systemd.DynamicUser and blocks; at the generator + timeout the manager freezes. + +strace of the wrapped generator shows the final syscalls: + + connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45) + sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships", + "parameters":{"groupName":"systemd-network", + "service":"io.systemd.DynamicUser"}, ...}) + epoll_wait(5, ...) <- blocks until SIGALRM + +We already have machinery for exactly this situation. nss-systemd honours a +$SYSTEMD_NSS_DYNAMIC_BYPASS environment variable: when set, +nss_glue_userdb_flags() adds USERDB_EXCLUDE_DYNAMIC_USER so the userdb client +skips connecting to /run/systemd/userdb/io.systemd.DynamicUser entirely. It +is already used in two places to avoid deadlocking against / looping back +into PID 1: + + - For the system D-Bus broker: service.c sets EXEC_NSS_DYNAMIC_BYPASS for + the dbus unit ("System D-Bus needs nss-systemd disabled, so that we don't + deadlock") and exec-invoke.c then exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in + its environment, because nss-systemd relies on blocking Varlink calls back + to a PID 1 that is itself waiting on D-Bus. + - For DynamicUser= units: exec-invoke.c exports the same variable while + PID 1 sets the unit up, to avoid an nss-systemd feedback loop with + ourselves. + +Generators are the same class of problem - code that PID 1 must wait on, +performing NSS lookups that route back to a PID 1 which cannot answer - so +apply the same bypass for them: set $SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the +environment built by build_generator_environment(). Static records (nss +"files") and any userdb services that happen to already be running continue +to resolve; only the dynamic-user source provided by the still-busy PID 1 is +skipped, and only for the duration of the generator phase. + +env-generators run even earlier and share the same exposure, but do not +perform NSS lookups in practice, so they are left unchanged here. +--- + src/core/manager.c | 20 ++++++++++++++++++++ + 1 file changed, 20 insertions(+) + +diff --git a/src/core/manager.c b/src/core/manager.c +index 015f575ac1..ffa56cc05c 100644 +--- a/src/core/manager.c ++++ b/src/core/manager.c +@@ -4362,6 +4362,26 @@ static int build_generator_environment(Manager *m, char ***ret) { + if (r < 0) + return r; + ++ /* PID 1 runs generators synchronously: while blocked inside manager_run_generators() its event loop ++ * is not running, so it cannot answer its own Varlink IPC - including the io.systemd.DynamicUser ++ * userdb service it exposes on /run/systemd/userdb/. If a generator performs an NSS lookup that ++ * resolves through nss-systemd (e.g. getgrnam() for a group with the "[SUCCESS=merge] systemd" ++ * nsswitch.conf default), the lookup connects to that socket and blocks waiting for a reply that ++ * cannot arrive until the generators finish - a self-deadlock broken only by the generator timeout ++ * (DEFAULT_TIMEOUT_USEC), after which the manager freezes. ++ * ++ * This is the same class of deadlock we already break for the system D-Bus broker, which nss-systemd ++ * depends on yet PID 1 must wait for: service.c sets EXEC_NSS_DYNAMIC_BYPASS for the dbus unit ++ * ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") and exec-invoke.c then ++ * exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in its environment; the same variable is also set in ++ * exec-invoke.c when PID 1 sets up a DynamicUser= unit, to avoid an nss-systemd feedback loop. Apply ++ * the same bypass for generators. nss-systemd honours the variable (nss_glue_userdb_flags() -> ++ * USERDB_EXCLUDE_DYNAMIC_USER) and skips the dynamic-user source served by the busy PID 1; static ++ * "files" records and any already-running userdb services still resolve. */ ++ r = strv_env_assign(&nl, "SYSTEMD_NSS_DYNAMIC_BYPASS", "1"); ++ if (r < 0) ++ return r; ++ + *ret = TAKE_PTR(nl); + return 0; + } +-- +2.49.0 + diff --git a/specs/s/systemd/systemd.spec b/specs/s/systemd/systemd.spec index 31b54a1ad55..eac806f7402 100644 --- a/specs/s/systemd/systemd.spec +++ b/specs/s/systemd/systemd.spec @@ -2,7 +2,7 @@ ## (rpmautospec version 0.8.3) ## RPMAUTOSPEC: autorelease, autochangelog %define autorelease(e:s:pb:n) %{?-p:0.}%{lua: - release_number = 4; + release_number = 5; base_release_number = tonumber(rpm.expand("%{?-b*}%{!?-b:1}")); print(release_number + base_release_number - 1); }%{?-e:.%{-e*}}%{?-s:.%{-s*}}%{!?-n:%{?dist}} @@ -390,6 +390,7 @@ Recommends: libkmod.so.2(LIBKMOD_5)%{?elf_bits} Recommends: libarchive.so.13%{?elf_suffix} +Patch7: 0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch %description systemd is a system and service manager that runs as PID 1 and starts the rest of the system. It provides aggressive parallelization capabilities, uses socket @@ -1577,6 +1578,9 @@ rm -rf \ %changelog ## START: Generated by rpmautospec +* Wed Jun 24 2026 Brian Fjeldstad - 258.4-5 +- fix(systemd): prevent PID 1 generator-phase userdb self-deadlock + * Tue May 12 2026 Dan Streetman - 258.4-4 - fix(systemd): restore default service/device timeout to upstream default of 90s