Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
From 150e75217f151e1b4d3ef01f089cef23c79e8228 Mon Sep 17 00:00:00 2001
From: Brian Fjeldstad <bfjelds@microsoft.com>
Date: Wed, 24 Jun 2026 01:39:29 +0000
Subject: [PATCH] core: don't let generators deadlock on PID 1's own userdb
service

When PID 1 runs system generators it does so synchronously, from
manager_startup() -> manager_run_generators() -> manager_execute_generators()
-> execute_directories(), which forks a generator executor and blocks in
pidref_wait_for_terminate_and_check() until the whole batch finishes or the
DEFAULT_TIMEOUT_USEC (90s) timeout fires. While blocked there PID 1's event
loop does not run, so PID 1 cannot answer any of its own Varlink IPC -
including the io.systemd.DynamicUser userdb service it listens on at
/run/systemd/userdb/io.systemd.DynamicUser (see src/core/varlink.c).

If a generator performs an NSS user/group lookup that is routed through
nss-systemd, the lookup connects to that socket and waits for a reply that
cannot arrive until generators complete. The result is a self-deadlock: the
generator blocks for the full timeout, the executor is then killed, and
PID 1 logs

Failed to fork off sandboxing environment for executing generators: Protocol error
Freezing execution

immediately after switch-root.

This is easy to hit with the shipped factory nsswitch.conf, whose group line
is

group: files [SUCCESS=merge] systemd

The [SUCCESS=merge] action causes nss-systemd to be consulted (to merge
group memberships) even when the group already exists in /etc/group, so even
a lookup of a purely static group such as systemd-network results in a
GetGroupRecord/GetMemberships Varlink call into the busy PID 1.

Reproducer:
- Boot a system whose nsswitch.conf group line is
"files [SUCCESS=merge] systemd" (the systemd factory default).
- Have a system generator resolve a group during early boot, e.g. the
netplan generator (/usr/lib/systemd/system-generators/netplan), which
looks up the "systemd-network" group.
- On a sufficiently slow machine the generator's lookup connects to
/run/systemd/userdb/io.systemd.DynamicUser and blocks; at the generator
timeout the manager freezes.

strace of the wrapped generator shows the final syscalls:

connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45)
sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships",
"parameters":{"groupName":"systemd-network",
"service":"io.systemd.DynamicUser"}, ...})
epoll_wait(5, ...) <- blocks until SIGALRM

We already have machinery for exactly this situation. nss-systemd honours a
$SYSTEMD_NSS_DYNAMIC_BYPASS environment variable: when set,
nss_glue_userdb_flags() adds USERDB_EXCLUDE_DYNAMIC_USER so the userdb client
skips connecting to /run/systemd/userdb/io.systemd.DynamicUser entirely. It
is already used in two places to avoid deadlocking against / looping back
into PID 1:

- For the system D-Bus broker: service.c sets EXEC_NSS_DYNAMIC_BYPASS for
the dbus unit ("System D-Bus needs nss-systemd disabled, so that we don't
deadlock") and exec-invoke.c then exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in
its environment, because nss-systemd relies on blocking Varlink calls back
to a PID 1 that is itself waiting on D-Bus.
- For DynamicUser= units: exec-invoke.c exports the same variable while
PID 1 sets the unit up, to avoid an nss-systemd feedback loop with
ourselves.

Generators are the same class of problem - code that PID 1 must wait on,
performing NSS lookups that route back to a PID 1 which cannot answer - so
apply the same bypass for them: set $SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the
environment built by build_generator_environment(). Static records (nss
"files") and any userdb services that happen to already be running continue
to resolve; only the dynamic-user source provided by the still-busy PID 1 is
skipped, and only for the duration of the generator phase.

env-generators run even earlier and share the same exposure, but do not
perform NSS lookups in practice, so they are left unchanged here.
---
src/core/manager.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/src/core/manager.c b/src/core/manager.c
index 015f575ac1..ffa56cc05c 100644
--- a/src/core/manager.c
+++ b/src/core/manager.c
@@ -4362,6 +4362,26 @@ static int build_generator_environment(Manager *m, char ***ret) {
if (r < 0)
return r;

+ /* PID 1 runs generators synchronously: while blocked inside manager_run_generators() its event loop
+ * is not running, so it cannot answer its own Varlink IPC - including the io.systemd.DynamicUser
+ * userdb service it exposes on /run/systemd/userdb/. If a generator performs an NSS lookup that
+ * resolves through nss-systemd (e.g. getgrnam() for a group with the "[SUCCESS=merge] systemd"
+ * nsswitch.conf default), the lookup connects to that socket and blocks waiting for a reply that
+ * cannot arrive until the generators finish - a self-deadlock broken only by the generator timeout
+ * (DEFAULT_TIMEOUT_USEC), after which the manager freezes.
+ *
+ * This is the same class of deadlock we already break for the system D-Bus broker, which nss-systemd
+ * depends on yet PID 1 must wait for: service.c sets EXEC_NSS_DYNAMIC_BYPASS for the dbus unit
+ * ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") and exec-invoke.c then
+ * exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in its environment; the same variable is also set in
+ * exec-invoke.c when PID 1 sets up a DynamicUser= unit, to avoid an nss-systemd feedback loop. Apply
+ * the same bypass for generators. nss-systemd honours the variable (nss_glue_userdb_flags() ->
+ * USERDB_EXCLUDE_DYNAMIC_USER) and skips the dynamic-user source served by the busy PID 1; static
+ * "files" records and any already-running userdb services still resolve. */
+ r = strv_env_assign(&nl, "SYSTEMD_NSS_DYNAMIC_BYPASS", "1");
+ if (r < 0)
+ return r;
+
*ret = TAKE_PTR(nl);
return 0;
}
--
2.49.0

5 changes: 5 additions & 0 deletions base/comps/systemd/systemd.comp.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,8 @@ description = "Remove build params reducing default timeout to 45s; leaving upst
type = "spec-search-replace"
regex = '-Ddefault-user-timeout-sec=45'
replacement = "#-Ddefault-user-timeout-sec=45"

[[components.systemd.overlays]]
description = "AzureLinux: avoid PID 1 self-deadlock during the generator phase (nss-systemd userdb lookup). Seen as a boot freeze on AZL4 during Trident A/B rollback. Submitted upstream."
type = "patch-add"
source = "0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch"
2 changes: 1 addition & 1 deletion locks/systemd.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
version = 1
import-commit = '5218dd0c26aa860bf163e326fa9733c03e8b381f'
upstream-commit = '5218dd0c26aa860bf163e326fa9733c03e8b381f'
input-fingerprint = 'sha256:4b25a8183b9190b878b7d06e43d90f95eb92df4c849ef6a107ba5403a51f7c23'
input-fingerprint = 'sha256:23e03a08db64d0457b033c07148a2a71cf98e2b3af4b4b23cedd9455595772f6'
resolution-input-hash = 'sha256:466421704711c4fd3c71f0b2ed715a0e61d49e3e26f3a2637fee755795849c8e'
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
From 150e75217f151e1b4d3ef01f089cef23c79e8228 Mon Sep 17 00:00:00 2001
Comment thread
bfjelds marked this conversation as resolved.
From: Brian Fjeldstad <bfjelds@microsoft.com>
Date: Wed, 24 Jun 2026 01:39:29 +0000
Subject: [PATCH] core: don't let generators deadlock on PID 1's own userdb
service

When PID 1 runs system generators it does so synchronously, from
manager_startup() -> manager_run_generators() -> manager_execute_generators()
-> execute_directories(), which forks a generator executor and blocks in
pidref_wait_for_terminate_and_check() until the whole batch finishes or the
DEFAULT_TIMEOUT_USEC (90s) timeout fires. While blocked there PID 1's event
loop does not run, so PID 1 cannot answer any of its own Varlink IPC -
including the io.systemd.DynamicUser userdb service it listens on at
/run/systemd/userdb/io.systemd.DynamicUser (see src/core/varlink.c).

If a generator performs an NSS user/group lookup that is routed through
nss-systemd, the lookup connects to that socket and waits for a reply that
cannot arrive until generators complete. The result is a self-deadlock: the
generator blocks for the full timeout, the executor is then killed, and
PID 1 logs

Failed to fork off sandboxing environment for executing generators: Protocol error
Freezing execution

immediately after switch-root.

This is easy to hit with the shipped factory nsswitch.conf, whose group line
is

group: files [SUCCESS=merge] systemd

The [SUCCESS=merge] action causes nss-systemd to be consulted (to merge
group memberships) even when the group already exists in /etc/group, so even
a lookup of a purely static group such as systemd-network results in a
GetGroupRecord/GetMemberships Varlink call into the busy PID 1.

Reproducer:
- Boot a system whose nsswitch.conf group line is
"files [SUCCESS=merge] systemd" (the systemd factory default).
- Have a system generator resolve a group during early boot, e.g. the
netplan generator (/usr/lib/systemd/system-generators/netplan), which
looks up the "systemd-network" group.
- On a sufficiently slow machine the generator's lookup connects to
/run/systemd/userdb/io.systemd.DynamicUser and blocks; at the generator
timeout the manager freezes.

strace of the wrapped generator shows the final syscalls:

connect(4, {AF_UNIX, "/run/systemd/userdb/io.systemd.DynamicUser"}, 45)
sendto(4, {"method":"io.systemd.UserDatabase.GetMemberships",
"parameters":{"groupName":"systemd-network",
"service":"io.systemd.DynamicUser"}, ...})
epoll_wait(5, ...) <- blocks until SIGALRM

We already have machinery for exactly this situation. nss-systemd honours a
$SYSTEMD_NSS_DYNAMIC_BYPASS environment variable: when set,
nss_glue_userdb_flags() adds USERDB_EXCLUDE_DYNAMIC_USER so the userdb client
skips connecting to /run/systemd/userdb/io.systemd.DynamicUser entirely. It
is already used in two places to avoid deadlocking against / looping back
into PID 1:

- For the system D-Bus broker: service.c sets EXEC_NSS_DYNAMIC_BYPASS for
the dbus unit ("System D-Bus needs nss-systemd disabled, so that we don't
deadlock") and exec-invoke.c then exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in
its environment, because nss-systemd relies on blocking Varlink calls back
to a PID 1 that is itself waiting on D-Bus.
- For DynamicUser= units: exec-invoke.c exports the same variable while
PID 1 sets the unit up, to avoid an nss-systemd feedback loop with
ourselves.

Generators are the same class of problem - code that PID 1 must wait on,
performing NSS lookups that route back to a PID 1 which cannot answer - so
apply the same bypass for them: set $SYSTEMD_NSS_DYNAMIC_BYPASS=1 in the
environment built by build_generator_environment(). Static records (nss
"files") and any userdb services that happen to already be running continue
to resolve; only the dynamic-user source provided by the still-busy PID 1 is
skipped, and only for the duration of the generator phase.

env-generators run even earlier and share the same exposure, but do not
perform NSS lookups in practice, so they are left unchanged here.
---
src/core/manager.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/src/core/manager.c b/src/core/manager.c
index 015f575ac1..ffa56cc05c 100644
--- a/src/core/manager.c
+++ b/src/core/manager.c
@@ -4362,6 +4362,26 @@ static int build_generator_environment(Manager *m, char ***ret) {
if (r < 0)
return r;

+ /* PID 1 runs generators synchronously: while blocked inside manager_run_generators() its event loop
+ * is not running, so it cannot answer its own Varlink IPC - including the io.systemd.DynamicUser
+ * userdb service it exposes on /run/systemd/userdb/. If a generator performs an NSS lookup that
+ * resolves through nss-systemd (e.g. getgrnam() for a group with the "[SUCCESS=merge] systemd"
+ * nsswitch.conf default), the lookup connects to that socket and blocks waiting for a reply that
+ * cannot arrive until the generators finish - a self-deadlock broken only by the generator timeout
+ * (DEFAULT_TIMEOUT_USEC), after which the manager freezes.
+ *
+ * This is the same class of deadlock we already break for the system D-Bus broker, which nss-systemd
+ * depends on yet PID 1 must wait for: service.c sets EXEC_NSS_DYNAMIC_BYPASS for the dbus unit
+ * ("System D-Bus needs nss-systemd disabled, so that we don't deadlock") and exec-invoke.c then
+ * exports SYSTEMD_NSS_DYNAMIC_BYPASS=1 in its environment; the same variable is also set in
+ * exec-invoke.c when PID 1 sets up a DynamicUser= unit, to avoid an nss-systemd feedback loop. Apply
+ * the same bypass for generators. nss-systemd honours the variable (nss_glue_userdb_flags() ->
+ * USERDB_EXCLUDE_DYNAMIC_USER) and skips the dynamic-user source served by the busy PID 1; static
+ * "files" records and any already-running userdb services still resolve. */
+ r = strv_env_assign(&nl, "SYSTEMD_NSS_DYNAMIC_BYPASS", "1");
+ if (r < 0)
+ return r;
+
*ret = TAKE_PTR(nl);
return 0;
}
--
2.49.0

6 changes: 5 additions & 1 deletion specs/s/systemd/systemd.spec
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
## (rpmautospec version 0.8.3)
## RPMAUTOSPEC: autorelease, autochangelog
%define autorelease(e:s:pb:n) %{?-p:0.}%{lua:
release_number = 4;
release_number = 5;
base_release_number = tonumber(rpm.expand("%{?-b*}%{!?-b:1}"));
print(release_number + base_release_number - 1);
}%{?-e:.%{-e*}}%{?-s:.%{-s*}}%{!?-n:%{?dist}}
Expand Down Expand Up @@ -390,6 +390,7 @@ Recommends: libkmod.so.2(LIBKMOD_5)%{?elf_bits}

Recommends: libarchive.so.13%{?elf_suffix}

Patch7: 0001-core-do-not-let-generators-deadlock-on-PID-1-userdb.patch
%description
systemd is a system and service manager that runs as PID 1 and starts the rest
of the system. It provides aggressive parallelization capabilities, uses socket
Expand Down Expand Up @@ -1577,6 +1578,9 @@ rm -rf \

%changelog
## START: Generated by rpmautospec
* Wed Jun 24 2026 Brian Fjeldstad <bfjelds@microsoft.com> - 258.4-5
- fix(systemd): prevent PID 1 generator-phase userdb self-deadlock

* Tue May 12 2026 Dan Streetman <ddstreet@ieee.org> - 258.4-4
- fix(systemd): restore default service/device timeout to upstream default
of 90s
Expand Down
Loading