Skip to content

feat(guestos): build patched kernel with folio_split() race fix#9953

Draft
basvandijk wants to merge 6 commits intomasterfrom
bas/guestos-kernel-folio-split-fix
Draft

feat(guestos): build patched kernel with folio_split() race fix#9953
basvandijk wants to merge 6 commits intomasterfrom
bas/guestos-kernel-folio-split-fix

Conversation

@basvandijk
Copy link
Copy Markdown
Collaborator

@basvandijk basvandijk commented Apr 20, 2026

Builds a custom linux-hwe-6.17 kernel in the GuestOS base image that includes the upstream fix for a pagecache folio_split() / folio_try_get() race that has been observed on IC nodes.

Background

  • Upstream commit: 577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0 ("mm/huge_memory: fix a folio_split() race condition with folio_try_get()") by Zi Yan, merged into mainline on 2026-03-04; first tagged release v7.0-rc4.
  • Backport to linux-6.18.y: already applied (2026-03-25).
  • Ubuntu linux-hwe-6.17 on noble (24.04): does NOT yet contain the fix — verified against both 6.17.0-22.22~24.04.1 (currently installed) and the proposed 6.17.0-24.24~24.04.1 via a grep of their upload diff. The fix does not appear in Ubuntu's stable-queue either.
  • Reproducer: https://github.com/dfinity/thp-madv-remove-test.

The bug was introduced by commit 00527733d0dc in kernel 6.14 (folio_split() feature), so it affects the HWE kernel we currently ship.

Changes

  • New kernel-build stage in ic-os/guestos/context/Dockerfile.base:
    • apt-get source linux-hwe-6.17, applies every *.patch file under kernel-patches/ in lexicographic order.
    • Bumps the Debian changelog with a +dfinity local suffix so the resulting kernel is identifiable via uname -r.
    • Builds only the amd64 generic flavor .debs with skipdbg=true skipretpoline=true to cut build time.
  • Main stage installs the locally built linux-image-unsigned-*-generic, linux-modules-*-generic and linux-modules-extra-*-generic .debs instead of pulling linux-image-virtual-hwe-24.04 from apt, keeping the kernel and extra modules in sync by construction.
  • New directory ic-os/guestos/context/kernel-patches/ with:
    • 0001-mm-huge_memory-fix-folio_split-race-condition.patch — the patch Zi Yan confirmed applies cleanly to 6.17.13.
    • README.md documenting conventions and how to drop custom patches when they land upstream.

Removing this customization later

Once Ubuntu ships the fix (either as a new linux-hwe-6.17 SRU or via a future linux-hwe-6.18 / -7.0 HWE track on noble), delete the patch file(s) under kernel-patches/. With no patches, the kernel-build stage still runs and produces a repackaged stock kernel; if we prefer to remove the custom build entirely at that point, revert this commit.

Testing

  • CI will exercise the kernel-build stage as part of the base image build.
  • Local smoke tests (optional):
    • Build only the kernel-build stage: docker build --target kernel-build -f ic-os/guestos/context/Dockerfile.base ic-os/guestos/context.
    • Inspect /debs inside the resulting image for three +dfinity1 .debs.
    • Boot a GuestOS dev image and check uname -r ends in +dfinity1.
  • Functional test against the reporter's reproducer at https://github.com/dfinity/thp-madv-remove-test.

Notes for reviewers

  • The kernel build adds ~20–40 min to the deploy-guest-os-baseimg job.
  • The kernel-build stage requires network access during the bazel action (for apt-get source / build-dep). If the sandbox blocks egress, the rule will need appropriate tags.
  • GuestOS signs kernels as part of the IC image build (see ic-os/guestos/docs/DiskLayout.adoc), so we deliberately consume the linux-image-unsigned-* flavor.

Build a custom linux-hwe-6.17 kernel in the GuestOS base image that
includes upstream commit 577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0
("mm/huge_memory: fix a folio_split() race condition with
folio_try_get()"). The fix is present in mainline v7.0-rc4+ and in
linux-6.18.y but has not yet reached Ubuntu's linux-hwe-6.17 package
on noble (24.04).

Changes:

* Add a new 'kernel-build' stage to Dockerfile.base that apt-get
  sources linux-hwe-6.17, applies every *.patch file under
  kernel-patches/ in lexicographic order, bumps the Debian changelog
  with a +dfinity local version, and builds only the amd64 'generic'
  flavor .deb packages (skipdbg=true skipretpoline=true).

* The main stage no longer installs the kernel via
  linux-image-virtual-hwe-24.04 + apt-cache depends; it copies the
  locally built .debs and installs them directly, keeping
  linux-image / linux-modules / linux-modules-extra in sync.

* Add kernel-patches/0001-mm-huge_memory-fix-folio_split-race-condition.patch
  carrying the fix, and a README describing conventions for adding
  and removing patches.

To drop the custom kernel build once Ubuntu ships the fix, remove the
patch file(s) under kernel-patches/; no Dockerfile change is then
required.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a custom GuestOS kernel build pipeline to ship an Ubuntu linux-hwe-6.17 kernel with an upstream backported fix for a folio_split() / folio_try_get() race, and installs the resulting locally built kernel .debs into the GuestOS base image.

Changes:

  • Adds a new kernel-build stage in Dockerfile.base to fetch, patch, and rebuild the linux-hwe-6.17 source package.
  • Installs the locally built linux-image-unsigned, linux-modules, and linux-modules-extra .debs in the final base image stage instead of an apt meta-package.
  • Introduces kernel-patches/ with a documented convention and the backport patch for the folio split race fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
ic-os/guestos/context/Dockerfile.base Adds the kernel build stage, applies local patches, builds kernel .debs, and installs them into the final image.
ic-os/guestos/context/kernel-patches/README.md Documents patch naming/apply conventions and when to remove local patches.
ic-os/guestos/context/kernel-patches/0001-mm-huge_memory-fix-folio_split-race-condition.patch Adds the upstream backport patch to fix the folio split race.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ic-os/guestos/context/Dockerfile.base Outdated
Comment on lines +90 to +92
# Apply all *.patch files in lexicographic order. Tolerate a missing/empty
# directory so that removing the patches (once upstream ships the fix) only
# requires deleting files under kernel-patches/.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says a missing kernel-patches/ directory is tolerated, but the preceding COPY kernel-patches /tmp/kernel-patches will fail if the directory is removed. Either adjust the comment to only claim empty-directory tolerance, or ensure the directory always exists in the repo (even when no patches are carried).

Suggested change
# Apply all *.patch files in lexicographic order. Tolerate a missing/empty
# directory so that removing the patches (once upstream ships the fix) only
# requires deleting files under kernel-patches/.
# Apply all *.patch files in lexicographic order. Tolerate an empty
# directory so that removing the patches (once upstream ships the fix) only
# requires deleting files under kernel-patches/ while keeping the directory.

Copilot uses AI. Check for mistakes.
linux-modules-extra-$(apt-cache depends ${_KERNEL_PACKAGE} | sed -n -e 's/ Depends: linux-image-\(.*\)-generic/\1/p')-generic && \
rm /tmp/packages.*
/tmp/kernel-debs/linux-image-unsigned-*-generic_*_amd64.deb \
/tmp/kernel-debs/linux-modules-*-generic_*_amd64.deb \
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux-modules-*-generic_*_amd64.deb will also match linux-modules-extra-*-generic_*_amd64.deb, so this glob can expand to both modules and modules-extra, and then modules-extra is also included again on the next line. Tighten the glob so it only matches the non-"extra" package (or enumerate the exact files) to avoid duplicate/accidental installs.

Suggested change
/tmp/kernel-debs/linux-modules-*-generic_*_amd64.deb \
/tmp/kernel-debs/linux-modules-[0-9]*-generic_*_amd64.deb \

Copilot uses AI. Check for mistakes.
The base image's /bin/sh is dash, which does not support 'shopt -s
nullglob'. Replace it with an explicit '[ -e "$p" ] || continue'
guard so the loop tolerates an empty kernel-patches/ directory without
relying on bash.

Also remove inline '#' comments from inside the RUN command. Because
Docker's backslash continuation collapses the RUN body into a single
shell line, those '#' comments were silently commenting out everything
that followed, including the 'debchange' invocation.
The upstream patch (Zi Yan) was generated against mainline 6.17.13,
which already contains the __split_unmapped_folio() refactor with
SPLIT_TYPE_UNIFORM / folio_split_supported(). Ubuntu's linux-hwe-6.17
6.17.0-22.22~24.04.1 predates that refactor and still uses the
'bool uniform_split' signature, so the original hunks did not apply.

Regenerate the hunks against Ubuntu's actual mm/huge_memory.c. The
semantic change is identical: introduce 'origin_folio = folio' at the
top of __split_unmapped_folio() and pass it to xas_try_split() so that
a concurrent folio_try_get() waits on the original folio until the
xarray has been fully updated with the after-split folios.
The previous commit accidentally embedded a shell-integration OSC 633
escape sequence at the start of the patch file (captured from the
terminal when a heredoc was redirected to the file). GNU patch happens
to ignore leading junk before the first 'From ' / 'diff --git' marker,
but the file was not a clean git-format-patch output.

Replace the garbled first line with a proper 'From <sha> Mon Sep 17 ...'
header so the file is a valid mailbox-style patch.
Set DEB_BUILD_OPTIONS=parallel=$(nproc) so Ubuntu's debian/rules fans
the kernel build out across all available CPUs. Without this the build
runs serially and on a 2-vCPU GitHub-hosted runner exceeds the
container-base-images job timeout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants