Skip to content

feat: use content fetch instead of image pull#7573

Merged
awesomenix merged 6 commits intomainfrom
nishp/noinstall/fetch
Mar 10, 2026
Merged

feat: use content fetch instead of image pull#7573
awesomenix merged 6 commits intomainfrom
nishp/noinstall/fetch

Conversation

@awesomenix
Copy link
Contributor

What this PR does / why we need it:

We should just fetch the contents instead of performing a pull, so the compressed images remain on disk, when a run is performed all it does is unpack and run. The difference is negligible at cost of smaller initial image size.

here is the difference

Before

21G	/var/lib/containerd

After

6.0G	/var/lib/containerd

Which issue(s) this PR fixes:

Fixes #

Requirements:

  • uses conventional commit messages
  • includes documentation
  • adds unit tests
  • tested upgrade from previous version
  • commits are GPG signed and Github marks them as verified

Special notes for your reviewer:

Release note:

none

@djsly
Copy link
Collaborator

djsly commented Dec 19, 2025

if this work, it will be a game changer:

ctr images pull
What it does

Resolves the image reference (tag/digest)
Fetches:

Image manifest
Config object
All layer blobs


Registers the image in containerd’s image store
Makes it available to:

ctr run
nerdctl run
Kubernetes (via CRI)
ctr content fetch
What it does

Fetches content by descriptor or ref
Stores blobs only in the content store
Does not:

Create an image
Register it for execution
Resolve dependencies automatically (unless explicitly referenced)

Why would anyone use ctr content fetch?

  1. Air‑gapped or mirrored registries

Pre-fetch blobs
Assemble images later
Control exact digests

@djsly
Copy link
Collaborator

djsly commented Dec 19, 2025

ctr content fetch <ref|descriptor>

Downloads raw OCI blobs (manifest, config, layer tarballs) into the content store.
Does not create an image record in the image metadata store.
Does not unpack layers into the snapshotter (unless you later do it).
Result: you have the building blocks locally, but no runnable image is registered.

ctr images pull

Resolves the reference, downloads the same blobs into the content store, and creates the image record.
Optionally unpacks layers (--unpack) to the snapshotter for faster first start.
Result: runnable image appears in ctr images ls and is usable by ctr run, CRI, etc.

@djsly
Copy link
Collaborator

djsly commented Dec 19, 2025

1) Fetch (blobs land in content store)

ctr content fetch docker.io/library/nginx:latest

2) Identify the manifest digest you fetched

ctr content ls | grep manifest

e.g. sha256:MANIFEST_DIGEST

3) Create the image record (register name → manifest)

ctr images create
docker.io/library/nginx:latest
sha256:MANIFEST_DIGEST

4) (Optional) Unpack now to avoid first-run penalty

ctr images unpack docker.io/library/nginx:latest

5) Run

ctr run --rm docker.io/library/nginx:latest nginx

If you skip step 3, the image won’t show up in ctr images ls, and ctr run won’t find it.
If you skip step 4, the first run will pay the unpack cost.

@awesomenix
Copy link
Contributor Author

Having a penalty overhead of first time run is expected, but the tradeoff here is caching many more images since we have disk space of 15GB+ vs smaller set right now with unpacked.

Did a lot of testing, there is an overall penalty of 5s-7s additional startup time, to unpack and register the image with snapshotter (overlayfs). But this penalty is across all the containers in parallel.

with content fetch

ctr content fetch mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213
mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213: resolved |++++++++++++++++++++++++++++++++++++++|
index-sha256:0030c917ff77ddcf9178b308183f3206e87a09caa469919091c56f636b2f085d: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:ceb88ac8edb5d5784e67f3ab2ecdf5326bbe17c5d358b3113b155f3864c4b48d: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:12fe821b53d7d08fa94921798357863e98d370b8d4b1b84901710630e406176e: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:35e54a72f0aaddd65cf5651ed7ee3b5ce7ae8d6e988c4e56b4db2899013ab7ed: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:7abab46f17154bd8a84006f1d346930e52dc7eaa1369e92f2871d8e1a68d29f9: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:18b8431d08c9cca89bdd05716d3bf66d2766fa6bf47b85fd0129aad25bffdb5a: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:6081dea4d21051c04e278b83406025041126e67fff28f8f0fb9283bbefcd89e0: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4b3ffd8ccb5201a0fc03585952effb4ed2d1ea5e704d2e7330212fb8b16c86a3: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:db710fc699676318869ea835688cf75c00aa98f3ac2d252de9d87ce40f528fc5: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:9ee85a502d0b32e2a344216a072a7f0462e752efb373348773201da82665e389: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8f60597c1f34e69884127c7177870f0c93b47c4be8ed44574d00a40a510f35b8: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:0e8e2e177580d419b05b8ce672464ba949ed8203db09689fd4712b0145b7429d: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:e76791a520b21300630a3e4b891203f07c09a6f5243d12c7dd6f74c673c98877: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 8.0 s total: 676.5 (84.6 MiB/s)

time ctr containers create mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213 debug-box

real 0m7.905s
user 0m0.019s
sys 0m0.025s

with images pull

time ctr images pull mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213
mcr.microsoft.com/aks/aks gpu cuda:580.9 saved
└──index (0030c917ff77) complete |++++++++++++++++++++++++++++++++++++++|
├──manifest (35e54a72f0aa) complete |++++++++++++++++++++++++++++++++++++++|
│ ├──config (6081dea4d210) complete |++++++++++++++++++++++++++++++++++++++|
│ ├──layer (4f4fb700ef54) complete |++++++++++++++++++++++++++++++++++++++|
│ ├──layer (4b3ffd8ccb52) complete |++++++++++++++++++++++++++++++++++++++|
│ ├──layer (9ee85a502d0b) complete |++++++++++++++++++++++++++++++++++++++|
│ ├──layer (db710fc69967) complete |++++++++++++++++++++++++++++++++++++++|
│ └──layer (8f60597c1f34) complete |++++++++++++++++++++++++++++++++++++++|
├──manifest (ceb88ac8edb5) complete |++++++++++++++++++++++++++++++++++++++|
│ └──config (e76791a520b2) complete |++++++++++++++++++++++++++++++++++++++|
├──manifest (12fe821b53d7) complete |++++++++++++++++++++++++++++++++++++++|
│ └──config (18b8431d08c9) complete |++++++++++++++++++++++++++++++++++++++|
└──manifest (7abab46f1715) complete |++++++++++++++++++++++++++++++++++++++|
└──config (0e8e2e177580) complete |++++++++++++++++++++++++++++++++++++++|
application/vnd.oci.image.index.v1+json sha256:0030c917ff77ddcf9178b308183f3206e87a09caa469919091c56f636b2f085d
Pulling from OCI Registry (mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213) elapsed: 16.4s total: 676.5 (41.4 MiB/s)

real 0m16.372s
user 0m0.010s
sys 0m0.018s

time ctr containers create mcr.microsoft.com/aks/aks-gpu-cuda:580.95.05-20251021155213 debug-box

real 0m0.054s
user 0m0.015s
sys 0m0.012s

@awesomenix
Copy link
Contributor Author

Obviously we can use this option strategically, like use images pull for critical containers which run by default on first time bootup, like pause, coredns etc.

i feel that 5s-7s additional startup time is not much, compared to savings we get in additional areas like disk space, additional caching etc.

But open to discuss as a team on pros and cons.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 55 out of 91 changed files in this pull request and generated no new comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 92 changed files in this pull request and generated 3 comments.

Comment on lines +1 to +126
package main

import (
"context"
"fmt"
"os"
"runtime"

containerd "github.com/containerd/containerd/v2/client"
"github.com/containerd/containerd/v2/pkg/namespaces"
"github.com/containerd/platforms"
)

const (
defaultSocket = "/run/containerd/containerd.sock"
defaultNS = "k8s.io"
// images with compressed content size below this threshold are
// unpacked after fetch, effectively turning the operation into a
// full pull (~150 MiB compressed ≈ ~300 MiB unpacked).
pullSizeThreshold = 150 * 1024 * 1024 // 150 MiB
)

func main() {
if len(os.Args) < 2 {
fmt.Fprintf(os.Stderr, "Usage: %s <image-ref> [image-ref...]\n", os.Args[0])
fmt.Fprintf(os.Stderr, "Example: %s mcr.microsoft.com/oss/kubernetes/pause:3.9\n", os.Args[0])
os.Exit(1)
}

socket := os.Getenv("CONTAINERD_SOCKET")
if socket == "" {
socket = defaultSocket
}
ns := os.Getenv("CONTAINERD_NAMESPACE")
if ns == "" {
ns = defaultNS
}

client, err := containerd.New(socket)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to connect to containerd at %s: %v\n", socket, err)
os.Exit(1)
}
defer client.Close()

ctx := namespaces.WithNamespace(context.Background(), ns)

failed := 0
for _, ref := range os.Args[1:] {
if err := fetchImage(ctx, client, ref); err != nil {
fmt.Fprintf(os.Stderr, "FAIL %s: %v\n", ref, err)
failed++
}
}

if failed > 0 {
os.Exit(1)
}
}

// fetchImage uses client.Fetch() which:
// - Downloads all blobs (manifest, config, layers) into the content store
// - Creates an image record in the metadata database
// - Does NOT unpack layers into the snapshotter
//
// If the total image content size is below pullSizeThreshold (150 MiB),
// client.Pull() is called to additionally unpack the layers. Pull reuses
// already-fetched content from the store and handles snapshotter resolution
// internally (namespace label → platform default).
func fetchImage(ctx context.Context, client *containerd.Client, ref string) error {
fmt.Printf("Fetching %s ...\n", ref)

platform := fmt.Sprintf("linux/%s", runtime.GOARCH)
p, err := platforms.Parse(platform)
if err != nil {
return fmt.Errorf("parse platform %s: %w", platform, err)
}
platformMatcher := platforms.OnlyStrict(p)

imageMeta, err := client.Fetch(ctx, ref,
containerd.WithPlatformMatcher(platformMatcher),
)
if err != nil {
return fmt.Errorf("fetch failed: %w", err)
}

image := containerd.NewImage(client, imageMeta)

size, err := image.Size(ctx)
if err != nil {
fmt.Fprintf(os.Stderr, "WARN %s: could not determine image size, skipping unpack: %v\n", ref, err)
fmt.Printf("OK %s -> %s (fetched)\n", imageMeta.Name, imageMeta.Target.Digest)
return nil
}

if size < pullSizeThreshold {
// We use pull here instead of use unpack because some runtimes (e.g. containerd-shim-runsc-v1),
// require pull to trigger unpacking into the correct snapshotter based on the image's platform.
if _, err := client.Pull(ctx, ref,
containerd.WithPlatformMatcher(platformMatcher),
containerd.WithPullUnpack,
); err != nil {
return fmt.Errorf("pull failed: %w", err)
}
fmt.Printf("OK %s -> %s (pulled, %s)\n", imageMeta.Name, imageMeta.Target.Digest, formatSize(size))
} else {
fmt.Printf("OK %s -> %s (fetched, %s)\n", imageMeta.Name, imageMeta.Target.Digest, formatSize(size))
}

return nil
}

func formatSize(bytes int64) string {
const (
mib = 1024 * 1024
gib = 1024 * 1024 * 1024
)
switch {
case bytes >= gib:
return fmt.Sprintf("%.2f GiB", float64(bytes)/float64(gib))
case bytes >= mib:
return fmt.Sprintf("%.2f MiB", float64(bytes)/float64(mib))
default:
return fmt.Sprintf("%d bytes", bytes)
}
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image-fetcher binary is a new Go module but has no unit tests. Other similar standalone Go binaries in this repo (e.g., vhdbuilder/lister) also lack tests, so this is consistent with the codebase convention. However, the image-fetcher has non-trivial logic (size-based fetch vs. pull threshold) that would benefit from test coverage, especially since incorrect behavior could silently produce VHDs with images in the wrong state (fetched-only vs. fully pulled).

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +96
size, err := image.Size(ctx)
if err != nil {
fmt.Fprintf(os.Stderr, "WARN %s: could not determine image size, skipping unpack: %v\n", ref, err)
fmt.Printf("OK %s -> %s (fetched)\n", imageMeta.Name, imageMeta.Target.Digest)
return nil
}

if size < pullSizeThreshold {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image.Size() method returns the total size of content in the content store (compressed layer blobs + config + manifests). The comparison size < pullSizeThreshold where pullSizeThreshold is 150 MiB is used to decide fetch-only vs. full pull. However, the comment on line 17-20 says "images with compressed content size below this threshold are unpacked after fetch, effectively turning the operation into a full pull (~150 MiB compressed ≈ ~300 MiB unpacked)."

The image.Size() documentation states it returns the total size of the image's content (sum of all blob sizes in the content store). For a multi-platform image that was fetched with a platform matcher, this should be the platform-specific size. Please verify this returns the compressed content size as intended, not the unpacked/decompressed size, as the threshold logic depends on this distinction.

Copilot uses AI. Check for mistakes.
@@ -609,7 +609,7 @@ if [ $OS = $UBUNTU_OS_NAME ] && [ "$(isARM64)" -ne 1 ]; then # No ARM64 SKU wit

mkdir -p /opt/{actions,gpu}

ctr -n k8s.io image pull "$NVIDIA_DRIVER_IMAGE:$NVIDIA_DRIVER_IMAGE_TAG"
/opt/azure/containers/image-fetcher "$NVIDIA_DRIVER_IMAGE:$NVIDIA_DRIVER_IMAGE_TAG"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NVIDIA driver image fetch on line 612 was previously using ctr -n k8s.io image pull, which both downloads blobs AND creates an image record AND unpacks layers. The new image-fetcher binary uses client.Fetch() for images above 150 MiB (which the NVIDIA driver image likely is), which only downloads blobs and creates an image record but does NOT unpack layers.

However, looking at the GPU install flow: after the image is fetched during VHD build, CTR_GPU_INSTALL_CMD (defined in cse_config.sh) later runs ctr run to extract drivers. ctr run requires layers to be unpacked into a snapshotter — a fetch-only image (no unpack) will fail at ctr run time with an error like "no snapshot available".

Please verify that the NVIDIA GPU driver extraction workflow (which uses ctr run) works correctly when the image is only fetched (not pulled/unpacked). If the NVIDIA driver image is above 150 MiB compressed (which it likely is at ~1+ GiB), image-fetcher will only fetch it without unpacking, and ctr run will fail.

Copilot uses AI. Check for mistakes.
bootType: efi

disks:
- partitionTableType: gpt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we get sign-off from @hbeberman or @miz060 on these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes will check with them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm alright with the change but I have one functional concern.

OSGuard has some systemd-repart config files baked in that balance the size of root-a and root-b when applied to an Azure disk.

After making this change do you observe the root-a partition growing to fill the provisioned Azure disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the change here https://github.com/microsoft/azurelinux/blob/25bde1f99877f485a18f9edd996101c0fd393db6/toolkit/imageconfigs/files/osguard/repart.d/14-root-a.conf conflicts with the change i made, issue is that

 The storage: block is the problem. You're re-customizing a base image that was already built with the upstream template's partition layout (root-a: 12G). Your config
  re-specifies storage but with root-a: 20G — this conflicts with the base image's existing layout.

  Root cause: You have reinitialize-verity (which says "I'm re-customizing an existing verity image") and a storage: block (which says "re-partition the disk"). These
  contradict each other. The Image Customizer likely tries to repartition the already-partitioned base image, corrupting the boot chain → OSProvisioningTimedOut.

I reverted my change to not break OsGuard image, instead i fallback to fetch only for OSGuard, make changes to upstream and then remove fetchonly

Copilot AI review requested due to automatic review settings March 9, 2026 23:21
@awesomenix awesomenix force-pushed the nishp/noinstall/fetch branch from 1e9d51f to e0f5893 Compare March 9, 2026 23:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 90 changed files in this pull request and generated 2 comments.

awesomenix and others added 6 commits March 9, 2026 22:33
Use platforms.OnlyStrict via WithPlatformMatcher instead of WithPlatform
to bypass Pull's internal platforms.Only() which expands to match
sub-platforms (e.g. 386 for amd64). Also fix stale 300 MiB comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When IMAGE_FETCH_ONLY=true is set, image-fetcher skips unpacking
for all images regardless of size. This saves disk space on
OSGuard's constrained root partition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@awesomenix awesomenix merged commit 59f54bf into main Mar 10, 2026
33 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants