feat(git): content-addressed immutable snapshots#325
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 693104d727
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| // The blob is immutable, so it is safe to cache aggressively downstream. | ||
| w.Header().Set("Cache-Control", "public, max-age=31536000, immutable") | ||
| w.Header().Set("Accept-Ranges", "bytes") |
There was a problem hiding this comment.
in the cold path where the snapshot is not pulled to the local disk cache from s3 yet, this should not advertise range request capability. maybe thats splitting hairs and wont actually matter, i just remember that edge case of the cold start from when i prototyped a much less thorough version of multipart range snapshot restore
There was a problem hiding this comment.
good catch, updated
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 693104d727
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
693104d to
1c9372b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c9372bf0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1c9372b to
d411afe
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d411afebcb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Snapshots were written to a single mutable key (upstream+".snapshot"), which made range requests unsafe: a regeneration could swap the bytes mid-download, so parallel/ranged fetches needed bespoke pinning. Write each snapshot once under a commit-addressed key and keep a tiny mutable pointer to the current commit. The stable snapshot URL resolves the pointer and advertises the immutable URL via Content-Location; a new /git/<repo>/snapshot/<commit>.tar.zst route serves the blob with standard HTTP Range support (zero-copy sendfile on the disk tier). In-flight downloads survive regenerations until the old blob ages out via TTL. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019eae3d-a2fd-70ca-80fe-a7536ec6748c
d411afe to
216a22a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 216a22a1c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extraHeaders.Set("X-Cachew-Snapshot-Commit", headSHA) | ||
|
|
||
| return snapshot.Create(ctx, s.cache, cacheKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders) | ||
| return snapshot.Create(ctx, s.cache, commitKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders) |
There was a problem hiding this comment.
Avoid racing immutable snapshot writes across replicas
In a multi-pod deployment, two replicas generating the same new HEAD can both observe Stat as missing and then both call snapshot.Create for the same supposedly immutable key. The cache Create implementations replace the object on close, and these archives are not byte-stable across independent clones because of .git metadata/mtimes, so once the first replica has published the pointer and clients start parallel range requests, the second replica finishing later can swap the bytes under the same URL and reintroduce the mixed-range corruption this content-addressed route is meant to prevent. This needs cache-wide create-if-absent/CAS semantics or deterministic archive bytes before advertising the key as immutable.
Useful? React with 👍 / 👎.
stuartwdouglas
left a comment
There was a problem hiding this comment.
Snapshots are commit-addressed, not content-addressed — bytes diverge across pods
The premise (snapshot.go doc comment: "any replica resolves the same key to byte-identical bytes") doesn't hold. The key is keyed by commit SHA, but the blob bytes are not a pure function of the commit, so two pods generating a snapshot for the same commit produce different archives. That reintroduces exactly the parallel-range corruption this PR aims to prevent.
Proof
Two clones of the same fixed commit, archived with the exact tar invocation in client/archive.go (tar -cpf - -C dir -- .):
cloneA/root.txt mtime=1781478813 cloneB/root.txt mtime=1781478815
sha256(cloneA archive) = 9925e87f…
sha256(cloneB archive) = 2406b9ca… ← different bytes
Why it diverges (worst first)
- File mtimes —
cloneForSnapshotdoes a plaingit clone; git sets working-tree mtimes to wall-clock checkout time, not commit time, and tar records them per entry. The write-once doc comment already acknowledges "a re-run … would otherwise overwrite the blob with a byte-different archive (file mtimes, .git metadata)" — but the skip-if-exists guard only prevents overwrite on a single node; it does nothing cross-node. .gitcontents —Createarchives".", so the clone-built packs (their object ordering + mtimes) and config are included and vary per clone.- Tar entry ordering —
tar … -- .emits entries in readdir order (filesystem-dependent, unsorted). No--sort=name. - pzstd thread count —
pzstd -p<threads>frame boundaries depend on thread count; consistent only if every pod is configured identically (a0/all-cores setting on heterogeneous pods diverges even with identical tar input).
Why it breaks correctness
The tiered cache resolves Open/Stat local-tier-first, and generation writes to all tiers including local disk. The Stat skip only dedupes when a prior blob is already visible. So:
- Concurrent generation: periodic jobs and cold-start spool background uploads run independently per pod → both
Stat→miss→both generate divergent bytes→both write. Pod A keepsbytesAlocally, pod B keepsbytesB. - TTL expiry + regen on another pod produces the same split.
A client then does a parallel/ranged download of /git/<repo>/snapshot/<commit>.tar.zst; the LB spreads ranges across pods, pod A serves from bytesA and pod B from bytesB, and the stitched archive is corrupt — the exact race this route was built to eliminate.
TestSnapshotWriteOnceForUnchangedHead only verifies one cache instance doesn't overwrite its own key; it doesn't test cross-node byte-identity, which is the property that's actually false.
Possible fixes
Make generation deterministic, e.g.:
- Build the archive from
git archive <commit>(ordered, uses commit time, excludes.git) — likely cleanest; or - Normalize tar:
--sort=name,--mtime=@<committer-date>,--owner=0 --group=0 --numeric-owner, pinned--format, and exclude/normalize.git. - Pin zstd threads (avoid
0); prefer single-streamzstdoverpzstdfor the cached blob since pzstd output is thread-count-dependent.
Alternatively/additionally, close the divergence path: serve immutable range requests from the shared remote tier only, and make remote Create write-once (first writer wins).
|
I'm pretty -1 on this change - we can solve the same problem with ETag matching and not blow out our storage usage. |
Snapshots were written to one mutable key, so range requests could stitch bytes from two different snapshots — the root cause the pinning prototypes worked around.
Now each snapshot is written once under a commit-addressed key with a tiny mutable pointer to the current commit. The stable
/snapshot.tar.zstURL resolves the pointer and advertises the immutable URL; a new/git/<repo>/snapshot/<commit>.tar.zstroute serves the blob with standard HTTPRange(sendfile on the disk tier), enabling safe parallel downloads with no pin protocol. In-flight downloads survive regenerations until the old blob ages out via TTL.Cold/S3-only serves fall back to a full
200, so clients must check 206-vs-200. Unknown/aged-out commits return 404 to trigger re-resolution via the stable URL.No infra changes: reuses existing caches and TTLs; multiple versions per repo coexist until they age out.