[WS1] KV-cache path consistency (prefill & decode)

Part of WS1 — Full Batch-Invariant Forward Chain (epic: #<WS1 tracking issue>)

## Why

Rollout generates token-by-token through the decode path; training re-runs the same sequence through prefill. If the two paths reduce in different orders, the same token gets different logprobs in rollout vs training — a classic and high-impact rollout-vs-training drift source. WS1 invariance work tends to focus on chunked-prefill; the decode stage must be covered explicitly.

## Scope

Ensure the prefill and decode paths produce the same reductions for the same effective context.

- Verify that attention over a cached context (decode: one query against N cached KV) reduces in the same fixed order as the equivalent prefill over the full sequence.
- Cover the decode-stage path explicitly in tests, not only chunked-prefill.
- Confirm cache writes/reads (layout, dtype of stored KV) do not introduce a precision difference between the path that wrote the cache and the path that consumes it.
- Validate "generate then re-score" equivalence against the #108 harness.

## Out of scope

- The attention kernel's internal accumulation design (covered by the attention issue; this issue checks prefill/decode parity on top of it).
- Paged-attention / cache-eviction policy beyond reduction-order correctness.
- Multi-GPU KV sharding (WS2).
- FP8 KV cache.

## Acceptance criteria

- For a fixed sequence, decode-path logprobs equal prefill-path logprobs within #108 tolerance (ideally bitwise) — the core rollout-vs-training check.
- The decode stage is exercised directly in the test sweep, across batch=1/N and padding layouts.
- Stored-KV dtype / layout is shown not to add drift between writer and reader paths.
- Tests include short / long / variable-length / padded sequences; passes the #108 shared test helper.

## Notes

- Depends on #108 and is tightly coupled with the attention issue — share one reduction-order contract across prefill and decode.
- This is the single most training-relevant consistency check in WS1; weight it accordingly.

## Planned PRs

- [ ] Full-prefill reference vs chunked-prefill consistency test
- [ ] Decode-stage path test (one query vs N cached KV) with reduction order matching prefill
- [ ] Stored-KV layout/dtype: show no writer-vs-reader precision drift
- [ ] "Generate then re-score" equivalence vs the #108 harness
- [ ] CI-friendly decode-path smoke test (short / long / varlen / padded)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WS1] KV-cache path consistency (prefill & decode) #152

Why

Scope

Out of scope

Acceptance criteria

Notes

Planned PRs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[WS1] KV-cache path consistency (prefill & decode) #152

Description

Why

Scope

Out of scope

Acceptance criteria

Notes

Planned PRs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions