-
Notifications
You must be signed in to change notification settings - Fork 31
[WS1] KV-cache path consistency (prefill & decode) #152
Copy link
Copy link
Open
Labels
component: testingAdd test cases and benchmark-related tasksAdd test cases and benchmark-related tasksfeatureplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)priority: highSevere congestion issues require the highest priority for resolution.Severe congestion issues require the highest priority for resolution.sprint-0615
Metadata
Metadata
Assignees
Labels
component: testingAdd test cases and benchmark-related tasksAdd test cases and benchmark-related tasksfeatureplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)priority: highSevere congestion issues require the highest priority for resolution.Severe congestion issues require the highest priority for resolution.sprint-0615
Type
Fields
Give feedbackNo fields configured for issues without a type.
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
Rollout generates token-by-token through the decode path; training re-runs the same sequence through prefill. If the two paths reduce in different orders, the same token gets different logprobs in rollout vs training — a classic and high-impact rollout-vs-training drift source. WS1 invariance work tends to focus on chunked-prefill; the decode stage must be covered explicitly.
Scope
Ensure the prefill and decode paths produce the same reductions for the same effective context.
Out of scope
Acceptance criteria
Notes
Planned PRs