-
Notifications
You must be signed in to change notification settings - Fork 31
[WS1][kernels] Batch-invariant matmul / GEMM #146
Copy link
Copy link
Open
Labels
component: kernelsTasks involving the development of CUDA and Triton underlying operatorsTasks involving the development of CUDA and Triton underlying operatorsfeatureplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)priority: highSevere congestion issues require the highest priority for resolution.Severe congestion issues require the highest priority for resolution.sprint-0615
Metadata
Metadata
Assignees
Labels
component: kernelsTasks involving the development of CUDA and Triton underlying operatorsTasks involving the development of CUDA and Triton underlying operatorsfeatureplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)priority: highSevere congestion issues require the highest priority for resolution.Severe congestion issues require the highest priority for resolution.sprint-0615
Type
Fields
Give feedbackNo fields configured for issues without a type.
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
This is the highest-technical-risk op in WS1. cuBLAS selects kernels by heuristic based on problem shape, and split-K decompositions change the reduction order of the K dimension — both break batch-invariance the moment batch size or sequence length shifts the chosen kernel. Matmul is also the most frequent op in the network (QKV, MLP, LM head), so drift here dominates everything downstream.
Scope
Provide a deterministic, batch-invariant GEMM the forward chain can route through.
Possible implementation routes:
Out of scope
Acceptance criteria
dXanddW) pass the shared gradient-invariance check from the WS1 backward-consistency issue.Notes
Planned PRs