Update to cute dsl 4.6.0.dev0#94
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates various SM100 operations to adapt to the new nvidia-cutlass-dsl version (bumped to >=4.6.0.dev0), including direct imports of OperandMajorMode, intrinsic cleanups, and passing separate operand data types to make_trivial_tiled_mma. The review feedback highlights several instances in the fully fused KDA, lightning attention, and linear attention modules where the newly added second data type argument was incorrectly duplicated (e.g., passing the same type twice) instead of correctly specifying the distinct data types for both operands (such as q_dtype, k_dtype, v_dtype, or io_dtype).
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| self.q_dtype, | ||
| self.q_dtype, |
| self.io_dtype, | ||
| self.io_dtype, |
| self.io_dtype, | ||
| self.io_dtype, |
| self.k_dtype, | ||
| self.k_dtype, |
| self.q_dtype, | ||
| self.q_dtype, |
| self.k_dtype, | ||
| self.k_dtype, |
| self.q_dtype, | ||
| self.q_dtype, |
| self.k_dtype, | ||
| self.k_dtype, |
| self.io_dtype, | ||
| self.io_dtype, |
| self.k_dtype, | ||
| self.k_dtype, |
There was a problem hiding this comment.
Pull request overview
Updates cuLA to be compatible with nvidia-cutlass-dsl 4.6.0.dev0, primarily by adapting SM100 (Blackwell) CuteDSL kernel code to API changes in operand major-mode enums, MMA helper signatures, and NVVM tcgen05 MLIR op bindings.
Changes:
- Bump
nvidia-cutlass-dsldependency to>=4.6.0.dev0. - Update multiple SM100 ops to use
OperandMajorMode(instead oftcgen05.OperandMajorMode) and pass explicit operand dtypes intosm100_utils.make_trivial_tiled_mma(...). - Adjust SM100 NVVM tcgen05 wrapper calls to match updated MLIR op argument names/signatures (e.g.,
val=for stores, dropnum=where no longer accepted).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/conftest.py | Minor collection logic formatting; maintains existing skip behavior. |
| pyproject.toml | Bumps Cutlass DSL dependency to >=4.6.0.dev0. |
| cula/ops/linear_attn_sm100.py | Updates major-mode enum usage and MMA helper argument list for Cutlass DSL 4.6.0. |
| cula/ops/lightning_attn_sm100.py | Same Cutlass DSL 4.6.0 compatibility adjustments (enum + MMA helper signature). |
| cula/ops/kda_fully_fused_sm100_wip.py | Same Cutlass DSL 4.6.0 compatibility adjustments across KDA fused path. |
| cula/ops/intrinsics_sm100.py | Updates NVVM tcgen05 wrapper bindings to new MLIR op APIs (val=, vector extract changes, etc.). |
| cula/ops/fwd_o_sm100.py | Updates MMA setup to new major-mode enum + MMA helper signature. |
| cula/ops/cp/pre_scan.py | Updates MMA setup to new major-mode enum + MMA helper signature. |
| cula/ops/chunk_wy_dqkg_sm100.py | Updates multiple MMA setups to new major-mode enum + MMA helper signature. |
| cula/ops/chunk_delta_h_sm100.py | Updates MMA setup to new major-mode enum + MMA helper signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dependencies = [ | ||
| "nvidia-cutlass-dsl>=4.4.2", | ||
| "nvidia-cutlass-dsl>=4.6.0.dev0", | ||
| "apache-tvm-ffi>=0.1.9", | ||
| ] |
icavan
left a comment
There was a problem hiding this comment.
LGTM, will merge this PR once flashinfer has cutedsl 4.6 enabled.
📌 Description
Fixing compatibility issues with cute dsl 4.6.0.dev0
This change is not compatible with versions below 4.6.0
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to cuLA! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
⚡ Performance
Reviewer Notes