Skip to content

Update to cute dsl 4.6.0.dev0#94

Open
anakinxc wants to merge 2 commits into
inclusionAI:mainfrom
anakinxc:main
Open

Update to cute dsl 4.6.0.dev0#94
anakinxc wants to merge 2 commits into
inclusionAI:mainfrom
anakinxc:main

Conversation

@anakinxc

Copy link
Copy Markdown

📌 Description

Fixing compatibility issues with cute dsl 4.6.0.dev0

This change is not compatible with versions below 4.6.0

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to cuLA! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing.

⚡ Performance

Reviewer Notes

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates various SM100 operations to adapt to the new nvidia-cutlass-dsl version (bumped to >=4.6.0.dev0), including direct imports of OperandMajorMode, intrinsic cleanups, and passing separate operand data types to make_trivial_tiled_mma. The review feedback highlights several instances in the fully fused KDA, lightning attention, and linear attention modules where the newly added second data type argument was incorrectly duplicated (e.g., passing the same type twice) instead of correctly specifying the distinct data types for both operands (such as q_dtype, k_dtype, v_dtype, or io_dtype).

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +495 to 496
self.q_dtype,
self.q_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For qk_tiled_mma, the operands are Q and K. The first two arguments should be self.q_dtype and self.k_dtype respectively, rather than passing self.q_dtype twice.

Suggested change
self.q_dtype,
self.q_dtype,
self.q_dtype,
self.k_dtype,

Comment on lines +524 to 525
self.io_dtype,
self.io_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For sq_tiled_mma, the operands are State and Q. The first two arguments should be self.io_dtype and self.q_dtype respectively.

Suggested change
self.io_dtype,
self.io_dtype,
self.io_dtype,
self.q_dtype,

Comment on lines +535 to 536
self.io_dtype,
self.io_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For ks_tiled_mma, the operands are State and K. The first two arguments should be self.io_dtype and self.k_dtype respectively.

Suggested change
self.io_dtype,
self.io_dtype,
self.io_dtype,
self.k_dtype,

Comment on lines +1468 to 1469
self.k_dtype,
self.k_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For fake_kv_tiled_mma_acc32, the operands are V and K. The first two arguments should be self.v_dtype and self.k_dtype respectively.

Suggested change
self.k_dtype,
self.k_dtype,
self.v_dtype,
self.k_dtype,

Comment on lines +452 to 453
self.q_dtype,
self.q_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For qk_tiled_mma, the operands are Q and K. The first two arguments should be self.q_dtype and self.k_dtype respectively.

Suggested change
self.q_dtype,
self.q_dtype,
self.q_dtype,
self.k_dtype,

Comment on lines +1107 to 1108
self.k_dtype,
self.k_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For fake_kv_tiled_mma_acc32, the operands are V and K. The first two arguments should be self.v_dtype and self.k_dtype respectively.

Suggested change
self.k_dtype,
self.k_dtype,
self.v_dtype,
self.k_dtype,

Comment on lines +367 to 368
self.q_dtype,
self.q_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For qk_tiled_mma, the operands are Q and K. The first two arguments should be self.q_dtype and self.k_dtype respectively.

Suggested change
self.q_dtype,
self.q_dtype,
self.q_dtype,
self.k_dtype,

Comment on lines +377 to 378
self.k_dtype,
self.k_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For kv_tiled_mma, the operands are V and K. The first two arguments should be self.v_dtype and self.k_dtype respectively.

Suggested change
self.k_dtype,
self.k_dtype,
self.v_dtype,
self.k_dtype,

Comment on lines +387 to 388
self.io_dtype,
self.io_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For sq_tiled_mma, the operands are State and Q. The first two arguments should be self.io_dtype and self.q_dtype respectively.

Suggested change
self.io_dtype,
self.io_dtype,
self.io_dtype,
self.q_dtype,

Comment on lines +934 to 935
self.k_dtype,
self.k_dtype,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For fake_kv_tiled_mma_acc32, the operands are V and K. The first two arguments should be self.v_dtype and self.k_dtype respectively.

Suggested change
self.k_dtype,
self.k_dtype,
self.v_dtype,
self.k_dtype,

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates cuLA to be compatible with nvidia-cutlass-dsl 4.6.0.dev0, primarily by adapting SM100 (Blackwell) CuteDSL kernel code to API changes in operand major-mode enums, MMA helper signatures, and NVVM tcgen05 MLIR op bindings.

Changes:

  • Bump nvidia-cutlass-dsl dependency to >=4.6.0.dev0.
  • Update multiple SM100 ops to use OperandMajorMode (instead of tcgen05.OperandMajorMode) and pass explicit operand dtypes into sm100_utils.make_trivial_tiled_mma(...).
  • Adjust SM100 NVVM tcgen05 wrapper calls to match updated MLIR op argument names/signatures (e.g., val= for stores, drop num= where no longer accepted).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/conftest.py Minor collection logic formatting; maintains existing skip behavior.
pyproject.toml Bumps Cutlass DSL dependency to >=4.6.0.dev0.
cula/ops/linear_attn_sm100.py Updates major-mode enum usage and MMA helper argument list for Cutlass DSL 4.6.0.
cula/ops/lightning_attn_sm100.py Same Cutlass DSL 4.6.0 compatibility adjustments (enum + MMA helper signature).
cula/ops/kda_fully_fused_sm100_wip.py Same Cutlass DSL 4.6.0 compatibility adjustments across KDA fused path.
cula/ops/intrinsics_sm100.py Updates NVVM tcgen05 wrapper bindings to new MLIR op APIs (val=, vector extract changes, etc.).
cula/ops/fwd_o_sm100.py Updates MMA setup to new major-mode enum + MMA helper signature.
cula/ops/cp/pre_scan.py Updates MMA setup to new major-mode enum + MMA helper signature.
cula/ops/chunk_wy_dqkg_sm100.py Updates multiple MMA setups to new major-mode enum + MMA helper signature.
cula/ops/chunk_delta_h_sm100.py Updates MMA setup to new major-mode enum + MMA helper signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment on lines 13 to 16
dependencies = [
"nvidia-cutlass-dsl>=4.4.2",
"nvidia-cutlass-dsl>=4.6.0.dev0",
"apache-tvm-ffi>=0.1.9",
]
@icavan icavan requested review from icavan and tongke6 June 14, 2026 04:30

@icavan icavan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge this PR once flashinfer has cutedsl 4.6 enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants