Skip to content

[gfx1250][tdm] Add K-loop hoist descriptor helpers and modify build script#484

Merged
coderfeli merged 3 commits intomainfrom
gfx1250_tdm_desc
May 9, 2026
Merged

[gfx1250][tdm] Add K-loop hoist descriptor helpers and modify build script#484
coderfeli merged 3 commits intomainfrom
gfx1250_tdm_desc

Conversation

@XingerZhu
Copy link
Copy Markdown
Collaborator

…t tweaks

TDM (python/flydsl/expr/rocdl/tdm_ops.py)

  • Add K-loop hoist helpers that let MoE-style K-reduction loops patch only dgroup0 lane 2 (the per-K-tile global addr_lo) of an otherwise hoisted base descriptor, instead of rebuilding the full 4xi32 descriptor every iteration:
    • update_tensor_descriptor_2d_addr_lo
    • update_tensor_gather_descriptor_addr_lo (carry-unsafe fast path; lo-32 add only)
    • update_tensor_descriptor_2d_addr_lo_hi
    • update_tensor_gather_descriptor_addr_lo_hi
    • update_tensor_descriptor_2d_addr64
    • update_tensor_gather_descriptor_addr64 (carry-safe variants via the new add_addr_with_carry helper, needed when per-CTA base + cumulative K-tile delta can cross a 4 GiB boundary in lo-32-bit arithmetic; otherwise the descriptor silently aliases into the wrong 4 GiB page and the GPU deadlocks in amdgpu_mes_reg_write_reg_wait with no recoverable signal).

Build scripts

  • scripts/build.sh: forward HIP_PLATFORM (default amd) to cmake as a cache var, because the shipped /opt/rocm/lib/cmake/hip/hip-config.cmake has if("OFF") ... set(hip_HIPCONFIG_EXECUTABLE) ... and cannot auto-detect the platform.
  • scripts/build_llvm.sh: enable lld in LLVM_ENABLE_PROJECTS, turn on MLIR_ENABLE_ROCM_RUNNER, and pass HIP_PLATFORM=amd for the same reason.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

…t tweaks

TDM (python/flydsl/expr/rocdl/tdm_ops.py)
- Add K-loop hoist helpers that let MoE-style K-reduction loops patch only
  dgroup0 lane 2 (the per-K-tile global addr_lo) of an otherwise hoisted
  base descriptor, instead of rebuilding the full 4xi32 descriptor every
  iteration:
  - `update_tensor_descriptor_2d_addr_lo`
  - `update_tensor_gather_descriptor_addr_lo`
    (carry-unsafe fast path; lo-32 add only)
  - `update_tensor_descriptor_2d_addr_lo_hi`
  - `update_tensor_gather_descriptor_addr_lo_hi`
  - `update_tensor_descriptor_2d_addr64`
  - `update_tensor_gather_descriptor_addr64`
    (carry-safe variants via the new `add_addr_with_carry` helper, needed
    when per-CTA base + cumulative K-tile delta can cross a 4 GiB boundary
    in lo-32-bit arithmetic; otherwise the descriptor silently aliases
    into the wrong 4 GiB page and the GPU deadlocks in
    `amdgpu_mes_reg_write_reg_wait` with no recoverable signal).

Build scripts
- scripts/build.sh: forward `HIP_PLATFORM` (default `amd`) to cmake as a
  cache var, because the shipped `/opt/rocm/lib/cmake/hip/hip-config.cmake`
  has `if("OFF") ... set(hip_HIPCONFIG_EXECUTABLE) ...` and cannot
  auto-detect the platform.
- scripts/build_llvm.sh: enable `lld` in `LLVM_ENABLE_PROJECTS`, turn on
  `MLIR_ENABLE_ROCM_RUNNER`, and pass `HIP_PLATFORM=amd` for the same
  reason.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings May 8, 2026 09:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Python-side TDM (gfx1250) descriptor “K-loop hoist” helpers to cheaply patch the per-iteration global address portion of an otherwise-hoisted descriptor (including a carry-safe 64-bit variant), and updates build scripts to better interoperate with ROCm HIP CMake configuration.

Changes:

  • Introduce descriptor patching helpers in tdm_ops.py to update dgroup0 lane 2 (addr_lo) and optionally lane 3 (addr_hi with type bits preserved), plus an add_addr_with_carry utility for carry-safe address updates.
  • Update scripts/build.sh to forward HIP_PLATFORM into the FlyDSL CMake configure.
  • Update scripts/build_llvm.sh to enable lld, turn on MLIR_ENABLE_ROCM_RUNNER, and pass HIP_PLATFORM.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
scripts/build.sh Prints/defaults HIP_PLATFORM and forwards it to FlyDSL’s CMake configure.
scripts/build_llvm.sh Enables additional LLVM/MLIR components and sets HIP_PLATFORM for the LLVM/MLIR build.
python/flydsl/expr/rocdl/tdm_ops.py Adds K-loop descriptor hoist/patch helpers, including a carry-safe addr_hi update path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/build_llvm.sh
@coderfeli coderfeli merged commit 631f684 into main May 9, 2026
10 of 11 checks passed
@coderfeli coderfeli deleted the gfx1250_tdm_desc branch May 9, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants