Skip to content

add gfx950 16x16x64 I8 MFMA support to MoE 2-stage GEMM#461

Draft
yadaish wants to merge 2 commits intomainfrom
dev/moe_gemm_16x16x64
Draft

add gfx950 16x16x64 I8 MFMA support to MoE 2-stage GEMM#461
yadaish wants to merge 2 commits intomainfrom
dev/moe_gemm_16x16x64

Conversation

@yadaish
Copy link
Copy Markdown
Collaborator

@yadaish yadaish commented Apr 30, 2026

Use rocdl.mfma_i32_16x16x64_i8 (when available) for int8 on gfx950, halving the number of MFMA instructions per K-tile by processing K=64 in a single op instead of two K=32 ops.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

yadaish and others added 2 commits April 30, 2026 02:28
Use rocdl.mfma_i32_16x16x64_i8 (when available) for int8 on gfx950,
halving the number of MFMA instructions per K-tile by processing K=64
in a single op instead of two K=32 ops.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant