Mx fp6 flatmm #3601

eeezio · 2026-01-19T03:56:29Z

Proposed changes

new feature: support dwordx3 sync/async load/store and support mx-fp6 flatmm

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Copilot

Pull request overview

This PR adds support for MX-FP6 data type in the flatmm operations, including implementing dwordx3 (12-byte) synchronous and asynchronous load/store operations needed for this data type.

Changes:

Added pk_fp6x16_t type support throughout the codebase with corresponding type traits and conversion utilities
Implemented dwordx3 (12-byte) load/store operations for AMD buffer addressing
Extended flatmm pipeline to handle FP6 data type with specialized tile distributions and memory layouts

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
test/ck_tile/memory_copy/test_copy.hpp	Added dwordx3 copy configuration and implementation for 12-byte data transfers
test/ck_tile/memory_copy/test_copy.cpp	Added test cases for pk_fp6x16_t memory copy operations
include/ck_tile/ops/gemm/warp/warp_gemm_attribute_mfma_impl.hpp	Added FP6 case to MFMA warp gemm attributes
include/ck_tile/ops/flatmm/pipeline/mx_flatmm_pipeline_agmem_bgmem_creg_v1_policy.hpp	Extended pipeline policy to support FP6 with dwordx3 operations and specialized tile distributions
include/ck_tile/ops/flatmm/pipeline/mx_flatmm_pipeline_agmem_bgmem_creg_v1.hpp	Updated pipeline implementation to handle FP6 data type with appropriate memory operations
include/ck_tile/ops/common/utils.hpp	Added DataTypeTraits for pk_fp6x16_t
include/ck_tile/host/reference/reference_gemm.hpp	Added FP6 support to reference GEMM implementation
include/ck_tile/host/check_err.hpp	Added error checking specialization for pk_fp6x16_t
include/ck_tile/core/tensor/buffer_view.hpp	Added support for 12-element int8_t buffer loads/stores in LDS
include/ck_tile/core/numeric/vector_type.hpp	Added int32x3_tt and int32x6_tt types for FP6 vector operations
include/ck_tile/core/numeric/type_convert.hpp	Added include for pk_fp6.hpp
include/ck_tile/core/numeric/pk_fp6.hpp	New file defining pk_fp6_t type with pack/unpack operations
include/ck_tile/core/arch/amd_buffer_addressing_builtins.hpp	Added dwordx3 load/store builtins and extended buffer operations
include/ck_tile/core/arch/amd_buffer_addressing.hpp	Added dwordx3 load support
include/ck_tile/core.hpp	Added include for pk_fp6.hpp
include/ck/utility/amd_xdlops.hpp	Updated comment for FP6 MFMA operation
example/ck_tile/18_flatmm/mxgemm/run_mx_flatmm.inc	Added FP6-specific initialization logic
example/ck_tile/18_flatmm/mxgemm/mx_flatmm_instance.cpp.in	Added FP6 type alias
example/ck_tile/18_flatmm/mxgemm/mx_flatmm_instance.cmake	Added FP6xFP6 configuration
example/ck_tile/18_flatmm/mxgemm/mx_flatmm.hpp	Added MXfp6_FlatmmConfig16 configuration
example/ck_tile/18_flatmm/mxgemm/mx_flatmm.cpp	Added FP6 case to example runner
example/ck_tile/18_flatmm/CMakeLists.txt	Removed blank line

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-19T17:39:04Z

include/ck_tile/core/numeric/vector_type.hpp

+{
+    static constexpr index_t N = 1;
+    using value_type           = int32x3_tt;
+    using type                 = int32x3_tt; // this is danguous


Corrected spelling of 'danguous' to 'dangerous'.

Copilot · 2026-01-19T17:39:05Z

include/ck_tile/core/numeric/vector_type.hpp

+{
+    static constexpr index_t N = 2;
+    using value_type           = int32x6_tt;
+    using type                 = int32x6_tt; // this is danguous


Corrected spelling of 'danguous' to 'dangerous'.

Copilot · 2026-01-19T17:39:05Z

include/ck_tile/core/arch/amd_buffer_addressing_builtins.hpp

                                index_t src_wave_addr_offset)
 {
-    static_assert(N == 1 || N == 2 || N == 4 || N == 8 || N == 16 || N == 32 || N == 64,
+    static_assert(N == 1 || N == 2 || N == 4 || N == 8 || N == 12 | N == 16 || N == 32 || N == 64,


Incorrect use of bitwise OR operator | instead of logical OR ||. This should be N == 12 || N == 16 to properly check if N equals 12 or 16.

Suggested change

static_assert(N == 1 || N == 2 || N == 4 || N == 8 || N == 12 | N == 16 || N == 32 || N == 64,

static_assert(N == 1 || N == 2 || N == 4 || N == 8 || N == 12 || N == 16 || N == 32 || N == 64,

Copilot · 2026-01-19T17:39:05Z

include/ck_tile/host/check_err.hpp

+    }
+    if(err_count > 0)
+    {
+        report_error_stats(err_count, numeric<pk_fp4_t>::max(), ref.size());


Incorrect type used in error reporting. This function checks pk_fp6x16_t ranges but reports statistics using pk_fp4_t::max(). Should use a type appropriate for pk_fp6x16_t or a generic float max value.

Suggested change

report_error_stats(err_count, numeric<pk_fp4_t>::max(), ref.size());

report_error_stats(err_count, std::numeric_limits<float>::max(), ref.size());

ZheWang added 13 commits January 19, 2026 03:18

add fp6 data-type and support sync/async dwordx3 load/store

f334ba5

clang-format

7965a94

pre-commit

9ce0131

1st commit

702b69c

default mnk pass ut

11c1562

fix a distrubution

76ee1de

fix

0a67f96

fix bdram distr

5a931cb

update

e1531ba

pass ut

1f7fc55

improve perf

1374984

update

2db47f7

clean code

6e5fa19

eeezio requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 19, 2026 03:56

eeezio requested a review from Snektron as a code owner January 19, 2026 03:56

fix clang format

41743df

afagaj requested a review from Copilot January 19, 2026 17:38

Copilot AI reviewed Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mx fp6 flatmm #3601

Mx fp6 flatmm #3601

eeezio commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Copilot AI Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	static_assert(N == 1 \|\| N == 2 \|\| N == 4 \|\| N == 8 \|\| N == 12 \| N == 16 \|\| N == 32 \|\| N == 64,
	static_assert(N == 1 \|\| N == 2 \|\| N == 4 \|\| N == 8 \|\| N == 12 \|\| N == 16 \|\| N == 32 \|\| N == 64,

	report_error_stats(err_count, numeric<pk_fp4_t>::max(), ref.size());
	report_error_stats(err_count, std::numeric_limits<float>::max(), ref.size());

Mx fp6 flatmm #3601

Are you sure you want to change the base?

Mx fp6 flatmm #3601

Conversation

eeezio commented Jan 19, 2026

Proposed changes

Checklist

Discussion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants