Add gather_qqmm by zcbenz · Pull Request #3757 · ml-explore/mlx

zcbenz · 2026-06-24T00:56:59Z

Add a simple gather_qqmm implementation, by supporting global scale in qmm kernels and use them as fallback.

The implementation is bare minimum, for CUDA it is rerouted to qmm_naive and for Metal it is qmv. We will add global scale support to more qmm kernels in followup PRs, and will have specialized fast kernels in future. This PR mostly serves as a way for users to use qqmm before we have the native kernels ready.

Support for CPU backend is lacked at the moment..

zcbenz force-pushed the qmm-global-scale branch 2 times, most recently from f8a6d7d to ca36a2d Compare June 24, 2026 01:01

zcbenz added 2 commits June 30, 2026 14:15

[CUDA] Make qmm_naive support global scale

06b0fa5

[CUDA] Add gather_qqmm

596b8b3

zcbenz force-pushed the qmm-global-scale branch from ca36a2d to 8180ce5 Compare June 30, 2026 07:39

zcbenz changed the title ~~[CUDA] Add gather_qqmm~~ Add gather_qqmm Jun 30, 2026

zcbenz force-pushed the qmm-global-scale branch 2 times, most recently from 0cb1b35 to b396742 Compare June 30, 2026 11:39

[Metal] Add gather_qqmm

6a43d36

zcbenz force-pushed the qmm-global-scale branch from b396742 to 6a43d36 Compare July 1, 2026 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gather_qqmm#3757

Add gather_qqmm#3757
zcbenz wants to merge 3 commits into
ml-explore:mainfrom
zcbenz:qmm-global-scale

zcbenz commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zcbenz commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zcbenz commented Jun 24, 2026 •

edited

Loading