Skip to content

Add gather_qqmm#3757

Open
zcbenz wants to merge 3 commits into
ml-explore:mainfrom
zcbenz:qmm-global-scale
Open

Add gather_qqmm#3757
zcbenz wants to merge 3 commits into
ml-explore:mainfrom
zcbenz:qmm-global-scale

Conversation

@zcbenz

@zcbenz zcbenz commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Add a simple gather_qqmm implementation, by supporting global scale in qmm kernels and use them as fallback.

The implementation is bare minimum, for CUDA it is rerouted to qmm_naive and for Metal it is qmv. We will add global scale support to more qmm kernels in followup PRs, and will have specialized fast kernels in future. This PR mostly serves as a way for users to use qqmm before we have the native kernels ready.

Support for CPU backend is lacked at the moment..

@zcbenz zcbenz force-pushed the qmm-global-scale branch 2 times, most recently from f8a6d7d to ca36a2d Compare June 24, 2026 01:01
@zcbenz zcbenz force-pushed the qmm-global-scale branch from ca36a2d to 8180ce5 Compare June 30, 2026 07:39
@zcbenz zcbenz changed the title [CUDA] Add gather_qqmm Add gather_qqmm Jun 30, 2026
@zcbenz zcbenz force-pushed the qmm-global-scale branch 2 times, most recently from 0cb1b35 to b396742 Compare June 30, 2026 11:39
@zcbenz zcbenz force-pushed the qmm-global-scale branch from b396742 to 6a43d36 Compare July 1, 2026 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant