Skip to content

Fix Python finfo.min port in quantized attention masking#369

Open
ronaldmannak wants to merge 9 commits into
ml-explore:mainfrom
PicoMLX:greatestFiniteMagnitude
Open

Fix Python finfo.min port in quantized attention masking#369
ronaldmannak wants to merge 9 commits into
ml-explore:mainfrom
PicoMLX:greatestFiniteMagnitude

Conversation

@ronaldmannak

Copy link
Copy Markdown
Contributor

Proposed changes

Fixes a likely mistranslation from Python mlx-lm in quantizedScaledDotProductAttention.

The Python reference masks boolean/causal logits with mx.finfo(scores.dtype).min, a large negative finite value that suppresses masked positions before softmax. The Swift port used Float.leastNormalMagnitude, which is instead a tiny positive value, so masked positions could remain competitive or dominate when valid scores are negative.

This updates the Swift quantized attention mask fill value to match the intended Python semantics and adds regression coverage for masked quantized attention.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@ronaldmannak ronaldmannak marked this pull request as draft June 24, 2026 17:55
@ronaldmannak ronaldmannak marked this pull request as ready for review June 24, 2026 21:41
@ronaldmannak

Copy link
Copy Markdown
Contributor Author

That turned out to be a little bit more complex than I thought, but should be good to go now. Also in this PR: a related CQA fix in the same KVCache method

@ronaldmannak

Copy link
Copy Markdown
Contributor Author

One open question: I added extensions to MLXArray and DType to this repo. You could argue those belong in MLX-Swift. Happy to open a separate PR on MLX-Swift for that

@davidkoski

Copy link
Copy Markdown
Collaborator

One open question: I added extensions to MLXArray and DType to this repo. You could argue those belong in MLX-Swift. Happy to open a separate PR on MLX-Swift for that

Yeah, that would probably be the best approach. There is a very thin finfo on there already

@ronaldmannak

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants