Fix Python finfo.min port in quantized attention masking by ronaldmannak · Pull Request #369 · ml-explore/mlx-swift-lm

ronaldmannak · 2026-06-24T17:21:47Z

Proposed changes

Fixes a likely mistranslation from Python mlx-lm in quantizedScaledDotProductAttention.

The Python reference masks boolean/causal logits with mx.finfo(scores.dtype).min, a large negative finite value that suppresses masked positions before softmax. The Swift port used Float.leastNormalMagnitude, which is instead a tiny positive value, so masked positions could remain competitive or dominate when valid scores are negative.

This updates the Swift quantized attention mask fill value to match the intended Python semantics and adds regression coverage for masked quantized attention.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

ronaldmannak · 2026-06-24T21:42:58Z

That turned out to be a little bit more complex than I thought, but should be good to go now. Also in this PR: a related CQA fix in the same KVCache method

ronaldmannak · 2026-06-24T21:55:52Z

One open question: I added extensions to MLXArray and DType to this repo. You could argue those belong in MLX-Swift. Happy to open a separate PR on MLX-Swift for that

davidkoski · 2026-06-24T22:12:12Z

One open question: I added extensions to MLXArray and DType to this repo. You could argue those belong in MLX-Swift. Happy to open a separate PR on MLX-Swift for that

Yeah, that would probably be the best approach. There is a very thin finfo on there already

ronaldmannak · 2026-06-24T22:28:08Z

@davidkoski Done: ml-explore/mlx-swift#429

ronaldmannak added 4 commits June 24, 2026 10:01

Apply -Float.greatestFiniteMagnitude

4ca52e6

Add unit test

38d6a26

Specify dtype

acf9803

Swift lint

6bf9997

ronaldmannak marked this pull request as draft June 24, 2026 17:55

ronaldmannak added 4 commits June 24, 2026 13:59

Create MLXArray and DType extensions

cd06115

Add preserve DType test

b7f38c1

Fix Gemma

2e050d9

Fix overflow

be4fe1f

ronaldmannak marked this pull request as ready for review June 24, 2026 21:41

ronaldmannak mentioned this pull request Jun 24, 2026

Add MLXArray and DType extensions ml-explore/mlx-swift#429

Open

4 tasks

MOve MLXArray and DType extensions to MLX-Swift

167f193

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Python finfo.min port in quantized attention masking#369

Fix Python finfo.min port in quantized attention masking#369
ronaldmannak wants to merge 9 commits into
ml-explore:mainfrom
PicoMLX:greatestFiniteMagnitude

ronaldmannak commented Jun 24, 2026

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

davidkoski commented Jun 24, 2026

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ronaldmannak commented Jun 24, 2026

Proposed changes

Checklist

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

davidkoski commented Jun 24, 2026

Uh oh!

ronaldmannak commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants