Skip to content

chore: sync upstream 2026-03-27#1

Open
hanzo-dev wants to merge 21 commits intomainfrom
upstream-sync-20260328
Open

chore: sync upstream 2026-03-27#1
hanzo-dev wants to merge 21 commits intomainfrom
upstream-sync-20260328

Conversation

@hanzo-dev
Copy link
Copy Markdown
Member

20 commits from mistral.rs

EricLBuehler and others added 21 commits February 25, 2026 13:36
…ricLBuehler#1943)

* fix(server-core): terminate SSE streams when response channel closes (EricLBuehler#1940)

Map Poll::Ready(None) to stream termination instead of Poll::Pending,
preventing indefinite hangs when all senders are dropped before a
terminal response is sent.

* docs: never include test plan in PR descriptions
**Description**
This PR fixes an issue where the `docs` GitHub Actions workflow fails on forks because GitHub Pages is not configured or enabled by default. 
By adding the `if: github.repository == 'EricLBuehler/mistral.rs'` condition to the `deploy` job, we ensure that the documentation is only built and deployed on the main repository, preventing unnecessary CI failures for contributors.
**Fixes**
Resolves the `Get Pages site failed` error occurring in forks during the `actions/configure-pages` step.
…er#1916)

`try_for_each` requires ALL architectures to match, causing GGUF
models with `llama` architecture to fail when the loader accepts
both `llama` and `mistral3`. Replace with `any()` check so that
matching any one of the expected architectures is sufficient.

Co-authored-by: Olivier ESTEVE <olivier@hdds.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…EricLBuehler#1941)

Converts Python-style descending ranges `range(expr, -1, -1)` into 
`range(expr)|reverse` prior to rendering. This provides a workaround 
for `minijinja` which does not natively support negative range steps, 
fixing the repeating lines inference issue on Qwen 2.5 templates.
… model token decoding (EricLBuehler#1950)

* fix(vision): correct Qwen VL multi-turn image processing

Three issues fixed across Qwen3-VL, Qwen2-VL, and Qwen2.5-VL:

1. resize_exact argument swap: image::DynamicImage::resize_exact takes
   (width, height) but was called with (height, width).

2. Per-image preprocessing: all images in a batch were resized to the
   max height/width, distorting smaller images in multi-turn
   conversations. Now each image is processed at its own resolution
   using Tensor::cat instead of Tensor::stack.

3. get_rope_index grid alignment (Qwen3-VL/MoE only): with per-image
   grids, prefix-cache-trimmed input_ids caused vision spans to
   misalign with grid_thw entries. Now always uses input_ids_full for
   MRoPE computation and narrows the resulting position_ids.

Also updates prefix cache pixel_values trimming to narrow by patch
count (from grid_thw) instead of image count.

* fix(sampling): include special tokens for thinking models

Enable include_special during token decoding when think tag mode is
active, so <think>/<\/think> delimiters appear in the output. Previously
only tool-calling sequences decoded special tokens.
* Add qwen3.5 moe

* Fix rmsnorm

* Fix special token case

* Fixes for qwen3.5

* Fix display and counting

* FIx par loading

* Tweak logs

* Loadtime merging

* Fix prefix cache

* feat(core): add hybrid paged-prefix recurrent parity and safety fallbacks

* Fixes for parity

* Fixes for parity

* Some docs

* Add docs and examples

* Add qwen3.5 dense and docs

* Add gdn prompt kernels

* Run fixes

* fix(core): harden qwen3.5 hybrid cache and device-map sizing

* docs: sync Python and vision docs with qwen3.5/qwen3next APIs

* fix(ci): resolve CUDA clippy, Metal compile errors, and typos check

* fix(ci): resolve remaining clippy, rustfmt, and metal compile errors

* fix(metal): pass buffer references to set_buffer calls

* fix(qwen3.5): handle empty seqlens and precompute deepstack indices

Avoid panic on empty seqlens by using proper error handling instead of
unwrap. Precompute deepstack index tensors before the layer loop to
eliminate repeated CPU-GPU syncs from Tensor::from_vec in hot path.
… command (EricLBuehler#1994)

* Add uqff gen selection for repo, base repo

* Add flags

* Format
…hler#1997)

* fix(isq): bits standardize format for numerical isq setting

* Fix error

* Format
…rt (EricLBuehler#2010)

The paged-attn crate was compiling Metal shaders with -std=metal3.0,
which does not define __HAVE_BFLOAT__. This caused the bfloat16
PagedAttention kernels to use an emulated _MLX_BFloat16 struct instead
of the native bfloat type, leading to runtime "was not found in the
library" errors on some Metal compiler/runtime combinations.

The quant crate already uses Metal 3.1 (see EricLBuehler#1844). This commit brings
paged-attn in line by:

1. Upgrading build.rs from metal3.0 to metal3.1 for precompiled
   metallibs
2. Setting MTLLanguageVersion::Version3_1 in the runtime compilation
   fallback path (compile_kernels_at_runtime) to match
…EricLBuehler#1979)

* load params.json before config.json if present

Currently, we never attempt to load params.json because Voxtral's repo also contains a config.json. Fixed a small logic error here to make sure we load params if present and fallback to config if not

* fix non-embeddings too

oops, haha
* fix(core): dummy out MoE expert tensors in Qwen2Loader for UQFF

* chore: ignore layrnorm typo

* fix(typos): ignore layrnorm globally

---------

Co-authored-by: Eric Buehler <65165915+EricLBuehler@users.noreply.github.com>
…ricLBuehler#1974) (EricLBuehler#2015)

When an iOS app goes to background, Metal rejects GPU command buffer
submissions with kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted.
Instead of failing the request, detect this specific error, reset cache state,
sleep 1s, and let the engine loop retry. Sequences remain in the scheduler
in Running state and are re-scheduled automatically when the app returns
to foreground.
…fusion pipelines (EricLBuehler#2016)

- Add `use_ring()` helper that checks both the `ring` feature flag and `RING_CONFIG` env var
- Fix `get_global_tp_size_from_devices()` CUDA+Ring branch to use `RingConfig::world_size` instead of local GPU count
- Fix `is_daemon()` to guard `RingConfig::load()` behind `use_ring()`, preventing panics when `RING_CONFIG` is unset
- Add `|| use_ring()` checks in vision, embedding, speech, and diffusion pipelines to match the pattern already used in normal.rs

Fixes EricLBuehler#2005
…r#2018)

* feat(quant): add MXFP4 ISQ with optimized decode kernels

* Improve quantize parallelization

* Format
…60328

# Conflicts:
#	CLAUDE.md
#	README.md
#	docs/VISION_MODELS.md
@github-actions
Copy link
Copy Markdown

Code Metrics Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                  5          305          210           52           43
 CSS                       2         1181         1036           34          111
 CUDA                     45        12720         9685         1375         1660
 Dockerfile                1           53           29           10           14
 JavaScript               16         3546         2676          482          388
 Jinja2                    7          694          656            5           33
 JSON                     21          409          406            0            3
 Makefile                  1            6            5            0            1
 Metal Shading Lan|       29        10790         8395          953         1442
 PowerShell                1          300          227           30           43
 Python                  116         8035         6561          402         1072
 Shell                     4          883          708          102           73
 Plain Text                3         3723            0         2413         1310
 TOML                     29         1403         1205           51          147
 YAML                      4           41           39            2            0
─────────────────────────────────────────────────────────────────────────────────
 HTML                      3          873          767           45           61
 |- CSS                    1          578          544           14           20
 |- JavaScript             1           24           23            0            1
 (Total)                             1475         1334           59           82
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                 97        10730            0         7823         2907
 |- BASH                  48          887          640          151           96
 |- JSON                  17          674          674            0            0
 |- PowerShell             2            2            2            0            0
 |- Python                19          812          628           84          100
 |- Rust                  45         1764         1468           52          244
 |- TOML                   8          227          180            1           46
 |- YAML                   3          175          173            1            1
 (Total)                            15271         3765         8112         3394
─────────────────────────────────────────────────────────────────────────────────
 Rust                    516       216130       189984         6057        20089
 |- Markdown             335         8303          451         6812         1040
 (Total)                           224433       190435        12869        21129
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                   903       285572       227598        26997        30977
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants