chore: sync upstream 2026-03-27 by hanzo-dev · Pull Request #1 · hanzoai/engine

hanzo-dev · 2026-03-28T06:21:11Z

20 commits from mistral.rs

…ricLBuehler#1943) * fix(server-core): terminate SSE streams when response channel closes (EricLBuehler#1940) Map Poll::Ready(None) to stream termination instead of Poll::Pending, preventing indefinite hangs when all senders are dropped before a terminal response is sent. * docs: never include test plan in PR descriptions

**Description** This PR fixes an issue where the `docs` GitHub Actions workflow fails on forks because GitHub Pages is not configured or enabled by default. By adding the `if: github.repository == 'EricLBuehler/mistral.rs'` condition to the `deploy` job, we ensure that the documentation is only built and deployed on the main repository, preventing unnecessary CI failures for contributors. **Fixes** Resolves the `Get Pages site failed` error occurring in forks during the `actions/configure-pages` step.

…ricLBuehler#1933)

…er#1916) `try_for_each` requires ALL architectures to match, causing GGUF models with `llama` architecture to fail when the loader accepts both `llama` and `mistral3`. Replace with `any()` check so that matching any one of the expected architectures is sufficient. Co-authored-by: Olivier ESTEVE <olivier@hdds.io> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…EricLBuehler#1941) Converts Python-style descending ranges `range(expr, -1, -1)` into `range(expr)|reverse` prior to rendering. This provides a workaround for `minijinja` which does not natively support negative range steps, fixing the repeating lines inference issue on Qwen 2.5 templates.

… model token decoding (EricLBuehler#1950) * fix(vision): correct Qwen VL multi-turn image processing Three issues fixed across Qwen3-VL, Qwen2-VL, and Qwen2.5-VL: 1. resize_exact argument swap: image::DynamicImage::resize_exact takes (width, height) but was called with (height, width). 2. Per-image preprocessing: all images in a batch were resized to the max height/width, distorting smaller images in multi-turn conversations. Now each image is processed at its own resolution using Tensor::cat instead of Tensor::stack. 3. get_rope_index grid alignment (Qwen3-VL/MoE only): with per-image grids, prefix-cache-trimmed input_ids caused vision spans to misalign with grid_thw entries. Now always uses input_ids_full for MRoPE computation and narrows the resulting position_ids. Also updates prefix cache pixel_values trimming to narrow by patch count (from grid_thw) instead of image count. * fix(sampling): include special tokens for thinking models Enable include_special during token decoding when think tag mode is active, so <think>/<\/think> delimiters appear in the output. Previously only tool-calling sequences decoded special tokens.

…er#1935)

* Add qwen3.5 moe * Fix rmsnorm * Fix special token case * Fixes for qwen3.5 * Fix display and counting * FIx par loading * Tweak logs * Loadtime merging * Fix prefix cache * feat(core): add hybrid paged-prefix recurrent parity and safety fallbacks * Fixes for parity * Fixes for parity * Some docs * Add docs and examples * Add qwen3.5 dense and docs * Add gdn prompt kernels * Run fixes * fix(core): harden qwen3.5 hybrid cache and device-map sizing * docs: sync Python and vision docs with qwen3.5/qwen3next APIs * fix(ci): resolve CUDA clippy, Metal compile errors, and typos check * fix(ci): resolve remaining clippy, rustfmt, and metal compile errors * fix(metal): pass buffer references to set_buffer calls * fix(qwen3.5): handle empty seqlens and precompute deepstack indices Avoid panic on empty seqlens by using proper error handling instead of unwrap. Precompute deepstack index tensors before the layer loop to eliminate repeated CPU-GPU syncs from Tensor::from_vec in hot path.

… command (EricLBuehler#1994) * Add uqff gen selection for repo, base repo * Add flags * Format

…hler#1997) * fix(isq): bits standardize format for numerical isq setting * Fix error * Format

…rt (EricLBuehler#2010) The paged-attn crate was compiling Metal shaders with -std=metal3.0, which does not define __HAVE_BFLOAT__. This caused the bfloat16 PagedAttention kernels to use an emulated _MLX_BFloat16 struct instead of the native bfloat type, leading to runtime "was not found in the library" errors on some Metal compiler/runtime combinations. The quant crate already uses Metal 3.1 (see EricLBuehler#1844). This commit brings paged-attn in line by: 1. Upgrading build.rs from metal3.0 to metal3.1 for precompiled metallibs 2. Setting MTLLanguageVersion::Version3_1 in the runtime compilation fallback path (compile_kernels_at_runtime) to match

…EricLBuehler#1979) * load params.json before config.json if present Currently, we never attempt to load params.json because Voxtral's repo also contains a config.json. Fixed a small logic error here to make sure we load params if present and fallback to config if not * fix non-embeddings too oops, haha

* fix(core): dummy out MoE expert tensors in Qwen2Loader for UQFF * chore: ignore layrnorm typo * fix(typos): ignore layrnorm globally --------- Co-authored-by: Eric Buehler <65165915+EricLBuehler@users.noreply.github.com>

…ricLBuehler#1974) (EricLBuehler#2015) When an iOS app goes to background, Metal rejects GPU command buffer submissions with kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted. Instead of failing the request, detect this specific error, reset cache state, sleep 1s, and let the engine loop retry. Sequences remain in the scheduler in Running state and are re-scheduled automatically when the app returns to foreground.

…fusion pipelines (EricLBuehler#2016) - Add `use_ring()` helper that checks both the `ring` feature flag and `RING_CONFIG` env var - Fix `get_global_tp_size_from_devices()` CUDA+Ring branch to use `RingConfig::world_size` instead of local GPU count - Fix `is_daemon()` to guard `RingConfig::load()` behind `use_ring()`, preventing panics when `RING_CONFIG` is unset - Add `|| use_ring()` checks in vision, embedding, speech, and diffusion pipelines to match the pattern already used in normal.rs Fixes EricLBuehler#2005

…set (EricLBuehler#1998) (EricLBuehler#2017)

…r#2018) * feat(quant): add MXFP4 ISQ with optimized decode kernels * Improve quantize parallelization * Format

…ricLBuehler#2019)

…ler#2020)

…60328 # Conflicts: # CLAUDE.md # README.md # docs/VISION_MODELS.md

github-actions · 2026-03-28T06:22:23Z

Code Metrics Report

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                  5          305          210           52           43
 CSS                       2         1181         1036           34          111
 CUDA                     45        12720         9685         1375         1660
 Dockerfile                1           53           29           10           14
 JavaScript               16         3546         2676          482          388
 Jinja2                    7          694          656            5           33
 JSON                     21          409          406            0            3
 Makefile                  1            6            5            0            1
 Metal Shading Lan|       29        10790         8395          953         1442
 PowerShell                1          300          227           30           43
 Python                  116         8035         6561          402         1072
 Shell                     4          883          708          102           73
 Plain Text                3         3723            0         2413         1310
 TOML                     29         1403         1205           51          147
 YAML                      4           41           39            2            0
─────────────────────────────────────────────────────────────────────────────────
 HTML                      3          873          767           45           61
 |- CSS                    1          578          544           14           20
 |- JavaScript             1           24           23            0            1
 (Total)                             1475         1334           59           82
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                 97        10730            0         7823         2907
 |- BASH                  48          887          640          151           96
 |- JSON                  17          674          674            0            0
 |- PowerShell             2            2            2            0            0
 |- Python                19          812          628           84          100
 |- Rust                  45         1764         1468           52          244
 |- TOML                   8          227          180            1           46
 |- YAML                   3          175          173            1            1
 (Total)                            15271         3765         8112         3394
─────────────────────────────────────────────────────────────────────────────────
 Rust                    516       216130       189984         6057        20089
 |- Markdown             335         8303          451         6812         1040
 (Total)                           224433       190435        12869        21129
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                   903       285572       227598        26997        30977
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

EricLBuehler and others added 21 commits February 25, 2026 13:36

Fix memory limit constants for 32-bit targets in attention and ISQ (E…

bead32c

…ricLBuehler#1933)

fix(docs): Update MCP client documentation link in README (EricLBuehl…

2ed5575

…er#1935)

feat(cli): add --uqff-base-model and --uqff-repo-id flags to quantize…

1aac117

… command (EricLBuehler#1994) * Add uqff gen selection for repo, base repo * Add flags * Format

fix(cli): ensure readme matches older versions (EricLBuehler#1995)

ea80dcd

fix(isq): bits standardize format for numerical isq setting (EricLBue…

f0a1cde

…hler#1997) * fix(isq): bits standardize format for numerical isq setting * Fix error * Format

fix(cache): set hybrid recurrent state_indices during prompt cache re…

3f5b5ea

…set (EricLBuehler#1998) (EricLBuehler#2017)

feat(quant): add MXFP4 ISQ with optimized decode kernels (EricLBuehle…

8368da9

…r#2018) * feat(quant): add MXFP4 ISQ with optimized decode kernels * Improve quantize parallelization * Format

refactor(wrapper-crates): reduce duplicated builder and request glue (E…

e8aa6a8

…ricLBuehler#2019)

fix(docs): duplicate entry in SUMMARY.md breaks docs build (EricLBueh…

604368e

…ler#2020)

Merge remote-tracking branch 'upstream/master' into upstream-sync-202…

cb5ff3f

…60328 # Conflicts: # CLAUDE.md # README.md # docs/VISION_MODELS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: sync upstream 2026-03-27#1

chore: sync upstream 2026-03-27#1
hanzo-dev wants to merge 21 commits intomainfrom
upstream-sync-20260328

hanzo-dev commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

hanzo-dev commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants