Open
Conversation
…ricLBuehler#1943) * fix(server-core): terminate SSE streams when response channel closes (EricLBuehler#1940) Map Poll::Ready(None) to stream termination instead of Poll::Pending, preventing indefinite hangs when all senders are dropped before a terminal response is sent. * docs: never include test plan in PR descriptions
**Description** This PR fixes an issue where the `docs` GitHub Actions workflow fails on forks because GitHub Pages is not configured or enabled by default. By adding the `if: github.repository == 'EricLBuehler/mistral.rs'` condition to the `deploy` job, we ensure that the documentation is only built and deployed on the main repository, preventing unnecessary CI failures for contributors. **Fixes** Resolves the `Get Pages site failed` error occurring in forks during the `actions/configure-pages` step.
…er#1916) `try_for_each` requires ALL architectures to match, causing GGUF models with `llama` architecture to fail when the loader accepts both `llama` and `mistral3`. Replace with `any()` check so that matching any one of the expected architectures is sufficient. Co-authored-by: Olivier ESTEVE <olivier@hdds.io> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…EricLBuehler#1941) Converts Python-style descending ranges `range(expr, -1, -1)` into `range(expr)|reverse` prior to rendering. This provides a workaround for `minijinja` which does not natively support negative range steps, fixing the repeating lines inference issue on Qwen 2.5 templates.
… model token decoding (EricLBuehler#1950) * fix(vision): correct Qwen VL multi-turn image processing Three issues fixed across Qwen3-VL, Qwen2-VL, and Qwen2.5-VL: 1. resize_exact argument swap: image::DynamicImage::resize_exact takes (width, height) but was called with (height, width). 2. Per-image preprocessing: all images in a batch were resized to the max height/width, distorting smaller images in multi-turn conversations. Now each image is processed at its own resolution using Tensor::cat instead of Tensor::stack. 3. get_rope_index grid alignment (Qwen3-VL/MoE only): with per-image grids, prefix-cache-trimmed input_ids caused vision spans to misalign with grid_thw entries. Now always uses input_ids_full for MRoPE computation and narrows the resulting position_ids. Also updates prefix cache pixel_values trimming to narrow by patch count (from grid_thw) instead of image count. * fix(sampling): include special tokens for thinking models Enable include_special during token decoding when think tag mode is active, so <think>/<\/think> delimiters appear in the output. Previously only tool-calling sequences decoded special tokens.
* Add qwen3.5 moe * Fix rmsnorm * Fix special token case * Fixes for qwen3.5 * Fix display and counting * FIx par loading * Tweak logs * Loadtime merging * Fix prefix cache * feat(core): add hybrid paged-prefix recurrent parity and safety fallbacks * Fixes for parity * Fixes for parity * Some docs * Add docs and examples * Add qwen3.5 dense and docs * Add gdn prompt kernels * Run fixes * fix(core): harden qwen3.5 hybrid cache and device-map sizing * docs: sync Python and vision docs with qwen3.5/qwen3next APIs * fix(ci): resolve CUDA clippy, Metal compile errors, and typos check * fix(ci): resolve remaining clippy, rustfmt, and metal compile errors * fix(metal): pass buffer references to set_buffer calls * fix(qwen3.5): handle empty seqlens and precompute deepstack indices Avoid panic on empty seqlens by using proper error handling instead of unwrap. Precompute deepstack index tensors before the layer loop to eliminate repeated CPU-GPU syncs from Tensor::from_vec in hot path.
… command (EricLBuehler#1994) * Add uqff gen selection for repo, base repo * Add flags * Format
…hler#1997) * fix(isq): bits standardize format for numerical isq setting * Fix error * Format
…rt (EricLBuehler#2010) The paged-attn crate was compiling Metal shaders with -std=metal3.0, which does not define __HAVE_BFLOAT__. This caused the bfloat16 PagedAttention kernels to use an emulated _MLX_BFloat16 struct instead of the native bfloat type, leading to runtime "was not found in the library" errors on some Metal compiler/runtime combinations. The quant crate already uses Metal 3.1 (see EricLBuehler#1844). This commit brings paged-attn in line by: 1. Upgrading build.rs from metal3.0 to metal3.1 for precompiled metallibs 2. Setting MTLLanguageVersion::Version3_1 in the runtime compilation fallback path (compile_kernels_at_runtime) to match
…EricLBuehler#1979) * load params.json before config.json if present Currently, we never attempt to load params.json because Voxtral's repo also contains a config.json. Fixed a small logic error here to make sure we load params if present and fallback to config if not * fix non-embeddings too oops, haha
* fix(core): dummy out MoE expert tensors in Qwen2Loader for UQFF * chore: ignore layrnorm typo * fix(typos): ignore layrnorm globally --------- Co-authored-by: Eric Buehler <65165915+EricLBuehler@users.noreply.github.com>
…ricLBuehler#1974) (EricLBuehler#2015) When an iOS app goes to background, Metal rejects GPU command buffer submissions with kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted. Instead of failing the request, detect this specific error, reset cache state, sleep 1s, and let the engine loop retry. Sequences remain in the scheduler in Running state and are re-scheduled automatically when the app returns to foreground.
…fusion pipelines (EricLBuehler#2016) - Add `use_ring()` helper that checks both the `ring` feature flag and `RING_CONFIG` env var - Fix `get_global_tp_size_from_devices()` CUDA+Ring branch to use `RingConfig::world_size` instead of local GPU count - Fix `is_daemon()` to guard `RingConfig::load()` behind `use_ring()`, preventing panics when `RING_CONFIG` is unset - Add `|| use_ring()` checks in vision, embedding, speech, and diffusion pipelines to match the pattern already used in normal.rs Fixes EricLBuehler#2005
…r#2018) * feat(quant): add MXFP4 ISQ with optimized decode kernels * Improve quantize parallelization * Format
…60328 # Conflicts: # CLAUDE.md # README.md # docs/VISION_MODELS.md
Code Metrics Report━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Language Files Lines Code Comments Blanks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ C Header 5 305 210 52 43 CSS 2 1181 1036 34 111 CUDA 45 12720 9685 1375 1660 Dockerfile 1 53 29 10 14 JavaScript 16 3546 2676 482 388 Jinja2 7 694 656 5 33 JSON 21 409 406 0 3 Makefile 1 6 5 0 1 Metal Shading Lan| 29 10790 8395 953 1442 PowerShell 1 300 227 30 43 Python 116 8035 6561 402 1072 Shell 4 883 708 102 73 Plain Text 3 3723 0 2413 1310 TOML 29 1403 1205 51 147 YAML 4 41 39 2 0 ───────────────────────────────────────────────────────────────────────────────── HTML 3 873 767 45 61 |- CSS 1 578 544 14 20 |- JavaScript 1 24 23 0 1 (Total) 1475 1334 59 82 ───────────────────────────────────────────────────────────────────────────────── Jupyter Notebooks 3 122 83 23 16 |- Markdown 1 60 30 22 8 |- Python 1 122 113 1 8 (Total) 304 226 46 32 ───────────────────────────────────────────────────────────────────────────────── Markdown 97 10730 0 7823 2907 |- BASH 48 887 640 151 96 |- JSON 17 674 674 0 0 |- PowerShell 2 2 2 0 0 |- Python 19 812 628 84 100 |- Rust 45 1764 1468 52 244 |- TOML 8 227 180 1 46 |- YAML 3 175 173 1 1 (Total) 15271 3765 8112 3394 ───────────────────────────────────────────────────────────────────────────────── Rust 516 216130 189984 6057 20089 |- Markdown 335 8303 451 6812 1040 (Total) 224433 190435 12869 21129 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total 903 285572 227598 26997 30977 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
20 commits from mistral.rs