diff --git a/Documentation/Models.md b/Documentation/Models.md
index cd48a43d4..8826fc597 100644
--- a/Documentation/Models.md
+++ b/Documentation/Models.md
@@ -53,6 +53,14 @@ TDT models process audio in chunks (~15s with overlap) as batch operations.
 | **Kokoro ANE (7-stage)** | Same Kokoro 82M weights split into 7 CoreML stages so the ANE-friendly layers (Albert / PostAlbert / Alignment / Vocoder) stay resident on the Neural Engine while Prosody / Noise / Tail run on CPU+GPU. 3-11× RTFx vs. the single-graph Kokoro. Single voice (`af_heart`), ≤510 IPA phonemes per call, no chunker / SSML / custom lexicon. | ANE-optimized variant derived from [laishere/kokoro-coreml](https://github.com/laishere/kokoro-coreml) |
 | **PocketTTS** | Second TTS backend (~155M params). Autoregressive frame-by-frame generation with dynamic audio chunking. No phoneme stage, works directly on text tokens. | Supports streaming, minimal RAM usage, excellent quality |
 
+## Not Production Ready
+
+Models that are functionally complete and shipped, but **not yet recommended for production use** — RTFx or WER limitations that need community assistance to push past. Open to PRs / issue reports / perf investigations.
+
+| Model | Status |
+|-------|--------|
+| **Magpie TTS Multilingual** ([FluidAudio#541](https://github.com/FluidInference/FluidAudio/pull/541), [mobius#44](https://github.com/FluidInference/mobius/pull/44), [HF](https://huggingface.co/FluidInference/magpie-tts-multilingual-357m-coreml)) | NVIDIA NeMo Magpie TTS Multilingual 357M, 8 languages (en/es/de/fr/it/vi/zh/hi), 5 built-in speakers. 4-model CoreML pipeline (text_encoder + decoder_prefill + decoder_step + nanocodec_decoder) + pure-Swift Local Transformer (Accelerate + BNNS). Custom IPA override via `\|...\|` segments. **Quite slow on Apple Silicon — RTFx ≈ 0.04 (~25× slower than realtime), ~30 s cold first synth, ~96 s warm for an 8-word English sentence on M-series.** Audio is ASR-clean on 4/5 speakers; spk0 has a single trailing-word artifact attributable to fp16 sampler-trajectory drift. Throughput investigation, MLX-backed LocalTransformer, CFG perf, and Japanese support (OpenJTalk + MeCab) are pending. For real-time TTS use Kokoro or PocketTTS instead. |
+
 ## Evaluated Models (Not Supported)
 
 Models we converted and tested but are not supported: too large for on-device deployment, limitations or superseded by better approaches.
@@ -83,4 +91,5 @@ Models we converted and tested but are not supported: too large for on-device de
 | Kokoro TTS | [FluidInference/kokoro-82m-coreml](https://huggingface.co/FluidInference/kokoro-82m-coreml) |
 | Kokoro ANE (7-stage) | [FluidInference/kokoro-82m-coreml/tree/main/ANE](https://huggingface.co/FluidInference/kokoro-82m-coreml/tree/main/ANE) |
 | PocketTTS | [FluidInference/pocket-tts-coreml](https://huggingface.co/FluidInference/pocket-tts-coreml) |
+| Magpie TTS Multilingual | [FluidInference/magpie-tts-multilingual-357m-coreml](https://huggingface.co/FluidInference/magpie-tts-multilingual-357m-coreml) |
 | Nemotron Streaming | [FluidInference/nemotron-speech-streaming-en-0.6b-coreml](https://huggingface.co/FluidInference/nemotron-speech-streaming-en-0.6b-coreml) |
diff --git a/Documentation/TTS/Magpie.md b/Documentation/TTS/Magpie.md
new file mode 100644
index 000000000..9124b59e0
--- /dev/null
+++ b/Documentation/TTS/Magpie.md
@@ -0,0 +1,154 @@
+# Magpie TTS Multilingual (Swift Port)
+
+Swift port of NVIDIA NeMo Magpie TTS Multilingual 357M, exported to CoreML.
+Lives under `Sources/FluidAudio/TTS/Magpie/`.
+
+## Status
+
+Functional but **quite slow — needs significant perf work, not for real-time
+or latency-sensitive use.** First synth on a fresh process is dominated by
+CoreML model load + first-call ANE compile (~30 s); warm synths run at
+~96 s wall for an 8-word English sentence on M-series, i.e. RTFx ≈ **0.04**
+(~25× slower than realtime). Whether the throughput ceiling is a model
+characteristic, a CoreML conversion limitation, or both is still being
+investigated and is expected to improve in subsequent iterations. For
+real-time use prefer Kokoro (~20× RTFx) or PocketTTS (~1.5–2× RTFx);
+Magpie's value prop is multilingual coverage and the 5 built-in speaker
+contexts, not throughput.
+
+Audio quality is perceptually clean across all 5 speakers and ASR-clean on
+4/5; speaker 0 has a single trailing-word artifact ("…and") attributable
+to fp16 sampler-trajectory drift, not a structural bug.
+
+Not yet covered: Japanese (deferred — needs OpenJTalk XCFramework + MeCab
+dict), CFG performance optimization, MLX-backed LocalTransformer,
+throughput investigation (the headline gap).
+
+## Architecture
+
+```
+text → MagpieTokenizer (per-language) → text_encoder.mlmodelc
+                                          ↓
+speaker_N.npy (110×768) → decoder_prefill.mlmodelc (1 batched call) ──┐
+                                                                      ↓
+                            ┌──── KV cache (12 layers × [2,1,512,12,64] fp16)
+                            ↓
+                   AR loop (decoder_step.mlmodelc, ≤500 steps):
+                     ├─ LocalTransformer (Swift, Accelerate+BNNS)
+                     ├─ Sampler (top-k=80, temp=0.6, forbidden mask)
+                     ├─ embed sampled (8) codes → next decoder_step input
+                     └─ stop on audio_eos_id (2017) or maxSteps
+                            ↓
+                   nanocodec_decoder.mlmodelc → 22 050 Hz Float32 PCM
+```
+
+## Compute placement (verified end-to-end)
+
+| Model              | Compute units            | Reasoning                                                                                                    |
+| ------------------ | ------------------------ | ------------------------------------------------------------------------------------------------------------ |
+| `text_encoder`     | `.cpuAndNeuralEngine`    | Runs on ANE; ~3.5× vs CPU.                                                                                   |
+| `decoder_prefill`  | `.cpuAndNeuralEngine`    | Runs on ANE; ~3.2× vs CPU. One batched call replaces 110 sequential `decoder_step` calls.                    |
+| `decoder_step`     | **`.cpuAndGPU`**         | Pinned. ANE compile fails (`MILCompilerForANE: ANECCompile() FAILED`) due to rank-4 split-K/V scatter; on `.cpuAndNeuralEngine` it falls back to CPU at ~hundreds-of-ms cost per call. GPU (Metal MPS) is fastest. Verified: 96 s warm vs 103 s warm on `.cpuAndNeuralEngine`. |
+| `nanocodec_decoder`| `.cpuAndNeuralEngine`    | Runs on ANE.                                                                                                 |
+
+The pin is implemented in `MagpieModelStore.swift:60` — caller-supplied
+`computeUnits` is honored for all models *except* `decoder_step`, which is
+forced to `.cpuAndGPU` (or `.cpuOnly` if the caller asked for `.cpuOnly`).
+
+## Performance journey
+
+Three optimizations landed during the port; numbers are warm-avg wall time on
+M-series for an 8-word English sentence.
+
+| Stage                                                   | Wall (warm) | Speedup |
+| ------------------------------------------------------- | ----------- | ------- |
+| Baseline: 110-step prefill loop, ANE on decoder_step    | ~420 s      | 1.0×    |
+| **Wire `decoder_prefill.mlmodelc` (1 batched call)**    | ~110 s      | 3.8×    |
+| **Pin decoder_step to `.cpuAndGPU`**                    | ~96 s       | 4.4×    |
+
+Asset was already on HF (`FluidInference/magpie-tts-multilingual-357m-coreml`)
+and downloaded by `MagpieResourceDownloader`, just unused. `prefillFast`
+(`MagpiePrefill.swift:23`) replaces 110 sequential `decoder_step` calls with
+one `decoder_prefill` call whose 12 stacked-K/V outputs (`var_208`, `var_374`,
+… `var_1958`, each `[2, 1, 512, 12, 64]` fp16) are sliced via two `memcpy`s
+per layer into the KV cache (`MagpieKvCache.seedFromPrefillOutputs`).
+
+## Public API
+
+```swift
+let manager = try await MagpieTtsManager.downloadAndCreate(
+    languages: [.english],
+    cacheDirectory: nil,
+    computeUnits: .cpuAndNeuralEngine,   // decoder_step pinned to GPU internally
+    progressHandler: nil
+)
+
+let result = try await manager.synthesize(
+    text: "Hello world.",
+    speaker: .john,
+    language: .english,
+    options: .default
+)
+// result.samples : [Float]   (22 050 Hz)
+// result.codeCount : Int
+// result.durationSeconds : Double
+```
+
+## CLI
+
+```bash
+# Download all assets eagerly
+swift run fluidaudiocli magpie download
+
+# Synth
+swift run fluidaudiocli magpie text "Hello world." --speaker 0 --output hello.wav
+```
+
+Parity, probe, and compute-plan tooling live upstream in `mobius` (Python) —
+they exercise the export pipeline and are out of scope for the Swift runtime.
+
+## Known issues
+
+1. **spk0 trailing-word drift.** ASR shows a stray word at the end (e.g.
+   "…seashore, and"). Stage-by-stage parity probe (in `mobius`) localizes it
+   to fp16 sampler-trajectory non-determinism between Python+CoreML reference
+   and Swift+CoreML host: prefill SNR degrades L0=64 dB → L11=44 dB through
+   the 12-layer cache, then compounds in the AR loop. CoreML itself is
+   consistent between languages; the drift is host-floating-point + RNG/sampler
+   ordering. Not user-perceptible on speakers 1–4.
+
+2. **`decoder_step` ANE compile failure is real.** Earlier benchmark with
+   zeroed `position` scalars showed a 3× ANE speedup; that was misleading —
+   with real incrementing positions the ANEF compile fails at runtime per
+   call. Keep the `.cpuAndGPU` pin.
+
+## File map
+
+```
+Sources/FluidAudio/TTS/Magpie/
+├── MagpieTtsManager.swift                # public actor
+├── MagpieConstants.swift                 # shapes, ids, file names, HF repo id
+├── MagpieError.swift
+├── MagpieTypes.swift
+├── Assets/
+│   ├── MagpieModelStore.swift            # actor; loads 4 mlmodelcs, per-model compute units
+│   ├── MagpieResourceDownloader.swift    # HF download via DownloadUtils
+│   ├── MagpieConstantsStore.swift
+│   └── MagpieLocalTransformerWeights.swift
+├── LocalTransformer/
+│   ├── MagpieLocalTransformer.swift      # 1-layer transformer (attention + FFN) via Accelerate (cblas_sgemm) + BNNS (GELU)
+│   └── MagpieSampler.swift               # top-k + temp + forbidden mask + CFG merge
+├── Pipeline/
+│   ├── Preprocess/                       # per-language tokenizers + IPA override
+│   └── Synthesize/
+│       ├── MagpieSynthesizer.swift       # orchestrates encode → prefill → AR → nanocodec
+│       ├── MagpieKvCache.swift           # 12 layers × (cache, position); seedFromPrefillOutputs
+│       ├── MagpiePrefill.swift           # prefillFast (batched) + prefill (110-step fallback)
+│       └── MagpieNanocodec.swift
+└── Shared/
+    ├── NpyReader.swift                   # .npy v1 (fp32/fp16/int)
+    └── MagpieMT19937.swift               # deterministic RNG matching Python reference
+
+Sources/FluidAudioCLI/Commands/
+└── MagpieCommand.swift                   # dispatch (download / text)
+```
diff --git a/README.md b/README.md
index 002c046c5..4e293854e 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ Want to convert your own model? Check [möbius](https://github.com/FluidInferenc
 
 - **Automatic Speech Recognition (ASR)**: [Parakeet TDT v3](Documentation/Models.md#batch-transcription-near-real-time) (0.6b) and other TDT/CTC models for batch transcription supporting 25 European languages, Japanese, and Chinese; [Parakeet EOU](Documentation/Models.md#streaming-transcription-true-real-time) (120m) for streaming ASR with end-of-utterance detection (English only). See all [ASR models](Documentation/Models.md#asr-models).
 - **Inverse Text Normalization (ITN)**: Post-process ASR output to convert spoken-form to written-form ("two hundred" → "200"). See [text-processing-rs](https://github.com/FluidInference/text-processing-rs)
-- **Text-to-Speech (TTS)**: Kokoro (82m) for parallel synthesis with SSML and pronunciation control across 9 languages (EN, ES, FR, HI, IT, JA, PT, ZH); PocketTTS for streaming TTS with voice cloning support (EN, DE, ES, FR, IT, PT — 6L and 24L variants)
+- **Text-to-Speech (TTS)**: Kokoro (82m) for parallel synthesis with SSML and pronunciation control across 9 languages (EN, ES, FR, HI, IT, JA, PT, ZH); PocketTTS for streaming TTS with voice cloning support (EN, DE, ES, FR, IT, PT — 6L and 24L variants); **Magpie (357m, experimental)** autoregressive multilingual TTS with 5 speakers, `|…|` IPA override, and 8-language coverage (EN, ES, DE, FR, IT, VI, ZH, HI) — note: quite slow (~0.04 RTFx on Apple Silicon, ~25× slower than realtime) and needs further perf work, see [Magpie docs](Documentation/TTS/Magpie.md) before adopting
 - **Speaker Diarization (Online + Offline)**: Speaker separation and identification across audio streams. Streaming pipeline for real-time processing and offline batch pipeline with advanced clustering.
 - **Speaker Embedding Extraction**: Generate speaker embeddings for voice comparison and clustering, you can use this for speaker identification
 - **Voice Activity Detection (VAD)**: Voice activity detection with Silero models
@@ -607,6 +607,60 @@ swift run fluidaudiocli tts "Hello from FluidAudio." --auto-download --output ou
 
 Dictionary and model assets are cached under `~/.cache/fluidaudio/Models/kokoro`.
 
+### Magpie (Multilingual) — experimental
+
+> ⚠️ **Quite slow on Apple Silicon — needs significant perf work; not for
+> real-time / latency-sensitive use.** First synth on a fresh process is
+> dominated by CoreML model load + first-call ANE compile (~30 s). Warm
+> synths run at **~96 s wall for an 8-word English sentence** on M-series
+> (RTFx ≈ **0.04**, i.e. ~25× slower than realtime). Output is
+> perceptually clean / ASR-clean across 4 of the 5 speakers; speaker 0
+> has a single trailing-word artifact attributable to fp16
+> sampler-trajectory drift (not a structural bug). Whether the throughput
+> ceiling is a model characteristic, a CoreML conversion limitation, or
+> both is still being investigated and is expected to improve in
+> subsequent iterations. **Use Kokoro (~20× RTFx) or PocketTTS
+> (~1.5–2× RTFx) for real-time use.** Magpie ships for multilingual
+> coverage and the 5 speaker contexts, not throughput.
+
+Magpie TTS Multilingual (357M) is NVIDIA's autoregressive encoder-decoder TTS with 8-codebook NanoCodec vocoder output at 22.05 kHz. It exposes 5 built-in speakers and supports 8 languages (English, Spanish, German, French, Italian, Vietnamese, Mandarin, Hindi) with a `|…|` IPA override that routes inline phoneme sequences directly to the tokenizer. Japanese is deferred pending OpenJTalk integration.
+
+```swift
+import FluidAudio
+
+Task {
+    let manager = try await MagpieTtsManager.downloadAndCreate(
+        languages: [.english, .spanish]
+    )
+    let result = try await manager.synthesize(
+        text: "Hello | ˈ n ɛ m o ʊ | from FluidAudio.",
+        speaker: .john,
+        language: .english
+    )
+    let wav = AudioWAV.data(from: result.samples, sampleRate: result.sampleRate)
+    try wav.write(to: URL(fileURLWithPath: "hello.wav"))
+}
+```
+
+```bash
+# Pre-download assets for selected languages
+swift run fluidaudiocli magpie download --languages en,es
+
+# Synthesize with IPA override enabled (default)
+swift run fluidaudiocli magpie text --text "Hello | ˈ n ɛ m o ʊ |." \
+    --speaker 0 --language en --output hello.wav
+
+# Classifier-free guidance and sampling controls
+swift run fluidaudiocli magpie text --text "Bonjour." --language fr \
+    --cfg 2.5 --temperature 0.6 --topk 80 --seed 42 --output bonjour.wav
+```
+
+Parity / probe / compute-plan tooling lives upstream in `mobius` (Python).
+
+Assets (4 CoreML models + `constants/` + per-language tokenizer files) are fetched from [`FluidInference/magpie-tts-multilingual-357m-coreml`](https://huggingface.co/FluidInference/magpie-tts-multilingual-357m-coreml) on first use. The 1-layer local transformer (256d, top-k + temperature sampling, forbidden-token mask) runs on CPU via Accelerate/BNNS; the 12-layer decoder KV cache is rolled stateful across steps.
+
+When `--seed N` is supplied, sampling is driven by a NumPy-compatible MT19937 RNG so the Swift output is bit-reproducible against the Python reference seeded with `np.random.seed(N)`.
+
 ## Continuous Integration
 
 - `tests.yml`: Default build matrix covering SwiftPM tests and an iOS archive smoke test.
diff --git a/Sources/FluidAudio/ModelNames.swift b/Sources/FluidAudio/ModelNames.swift
index 437b0422f..f40860abe 100644
--- a/Sources/FluidAudio/ModelNames.swift
+++ b/Sources/FluidAudio/ModelNames.swift
@@ -31,6 +31,7 @@ public enum Repo: String, CaseIterable, Sendable {
     case parakeetTdtCtc110m = "FluidInference/parakeet-tdt-ctc-110m-coreml"
     case cosyvoice3 = "FluidInference/CosyVoice3-0.5B-coreml"
     case cohereTranscribeCoreml = "FluidInference/cohere-transcribe-03-2026-coreml/q8"
+    case magpieTts = "FluidInference/magpie-tts-multilingual-357m-coreml"
 
     /// Repository slug (without owner)
     public var name: String {
@@ -87,6 +88,8 @@ public enum Repo: String, CaseIterable, Sendable {
             return "CosyVoice3-0.5B-coreml"
         case .cohereTranscribeCoreml:
             return "cohere-transcribe-03-2026-coreml/q8"
+        case .magpieTts:
+            return "magpie-tts-multilingual-357m-coreml"
         }
     }
 
@@ -185,6 +188,8 @@ public enum Repo: String, CaseIterable, Sendable {
             return "cosyvoice3"
         case .cohereTranscribeCoreml:
             return "cohere-transcribe/q8"
+        case .magpieTts:
+            return "magpie-tts"
         default:
             return name.replacingOccurrences(of: "-coreml", with: "")
         }
@@ -642,6 +647,35 @@ public enum ModelNames {
         }
     }
 
+    /// Magpie TTS Multilingual 357M model names.
+    ///
+    /// Four CoreML models + a `constants/` directory + a `tokenizer/` directory of
+    /// per-language lookup data. The `decoder_prefill` model is optional; when
+    /// absent the prefill runs step-by-step through `decoder_step`.
+    public enum Magpie {
+        public static let textEncoder = "text_encoder"
+        public static let decoderPrefill = "decoder_prefill"
+        public static let decoderStep = "decoder_step"
+        public static let nanocodecDecoder = "nanocodec_decoder"
+
+        public static let textEncoderFile = textEncoder + ".mlmodelc"
+        public static let decoderPrefillFile = decoderPrefill + ".mlmodelc"
+        public static let decoderStepFile = decoderStep + ".mlmodelc"
+        public static let nanocodecDecoderFile = nanocodecDecoder + ".mlmodelc"
+
+        public static let constantsDir = "constants"
+        public static let tokenizerDir = "tokenizer"
+
+        /// Files required for English synthesis. Other languages append their own
+        /// lookup files on top (see `MagpieResourceDownloader`).
+        public static let requiredModels: Set<String> = [
+            textEncoderFile,
+            decoderStepFile,
+            nanocodecDecoderFile,
+            constantsDir,
+        ]
+    }
+
     /// Multilingual G2P (CharsiuG2P ByT5) model names
     public enum MultilingualG2P {
         public static let encoder = "MultilingualG2PEncoder"
@@ -848,6 +882,8 @@ public enum ModelNames {
             return ModelNames.CosyVoice3.requiredModels
         case .cohereTranscribeCoreml:
             return ModelNames.CohereTranscribe.requiredModels
+        case .magpieTts:
+            return ModelNames.Magpie.requiredModels
         }
     }
 }
diff --git a/Sources/FluidAudio/TTS/Magpie/Assets/MagpieConstantsStore.swift b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieConstantsStore.swift
new file mode 100644
index 000000000..1d53c7b65
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieConstantsStore.swift
@@ -0,0 +1,178 @@
+import Foundation
+
+/// Decoded shape / hyperparameter metadata from `constants/constants.json`.
+///
+/// The field names mirror the Python exporter
+/// (`mobius/.../export_constants.py`). Unknown keys are ignored so the exporter
+/// can add fields without breaking Swift. All fields have safe defaults matching
+/// the published 357M checkpoint so the Swift port remains usable if a key is
+/// dropped in a future rebuild.
+public struct MagpieModelConfig: Sendable, Decodable {
+    public let dModel: Int
+    public let numDecoderLayers: Int
+    public let numHeads: Int
+    public let headDim: Int
+    public let numCodebooks: Int
+    public let numCodesPerCodebook: Int
+    public let maxCacheLength: Int
+    public let maxTextLength: Int
+    public let audioBosId: Int32
+    public let audioEosId: Int32
+    public let speakerContextLength: Int
+
+    enum CodingKeys: String, CodingKey {
+        case dModel = "d_model"
+        case numDecoderLayers = "num_decoder_layers"
+        case numHeads = "num_heads"
+        case headDim = "head_dim"
+        case numCodebooks = "num_codebooks"
+        case numCodesPerCodebook = "num_codes_per_codebook"
+        case maxCacheLength = "max_cache_length"
+        case maxTextLength = "max_text_length"
+        case audioBosId = "audio_bos_id"
+        case audioEosId = "audio_eos_id"
+        case speakerContextLength = "speaker_context_length"
+    }
+
+    public init(from decoder: Decoder) throws {
+        let c = try decoder.container(keyedBy: CodingKeys.self)
+        dModel = (try? c.decode(Int.self, forKey: .dModel)) ?? MagpieConstants.dModel
+        numDecoderLayers =
+            (try? c.decode(Int.self, forKey: .numDecoderLayers)) ?? MagpieConstants.numDecoderLayers
+        numHeads = (try? c.decode(Int.self, forKey: .numHeads)) ?? MagpieConstants.numHeads
+        headDim = (try? c.decode(Int.self, forKey: .headDim)) ?? MagpieConstants.headDim
+        numCodebooks =
+            (try? c.decode(Int.self, forKey: .numCodebooks)) ?? MagpieConstants.numCodebooks
+        numCodesPerCodebook =
+            (try? c.decode(Int.self, forKey: .numCodesPerCodebook))
+            ?? MagpieConstants.numCodesPerCodebook
+        maxCacheLength =
+            (try? c.decode(Int.self, forKey: .maxCacheLength)) ?? MagpieConstants.maxCacheLength
+        maxTextLength =
+            (try? c.decode(Int.self, forKey: .maxTextLength)) ?? MagpieConstants.maxTextLength
+        audioBosId = (try? c.decode(Int32.self, forKey: .audioBosId)) ?? MagpieConstants.audioBosId
+        audioEosId = (try? c.decode(Int32.self, forKey: .audioEosId)) ?? MagpieConstants.audioEosId
+        speakerContextLength =
+            (try? c.decode(Int.self, forKey: .speakerContextLength))
+            ?? MagpieConstants.speakerContextLength
+    }
+
+    public init(
+        dModel: Int = MagpieConstants.dModel,
+        numDecoderLayers: Int = MagpieConstants.numDecoderLayers,
+        numHeads: Int = MagpieConstants.numHeads,
+        headDim: Int = MagpieConstants.headDim,
+        numCodebooks: Int = MagpieConstants.numCodebooks,
+        numCodesPerCodebook: Int = MagpieConstants.numCodesPerCodebook,
+        maxCacheLength: Int = MagpieConstants.maxCacheLength,
+        maxTextLength: Int = MagpieConstants.maxTextLength,
+        audioBosId: Int32 = MagpieConstants.audioBosId,
+        audioEosId: Int32 = MagpieConstants.audioEosId,
+        speakerContextLength: Int = MagpieConstants.speakerContextLength
+    ) {
+        self.dModel = dModel
+        self.numDecoderLayers = numDecoderLayers
+        self.numHeads = numHeads
+        self.headDim = headDim
+        self.numCodebooks = numCodebooks
+        self.numCodesPerCodebook = numCodesPerCodebook
+        self.maxCacheLength = maxCacheLength
+        self.maxTextLength = maxTextLength
+        self.audioBosId = audioBosId
+        self.audioEosId = audioEosId
+        self.speakerContextLength = speakerContextLength
+    }
+}
+
+/// Loaded constants: config, per-speaker embeddings (fp32), per-codebook
+/// audio embeddings (fp32). All arrays are stored row-major.
+public struct MagpieConstantsBundle: Sendable {
+    public let config: MagpieModelConfig
+    /// Shape: [numSpeakers][contextLength × dModel]. Row-major.
+    public let speakerEmbeddings: [[Float]]
+    /// Shape: [numCodebooks][numCodesPerCodebook × dModel]. Row-major.
+    public let audioEmbeddings: [[Float]]
+    /// Text tokenizer EOS id (from `tokenizer_metadata.json`; 0 if absent).
+    public let textEosId: Int32
+}
+
+/// Loads Magpie constants from a directory (typically `<repo>/constants/`).
+public enum MagpieConstantsLoader {
+
+    private static let logger = AppLogger(category: "MagpieConstantsLoader")
+
+    public static func load(from constantsDir: URL) throws -> MagpieConstantsBundle {
+        let config = try loadConfig(from: constantsDir)
+
+        var speakerEmbeddings: [[Float]] = []
+        speakerEmbeddings.reserveCapacity(MagpieConstants.numSpeakers)
+        for idx in 0..<MagpieConstants.numSpeakers {
+            let url = constantsDir.appendingPathComponent(
+                MagpieConstants.Files.speakerEmbedding(index: idx))
+            guard FileManager.default.fileExists(atPath: url.path) else {
+                throw MagpieError.modelFileNotFound(url.lastPathComponent)
+            }
+            let array = try NpyReader.read(from: url)
+            try array.assertShape([config.speakerContextLength, config.dModel], label: url.lastPathComponent)
+            speakerEmbeddings.append(array.data)
+        }
+
+        var audioEmbeddings: [[Float]] = []
+        audioEmbeddings.reserveCapacity(config.numCodebooks)
+        for cb in 0..<config.numCodebooks {
+            let url = constantsDir.appendingPathComponent(
+                MagpieConstants.Files.audioEmbedding(codebook: cb))
+            guard FileManager.default.fileExists(atPath: url.path) else {
+                throw MagpieError.modelFileNotFound(url.lastPathComponent)
+            }
+            let array = try NpyReader.read(from: url)
+            try array.assertShape([config.numCodesPerCodebook, config.dModel], label: url.lastPathComponent)
+            audioEmbeddings.append(array.data)
+        }
+
+        let textEosId = loadTextEosId(from: constantsDir)
+
+        logger.info(
+            "Loaded Magpie constants: \(speakerEmbeddings.count) speakers × \(config.speakerContextLength)×\(config.dModel), \(audioEmbeddings.count) codebooks × \(config.numCodesPerCodebook)×\(config.dModel), textEosId=\(textEosId)"
+        )
+
+        return MagpieConstantsBundle(
+            config: config,
+            speakerEmbeddings: speakerEmbeddings,
+            audioEmbeddings: audioEmbeddings,
+            textEosId: textEosId
+        )
+    }
+
+    private static func loadTextEosId(from dir: URL) -> Int32 {
+        let url = dir.appendingPathComponent(MagpieConstants.Files.tokenizerMetadataJson)
+        guard FileManager.default.fileExists(atPath: url.path),
+            let data = try? Data(contentsOf: url),
+            let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any]
+        else {
+            return 0
+        }
+        if let eos = json["eos_token_id"] as? Int {
+            return Int32(eos)
+        }
+        if let eos = json["text_eos_id"] as? Int {
+            return Int32(eos)
+        }
+        return 0
+    }
+
+    private static func loadConfig(from dir: URL) throws -> MagpieModelConfig {
+        let url = dir.appendingPathComponent(MagpieConstants.Files.constantsJson)
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            logger.warning("constants.json missing; falling back to built-in defaults")
+            return MagpieModelConfig()
+        }
+        do {
+            let data = try Data(contentsOf: url)
+            return try JSONDecoder().decode(MagpieModelConfig.self, from: data)
+        } catch {
+            throw MagpieError.invalidConstants("constants.json: \(error)")
+        }
+    }
+
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Assets/MagpieLocalTransformerWeights.swift b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieLocalTransformerWeights.swift
new file mode 100644
index 000000000..f5cc371a5
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieLocalTransformerWeights.swift
@@ -0,0 +1,162 @@
+import Foundation
+
+/// Weights for the Swift-side 1-layer Local Transformer that samples the 8
+/// codebook tokens per frame.
+///
+/// Shapes match the NumPy reference in `mobius/models/tts/magpie/coreml/generate_coreml.py`
+/// (fn `local_transformer_forward`). All arrays are kept row-major fp32 so the
+/// Accelerate + BNNS forward pass can consume them directly.
+public struct MagpieLocalTransformerWeights: Sendable {
+    // Input projection: (localDim, dModel) weight + (localDim,) bias.
+    public let inProjWeight: [Float]
+    public let inProjBias: [Float]
+    /// Positional embedding slots: (maxPositions, localDim).
+    public let posEmbedding: [Float]
+    /// RMSNorm / LayerNorm weights: (localDim,) each.
+    public let norm1Weight: [Float]
+    public let norm2Weight: [Float]
+    /// Self-attention QKV weight: (3*localDim, localDim).
+    public let saQkvWeight: [Float]
+    /// Self-attention output weight: (localDim, localDim).
+    public let saOWeight: [Float]
+    /// FFN conv kernel=1: (ffnDim, localDim) then (localDim, ffnDim).
+    public let ffnConv1Weight: [Float]
+    public let ffnConv2Weight: [Float]
+    /// Per-codebook output heads: 8× (numCodesPerCodebook, localDim) + (numCodesPerCodebook,).
+    public let outProjWeights: [[Float]]
+    public let outProjBiases: [[Float]]
+
+    // Cached dimensions for convenience.
+    public let localDim: Int
+    public let dModel: Int
+    public let ffnDim: Int
+    public let maxPositions: Int
+    public let numCodebooks: Int
+    public let numCodesPerCodebook: Int
+}
+
+public enum MagpieLocalTransformerLoader {
+
+    private static let logger = AppLogger(category: "MagpieLocalTransformerLoader")
+
+    /// Loads all `local_transformer/*.npy` files from `constantsDir`.
+    public static func load(
+        from constantsDir: URL,
+        config: MagpieModelConfig
+    ) throws -> MagpieLocalTransformerWeights {
+        let ltDir = constantsDir.appendingPathComponent(MagpieConstants.Files.localTransformerDir)
+        guard FileManager.default.fileExists(atPath: ltDir.path) else {
+            throw MagpieError.modelFileNotFound(MagpieConstants.Files.localTransformerDir)
+        }
+
+        let localDim = MagpieConstants.localTransformerDim
+        let ffnDim = MagpieConstants.localTransformerFfnDim
+        let maxPositions = MagpieConstants.localTransformerMaxPositions
+        let dModel = config.dModel
+        let numCodebooks = config.numCodebooks
+        let numCodesPerCodebook = config.numCodesPerCodebook
+
+        func loadNpy(_ name: String, expecting shape: [Int]) throws -> [Float] {
+            let url = ltDir.appendingPathComponent(name)
+            guard FileManager.default.fileExists(atPath: url.path) else {
+                throw MagpieError.modelFileNotFound("\(MagpieConstants.Files.localTransformerDir)/\(name)")
+            }
+            let array = try NpyReader.read(from: url)
+            try array.assertShape(shape, label: name)
+            return array.data
+        }
+
+        let inProjWeight = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.inProjWeight,
+            expecting: [localDim, dModel])
+        let inProjBias = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.inProjBias,
+            expecting: [localDim])
+        let posEmbedding = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.posEmb,
+            expecting: [maxPositions, localDim])
+        let norm1Weight = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.norm1Weight,
+            expecting: [localDim])
+        let norm2Weight = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.norm2Weight,
+            expecting: [localDim])
+        let saQkvWeight = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.saQkvWeight,
+            expecting: [3 * localDim, localDim])
+        let saOWeight = try loadNpy(
+            MagpieConstants.Files.LocalTransformer.saOWeight,
+            expecting: [localDim, localDim])
+        // Conv1d kernel=1 is effectively (out, in) matmul; the exporter keeps
+        // the trailing kernel dim so we accept either [out, in] or [out, in, 1].
+        let ffnConv1Weight = try loadFlexible(
+            name: MagpieConstants.Files.LocalTransformer.ffnConv1Weight,
+            directory: ltDir,
+            primary: [ffnDim, localDim],
+            alternate: [ffnDim, localDim, 1])
+        let ffnConv2Weight = try loadFlexible(
+            name: MagpieConstants.Files.LocalTransformer.ffnConv2Weight,
+            directory: ltDir,
+            primary: [localDim, ffnDim],
+            alternate: [localDim, ffnDim, 1])
+
+        var outProjWeights: [[Float]] = []
+        var outProjBiases: [[Float]] = []
+        outProjWeights.reserveCapacity(numCodebooks)
+        outProjBiases.reserveCapacity(numCodebooks)
+        for cb in 0..<numCodebooks {
+            let w = try loadNpy(
+                MagpieConstants.Files.LocalTransformer.outProjWeight(codebook: cb),
+                expecting: [numCodesPerCodebook, localDim])
+            let b = try loadNpy(
+                MagpieConstants.Files.LocalTransformer.outProjBias(codebook: cb),
+                expecting: [numCodesPerCodebook])
+            outProjWeights.append(w)
+            outProjBiases.append(b)
+        }
+
+        logger.info(
+            "Loaded local transformer weights: localDim=\(localDim), ffnDim=\(ffnDim), maxPositions=\(maxPositions), codebooks=\(numCodebooks)"
+        )
+
+        return MagpieLocalTransformerWeights(
+            inProjWeight: inProjWeight,
+            inProjBias: inProjBias,
+            posEmbedding: posEmbedding,
+            norm1Weight: norm1Weight,
+            norm2Weight: norm2Weight,
+            saQkvWeight: saQkvWeight,
+            saOWeight: saOWeight,
+            ffnConv1Weight: ffnConv1Weight,
+            ffnConv2Weight: ffnConv2Weight,
+            outProjWeights: outProjWeights,
+            outProjBiases: outProjBiases,
+            localDim: localDim,
+            dModel: dModel,
+            ffnDim: ffnDim,
+            maxPositions: maxPositions,
+            numCodebooks: numCodebooks,
+            numCodesPerCodebook: numCodesPerCodebook
+        )
+    }
+
+    /// Loads a `.npy` file accepting either `primary` or `alternate` shape. Returns
+    /// the raw float buffer; callers treat both shapes as equivalent (conv1d
+    /// kernel=1 vs plain matmul).
+    private static func loadFlexible(
+        name: String, directory: URL, primary: [Int], alternate: [Int]
+    ) throws -> [Float] {
+        let url = directory.appendingPathComponent(name)
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            throw MagpieError.modelFileNotFound(
+                "\(MagpieConstants.Files.localTransformerDir)/\(name)")
+        }
+        let array = try NpyReader.read(from: url)
+        if array.shape == primary || array.shape == alternate {
+            return array.data
+        }
+        throw MagpieError.invalidNpyFile(
+            path: name,
+            reason: "expected shape \(primary) or \(alternate), got \(array.shape)")
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Assets/MagpieModelStore.swift b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieModelStore.swift
new file mode 100644
index 000000000..d92e7c719
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieModelStore.swift
@@ -0,0 +1,216 @@
+@preconcurrency import CoreML
+import Foundation
+
+/// Actor-based store for Magpie CoreML models + constants + LocalTransformer weights.
+///
+/// Manages loading of 3 required models (text_encoder, decoder_step, nanocodec_decoder)
+/// and 1 optional model (decoder_prefill). Also holds the pre-loaded
+/// `MagpieConstantsBundle` and `MagpieLocalTransformerWeights` so the synthesizer
+/// can hit all assets from a single entry point.
+public actor MagpieModelStore {
+
+    private let logger = AppLogger(category: "MagpieModelStore")
+
+    private var textEncoderModel: MLModel?
+    private var decoderPrefillModel: MLModel?  // optional fast path
+    private var decoderStepModel: MLModel?
+    private var nanocodecDecoderModel: MLModel?
+
+    private var constantsBundle: MagpieConstantsBundle?
+    private var localTransformerWeights: MagpieLocalTransformerWeights?
+
+    private var repoDirectory: URL?
+
+    private let directory: URL?
+    private let computeUnits: MLComputeUnits
+    private let preferredLanguages: Set<MagpieLanguage>
+
+    /// - Parameters:
+    ///   - directory: Optional override for the base cache directory.
+    ///   - computeUnits: CoreML compute preference for all models.
+    ///   - preferredLanguages: Set of languages whose tokenizer data should be fetched.
+    public init(
+        directory: URL? = nil,
+        computeUnits: MLComputeUnits = .cpuAndNeuralEngine,
+        preferredLanguages: Set<MagpieLanguage> = [.english]
+    ) {
+        self.directory = directory
+        self.computeUnits = computeUnits
+        self.preferredLanguages = preferredLanguages
+    }
+
+    /// Download (if missing) and load all Magpie CoreML models + constants.
+    public func loadIfNeeded() async throws {
+        if textEncoderModel != nil {
+            return
+        }
+
+        let repoDir = try await MagpieResourceDownloader.ensureAssets(
+            languages: preferredLanguages,
+            directory: directory,
+            includePrefill: true
+        )
+        self.repoDirectory = repoDir
+
+        logger.info("Loading Magpie CoreML models from \(repoDir.path)…")
+
+        let config = MLModelConfiguration()
+        config.computeUnits = computeUnits
+
+        // `decoder_step.mlmodelc` reliably fails ANE compilation
+        // (`MILCompilerForANE error: ANECCompile() FAILED`) due to its rank-4
+        // split-K/V scatter layout, then falls back to CPU at the cost of one
+        // failed ANE compile attempt per call (~hundreds of ms each). Pin it
+        // to `.cpuAndGPU` so CoreML skips the ANE attempt entirely and runs
+        // on Metal MPS — verified end-to-end as the fastest path
+        // (96s warm vs 103s warm on `.cpuAndNeuralEngine`).
+        let gpuConfig = MLModelConfiguration()
+        gpuConfig.computeUnits =
+            computeUnits == .cpuOnly ? .cpuOnly : .cpuAndGPU
+
+        // `nanocodec_decoder.mlmodelc` is fastest on **CPU only**. The model's
+        // upsample stack (5 transposed convs + 96 sin/pow per-frame embedding
+        // ops + 86 LeakyReLU) doesn't map well onto Metal MPS, and ANE compile
+        // fails on its conv stack. Empirically (M-series, single fwd of 256
+        // frames):
+        //   .cpuOnly             ~2.87 s
+        //   .cpuAndGPU           ~3.86 s
+        //   .cpuAndNeuralEngine ~10.12 s   (ANE compile fail → CPU fallback dance)
+        //   .all                 ~2.95 s
+        // Putting it on `.cpuAndGPU` also makes `decoder_step` ~40 ms/step
+        // because both contend for the same Metal queue. Pinning nanocodec to
+        // CPU keeps Metal exclusive for decoder_step (25 ms/step) and saves a
+        // full second on the nanocodec call → ~1.03x RTFx vs ~0.91x before.
+        let cpuConfig = MLModelConfiguration()
+        cpuConfig.computeUnits = .cpuOnly
+
+        let loadStart = Date()
+
+        textEncoderModel = try loadModel(
+            repoDir: repoDir,
+            fileName: ModelNames.Magpie.textEncoderFile,
+            config: config,
+            required: true)
+
+        decoderStepModel = try loadModel(
+            repoDir: repoDir,
+            fileName: ModelNames.Magpie.decoderStepFile,
+            config: gpuConfig,
+            required: true)
+
+        nanocodecDecoderModel = try loadModel(
+            repoDir: repoDir,
+            fileName: ModelNames.Magpie.nanocodecDecoderFile,
+            config: cpuConfig,
+            required: true)
+
+        decoderPrefillModel = try loadModel(
+            repoDir: repoDir,
+            fileName: ModelNames.Magpie.decoderPrefillFile,
+            config: config,
+            required: false)
+
+        let elapsed = Date().timeIntervalSince(loadStart)
+        logger.info(
+            "Magpie models loaded in \(String(format: "%.2f", elapsed))s (prefill \(decoderPrefillModel == nil ? "absent" : "present"))"
+        )
+
+        // Load constants + local transformer weights.
+        let constantsDir = MagpieResourceDownloader.constantsDirectory(in: repoDir)
+        let bundle = try MagpieConstantsLoader.load(from: constantsDir)
+        constantsBundle = bundle
+        localTransformerWeights = try MagpieLocalTransformerLoader.load(
+            from: constantsDir, config: bundle.config)
+    }
+
+    public func textEncoder() throws -> MLModel {
+        guard let model = textEncoderModel else {
+            throw MagpieError.notInitialized
+        }
+        return model
+    }
+
+    public func decoderStep() throws -> MLModel {
+        guard let model = decoderStepModel else {
+            throw MagpieError.notInitialized
+        }
+        return model
+    }
+
+    public func nanocodecDecoder() throws -> MLModel {
+        guard let model = nanocodecDecoderModel else {
+            throw MagpieError.notInitialized
+        }
+        return model
+    }
+
+    public func decoderPrefill() throws -> MLModel {
+        guard let model = decoderPrefillModel else {
+            throw MagpieError.notInitialized
+        }
+        return model
+    }
+
+    public func hasDecoderPrefill() -> Bool {
+        decoderPrefillModel != nil
+    }
+
+    public func constants() throws -> MagpieConstantsBundle {
+        guard let bundle = constantsBundle else {
+            throw MagpieError.notInitialized
+        }
+        return bundle
+    }
+
+    public func localTransformer() throws -> MagpieLocalTransformerWeights {
+        guard let weights = localTransformerWeights else {
+            throw MagpieError.notInitialized
+        }
+        return weights
+    }
+
+    public func repoDir() throws -> URL {
+        guard let dir = repoDirectory else {
+            throw MagpieError.notInitialized
+        }
+        return dir
+    }
+
+    /// Release all loaded models + constants. Resource downloads on disk are kept.
+    public func unload() {
+        textEncoderModel = nil
+        decoderPrefillModel = nil
+        decoderStepModel = nil
+        nanocodecDecoderModel = nil
+        constantsBundle = nil
+        localTransformerWeights = nil
+    }
+
+    // MARK: - Helpers
+
+    private func loadModel(
+        repoDir: URL, fileName: String, config: MLModelConfiguration, required: Bool
+    ) throws -> MLModel? {
+        let modelURL = repoDir.appendingPathComponent(fileName)
+        guard FileManager.default.fileExists(atPath: modelURL.path) else {
+            if required {
+                throw MagpieError.modelFileNotFound(fileName)
+            } else {
+                logger.notice("Optional model \(fileName) not present; skipping")
+                return nil
+            }
+        }
+        do {
+            let model = try MLModel(contentsOf: modelURL, configuration: config)
+            logger.info("Loaded \(fileName)")
+            return model
+        } catch {
+            if required {
+                throw MagpieError.corruptedModel(fileName, underlying: "\(error)")
+            } else {
+                logger.warning("Failed to load optional \(fileName): \(error)")
+                return nil
+            }
+        }
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Assets/MagpieResourceDownloader.swift b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieResourceDownloader.swift
new file mode 100644
index 000000000..c331353b1
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Assets/MagpieResourceDownloader.swift
@@ -0,0 +1,188 @@
+import Foundation
+
+/// Downloads Magpie TTS models, constants, and per-language tokenizer data from HuggingFace.
+///
+/// The HF repo (`FluidInference/magpie-tts-multilingual-357m-coreml`) ships:
+/// - 3 required CoreML models + 1 optional prefill model at the repo root
+/// - `constants/` with model config, speaker embeddings, audio codebook tables, and
+///   the local-transformer weights (downloaded as one subtree)
+/// - `tokenizer/` with per-language lookup data (lazy per language)
+public enum MagpieResourceDownloader {
+
+    private static let logger = AppLogger(category: "MagpieResourceDownloader")
+
+    /// Ensure the CoreML models + `constants/` directory are present locally, and
+    /// ensure tokenizer data for each requested language is present. Returns the
+    /// resolved repo directory (i.e. the root containing the `.mlmodelc` files).
+    public static func ensureAssets(
+        languages: Set<MagpieLanguage> = [.english],
+        directory: URL? = nil,
+        includePrefill: Bool = true,
+        progressHandler: DownloadUtils.ProgressHandler? = nil
+    ) async throws -> URL {
+        let modelsRoot = try directory ?? defaultCacheRoot()
+        let repoDir = modelsRoot.appendingPathComponent(Repo.magpieTts.folderName)
+
+        let rootModelsPresent = ModelNames.Magpie.requiredModels.allSatisfy { entry in
+            FileManager.default.fileExists(atPath: repoDir.appendingPathComponent(entry).path)
+        }
+
+        if !rootModelsPresent {
+            logger.info("Downloading Magpie TTS models from HuggingFace…")
+            try await DownloadUtils.downloadRepo(
+                .magpieTts, to: modelsRoot, progressHandler: progressHandler)
+        } else {
+            logger.info("Magpie TTS models found in cache")
+        }
+
+        if includePrefill {
+            let prefillURL = repoDir.appendingPathComponent(ModelNames.Magpie.decoderPrefillFile)
+            if !FileManager.default.fileExists(atPath: prefillURL.path) {
+                logger.info("Fetching optional decoder_prefill model")
+                do {
+                    try await DownloadUtils.downloadSubdirectory(
+                        .magpieTts,
+                        subdirectory: ModelNames.Magpie.decoderPrefillFile,
+                        to: repoDir
+                    )
+                } catch {
+                    logger.warning(
+                        "decoder_prefill unavailable; falling back to step-by-step prefill: \(error)"
+                    )
+                }
+            }
+        }
+
+        for language in languages {
+            try await ensureTokenizer(for: language, repoDirectory: repoDir)
+        }
+
+        return repoDir
+    }
+
+    /// Ensure tokenizer data for `language` exists. No-op for ByT5-only languages
+    /// (French, Italian, Vietnamese) since those use pure byte-level encoding.
+    public static func ensureTokenizer(
+        for language: MagpieLanguage, repoDirectory: URL
+    ) async throws {
+        let files = MagpieTokenizerFiles.files(for: language)
+        if files.isEmpty { return }
+
+        let tokenizerDir = repoDirectory.appendingPathComponent(ModelNames.Magpie.tokenizerDir)
+        if !FileManager.default.fileExists(atPath: tokenizerDir.path) {
+            try FileManager.default.createDirectory(
+                at: tokenizerDir, withIntermediateDirectories: true)
+        }
+
+        for file in files {
+            let localURL = tokenizerDir.appendingPathComponent(file)
+            if FileManager.default.fileExists(atPath: localURL.path) { continue }
+
+            let remotePath = "\(ModelNames.Magpie.tokenizerDir)/\(file)"
+            logger.info("Downloading Magpie tokenizer file: \(remotePath)")
+            let remoteURL: URL
+            do {
+                remoteURL = try ModelRegistry.resolveModel(Repo.magpieTts.remotePath, remotePath)
+            } catch {
+                throw MagpieError.downloadFailed(
+                    "failed to resolve HF URL for \(remotePath): \(error)")
+            }
+
+            do {
+                let data = try await AssetDownloader.fetchData(
+                    from: remoteURL,
+                    description: "magpie tokenizer \(file)",
+                    logger: logger
+                )
+                try data.write(to: localURL, options: [.atomic])
+            } catch {
+                throw MagpieError.tokenizerDataMissing(
+                    language: language.rawValue, file: file)
+            }
+        }
+    }
+
+    /// Return the directory that holds constants (JSON + npy + local_transformer/).
+    public static func constantsDirectory(in repoDirectory: URL) -> URL {
+        repoDirectory.appendingPathComponent(ModelNames.Magpie.constantsDir)
+    }
+
+    /// Return the directory that holds per-language tokenizer lookups.
+    public static func tokenizerDirectory(in repoDirectory: URL) -> URL {
+        repoDirectory.appendingPathComponent(ModelNames.Magpie.tokenizerDir)
+    }
+
+    private static func defaultCacheRoot() throws -> URL {
+        let base: URL
+        #if os(macOS)
+        base = FileManager.default.homeDirectoryForCurrentUser
+            .appendingPathComponent(".cache")
+        #else
+        guard
+            let first = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first
+        else {
+            throw MagpieError.downloadFailed("failed to locate caches directory")
+        }
+        base = first
+        #endif
+        let root = base.appendingPathComponent("fluidaudio").appendingPathComponent("Models")
+        if !FileManager.default.fileExists(atPath: root.path) {
+            try FileManager.default.createDirectory(at: root, withIntermediateDirectories: true)
+        }
+        return root
+    }
+}
+
+/// Authoritative list of per-language tokenizer files. The emitters in
+/// `mobius/models/tts/magpie/export_tokenizers.py` produce these names; the Swift
+/// tokenizers consume them.
+public enum MagpieTokenizerFiles {
+    /// Tokenizer filenames emitted by
+    /// `mobius/models/tts/magpie/coreml/export_tokenizers.py`. The naming convention
+    /// is `{tokenizer_name}_{suffix}.json` where `tokenizer_name` follows the NeMo
+    /// AggregatedTTSTokenizer names (e.g. `english_phoneme`, `french_chartokenizer`).
+    public static func files(for language: MagpieLanguage) -> [String] {
+        let base = tokenizerName(for: language)
+        switch language {
+        case .english, .spanish, .italian, .vietnamese:
+            // IPA G2P: token2id + phoneme_dict.
+            return ["\(base)_token2id.json", "\(base)_phoneme_dict.json"]
+        case .german:
+            // IPA G2P with heteronym fallback.
+            return [
+                "\(base)_token2id.json",
+                "\(base)_phoneme_dict.json",
+                "\(base)_heteronyms.json",
+            ]
+        case .french, .hindi:
+            // Char-based tokenizers: only token2id lookup.
+            return ["\(base)_token2id.json"]
+        case .mandarin:
+            // pypinyin (phrase + char) + tone / letter / token2id maps.
+            return [
+                "\(base)_token2id.json",
+                "\(base)_pinyin_dict.json",
+                "\(base)_tone_dict.json",
+                "\(base)_ascii_letter_dict.json",
+                "mandarin_pypinyin_char_dict.json",
+                "mandarin_pypinyin_phrase_dict.json",
+                "mandarin_jieba_dict.json",
+            ]
+        }
+    }
+
+    /// NeMo tokenizer name for the given language (matches the Python map in
+    /// `generate_coreml._tokenize_text`).
+    public static func tokenizerName(for language: MagpieLanguage) -> String {
+        switch language {
+        case .english: return "english_phoneme"
+        case .spanish: return "spanish_phoneme"
+        case .german: return "german_phoneme"
+        case .italian: return "italian_phoneme"
+        case .vietnamese: return "vietnamese_phoneme"
+        case .mandarin: return "mandarin_phoneme"
+        case .french: return "french_chartokenizer"
+        case .hindi: return "hindi_chartokenizer"
+        }
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieLocalTransformer.swift b/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieLocalTransformer.swift
new file mode 100644
index 000000000..a9c9ae6cd
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieLocalTransformer.swift
@@ -0,0 +1,349 @@
+import Accelerate
+import Foundation
+
+/// Swift-side 1-layer Local Transformer forward pass.
+///
+/// Mirrors `local_transformer_forward` in
+/// `mobius/models/tts/magpie/coreml/generate_coreml.py` (lines 108–155):
+/// pre-norm causal self-attention → pre-norm FFN with tanh-GELU. Single attention
+/// head, localDim=256. Uses BLAS (`cblas_sgemm`) for every matmul so the AR loop
+/// stays cache-resident.
+///
+/// The transformer is stateless across frames — each call to
+/// `MagpieLocalTransformerSampler.sample(...)` rebuilds the sequence from the
+/// current decoder hidden state and the 8 tokens sampled so far.
+public struct MagpieLocalTransformer: Sendable {
+
+    public let weights: MagpieLocalTransformerWeights
+
+    public init(weights: MagpieLocalTransformerWeights) {
+        self.weights = weights
+    }
+
+    /// Forward pass for a sequence of length `T` (T ≤ numCodebooks+2).
+    ///
+    /// - Parameter sequence: `[T * localDim]` row-major fp32 (input sequence
+    ///   including positional embeddings yet to be added — this routine adds them).
+    ///   Caller must supply `T` explicitly to avoid ambiguity on partial buffers.
+    /// - Returns: `[T * localDim]` row-major output.
+    public func forward(sequence: [Float], length T: Int) -> [Float] {
+        precondition(sequence.count >= T * weights.localDim, "sequence buffer too small")
+        precondition(T <= weights.maxPositions, "sequence length exceeds maxPositions")
+
+        let D = weights.localDim
+        let ffnD = weights.ffnDim
+
+        // x = sequence[:T*D] + posEmbedding[:T*D]
+        var x = Swift.Array(sequence.prefix(T * D))
+        addPositional(into: &x, length: T)
+
+        // ── Pre-norm causal self-attention ──
+        var xNorm = layerNorm(x, length: T, weight: weights.norm1Weight)
+
+        // QKV = xNorm @ sa_qkv_weight.T   (T,D) × (3D,D)ᵀ → (T, 3D)
+        var qkv = Swift.Array<Float>(repeating: 0, count: T * 3 * D)
+        matmulTransB(
+            a: xNorm, aRows: T, aCols: D,
+            b: weights.saQkvWeight, bRows: 3 * D, bCols: D,
+            out: &qkv)
+
+        // Split QKV into Q, K, V (each T × D). Direct memcpy from packed (T, 3D)
+        // buffer; no intermediate Swift sub-array allocations per row.
+        var q = Swift.Array<Float>(repeating: 0, count: T * D)
+        var k = Swift.Array<Float>(repeating: 0, count: T * D)
+        var v = Swift.Array<Float>(repeating: 0, count: T * D)
+        let bytesPerRow = D * MemoryLayout<Float>.size
+        qkv.withUnsafeBufferPointer { srcPtr in
+            q.withUnsafeMutableBufferPointer { qPtr in
+                k.withUnsafeMutableBufferPointer { kPtr in
+                    v.withUnsafeMutableBufferPointer { vPtr in
+                        guard let src = srcPtr.baseAddress,
+                            let qb = qPtr.baseAddress,
+                            let kb = kPtr.baseAddress,
+                            let vb = vPtr.baseAddress
+                        else { return }
+                        for t in 0..<T {
+                            let srcRow = src.advanced(by: t * 3 * D)
+                            let dstOff = t * D
+                            memcpy(qb.advanced(by: dstOff), srcRow, bytesPerRow)
+                            memcpy(kb.advanced(by: dstOff), srcRow.advanced(by: D), bytesPerRow)
+                            memcpy(vb.advanced(by: dstOff), srcRow.advanced(by: 2 * D), bytesPerRow)
+                        }
+                    }
+                }
+            }
+        }
+
+        // attn = Q @ Kᵀ * scale  (T × T)
+        var attn = Swift.Array<Float>(repeating: 0, count: T * T)
+        matmulTransB(
+            a: q, aRows: T, aCols: D,
+            b: k, bRows: T, bCols: D,
+            out: &attn)
+        let scale = Float(1.0 / sqrt(Double(D)))
+        var scaleVar = scale
+        vDSP_vsmul(attn, 1, &scaleVar, &attn, 1, vDSP_Length(T * T))
+
+        // Causal mask + softmax
+        for t in 0..<T {
+            // Mask out positions > t (future). Then softmax over [0, t].
+            var maxVal: Float = -.infinity
+            for j in 0...t {
+                if attn[t * T + j] > maxVal { maxVal = attn[t * T + j] }
+            }
+            var denom: Float = 0
+            for j in 0..<T {
+                if j <= t {
+                    let e = expf(attn[t * T + j] - maxVal)
+                    attn[t * T + j] = e
+                    denom += e
+                } else {
+                    attn[t * T + j] = 0
+                }
+            }
+            if denom > 0 {
+                let invDenom = 1.0 / denom
+                for j in 0...t {
+                    attn[t * T + j] *= invDenom
+                }
+            }
+        }
+
+        // saOut = attn @ V      (T × T) × (T × D) → (T × D)
+        var saOut = Swift.Array<Float>(repeating: 0, count: T * D)
+        matmul(
+            a: attn, aRows: T, aCols: T,
+            b: v, bRows: T, bCols: D,
+            out: &saOut)
+
+        // saOut = saOut @ sa_o_weight.T    (T, D) × (D, D)ᵀ → (T, D)
+        var saProj = Swift.Array<Float>(repeating: 0, count: T * D)
+        matmulTransB(
+            a: saOut, aRows: T, aCols: D,
+            b: weights.saOWeight, bRows: D, bCols: D,
+            out: &saProj)
+
+        // x += saProj
+        vDSP_vadd(x, 1, saProj, 1, &x, 1, vDSP_Length(T * D))
+
+        // ── Pre-norm FFN ──
+        xNorm = layerNorm(x, length: T, weight: weights.norm2Weight)
+
+        // h = gelu(xNorm @ ffn_conv1_weight.T)  → (T, ffnD)
+        var h = Swift.Array<Float>(repeating: 0, count: T * ffnD)
+        matmulTransB(
+            a: xNorm, aRows: T, aCols: D,
+            b: weights.ffnConv1Weight, bRows: ffnD, bCols: D,
+            out: &h)
+        applyGeluTanh(into: &h)
+
+        // x += h @ ffn_conv2_weight.T           → (T, D)
+        var ffnOut = Swift.Array<Float>(repeating: 0, count: T * D)
+        matmulTransB(
+            a: h, aRows: T, aCols: ffnD,
+            b: weights.ffnConv2Weight, bRows: D, bCols: ffnD,
+            out: &ffnOut)
+        vDSP_vadd(x, 1, ffnOut, 1, &x, 1, vDSP_Length(T * D))
+
+        return x
+    }
+
+    /// Project a (dModel,) decoder hidden state through the input projection
+    /// → (localDim,). Used by the sampler to seed the LT sequence.
+    public func projectInput(hidden: [Float]) -> [Float] {
+        precondition(hidden.count == weights.dModel)
+        var out = weights.inProjBias  // copy bias
+        // out += inProjWeight @ hidden  (localDim, dModel) × (dModel,) → (localDim,)
+        inProjWeightApply(hidden: hidden, accumulate: &out)
+        return out
+    }
+
+    /// Compute logits for codebook `cb`: last-timestep out_proj head.
+    public func codebookLogits(lastHidden: [Float], codebook: Int) -> [Float] {
+        precondition(lastHidden.count == weights.localDim)
+        let numCodes = weights.numCodesPerCodebook
+        var logits = weights.outProjBiases[codebook]  // copy bias (numCodes,)
+        // logits += outProjWeights[codebook] @ lastHidden  (numCodes, localDim) × (localDim,)
+        let w = weights.outProjWeights[codebook]
+        w.withUnsafeBufferPointer { wPtr in
+            lastHidden.withUnsafeBufferPointer { hPtr in
+                logits.withUnsafeMutableBufferPointer { outPtr in
+                    cblas_sgemv(
+                        CblasRowMajor, CblasNoTrans,
+                        Int32(numCodes), Int32(weights.localDim),
+                        1.0,
+                        wPtr.baseAddress, Int32(weights.localDim),
+                        hPtr.baseAddress, 1,
+                        1.0,
+                        outPtr.baseAddress, 1)
+                }
+            }
+        }
+        return logits
+    }
+
+    // MARK: - Private helpers
+
+    private func addPositional(into buffer: inout [Float], length T: Int) {
+        let D = weights.localDim
+        let count = T * D
+        var tmp = buffer
+        weights.posEmbedding.withUnsafeBufferPointer { posPtr in
+            tmp.withUnsafeMutableBufferPointer { dstPtr in
+                // Only use first T rows of posEmbedding.
+                vDSP_vadd(
+                    dstPtr.baseAddress!, 1,
+                    posPtr.baseAddress!, 1,
+                    dstPtr.baseAddress!, 1,
+                    vDSP_Length(count))
+            }
+        }
+        buffer = tmp
+    }
+
+    private func layerNorm(_ x: [Float], length T: Int, weight: [Float]) -> [Float] {
+        let D = weights.localDim
+        var out = Swift.Array<Float>(repeating: 0, count: T * D)
+        let eps: Float = 1e-5
+        x.withUnsafeBufferPointer { xPtr in
+            weight.withUnsafeBufferPointer { wPtr in
+                out.withUnsafeMutableBufferPointer { outPtr in
+                    guard let xBase = xPtr.baseAddress,
+                        let wBase = wPtr.baseAddress,
+                        let outBase = outPtr.baseAddress
+                    else { return }
+                    for t in 0..<T {
+                        let row = xBase.advanced(by: t * D)
+                        let outRow = outBase.advanced(by: t * D)
+                        // mean = avg(row)
+                        var mean: Float = 0
+                        vDSP_meanv(row, 1, &mean, vDSP_Length(D))
+                        // outRow = row - mean (in-place via vsadd with -mean)
+                        var negMean = -mean
+                        vDSP_vsadd(row, 1, &negMean, outRow, 1, vDSP_Length(D))
+                        // variance = mean(centered^2). Use vDSP_measqv for fused
+                        // square + mean over the centered buffer (one pass).
+                        var meanSq: Float = 0
+                        vDSP_measqv(outRow, 1, &meanSq, vDSP_Length(D))
+                        var invStd = 1.0 / sqrt(meanSq + eps)
+                        // outRow = centered * invStd
+                        vDSP_vsmul(outRow, 1, &invStd, outRow, 1, vDSP_Length(D))
+                        // outRow *= weight (elementwise).
+                        vDSP_vmul(outRow, 1, wBase, 1, outRow, 1, vDSP_Length(D))
+                    }
+                }
+            }
+        }
+        return out
+    }
+
+    /// Compute `inProjWeight @ hidden + bias` in-place (bias already copied into `accumulate`).
+    private func inProjWeightApply(hidden: [Float], accumulate: inout [Float]) {
+        let D = weights.localDim
+        let M = weights.dModel
+        weights.inProjWeight.withUnsafeBufferPointer { wPtr in
+            hidden.withUnsafeBufferPointer { hPtr in
+                accumulate.withUnsafeMutableBufferPointer { outPtr in
+                    cblas_sgemv(
+                        CblasRowMajor, CblasNoTrans,
+                        Int32(D), Int32(M),
+                        1.0,
+                        wPtr.baseAddress, Int32(M),
+                        hPtr.baseAddress, 1,
+                        1.0,
+                        outPtr.baseAddress, 1)
+                }
+            }
+        }
+    }
+
+    /// Row-major `out = A @ B`  (M×K) × (K×N) = (M×N)
+    private func matmul(
+        a: [Float], aRows M: Int, aCols K: Int,
+        b: [Float], bRows: Int, bCols N: Int,
+        out: inout [Float]
+    ) {
+        precondition(K == bRows, "matmul inner dimension mismatch")
+        a.withUnsafeBufferPointer { aPtr in
+            b.withUnsafeBufferPointer { bPtr in
+                out.withUnsafeMutableBufferPointer { outPtr in
+                    cblas_sgemm(
+                        CblasRowMajor, CblasNoTrans, CblasNoTrans,
+                        Int32(M), Int32(N), Int32(K),
+                        1.0,
+                        aPtr.baseAddress, Int32(K),
+                        bPtr.baseAddress, Int32(N),
+                        0.0,
+                        outPtr.baseAddress, Int32(N))
+                }
+            }
+        }
+    }
+
+    /// Row-major `out = A @ Bᵀ`  (M×K) × (N×K)ᵀ = (M×N); B is stored as (N, K).
+    private func matmulTransB(
+        a: [Float], aRows M: Int, aCols K: Int,
+        b: [Float], bRows N: Int, bCols bk: Int,
+        out: inout [Float]
+    ) {
+        precondition(K == bk, "matmulTransB inner dimension mismatch")
+        a.withUnsafeBufferPointer { aPtr in
+            b.withUnsafeBufferPointer { bPtr in
+                out.withUnsafeMutableBufferPointer { outPtr in
+                    cblas_sgemm(
+                        CblasRowMajor, CblasNoTrans, CblasTrans,
+                        Int32(M), Int32(N), Int32(K),
+                        1.0,
+                        aPtr.baseAddress, Int32(K),
+                        bPtr.baseAddress, Int32(K),
+                        0.0,
+                        outPtr.baseAddress, Int32(N))
+                }
+            }
+        }
+    }
+
+    /// Apply tanh-approximation GELU in-place.
+    /// `y = 0.5 * x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x^3)))`
+    ///
+    /// Vectorized via vDSP for the polynomial inner term and `vvtanhf` for the
+    /// elementwise tanh. Avoids the per-element `tanhf` call from the scalar loop.
+    private func applyGeluTanh(into buffer: inout [Float]) {
+        let n = buffer.count
+        guard n > 0 else { return }
+        var sqrt2pi: Float = 0.7978845608
+        var coef: Float = 0.044715
+        var half: Float = 0.5
+        var one: Float = 1.0
+        var inner = Swift.Array<Float>(repeating: 0, count: n)
+        var tanhOut = Swift.Array<Float>(repeating: 0, count: n)
+        buffer.withUnsafeMutableBufferPointer { buf in
+            inner.withUnsafeMutableBufferPointer { innerBuf in
+                tanhOut.withUnsafeMutableBufferPointer { tanhBuf in
+                    guard let xPtr = buf.baseAddress,
+                        let inPtr = innerBuf.baseAddress,
+                        let tPtr = tanhBuf.baseAddress
+                    else { return }
+                    // inner = x * x  (then x^3 = inner * x)
+                    vDSP_vsq(xPtr, 1, inPtr, 1, vDSP_Length(n))
+                    vDSP_vmul(inPtr, 1, xPtr, 1, inPtr, 1, vDSP_Length(n))
+                    // inner = coef * x^3
+                    vDSP_vsmul(inPtr, 1, &coef, inPtr, 1, vDSP_Length(n))
+                    // inner = x + coef*x^3
+                    vDSP_vadd(inPtr, 1, xPtr, 1, inPtr, 1, vDSP_Length(n))
+                    // inner *= sqrt(2/π)
+                    vDSP_vsmul(inPtr, 1, &sqrt2pi, inPtr, 1, vDSP_Length(n))
+                    // tanhOut = tanh(inner)
+                    var nVar = Int32(n)
+                    vvtanhf(tPtr, inPtr, &nVar)
+                    // tanhOut = 1 + tanh(inner)
+                    vDSP_vsadd(tPtr, 1, &one, tPtr, 1, vDSP_Length(n))
+                    // tanhOut *= x
+                    vDSP_vmul(tPtr, 1, xPtr, 1, tPtr, 1, vDSP_Length(n))
+                    // x = 0.5 * tanhOut
+                    vDSP_vsmul(tPtr, 1, &half, xPtr, 1, vDSP_Length(n))
+                }
+            }
+        }
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieSampler.swift b/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieSampler.swift
new file mode 100644
index 000000000..99e621be4
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/LocalTransformer/MagpieSampler.swift
@@ -0,0 +1,266 @@
+import Foundation
+
+/// RNG wrapper for the Magpie sampler. Always backed by `MagpieMT19937` so a
+/// `seed` round-trips bit-identical against `np.random.seed(seed)` in the
+/// Python reference; when no seed is supplied it auto-seeds from
+/// `arc4random_buf` so behavior outside seeded mode is still random.
+///
+/// Not Sendable: holds mutable RNG state. Consumed within `MagpieSynthesizer`
+/// actor isolation so it never crosses concurrency domains.
+public final class MagpieSamplerRng {
+
+    private let mt: MagpieMT19937
+
+    public init(seed: UInt64?) {
+        if let seed = seed {
+            // NumPy raises for seed >= 2^32; clamp by mask to keep behavior stable
+            // while still letting callers pass `UInt64`.
+            self.mt = MagpieMT19937(seed: UInt32(truncatingIfNeeded: seed))
+        } else {
+            var bytes: UInt32 = 0
+            withUnsafeMutableBytes(of: &bytes) { buf in
+                if let base = buf.baseAddress {
+                    arc4random_buf(base, buf.count)
+                }
+            }
+            self.mt = MagpieMT19937(seed: bytes)
+        }
+    }
+
+    /// `np.random.choice(len(probs), p=probs)` — see `MagpieMT19937.numpyChoice`.
+    public func numpyChoice(probs: [Float]) -> Int {
+        // Mirror NumPy: cumsum in fp32 (matches `probs.cumsum()` over fp32),
+        // promote per-element to fp64 only for the final searchsorted compare.
+        var cdf = [Double](repeating: 0, count: probs.count)
+        var totalF: Float = 0
+        for i in 0..<probs.count {
+            let p = probs[i] > 0 ? probs[i] : 0
+            totalF += p
+            cdf[i] = Double(totalF)
+        }
+        if totalF <= 0 { return probs.count - 1 }
+        let u = mt.uniformDouble() * Double(totalF)
+        var lo = 0
+        var hi = cdf.count
+        while lo < hi {
+            let mid = (lo &+ hi) >> 1
+            if cdf[mid] > u { hi = mid } else { lo = mid + 1 }
+        }
+        return Swift.min(lo, probs.count - 1)
+    }
+}
+
+/// Samples the 8 codebook tokens from one decoder hidden state by driving the
+/// Swift Local Transformer auto-regressively.
+///
+/// Mirrors `local_transformer_sample` in
+/// `mobius/models/tts/magpie/coreml/generate_coreml.py` (lines 172–242).
+public struct MagpieLocalSampler: Sendable {
+
+    private let lt: MagpieLocalTransformer
+    private let audioEmbeddings: [[Float]]
+
+    /// - Parameter audioEmbeddings: per-codebook `[numCodesPerCodebook × dModel]` fp32.
+    public init(
+        localTransformer: MagpieLocalTransformer,
+        audioEmbeddings: [[Float]]
+    ) {
+        self.lt = localTransformer
+        self.audioEmbeddings = audioEmbeddings
+    }
+
+    /// Sample one frame of `numCodebooks` codes.
+    ///
+    /// - Parameters:
+    ///   - decoderHidden: conditional decoder hidden state, `[dModel]`.
+    ///   - uncondDecoderHidden: unconditional path for CFG; `nil` disables CFG.
+    ///   - forbidEos: mask `audioEosId` (set `true` while `t < minFrames`).
+    ///   - options: temperature / topK / cfgScale.
+    ///   - rng: NumPy-compatible MT19937 RNG.
+    public func sample(
+        decoderHidden: [Float],
+        uncondDecoderHidden: [Float]? = nil,
+        forbidEos: Bool,
+        options: MagpieSynthesisOptions,
+        rng: MagpieSamplerRng
+    ) -> [Int32] {
+        let numCodebooks = lt.weights.numCodebooks
+        let D = lt.weights.localDim
+        let useCfg = uncondDecoderHidden != nil && options.cfgScale != 1.0
+
+        // Project decoder hidden through in_proj → first LT token.
+        let condFirst = lt.projectInput(hidden: decoderHidden)
+        var condSeq = condFirst  // growing buffer, flat row-major
+        var condLen = 1
+
+        var uncondSeq: [Float] = []
+        var uncondLen = 0
+        if let uncondHidden = uncondDecoderHidden {
+            uncondSeq = lt.projectInput(hidden: uncondHidden)
+            uncondLen = 1
+        }
+
+        var codes = Swift.Array<Int32>(repeating: 0, count: numCodebooks)
+        let forbidden = forbiddenTokens(eosMasked: forbidEos)
+
+        for cb in 0..<numCodebooks {
+            let condOut = lt.forward(sequence: condSeq, length: condLen)
+            let lastOffset = (condLen - 1) * D
+            let lastHidden = Swift.Array(condOut[lastOffset..<(lastOffset + D)])
+            var logits = lt.codebookLogits(lastHidden: lastHidden, codebook: cb)
+
+            if useCfg {
+                let uncondOut = lt.forward(sequence: uncondSeq, length: uncondLen)
+                let uncondLast = Swift.Array(
+                    uncondOut[((uncondLen - 1) * D)..<((uncondLen - 1) * D + D)])
+                let uncondLogits = lt.codebookLogits(lastHidden: uncondLast, codebook: cb)
+                let scale = options.cfgScale
+                for i in 0..<logits.count {
+                    logits[i] = scale * logits[i] + (1.0 - scale) * uncondLogits[i]
+                }
+            }
+
+            // Mask forbidden tokens.
+            for tok in forbidden where Int(tok) < logits.count {
+                logits[Int(tok)] = -.infinity
+            }
+
+            let sampled = Self.sampleTopK(
+                logits: logits, topK: options.topK, temperature: options.temperature,
+                rng: rng)
+            codes[cb] = Int32(sampled)
+
+            // Embed sampled token → next LT input (both cond and uncond paths).
+            let tokenEmb = audioEmbeddings[cb]
+            let row = Int(sampled)
+            let start = row * lt.weights.dModel
+            let hiddenSlice = Swift.Array(tokenEmb[start..<(start + lt.weights.dModel)])
+            let nextInput = lt.projectInput(hidden: hiddenSlice)
+
+            condSeq.append(contentsOf: nextInput)
+            condLen += 1
+            if useCfg {
+                uncondSeq.append(contentsOf: nextInput)
+                uncondLen += 1
+            }
+        }
+
+        return codes
+    }
+
+    // MARK: - Sampling utils
+
+    private func forbiddenTokens(eosMasked: Bool) -> [Int32] {
+        if eosMasked {
+            // Block EOS + CTX_BOS + reserved.
+            return [MagpieConstants.audioEosId] + MagpieConstants.forbiddenAudioIds
+        } else {
+            return MagpieConstants.forbiddenAudioIds
+        }
+    }
+
+    /// Categorical sampling with optional top-k truncation + temperature.
+    ///
+    /// Matches the Python reference (`sample_topk` in `generate_coreml.py`):
+    ///   1. Mask all but the top-k logits (set others to `-inf`).
+    ///   2. Divide by `max(temperature, 1e-8)`.
+    ///   3. Subtract max → softmax in fp32.
+    ///   4. `np.random.choice(n, p=probs)` via `MagpieMT19937.numpyChoice`.
+    ///
+    /// Made `static` so the method is shared by both the instance call site and
+    /// unit tests.
+    static func sampleTopK(
+        logits: [Float], topK: Int, temperature: Float,
+        rng: MagpieSamplerRng
+    ) -> Int {
+        var truncated = logits
+        if topK > 0 && topK < truncated.count {
+            // Threshold = K-th largest. Found via a fixed-size min-heap of size K,
+            // O(N + K log K) vs O(N log N) for the prior full-sort path. We keep
+            // exactly K survivors to match `torch.topk`: all strictly-above-threshold
+            // values, plus the earliest-index ties at the threshold up to a total
+            // of K. The rest are masked to `-inf`.
+            let threshold = Self.topKThreshold(values: truncated, k: topK)
+            var aboveCount = 0
+            for v in truncated where v > threshold { aboveCount += 1 }
+            var tiesNeeded = topK - aboveCount
+            for i in 0..<truncated.count {
+                if truncated[i] > threshold {
+                    continue
+                }
+                if truncated[i] == threshold && tiesNeeded > 0 {
+                    tiesNeeded -= 1
+                    continue
+                }
+                truncated[i] = -.infinity
+            }
+        }
+        let t = max(temperature, 1e-8)
+        for i in 0..<truncated.count {
+            truncated[i] /= t
+        }
+        let maxVal = truncated.max() ?? 0
+        var sum: Float = 0
+        for i in 0..<truncated.count {
+            let e = expf(truncated[i] - maxVal)
+            truncated[i] = e
+            sum += e
+        }
+        if sum <= 0 || !sum.isFinite {
+            // Degenerate — fall back to argmax over original logits.
+            return logits.indices.max(by: { logits[$0] < logits[$1] }) ?? 0
+        }
+        // Normalize → fp32 probability vector. Mirrors `probs / probs.sum()`.
+        let inv = 1.0 / sum
+        for i in 0..<truncated.count {
+            truncated[i] *= inv
+        }
+        return rng.numpyChoice(probs: truncated)
+    }
+
+    /// Returns the K-th largest value in `values` using a fixed-size min-heap.
+    /// O(N) heap construction (first K elements) + O((N - K) log K) replacements.
+    /// Required: 1 <= k <= values.count.
+    private static func topKThreshold(values: [Float], k: Int) -> Float {
+        var heap = Swift.Array<Float>(repeating: 0, count: k)
+        heap.withUnsafeMutableBufferPointer { buf in
+            // Phase 1: insert first K values, sift each up.
+            for i in 0..<k {
+                buf[i] = values[i]
+                var j = i
+                while j > 0 {
+                    let parent = (j - 1) >> 1
+                    if buf[j] < buf[parent] {
+                        let tmp = buf[j]
+                        buf[j] = buf[parent]
+                        buf[parent] = tmp
+                        j = parent
+                    } else {
+                        break
+                    }
+                }
+            }
+            // Phase 2: for each remaining value, replace the min if it's larger,
+            // then sift down.
+            for i in k..<values.count {
+                let v = values[i]
+                if v <= buf[0] { continue }
+                buf[0] = v
+                var j = 0
+                while true {
+                    let left = 2 * j + 1
+                    let right = left + 1
+                    var smallest = j
+                    if left < k && buf[left] < buf[smallest] { smallest = left }
+                    if right < k && buf[right] < buf[smallest] { smallest = right }
+                    if smallest == j { break }
+                    let tmp = buf[j]
+                    buf[j] = buf[smallest]
+                    buf[smallest] = tmp
+                    j = smallest
+                }
+            }
+        }
+        return heap[0]
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/MagpieConstants.swift b/Sources/FluidAudio/TTS/Magpie/MagpieConstants.swift
new file mode 100644
index 000000000..4cd0b94b9
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/MagpieConstants.swift
@@ -0,0 +1,127 @@
+import Foundation
+
+/// Constants for the NVIDIA Magpie TTS Multilingual 357M backend.
+///
+/// Source: https://huggingface.co/nvidia/magpie_tts_multilingual_357m
+/// Architecture: encoder-decoder transformer + NanoCodec vocoder, produces 22 kHz audio.
+public enum MagpieConstants {
+
+    // MARK: - Audio
+
+    /// NanoCodec output sample rate (Hz).
+    public static let audioSampleRate: Int = 22_050
+    /// Samples per codec frame (NanoCodec is 21.5 fps at 22050 Hz ⇒ ~1024 samples/frame).
+    public static let codecSamplesPerFrame: Int = 1_024
+    /// Peak-normalize audio to this level before returning samples.
+    public static let peakTarget: Float = 0.9
+
+    // MARK: - Model dimensions
+
+    /// Transformer hidden dim (decoder input + output, encoder output).
+    public static let dModel: Int = 768
+    /// Decoder transformer layers.
+    public static let numDecoderLayers: Int = 12
+    /// Number of heads in decoder attention.
+    public static let numHeads: Int = 12
+    /// Head dimension (dModel / numHeads).
+    public static let headDim: Int = 64
+    /// Max KV cache length used when the decoder_step model was converted.
+    public static let maxCacheLength: Int = 512
+    /// Max text tokens after padding (matches traceable text_encoder input shape).
+    public static let maxTextLength: Int = 256
+
+    // MARK: - NanoCodec
+
+    /// Number of codebooks the decoder emits per frame.
+    public static let numCodebooks: Int = 8
+    /// Number of codes per codebook (NanoCodec FSQ size).
+    public static let numCodesPerCodebook: Int = 2_024
+    /// Max frames NanoCodec accepts in a single forward pass.
+    public static let maxNanocodecFrames: Int = 256
+
+    // MARK: - Special audio token ids
+
+    /// BOS for audio codebooks (never sampled).
+    public static let audioBosId: Int32 = 2_016
+    /// End-of-sequence: if sampled in any codebook, generation stops.
+    public static let audioEosId: Int32 = 2_017
+    /// Forbidden auxiliary tokens (CTX_BOS, CTX_EOS, MASK, reserved).
+    public static let forbiddenAudioIds: [Int32] = [2_016, 2_018, 2_019, 2_020, 2_021, 2_022, 2_023]
+
+    // MARK: - Speaker context
+
+    /// Context length per speaker embedding (T_ctx).
+    public static let speakerContextLength: Int = 110
+    /// Number of built-in speakers (John, Sofia, Aria, Jason, Leo).
+    public static let numSpeakers: Int = 5
+
+    // MARK: - Local Transformer (Swift-side sampling head)
+
+    /// Hidden dim of the 1-layer local transformer.
+    public static let localTransformerDim: Int = 256
+    /// FFN hidden dim inside the local transformer.
+    public static let localTransformerFfnDim: Int = 1_024
+    /// Max positional embedding slots (num_codebooks + 2 for BOS alignment).
+    public static let localTransformerMaxPositions: Int = 10
+
+    // MARK: - Generation defaults
+
+    /// Max decoder steps per utterance (hard cap, ~11.9 s of audio).
+    public static let maxSteps: Int = 500
+    /// Number of steps EOS is masked out at the start (avoids empty audio).
+    public static let minFrames: Int = 4
+    /// Default sampling temperature.
+    public static let defaultTemperature: Float = 0.6
+    /// Default top-k truncation.
+    public static let defaultTopK: Int = 80
+    /// Default CFG scale. `1.0` disables the unconditional path entirely.
+    ///
+    /// The Python reference ships `cfg_scale = 2.5` (in `constants.json`) which doubles
+    /// `decoder_step` calls per frame (cond + uncond). Default is now `1.0` so the
+    /// Swift port runs at half the wall time out-of-the-box; opt back in via
+    /// `MagpieSynthesisOptions.cfgScale = 2.5` (or `--cfg 2.5` on the CLI) when guidance
+    /// quality matters more than throughput.
+    public static let defaultCfgScale: Float = 1.0
+
+    // MARK: - Repository
+
+    /// HuggingFace repository id that ships the compiled CoreML artifacts + constants.
+    public static let huggingFaceRepo: String = "FluidInference/magpie-tts-multilingual-357m-coreml"
+
+    // MARK: - File names
+
+    public enum Files {
+        // Models
+        public static let textEncoder = "text_encoder.mlmodelc"
+        public static let decoderPrefill = "decoder_prefill.mlmodelc"  // optional
+        public static let decoderStep = "decoder_step.mlmodelc"
+        public static let nanocodecDecoder = "nanocodec_decoder.mlmodelc"
+
+        // Constants
+        public static let constantsDir = "constants"
+        public static let constantsJson = "constants.json"
+        public static let tokenizerMetadataJson = "tokenizer_metadata.json"
+
+        public static func speakerEmbedding(index: Int) -> String { "speaker_\(index).npy" }
+        public static func audioEmbedding(codebook: Int) -> String { "audio_embedding_\(codebook).npy" }
+
+        // Local transformer weights (under constants/local_transformer/)
+        public static let localTransformerDir = "local_transformer"
+        public enum LocalTransformer {
+            public static let inProjWeight = "in_proj_weight.npy"
+            public static let inProjBias = "in_proj_bias.npy"
+            public static let posEmb = "pos_emb.npy"
+            public static let norm1Weight = "norm1_weight.npy"
+            public static let norm2Weight = "norm2_weight.npy"
+            public static let saQkvWeight = "sa_qkv_weight.npy"
+            public static let saOWeight = "sa_o_weight.npy"
+            public static let ffnConv1Weight = "ffn_conv1_weight.npy"
+            public static let ffnConv2Weight = "ffn_conv2_weight.npy"
+            public static func outProjWeight(codebook: Int) -> String { "out_proj_\(codebook)_weight.npy" }
+            public static func outProjBias(codebook: Int) -> String { "out_proj_\(codebook)_bias.npy" }
+        }
+
+        // Tokenizer data (under tokenizer/)
+        public static let tokenizerDir = "tokenizer"
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/MagpieError.swift b/Sources/FluidAudio/TTS/Magpie/MagpieError.swift
new file mode 100644
index 000000000..0d520bfb5
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/MagpieError.swift
@@ -0,0 +1,43 @@
+import Foundation
+
+/// Errors that can surface during Magpie TTS initialization or synthesis.
+public enum MagpieError: Error, LocalizedError, Sendable {
+    case notInitialized
+    case modelFileNotFound(String)
+    case corruptedModel(String, underlying: String)
+    case downloadFailed(String)
+    case invalidConstants(String)
+    case unsupportedLanguage(String)
+    case tokenizerDataMissing(language: String, file: String)
+    case textTooLong(tokenCount: Int, maxLength: Int)
+    case invalidNpyFile(path: String, reason: String)
+    case inferenceFailed(stage: String, underlying: String)
+    case invalidSpeakerIndex(Int)
+
+    public var errorDescription: String? {
+        switch self {
+        case .notInitialized:
+            return "Magpie TTS manager has not been initialized. Call initialize() first."
+        case .modelFileNotFound(let name):
+            return "Magpie model file not found: \(name)"
+        case .corruptedModel(let name, let underlying):
+            return "Magpie model appears corrupted: \(name) (\(underlying))"
+        case .downloadFailed(let message):
+            return "Magpie download failed: \(message)"
+        case .invalidConstants(let message):
+            return "Magpie constants invalid: \(message)"
+        case .unsupportedLanguage(let code):
+            return "Magpie does not support language code: \(code)"
+        case .tokenizerDataMissing(let language, let file):
+            return "Tokenizer data missing for \(language): \(file)"
+        case .textTooLong(let tokenCount, let maxLength):
+            return "Text produced \(tokenCount) tokens; Magpie accepts at most \(maxLength)."
+        case .invalidNpyFile(let path, let reason):
+            return "Invalid .npy file at \(path): \(reason)"
+        case .inferenceFailed(let stage, let underlying):
+            return "Magpie \(stage) inference failed: \(underlying)"
+        case .invalidSpeakerIndex(let index):
+            return "Invalid Magpie speaker index \(index) (valid range: 0..<\(MagpieConstants.numSpeakers))."
+        }
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/MagpieTtsManager.swift b/Sources/FluidAudio/TTS/Magpie/MagpieTtsManager.swift
new file mode 100644
index 000000000..68cc42bb4
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/MagpieTtsManager.swift
@@ -0,0 +1,189 @@
+import CoreML
+import Foundation
+
+/// Manages text-to-speech synthesis with the NVIDIA Magpie TTS Multilingual 357M model.
+///
+/// > Important: **Experimental — quite slow on Apple Silicon, needs further
+/// > perf work.** Magpie is an autoregressive cross-attention transformer +
+/// > non-ANE NanoCodec vocoder. First synth on a fresh process is dominated
+/// > by CoreML model load + first-call ANE compile (~30 s); warm synths run
+/// > at ~96 s wall for an 8-word English sentence on M-series, i.e.
+/// > RTFx ≈ **0.04**. Output is ASR-clean across 4 of the 5 built-in
+/// > speakers; speaker 0 has a single trailing-word artifact attributable
+/// > to fp16 sampler-trajectory drift (not a structural bug). Whether the
+/// > throughput ceiling is a model characteristic or a CoreML conversion
+/// > limitation is still being investigated. **Do not use in
+/// > latency-sensitive paths.** For real-time use, prefer Kokoro
+/// > (~20× RTFx, parallel) or PocketTTS (~1.5–2× RTFx, streaming Mimi).
+/// > Magpie's value prop is multilingual coverage and the 5 built-in
+/// > speaker contexts, not throughput.
+///
+/// Magpie is an encoder-decoder transformer that emits discrete NanoCodec tokens
+/// autoregressively at 21.5 fps; NanoCodec then decodes them to 22 kHz audio. The
+/// Swift port uses four CoreML models (text_encoder, decoder_prefill, decoder_step,
+/// nanocodec_decoder) plus a small 1-layer "local transformer" implemented in Swift
+/// to sample the 8 codebook tokens per step.
+///
+/// Usage:
+/// ```swift
+/// let manager = try await MagpieTtsManager.downloadAndCreate(
+///     languages: [.english, .spanish])
+/// let result = try await manager.synthesize(
+///     text: "Hello from Magpie.", speaker: .john, language: .english)
+/// ```
+public actor MagpieTtsManager {
+
+    private let logger = AppLogger(category: "MagpieTtsManager")
+
+    private let directory: URL?
+    private let computeUnits: MLComputeUnits
+    private let preferredLanguages: Set<MagpieLanguage>
+
+    private var store: MagpieModelStore?
+    private var tokenizer: MagpieTokenizer?
+    private var synthesizer: MagpieSynthesizer?
+
+    public init(
+        directory: URL? = nil,
+        computeUnits: MLComputeUnits = .cpuAndNeuralEngine,
+        preferredLanguages: Set<MagpieLanguage> = [.english]
+    ) {
+        self.directory = directory
+        self.computeUnits = computeUnits
+        self.preferredLanguages = preferredLanguages
+    }
+
+    public var isAvailable: Bool {
+        synthesizer != nil
+    }
+
+    /// Convenience factory: download assets and return a ready-to-use manager.
+    public static func downloadAndCreate(
+        languages: Set<MagpieLanguage> = [.english],
+        cacheDirectory: URL? = nil,
+        computeUnits: MLComputeUnits = .cpuAndNeuralEngine
+    ) async throws -> MagpieTtsManager {
+        let manager = MagpieTtsManager(
+            directory: cacheDirectory,
+            computeUnits: computeUnits,
+            preferredLanguages: languages)
+        try await manager.initialize()
+        return manager
+    }
+
+    /// Download models + constants from HuggingFace and load everything needed to synthesize.
+    public func initialize() async throws {
+        if synthesizer != nil { return }
+
+        let store = MagpieModelStore(
+            directory: directory,
+            computeUnits: computeUnits,
+            preferredLanguages: preferredLanguages)
+        try await store.loadIfNeeded()
+        self.store = store
+
+        let bundle = try await store.constants()
+        let repoDir = try await store.repoDir()
+        let tokenizerDir = MagpieResourceDownloader.tokenizerDirectory(in: repoDir)
+        let tokenizer = MagpieTokenizer(
+            tokenizerDir: tokenizerDir, eosId: bundle.textEosId)
+        self.tokenizer = tokenizer
+
+        let synthesizer = MagpieSynthesizer(store: store, tokenizer: tokenizer)
+
+        // Warm CoreML graphs so the first user-facing synthesize() call
+        // doesn't pay first-dispatch cost on text_encoder / decoder_step /
+        // nanocodec_decoder. Failures here are non-fatal — log and proceed.
+        let warmupStart = Date()
+        do {
+            try await synthesizer.warmup()
+            let elapsed = Date().timeIntervalSince(warmupStart)
+            logger.info("Magpie warmup took \(String(format: "%.2f", elapsed))s")
+        } catch {
+            logger.warning("Magpie warmup failed (non-fatal): \(error.localizedDescription)")
+        }
+
+        self.synthesizer = synthesizer
+        logger.info("Magpie TTS ready (languages: \(preferredLanguages.map { $0.rawValue }.sorted()))")
+    }
+
+    /// Ensure tokenizer data for `language` exists on disk (downloads if missing).
+    /// Useful when you want to synthesize in a language that wasn't in
+    /// `preferredLanguages` at init time.
+    public func prepareLanguage(_ language: MagpieLanguage) async throws {
+        guard let store = store else {
+            throw MagpieError.notInitialized
+        }
+        let repoDir = try await store.repoDir()
+        try await MagpieResourceDownloader.ensureTokenizer(
+            for: language, repoDirectory: repoDir)
+    }
+
+    /// Synthesize `text` into 22 kHz float PCM using the given speaker and language.
+    ///
+    /// Text flows through the normal language tokenizer / G2P. When
+    /// `options.allowIpaOverride` is `true` (default), any `|…|` region in the text
+    /// is treated as a space-separated IPA pronunciation override and tokenized
+    /// directly against the language's `token2id` map — no G2P.
+    public func synthesize(
+        text: String,
+        speaker: MagpieSpeaker = .john,
+        language: MagpieLanguage = .english,
+        options: MagpieSynthesisOptions = .default
+    ) async throws -> MagpieSynthesisResult {
+        guard let synthesizer = synthesizer else {
+            throw MagpieError.notInitialized
+        }
+        return try await synthesizer.synthesize(
+            text: text, speaker: speaker, language: language, options: options)
+    }
+
+    /// Streaming variant of `synthesize(text:...)`. Yields one
+    /// `MagpieAudioChunk` per chunk as soon as its NanoCodec decode finishes,
+    /// instead of waiting for the entire utterance to complete.
+    ///
+    /// The chunker reserves the first chunk for a small clause-sized head
+    /// (~50 codec frames ≈ 2.3 s of audio) to minimize time-to-first-audio.
+    /// Subsequent chunks pack at the normal capacity. Each non-final chunk
+    /// already includes any punctuation-aware trailing silence, so callers
+    /// can append `samples` arrays back-to-back for gapless playback.
+    ///
+    /// `peakNormalize` is force-disabled in streaming mode (cannot be applied
+    /// without buffering the full utterance).
+    ///
+    /// Cancelling the consuming task cancels in-flight synthesis cleanly.
+    public func synthesizeStream(
+        text: String,
+        speaker: MagpieSpeaker = .john,
+        language: MagpieLanguage = .english,
+        options: MagpieSynthesisOptions = .default
+    ) async throws -> AsyncThrowingStream<MagpieAudioChunk, Error> {
+        guard let synthesizer = synthesizer else {
+            throw MagpieError.notInitialized
+        }
+        return synthesizer.synthesizeStream(
+            text: text, speaker: speaker, language: language, options: options)
+    }
+
+    /// Synthesize from pre-tokenized phoneme/IPA tokens, bypassing the text frontend.
+    public func synthesize(
+        phonemes: MagpiePhonemeTokens,
+        speaker: MagpieSpeaker = .john,
+        options: MagpieSynthesisOptions = .default
+    ) async throws -> MagpieSynthesisResult {
+        guard let synthesizer = synthesizer else {
+            throw MagpieError.notInitialized
+        }
+        return try await synthesizer.synthesize(
+            phonemes: phonemes, speaker: speaker, options: options)
+    }
+
+    public func cleanup() async {
+        if let store = store {
+            await store.unload()
+        }
+        store = nil
+        tokenizer = nil
+        synthesizer = nil
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/MagpieTypes.swift b/Sources/FluidAudio/TTS/Magpie/MagpieTypes.swift
new file mode 100644
index 000000000..52b391334
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/MagpieTypes.swift
@@ -0,0 +1,185 @@
+import Foundation
+
+/// Supported Magpie TTS languages.
+///
+/// Japanese (`ja`) is intentionally omitted in this Swift port; it requires OpenJTalk
+/// (a static C++ lib) and the OpenJTalk MeCab dictionary (~102 MB), both deferred to a
+/// follow-up PR.
+public enum MagpieLanguage: String, Sendable, CaseIterable, Hashable {
+    case english = "en"
+    case spanish = "es"
+    case german = "de"
+    case french = "fr"
+    case italian = "it"
+    case vietnamese = "vi"
+    case mandarin = "zh"
+    case hindi = "hi"
+}
+
+/// Built-in Magpie speakers (index 0–4). Voice quality varies — see model card.
+public enum MagpieSpeaker: Int, Sendable, CaseIterable {
+    case john = 0
+    case sofia = 1
+    case aria = 2
+    case jason = 3
+    case leo = 4
+
+    public var displayName: String {
+        switch self {
+        case .john: return "John"
+        case .sofia: return "Sofia"
+        case .aria: return "Aria"
+        case .jason: return "Jason"
+        case .leo: return "Leo"
+        }
+    }
+}
+
+/// Tuning knobs for a single synthesis call.
+public struct MagpieSynthesisOptions: Sendable {
+    public var temperature: Float
+    public var topK: Int
+    public var maxSteps: Int
+    public var minFrames: Int
+    public var cfgScale: Float
+    public var seed: UInt64?
+    public var peakNormalize: Bool
+    /// When `true`, `|...|` regions in the input text are tokenized directly as IPA
+    /// (space-separated IPA characters) and the rest of the text flows through the
+    /// normal language tokenizer / G2P. When `false`, `|` is treated as a literal
+    /// character. Always on by default — matches the Magpie model card guidance.
+    public var allowIpaOverride: Bool
+
+    public init(
+        temperature: Float = MagpieConstants.defaultTemperature,
+        topK: Int = MagpieConstants.defaultTopK,
+        maxSteps: Int = MagpieConstants.maxSteps,
+        minFrames: Int = MagpieConstants.minFrames,
+        cfgScale: Float = MagpieConstants.defaultCfgScale,
+        seed: UInt64? = nil,
+        peakNormalize: Bool = true,
+        allowIpaOverride: Bool = true
+    ) {
+        self.temperature = temperature
+        self.topK = topK
+        self.maxSteps = maxSteps
+        self.minFrames = minFrames
+        self.cfgScale = cfgScale
+        self.seed = seed
+        self.peakNormalize = peakNormalize
+        self.allowIpaOverride = allowIpaOverride
+    }
+
+    public static let `default` = MagpieSynthesisOptions()
+}
+
+/// Pre-tokenized phoneme input, bypassing every text-frontend stage (normalization,
+/// G2P, `|`-override lexing). Use when you want full control over pronunciation, or
+/// when importing token ids from an external phonemizer.
+///
+/// Expected format: raw token ids from the language's `*_token2id.json` map, each in
+/// `[0, vocab)`. The frontend will pad/truncate to `MagpieConstants.maxTextLength`
+/// and build the encoder mask automatically.
+public struct MagpiePhonemeTokens: Sendable {
+    public let tokenIds: [Int32]
+    public let language: MagpieLanguage
+
+    public init(tokenIds: [Int32], language: MagpieLanguage) {
+        self.tokenIds = tokenIds
+        self.language = language
+    }
+}
+
+/// Per-stage wallclock timings for a synthesis call (seconds).
+public struct MagpieSynthesisTimings: Sendable {
+    public let textEncoderSeconds: Double
+    public let prefillSeconds: Double
+    public let arLoopSeconds: Double
+    public let decoderStepSeconds: Double
+    public let samplerSeconds: Double
+    public let nanocodecSeconds: Double
+
+    public init(
+        textEncoderSeconds: Double, prefillSeconds: Double, arLoopSeconds: Double,
+        decoderStepSeconds: Double, samplerSeconds: Double, nanocodecSeconds: Double
+    ) {
+        self.textEncoderSeconds = textEncoderSeconds
+        self.prefillSeconds = prefillSeconds
+        self.arLoopSeconds = arLoopSeconds
+        self.decoderStepSeconds = decoderStepSeconds
+        self.samplerSeconds = samplerSeconds
+        self.nanocodecSeconds = nanocodecSeconds
+    }
+}
+
+/// One incremental output of a streaming synthesis call.
+///
+/// Audio chunks arrive in sequence (`sequenceIndex` strictly increasing). They
+/// can be played gaplessly: each non-final chunk's `samples` already includes
+/// the trailing silence the chunker assigned for natural prosody breaks at
+/// sentence/clause boundaries, plus a brief edge-fade at both ends to mask
+/// any boundary discontinuity.
+public struct MagpieAudioChunk: Sendable {
+    /// fp32 PCM samples in [-1, 1], mono.
+    public let samples: [Float]
+    /// Always `MagpieConstants.audioSampleRate` (22050 Hz).
+    public let sampleRate: Int
+    /// 0-based index into the chunk sequence for this synthesis call.
+    public let sequenceIndex: Int
+    /// `true` for the last chunk of the call (after which the stream finishes).
+    public let isFinal: Bool
+    /// Source text for this chunk — useful for rolling captions.
+    public let text: String
+    /// Number of codec frames (pre-NanoCodec) that produced this chunk.
+    public let codeCount: Int
+    /// Whether the AR loop ended on EOS (vs hitting `maxSteps`) for this chunk.
+    public let finishedOnEos: Bool
+
+    public init(
+        samples: [Float], sampleRate: Int, sequenceIndex: Int, isFinal: Bool,
+        text: String, codeCount: Int, finishedOnEos: Bool
+    ) {
+        self.samples = samples
+        self.sampleRate = sampleRate
+        self.sequenceIndex = sequenceIndex
+        self.isFinal = isFinal
+        self.text = text
+        self.codeCount = codeCount
+        self.finishedOnEos = finishedOnEos
+    }
+
+    public var durationSeconds: Double {
+        guard sampleRate > 0 else { return 0 }
+        return Double(samples.count) / Double(sampleRate)
+    }
+}
+
+/// Result of a synthesis call.
+public struct MagpieSynthesisResult: Sendable {
+    /// 32-bit float PCM samples in [-1, 1], mono.
+    public let samples: [Float]
+    /// Always `MagpieConstants.audioSampleRate` (22050 Hz) for Magpie.
+    public let sampleRate: Int
+    /// Number of codec frames generated (before NanoCodec expansion).
+    public let codeCount: Int
+    /// Whether generation stopped because an EOS token was emitted (vs hitting `maxSteps`).
+    public let finishedOnEos: Bool
+    /// Per-stage timings.
+    public let timings: MagpieSynthesisTimings
+
+    public var durationSeconds: Double {
+        guard sampleRate > 0 else { return 0 }
+        return Double(samples.count) / Double(sampleRate)
+    }
+
+    public init(
+        samples: [Float], sampleRate: Int, codeCount: Int, finishedOnEos: Bool,
+        timings: MagpieSynthesisTimings
+    ) {
+        self.samples = samples
+        self.sampleRate = sampleRate
+        self.codeCount = codeCount
+        self.finishedOnEos = finishedOnEos
+        self.timings = timings
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieChunker.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieChunker.swift
new file mode 100644
index 000000000..8868449bc
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieChunker.swift
@@ -0,0 +1,384 @@
+import Foundation
+import NaturalLanguage
+
+/// One chunk of input text plus a hint for trailing silence.
+public struct MagpieTextChunk: Sendable {
+    public let text: String
+    public let estimatedCodes: Int
+    public let pauseAfterMs: Int
+
+    public init(text: String, estimatedCodes: Int, pauseAfterMs: Int) {
+        self.text = text
+        self.estimatedCodes = estimatedCodes
+        self.pauseAfterMs = pauseAfterMs
+    }
+}
+
+/// Splits input text into chunks each estimated to fit inside the NanoCodec
+/// 256-frame static-shape cap (~11.9 s of audio at 21.5 fps). Approach:
+///
+/// 1. `NaturalLanguage` sentence tokenizer (handles abbreviations, multilingual
+///    punctuation including `。？！` for Chinese / Japanese-style text).
+/// 2. Merge adjacent short sentences below `mergeBelowCodes` into one chunk so
+///    one-word fragments don't create choppy prosody resets.
+/// 3. Split too-long sentences on internal punctuation (`,;:—`), then on
+///    English connector words (` and `, ` but `, ` because `, ` however `),
+///    then as a last resort on whitespace at the codes-budget boundary.
+/// 4. Assign `pauseAfterMs` based on the trailing punctuation of each chunk.
+///
+/// Capacity is estimated as `chars * codesPerChar` with a per-language ratio
+/// (calibrated empirically: English ≈ 2.3 codes/char, Mandarin ≈ 7 codes/char,
+/// since one CJK char represents a whole syllable). Estimates are intentionally
+/// conservative (over-estimate) so chunks never exceed `maxCodesPerChunk` at
+/// synth time; the real cap is enforced by `MagpieConstants.maxNanocodecFrames`
+/// truncation in `MagpieNanocodec` if estimation drifts.
+public enum MagpieChunker {
+
+    /// Hard upper bound per chunk in codec frames. Set just below the model's
+    /// 256-frame static cap to leave headroom for estimation error.
+    public static let maxCodesPerChunk: Int = 220
+    /// Sentences shorter than this get merged with their neighbor when possible.
+    public static let mergeBelowCodes: Int = 30
+
+    /// Pause inserted after each chunk based on its trailing punctuation. These
+    /// are appended to the output PCM as zero-filled silence between chunks.
+    public static let pauseSentenceMs: Int = 250
+    public static let pauseClauseMs: Int = 80
+    public static let pauseParagraphMs: Int = 450
+    public static let pauseDefaultMs: Int = 100
+
+    /// Per-language codes-per-character ratio. Conservative values (slight
+    /// over-estimate) so we never under-cap.
+    private static func codesPerChar(_ language: MagpieLanguage) -> Double {
+        switch language {
+        case .mandarin, .hindi: return 7.0
+        case .vietnamese: return 3.0
+        default: return 2.3
+        }
+    }
+
+    /// Soft cap for the first chunk in streaming mode (codec frames).
+    /// 50 frames ≈ 2.3 s of audio at 21.5 fps — a clause-sized head that
+    /// trades a little prosody scope for low time-to-first-audio.
+    public static let streamingFirstChunkCap: Int = 50
+
+    /// Streaming variant: same as `chunk(text:language:)` but forces the first
+    /// chunk to be small (≤ `firstChunkCap` codes) so the first audio yield
+    /// arrives quickly. Subsequent chunks pack normally up to
+    /// `maxCodesPerChunk`. If the first sentence is already small enough, this
+    /// returns the same result as `chunk(text:language:)`.
+    public static func chunkForStreaming(
+        text: String,
+        language: MagpieLanguage,
+        firstChunkCap: Int = streamingFirstChunkCap
+    ) -> [MagpieTextChunk] {
+        let trimmed = text.trimmingCharacters(in: .whitespacesAndNewlines)
+        guard !trimmed.isEmpty else { return [] }
+
+        // Sentence-tokenize once; only re-shape the first sentence.
+        let sentences = splitSentences(trimmed, language: language)
+        guard let firstSentence = sentences.first else { return [] }
+
+        let ratio = codesPerChar(language)
+        let estimate: (String) -> Int = { Int((Double($0.count) * ratio).rounded(.up)) }
+
+        // If the first sentence already fits in the cap, normal chunking wins.
+        if estimate(firstSentence) <= firstChunkCap {
+            return chunk(text: trimmed, language: language)
+        }
+
+        // Try internal punctuation (commas, semicolons, em-dashes, …); fall
+        // back to whitespace if there is no internal punctuation.
+        var head: String? = nil
+        var tail: String? = nil
+
+        let punctPieces = splitOn(firstSentence, where: { punctuationSplitChars.contains($0) })
+        if punctPieces.count >= 2 {
+            var picked: [String] = []
+            var pickedCodes = 0
+            var i = 0
+            while i < punctPieces.count {
+                let pe = estimate(punctPieces[i])
+                if !picked.isEmpty && pickedCodes + pe > firstChunkCap { break }
+                picked.append(punctPieces[i])
+                pickedCodes += pe
+                i += 1
+            }
+            if i < punctPieces.count && !picked.isEmpty {
+                head = picked.joined(separator: " ")
+                tail = punctPieces[i...].joined(separator: " ")
+            }
+        }
+
+        if head == nil {
+            let words = firstSentence.split(whereSeparator: { $0.isWhitespace })
+            if words.count >= 2 {
+                var picked: [String] = []
+                var pickedCodes = 0
+                var i = 0
+                while i < words.count {
+                    let wc = Int((Double(words[i].count + 1) * ratio).rounded(.up))
+                    if !picked.isEmpty && pickedCodes + wc > firstChunkCap { break }
+                    picked.append(String(words[i]))
+                    pickedCodes += wc
+                    i += 1
+                }
+                if i < words.count && !picked.isEmpty {
+                    head = picked.joined(separator: " ")
+                    tail = words[i...].joined(separator: " ")
+                }
+            }
+        }
+
+        guard let h = head, let t = tail, !h.isEmpty, !t.isEmpty else {
+            // Couldn't split — fall back to normal chunking.
+            return chunk(text: trimmed, language: language)
+        }
+
+        let headChunk = MagpieTextChunk(
+            text: h,
+            estimatedCodes: estimate(h),
+            // Clause-level pause after the head: it's almost always cut at a
+            // comma or mid-clause whitespace, not at sentence end.
+            pauseAfterMs: pauseClauseMs)
+
+        // Re-chunk the rest of sentence 1 + every following sentence using the
+        // normal pipeline. This keeps merging logic + per-sentence pause
+        // assignment intact for everything after the streaming head.
+        let remainder = ([t] + sentences.dropFirst()).joined(separator: " ")
+        let tailChunks = chunk(text: remainder, language: language)
+        return [headChunk] + tailChunks
+    }
+
+    /// Split `text` into chunks, each estimated to produce ≤ `maxCodesPerChunk`
+    /// codec frames. Order is preserved.
+    public static func chunk(
+        text: String,
+        language: MagpieLanguage
+    ) -> [MagpieTextChunk] {
+        let trimmed = text.trimmingCharacters(in: .whitespacesAndNewlines)
+        guard !trimmed.isEmpty else { return [] }
+
+        // 1. Sentence-tokenize.
+        let sentences = splitSentences(trimmed, language: language)
+        guard !sentences.isEmpty else { return [] }
+
+        // 2. Estimate + merge short adjacent sentences.
+        let merged = mergeShortSentences(sentences, language: language)
+
+        // 3. Split too-long sentences on punctuation / connectors / whitespace.
+        var output: [MagpieTextChunk] = []
+        output.reserveCapacity(merged.count)
+        for sentence in merged {
+            let pieces = splitIfTooLong(sentence, language: language)
+            output.append(contentsOf: pieces)
+        }
+        return output
+    }
+
+    // MARK: - Step 1: NaturalLanguage sentence tokenization
+
+    private static func splitSentences(_ text: String, language: MagpieLanguage) -> [String] {
+        let tokenizer = NLTokenizer(unit: .sentence)
+        if let nl = nlLanguage(for: language) {
+            tokenizer.setLanguage(nl)
+        }
+        tokenizer.string = text
+
+        var out: [String] = []
+        tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, _ in
+            let s = String(text[range]).trimmingCharacters(in: .whitespacesAndNewlines)
+            if !s.isEmpty { out.append(s) }
+            return true
+        }
+        return out
+    }
+
+    private static func nlLanguage(for language: MagpieLanguage) -> NLLanguage? {
+        switch language {
+        case .english: return .english
+        case .spanish: return .spanish
+        case .german: return .german
+        case .french: return .french
+        case .italian: return .italian
+        case .vietnamese: return .vietnamese
+        case .mandarin: return .simplifiedChinese
+        case .hindi: return .hindi
+        }
+    }
+
+    // MARK: - Step 2: Merge short adjacent sentences
+
+    private static func mergeShortSentences(
+        _ sentences: [String], language: MagpieLanguage
+    ) -> [String] {
+        let ratio = codesPerChar(language)
+        let mergeTarget = maxCodesPerChunk * 3 / 4  // 165 codes ≈ 7.6 s
+        var out: [String] = []
+        for s in sentences {
+            let est = Int((Double(s.count) * ratio).rounded(.up))
+            if let last = out.last {
+                let lastEst = Int((Double(last.count) * ratio).rounded(.up))
+                let combined = lastEst + est
+                // Merge greedily while there's room: either neighbor is short,
+                // OR combined fits inside the soft target (avoids pointless
+                // single-sentence chunks when several easily fit together).
+                let bothFit = combined <= maxCodesPerChunk
+                let oneIsShort = lastEst < mergeBelowCodes || est < mergeBelowCodes
+                let underSoftTarget = combined <= mergeTarget
+                if bothFit && (oneIsShort || underSoftTarget) {
+                    out[out.count - 1] = last + " " + s
+                    continue
+                }
+            }
+            out.append(s)
+        }
+        return out
+    }
+
+    // MARK: - Step 3: Split too-long sentences
+
+    /// Split delimiters tried in order. Each pass keeps the delimiter attached
+    /// to the preceding fragment so prosody hints (commas, dashes) survive.
+    private static let punctuationSplitChars: [Character] = [",", ";", ":", "—", "–", "，", "；", "："]
+    private static let connectorPhrases: [String] = [
+        " and ", " but ", " or ", " because ", " however ", " therefore ",
+        " so ", " while ", " although ", " though ",
+    ]
+
+    private static func splitIfTooLong(
+        _ sentence: String, language: MagpieLanguage
+    ) -> [MagpieTextChunk] {
+        let ratio = codesPerChar(language)
+        let estimate: (String) -> Int = { Int((Double($0.count) * ratio).rounded(.up)) }
+
+        if estimate(sentence) <= maxCodesPerChunk {
+            return [makeChunk(sentence, codes: estimate(sentence))]
+        }
+
+        // Try punctuation splits.
+        var pieces = splitOn(sentence, where: { punctuationSplitChars.contains($0) })
+        if allFit(pieces, estimate: estimate) {
+            return pieces.map { makeChunk($0, codes: estimate($0)) }
+        }
+
+        // Try connector phrases on each piece that's still too long.
+        pieces = pieces.flatMap { piece -> [String] in
+            if estimate(piece) <= maxCodesPerChunk { return [piece] }
+            return splitOnPhrases(piece, phrases: connectorPhrases)
+        }
+        if allFit(pieces, estimate: estimate) {
+            return pieces.map { makeChunk($0, codes: estimate($0)) }
+        }
+
+        // Last resort: whitespace split at the codes-budget boundary.
+        pieces = pieces.flatMap { piece -> [String] in
+            if estimate(piece) <= maxCodesPerChunk { return [piece] }
+            return splitOnWhitespace(piece, ratio: ratio)
+        }
+        return pieces.map { makeChunk($0, codes: estimate($0)) }
+    }
+
+    private static func allFit(
+        _ pieces: [String], estimate: (String) -> Int
+    ) -> Bool {
+        pieces.allSatisfy { estimate($0) <= maxCodesPerChunk }
+    }
+
+    /// Splits keeping the delimiter attached to the preceding fragment.
+    private static func splitOn(
+        _ s: String, where isDelimiter: (Character) -> Bool
+    ) -> [String] {
+        var out: [String] = []
+        var current = ""
+        for ch in s {
+            current.append(ch)
+            if isDelimiter(ch) {
+                let trimmed = current.trimmingCharacters(in: .whitespacesAndNewlines)
+                if !trimmed.isEmpty { out.append(trimmed) }
+                current = ""
+            }
+        }
+        let tail = current.trimmingCharacters(in: .whitespacesAndNewlines)
+        if !tail.isEmpty { out.append(tail) }
+        return out
+    }
+
+    private static func splitOnPhrases(_ s: String, phrases: [String]) -> [String] {
+        // Find the phrase nearest the middle that gives the most balanced split.
+        let lowered = s.lowercased()
+        var best: (start: String.Index, end: String.Index)?
+        var bestImbalance = Int.max
+        for phrase in phrases {
+            var searchStart = lowered.startIndex
+            while let range = lowered.range(of: phrase, range: searchStart..<lowered.endIndex) {
+                let leftLen = lowered.distance(from: lowered.startIndex, to: range.lowerBound)
+                let rightLen = lowered.distance(from: range.upperBound, to: lowered.endIndex)
+                let imbalance = abs(leftLen - rightLen)
+                if imbalance < bestImbalance {
+                    bestImbalance = imbalance
+                    best = (range.lowerBound, range.upperBound)
+                }
+                searchStart = range.upperBound
+            }
+        }
+        guard let split = best else { return [s] }
+        // Map indices from `lowered` back into `s` (same UTF-8 length per char,
+        // but lowercasing can change byte length, so use distance-based mapping).
+        let leftDist = lowered.distance(from: lowered.startIndex, to: split.start)
+        let rightDist = lowered.distance(from: lowered.startIndex, to: split.end)
+        let leftIdx = s.index(s.startIndex, offsetBy: leftDist)
+        let rightIdx = s.index(s.startIndex, offsetBy: rightDist)
+        let left = String(s[s.startIndex..<leftIdx]).trimmingCharacters(in: .whitespacesAndNewlines)
+        let right = String(s[rightIdx..<s.endIndex]).trimmingCharacters(in: .whitespacesAndNewlines)
+        return [left, right].filter { !$0.isEmpty }
+    }
+
+    private static func splitOnWhitespace(_ s: String, ratio: Double) -> [String] {
+        let words = s.split(whereSeparator: { $0.isWhitespace })
+        guard !words.isEmpty else { return [s] }
+        var out: [String] = []
+        var current = ""
+        var currentCodes = 0
+        for w in words {
+            let wordCodes = Int((Double(w.count + 1) * ratio).rounded(.up))
+            if currentCodes + wordCodes > maxCodesPerChunk, !current.isEmpty {
+                out.append(current.trimmingCharacters(in: .whitespacesAndNewlines))
+                current = ""
+                currentCodes = 0
+            }
+            if !current.isEmpty { current.append(" ") }
+            current.append(String(w))
+            currentCodes += wordCodes
+        }
+        let tail = current.trimmingCharacters(in: .whitespacesAndNewlines)
+        if !tail.isEmpty { out.append(tail) }
+        // Avoid orphan tail: if the last piece is tiny, fold it into the
+        // previous one as long as the combined fragment still fits.
+        if out.count >= 2 {
+            let lastCount = out[out.count - 1].count
+            let prevCount = out[out.count - 2].count
+            let lastEst = Int((Double(lastCount) * ratio).rounded(.up))
+            let prevEst = Int((Double(prevCount) * ratio).rounded(.up))
+            if lastEst < mergeBelowCodes && lastEst + prevEst <= maxCodesPerChunk {
+                out[out.count - 2] = out[out.count - 2] + " " + out[out.count - 1]
+                out.removeLast()
+            }
+        }
+        return out
+    }
+
+    // MARK: - Pause assignment
+
+    private static func makeChunk(_ text: String, codes: Int) -> MagpieTextChunk {
+        MagpieTextChunk(text: text, estimatedCodes: codes, pauseAfterMs: pauseAfterMs(text))
+    }
+
+    private static func pauseAfterMs(_ text: String) -> Int {
+        guard let last = text.last else { return pauseDefaultMs }
+        if last == "\n" { return pauseParagraphMs }
+        if "。？！.?!".contains(last) { return pauseSentenceMs }
+        if ",;:，；：".contains(last) { return pauseClauseMs }
+        return pauseDefaultMs
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieIpaOverride.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieIpaOverride.swift
new file mode 100644
index 000000000..76a2427cf
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieIpaOverride.swift
@@ -0,0 +1,73 @@
+import Foundation
+
+/// Parses Magpie's `|`-delimited IPA override syntax.
+///
+/// The Magpie model card describes inline pronunciation overrides as:
+///
+///     "Hello world from | ˈ n ɛ m o ʊ | Text to Speech."
+///
+/// Inside each `|…|` region, tokens are **space-separated IPA characters**, each of
+/// which is looked up directly in the language's `token2id` map (no G2P). Outside
+/// the regions, text flows through the normal language tokenizer.
+///
+/// This type is a pure lexer — it only segments the input string. The caller
+/// (`MagpieTokenizer`) is responsible for tokenizing the segments.
+public enum MagpieIpaOverride {
+
+    public enum Segment: Sendable, Equatable {
+        /// Plain text to be handled by the language's G2P / tokenizer.
+        case text(String)
+        /// IPA tokens already space-separated. Look each up in `token2id` directly.
+        case ipa(tokens: [String])
+    }
+
+    /// Segments `input` into alternating `.text` / `.ipa` runs.
+    ///
+    /// Rules:
+    /// - Pairs of `|` delimit an IPA region. Whitespace inside is treated as a token
+    ///   separator; consecutive whitespace collapses to a single split.
+    /// - An unpaired trailing `|` is treated as literal text (no silent data loss).
+    /// - Empty IPA regions (`||`) collapse to no segment.
+    public static func segment(_ input: String) -> [Segment] {
+        guard input.contains("|") else {
+            return input.isEmpty ? [] : [.text(input)]
+        }
+
+        var segments: [Segment] = []
+        var cursor = input.startIndex
+        var inIpa = false
+        var buffer = ""
+
+        while cursor < input.endIndex {
+            let ch = input[cursor]
+            if ch == "|" {
+                if inIpa {
+                    let tokens = buffer.split(whereSeparator: { $0.isWhitespace }).map(String.init)
+                    if !tokens.isEmpty {
+                        segments.append(.ipa(tokens: tokens))
+                    }
+                } else {
+                    if !buffer.isEmpty {
+                        segments.append(.text(buffer))
+                    }
+                }
+                buffer.removeAll(keepingCapacity: true)
+                inIpa.toggle()
+            } else {
+                buffer.append(ch)
+            }
+            cursor = input.index(after: cursor)
+        }
+
+        // Trailing content: if we were still inside an IPA region at EOF, the leading
+        // `|` was unmatched — emit it plus the buffered content as plain text so we
+        // don't silently drop characters.
+        if inIpa {
+            segments.append(.text("|" + buffer))
+        } else if !buffer.isEmpty {
+            segments.append(.text(buffer))
+        }
+
+        return segments
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieTokenizer.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieTokenizer.swift
new file mode 100644
index 000000000..4c15659c9
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/MagpieTokenizer.swift
@@ -0,0 +1,129 @@
+import Foundation
+
+/// Result of tokenizing text for Magpie: padded token ids + mask + pre-pad length.
+public struct MagpieTokenizedText: Sendable {
+    public let paddedIds: [Int32]
+    public let mask: [Float]
+    public let realLength: Int
+
+    public init(paddedIds: [Int32], mask: [Float], realLength: Int) {
+        self.paddedIds = paddedIds
+        self.mask = mask
+        self.realLength = realLength
+    }
+}
+
+/// Common interface for per-language Magpie tokenizers.
+public protocol MagpieLanguageTokenizer: Sendable {
+    var language: MagpieLanguage { get }
+    /// Convert raw text to a list of token ids (pre-padding). Must append the model's
+    /// EOS id if the caller expects one — Magpie appends EOS downstream.
+    func encode(_ text: String) throws -> [Int32]
+
+    /// Encode a `|...|` IPA override region where `tokens` are space-separated IPA
+    /// phoneme strings that must be looked up directly against the language's
+    /// `token2id` map (with the language offset applied).
+    func encodeIpaTokens(_ tokens: [String]) throws -> [Int32]
+}
+
+/// Top-level dispatcher that loads the appropriate language tokenizer on demand
+/// and pads/truncates the result to `MagpieConstants.maxTextLength`.
+public actor MagpieTokenizer {
+
+    private let logger = AppLogger(category: "MagpieTokenizer")
+    private let tokenizerDir: URL
+    private let eosId: Int32
+
+    private var cache: [MagpieLanguage: MagpieLanguageTokenizer] = [:]
+
+    /// - Parameters:
+    ///   - tokenizerDir: directory containing the per-language JSON lookup files.
+    ///   - eosId: language-agnostic EOS token id (from `tokenizer_metadata.json`
+    ///     or the constants bundle).
+    public init(tokenizerDir: URL, eosId: Int32) {
+        self.tokenizerDir = tokenizerDir
+        self.eosId = eosId
+    }
+
+    /// Resolve (and cache) the tokenizer for `language`.
+    public func tokenizer(for language: MagpieLanguage) throws -> MagpieLanguageTokenizer {
+        if let cached = cache[language] { return cached }
+        let tok = try makeTokenizer(for: language)
+        cache[language] = tok
+        return tok
+    }
+
+    /// Full encode: text → (padded ids + mask). Honors `|...|` IPA override when
+    /// `options.allowIpaOverride == true`.
+    public func tokenize(
+        _ text: String, language: MagpieLanguage, options: MagpieSynthesisOptions
+    ) throws -> MagpieTokenizedText {
+        let tok = try tokenizer(for: language)
+
+        var ids: [Int32] = []
+        if options.allowIpaOverride {
+            for segment in MagpieIpaOverride.segment(text) {
+                switch segment {
+                case .text(let str):
+                    ids.append(contentsOf: try tok.encode(str))
+                case .ipa(let tokens):
+                    ids.append(contentsOf: try tok.encodeIpaTokens(tokens))
+                }
+            }
+        } else {
+            ids.append(contentsOf: try tok.encode(text))
+        }
+
+        // Append EOS unless the encoder already did so.
+        if ids.last != eosId {
+            ids.append(eosId)
+        }
+
+        let maxLen = MagpieConstants.maxTextLength
+        if ids.count > maxLen {
+            throw MagpieError.textTooLong(tokenCount: ids.count, maxLength: maxLen)
+        }
+
+        var padded = Swift.Array<Int32>(repeating: 0, count: maxLen)
+        var mask = Swift.Array<Float>(repeating: 0, count: maxLen)
+        for (i, v) in ids.enumerated() {
+            padded[i] = v
+            mask[i] = 1.0
+        }
+        return MagpieTokenizedText(paddedIds: padded, mask: mask, realLength: ids.count)
+    }
+
+    /// Pad/truncate pre-tokenized phoneme ids without running any G2P.
+    public func pad(phonemes: MagpiePhonemeTokens) throws -> MagpieTokenizedText {
+        var ids = phonemes.tokenIds
+        if ids.last != eosId { ids.append(eosId) }
+        let maxLen = MagpieConstants.maxTextLength
+        if ids.count > maxLen {
+            throw MagpieError.textTooLong(tokenCount: ids.count, maxLength: maxLen)
+        }
+        var padded = Swift.Array<Int32>(repeating: 0, count: maxLen)
+        var mask = Swift.Array<Float>(repeating: 0, count: maxLen)
+        for (i, v) in ids.enumerated() {
+            padded[i] = v
+            mask[i] = 1.0
+        }
+        return MagpieTokenizedText(paddedIds: padded, mask: mask, realLength: ids.count)
+    }
+
+    // MARK: - Factory
+
+    private func makeTokenizer(for language: MagpieLanguage) throws -> MagpieLanguageTokenizer {
+        switch language {
+        case .english, .spanish, .german, .italian, .vietnamese:
+            return try MagpiePhonemeTokenizer(
+                language: language,
+                tokenizerDir: tokenizerDir)
+        case .french, .hindi:
+            return try MagpieCharTokenizer(
+                language: language,
+                tokenizerDir: tokenizerDir)
+        case .mandarin:
+            return try MagpieMandarinTokenizer(tokenizerDir: tokenizerDir)
+        }
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieCharTokenizer.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieCharTokenizer.swift
new file mode 100644
index 000000000..240b6dfe5
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieCharTokenizer.swift
@@ -0,0 +1,43 @@
+import Foundation
+
+/// Character-level tokenizer used by French and Hindi.
+///
+/// NeMo's `ChineseCharsTokenizer` equivalent maps each Unicode character to its
+/// id via `token2id.json`. Unknown characters are silently dropped (matching the
+/// NeMo default `add_blank_at = False` behavior). Whitespace is mapped to `" "`
+/// when present in the vocab.
+public struct MagpieCharTokenizer: MagpieLanguageTokenizer {
+
+    public let language: MagpieLanguage
+    private let token2id: [String: Int32]
+
+    public init(language: MagpieLanguage, tokenizerDir: URL) throws {
+        self.language = language
+        let base = MagpieTokenizerFiles.tokenizerName(for: language)
+        self.token2id = try MagpiePhonemeTokenizer.loadTokenMap(
+            tokenizerDir.appendingPathComponent("\(base)_token2id.json"))
+    }
+
+    public func encode(_ text: String) throws -> [Int32] {
+        var ids: [Int32] = []
+        for ch in text {
+            let key = String(ch)
+            if let id = token2id[key] {
+                ids.append(id)
+            }
+        }
+        return ids
+    }
+
+    public func encodeIpaTokens(_ tokens: [String]) throws -> [Int32] {
+        var ids: [Int32] = []
+        for tok in tokens {
+            guard let id = token2id[tok] else {
+                throw MagpieError.invalidConstants(
+                    "IPA override token '\(tok)' is not in \(language.rawValue) token2id map")
+            }
+            ids.append(id)
+        }
+        return ids
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieMandarinTokenizer.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieMandarinTokenizer.swift
new file mode 100644
index 000000000..3bce1ffd1
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpieMandarinTokenizer.swift
@@ -0,0 +1,149 @@
+import Foundation
+
+/// Mandarin tokenizer: jieba segmentation → pypinyin lookup → tone/letter split →
+/// phoneme ids via `mandarin_phoneme_token2id.json`.
+///
+/// This Swift port uses the pre-built dictionaries emitted by
+/// `mobius/.../export_pypinyin.py` and `export_tokenizers.py`:
+///
+///   - `mandarin_pypinyin_phrase_dict.json` — phrase → [pinyin] multi-char hits.
+///   - `mandarin_pypinyin_char_dict.json`   — single char → [pinyin] fallback.
+///   - `mandarin_jieba_dict.json`           — user-dict entries with frequencies.
+///   - `mandarin_phoneme_pinyin_dict.json`  — pinyin (with tone digit) → [IPA phonemes].
+///   - `mandarin_phoneme_tone_dict.json`    — tone digit → tone token.
+///   - `mandarin_phoneme_ascii_letter_dict.json` — ASCII letter → token string.
+///   - `mandarin_phoneme_token2id.json`     — final token string → id.
+///
+/// Segmentation strategy: forward maximum-matching over the phrase dict, with a
+/// per-character fallback. Full jieba HMM fallback is not ported here — OOV
+/// characters collapse to their single-char pypinyin entry. This handles the
+/// majority of real-world text; tricky edge cases (unseen words) should use the
+/// `MagpiePhonemeTokens` bypass path.
+public struct MagpieMandarinTokenizer: MagpieLanguageTokenizer {
+
+    public let language: MagpieLanguage = .mandarin
+
+    private let phraseDict: [String: [String]]
+    private let charDict: [String: [String]]
+    private let pinyinDict: [String: [String]]
+    private let toneDict: [String: String]
+    private let asciiLetterDict: [String: String]
+    private let token2id: [String: Int32]
+    /// Characters (max length) covered by phraseDict — used to bound MaxMatch search.
+    private let maxPhraseLength: Int
+
+    public init(tokenizerDir: URL) throws {
+        let base = MagpieTokenizerFiles.tokenizerName(for: .mandarin)
+        self.phraseDict =
+            (try? Self.loadDict(tokenizerDir.appendingPathComponent("mandarin_pypinyin_phrase_dict.json"))) ?? [:]
+        self.charDict =
+            (try? Self.loadDict(tokenizerDir.appendingPathComponent("mandarin_pypinyin_char_dict.json"))) ?? [:]
+        self.pinyinDict = try Self.loadDict(
+            tokenizerDir.appendingPathComponent("\(base)_pinyin_dict.json"))
+        self.toneDict = try Self.loadStringDict(
+            tokenizerDir.appendingPathComponent("\(base)_tone_dict.json"))
+        self.asciiLetterDict = try Self.loadStringDict(
+            tokenizerDir.appendingPathComponent("\(base)_ascii_letter_dict.json"))
+        self.token2id = try MagpiePhonemeTokenizer.loadTokenMap(
+            tokenizerDir.appendingPathComponent("\(base)_token2id.json"))
+
+        var maxLen = 1
+        for key in phraseDict.keys where key.count > maxLen {
+            maxLen = key.count
+        }
+        self.maxPhraseLength = maxLen
+    }
+
+    public func encode(_ text: String) throws -> [Int32] {
+        var ids: [Int32] = []
+        let chars = Array(text)
+        var i = 0
+        while i < chars.count {
+            // Forward-maximum-match against phraseDict.
+            var matched = false
+            let upper = min(maxPhraseLength, chars.count - i)
+            if upper > 1 {
+                for len in stride(from: upper, through: 2, by: -1) {
+                    let phrase = String(chars[i..<(i + len)])
+                    if let pinyin = phraseDict[phrase] {
+                        appendPinyin(pinyin, into: &ids)
+                        i += len
+                        matched = true
+                        break
+                    }
+                }
+            }
+            if matched { continue }
+
+            let single = String(chars[i])
+            if let pinyin = charDict[single] {
+                appendPinyin(pinyin, into: &ids)
+            } else if let letter = asciiLetterDict[single], let id = token2id[letter] {
+                ids.append(id)
+            } else if let id = token2id[single] {
+                ids.append(id)
+            }
+            // else: silently drop (matches NeMo behavior for punctuation / unknown).
+            i += 1
+        }
+        return ids
+    }
+
+    public func encodeIpaTokens(_ tokens: [String]) throws -> [Int32] {
+        var ids: [Int32] = []
+        for tok in tokens {
+            guard let id = token2id[tok] else {
+                throw MagpieError.invalidConstants(
+                    "IPA override token '\(tok)' is not in mandarin token2id map")
+            }
+            ids.append(id)
+        }
+        return ids
+    }
+
+    // MARK: - Pinyin → phoneme expansion
+
+    private func appendPinyin(_ pinyinList: [String], into ids: inout [Int32]) {
+        for pinyin in pinyinList {
+            // pinyin is usually "ni3" (initial+final+tone digit). Split off trailing digit.
+            let (stem, tone) = splitTone(pinyin)
+            if let phones = pinyinDict[stem] {
+                for p in phones {
+                    if let id = token2id[p] { ids.append(id) }
+                }
+            } else {
+                // Fallback: emit stem as-is if present in token2id.
+                if let id = token2id[stem] { ids.append(id) }
+            }
+            if let toneDigit = tone, let toneTok = toneDict[toneDigit], let id = token2id[toneTok] {
+                ids.append(id)
+            }
+        }
+    }
+
+    private func splitTone(_ pinyin: String) -> (stem: String, tone: String?) {
+        guard let last = pinyin.last, last.isNumber else { return (pinyin, nil) }
+        let stem = String(pinyin.dropLast())
+        return (stem, String(last))
+    }
+
+    // MARK: - Loaders
+
+    private static func loadDict(_ url: URL) throws -> [String: [String]] {
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            throw MagpieError.tokenizerDataMissing(
+                language: "mandarin", file: url.lastPathComponent)
+        }
+        let data = try Data(contentsOf: url)
+        return try JSONDecoder().decode([String: [String]].self, from: data)
+    }
+
+    private static func loadStringDict(_ url: URL) throws -> [String: String] {
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            throw MagpieError.tokenizerDataMissing(
+                language: "mandarin", file: url.lastPathComponent)
+        }
+        let data = try Data(contentsOf: url)
+        return try JSONDecoder().decode([String: String].self, from: data)
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpiePhonemeTokenizer.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpiePhonemeTokenizer.swift
new file mode 100644
index 000000000..95e2bda27
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Preprocess/Tokenizers/MagpiePhonemeTokenizer.swift
@@ -0,0 +1,178 @@
+import Foundation
+
+/// Phoneme / G2P based tokenizer shared by English, Spanish, German, Italian, and
+/// Vietnamese (all use IPA phoneme dictionaries emitted by NeMo).
+///
+/// Behavior:
+/// 1. Normalize text (lowercase for English/German, keep case otherwise — matches
+///    NeMo's `grapheme_case` defaults).
+/// 2. Word-tokenize on whitespace + punctuation.
+/// 3. For each word:
+///    - If the phoneme_dict has the word → emit `" "` separator then each IPA
+///      phoneme as its own token id (via `token2id`).
+///    - Otherwise emit the raw characters as individual ids.
+/// 4. Preserve punctuation as literal token ids when present in `token2id`.
+///
+/// This is a pragmatic port of NeMo's EnglishPhonemesTokenizer / IPATokenizer that
+/// trades the full feature set for deterministic Swift-side lookup. Callers who
+/// need bit-exact parity should supply `MagpiePhonemeTokens` directly.
+public struct MagpiePhonemeTokenizer: MagpieLanguageTokenizer {
+
+    public let language: MagpieLanguage
+    private let phonemeDict: [String: [String]]
+    private let heteronyms: Set<String>
+    private let token2id: [String: Int32]
+
+    public init(language: MagpieLanguage, tokenizerDir: URL) throws {
+        self.language = language
+        let base = MagpieTokenizerFiles.tokenizerName(for: language)
+        self.token2id = try Self.loadTokenMap(
+            tokenizerDir.appendingPathComponent("\(base)_token2id.json"))
+        self.phonemeDict = try Self.loadPhonemeDict(
+            tokenizerDir.appendingPathComponent("\(base)_phoneme_dict.json"))
+
+        if language == .german {
+            let hetURL = tokenizerDir.appendingPathComponent("\(base)_heteronyms.json")
+            self.heteronyms = (try? Self.loadHeteronyms(hetURL)) ?? []
+        } else {
+            self.heteronyms = []
+        }
+    }
+
+    public func encode(_ text: String) throws -> [Int32] {
+        var ids: [Int32] = []
+        let normalized = normalize(text)
+        let tokens = splitWords(normalized)
+
+        for piece in tokens {
+            switch piece {
+            case .word(let word):
+                let key = caseKey(for: word)
+                if heteronyms.contains(key) {
+                    // Heteronym: fall back to grapheme-level encoding.
+                    appendGraphemes(word, into: &ids)
+                } else if let phones = phonemeDict[key] {
+                    // Inter-word spaces come from `.separator(" ")` below; do not
+                    // prepend an extra space here. NeMo's IPATokenizer relies on
+                    // raw whitespace from the input (`pad_with_space=false` for
+                    // english_phoneme) for word boundaries.
+                    for p in phones {
+                        if let id = token2id[p] { ids.append(id) }
+                    }
+                } else {
+                    appendGraphemes(word, into: &ids)
+                }
+            case .separator(let sep):
+                if let id = token2id[sep] { ids.append(id) }
+            }
+        }
+        return ids
+    }
+
+    public func encodeIpaTokens(_ tokens: [String]) throws -> [Int32] {
+        var ids: [Int32] = []
+        appendSpace(&ids)
+        for p in tokens {
+            guard let id = token2id[p] else {
+                throw MagpieError.invalidConstants(
+                    "IPA override token '\(p)' is not in \(language.rawValue) token2id map")
+            }
+            ids.append(id)
+        }
+        return ids
+    }
+
+    // MARK: - Helpers
+
+    /// Match NeMo `tokenizer_metadata.json` `grapheme_case`:
+    ///   english_phoneme: upper, spanish_phoneme: upper, german_phoneme: mixed.
+    private func normalize(_ text: String) -> String {
+        switch language {
+        case .english, .spanish:
+            return text.uppercased()
+        default:
+            return text
+        }
+    }
+
+    private func caseKey(for word: String) -> String {
+        switch language {
+        case .english, .spanish:
+            return word.uppercased()
+        default:
+            return word
+        }
+    }
+
+    private enum Piece {
+        case word(String)
+        case separator(String)
+    }
+
+    /// Split input into word pieces and punctuation/whitespace separators.
+    private func splitWords(_ text: String) -> [Piece] {
+        var pieces: [Piece] = []
+        var current = ""
+        for ch in text {
+            if ch.isLetter {
+                current.append(ch)
+            } else {
+                if !current.isEmpty {
+                    pieces.append(.word(current))
+                    current = ""
+                }
+                let s = String(ch)
+                if ch.isWhitespace {
+                    pieces.append(.separator(" "))
+                } else {
+                    pieces.append(.separator(s))
+                }
+            }
+        }
+        if !current.isEmpty { pieces.append(.word(current)) }
+        return pieces
+    }
+
+    private func appendSpace(_ ids: inout [Int32]) {
+        if let id = token2id[" "] { ids.append(id) }
+    }
+
+    private func appendGraphemes(_ word: String, into ids: inout [Int32]) {
+        for ch in word {
+            if let id = token2id[String(ch)] {
+                ids.append(id)
+            }
+        }
+    }
+
+    // MARK: - JSON loaders
+
+    static func loadTokenMap(_ url: URL) throws -> [String: Int32] {
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            throw MagpieError.tokenizerDataMissing(
+                language: url.deletingPathExtension().lastPathComponent, file: url.lastPathComponent)
+        }
+        let data = try Data(contentsOf: url)
+        let raw = try JSONDecoder().decode([String: Int].self, from: data)
+        var out: [String: Int32] = [:]
+        out.reserveCapacity(raw.count)
+        for (k, v) in raw { out[k] = Int32(v) }
+        return out
+    }
+
+    static func loadPhonemeDict(_ url: URL) throws -> [String: [String]] {
+        guard FileManager.default.fileExists(atPath: url.path) else {
+            throw MagpieError.tokenizerDataMissing(
+                language: url.deletingPathExtension().lastPathComponent, file: url.lastPathComponent)
+        }
+        let data = try Data(contentsOf: url)
+        return try JSONDecoder().decode([String: [String]].self, from: data)
+    }
+
+    static func loadHeteronyms(_ url: URL) throws -> Set<String> {
+        guard FileManager.default.fileExists(atPath: url.path) else { return [] }
+        let data = try Data(contentsOf: url)
+        let list = try JSONDecoder().decode([String].self, from: data)
+        return Set(list)
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieKvCache.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieKvCache.swift
new file mode 100644
index 000000000..addd5b606
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieKvCache.swift
@@ -0,0 +1,225 @@
+@preconcurrency import CoreML
+import Foundation
+
+/// Holds one path's KV cache state for the 12-layer decoder_step model
+/// (rank-4 split-K/V production layout).
+///
+/// Each layer has:
+///   - `cache_k{i}` : `MLMultiArray` shaped `[1, 512, numHeads, headDim]` fp16
+///   - `cache_v{i}` : `MLMultiArray` shaped `[1, 512, numHeads, headDim]` fp16
+///   - `position{i}`: `MLMultiArray` shaped `[1]` fp16 (scalar index into the cache)
+///
+/// After each `decoder_step` forward pass the model returns new K, V and
+/// position buffers under output names that do not match the input names
+/// (scatter rewrite). Names are hard-coded in
+/// `mobius/.../generate_coreml.py` (`DECODER_CACHE_K_OUT_KEYS`,
+/// `DECODER_CACHE_V_OUT_KEYS`, `DECODER_POSITION_KEYS`). This Swift port
+/// mirrors that list and should be regenerated if the Python compile pipeline
+/// changes.
+public final class MagpieKvCache {
+
+    /// Per-layer K cache output names (12 layers).
+    public static let cacheKOutputKeys: [String] = [
+        "new_k_1", "new_k_3", "new_k_5", "new_k_7",
+        "new_k_9", "new_k_11", "new_k_13", "new_k_15",
+        "new_k_17", "new_k_19", "new_k_21", "new_k",
+    ]
+
+    /// Per-layer V cache output names (12 layers).
+    public static let cacheVOutputKeys: [String] = [
+        "new_v_1", "new_v_3", "new_v_5", "new_v_7",
+        "new_v_9", "new_v_11", "new_v_13", "new_v_15",
+        "new_v_17", "new_v_19", "new_v_21", "new_v",
+    ]
+
+    /// Per-layer scalar position output names (12 layers).
+    public static let positionOutputKeys: [String] = [
+        "var_169", "var_339", "var_509", "var_679",
+        "var_849", "var_1019", "var_1189", "var_1359",
+        "var_1529", "var_1699", "var_1869", "var_2039",
+    ]
+
+    /// Per-layer combined K/V output keys for `decoder_prefill.mlmodelc`.
+    /// Each output is shaped `[2, 1, 512, 12, 64]` fp16 where index 0 = K and
+    /// index 1 = V (axis-0 stacked).
+    public static let prefillCacheOutputKeys: [String] = [
+        "var_208", "var_374", "var_540", "var_706",
+        "var_872", "var_1038", "var_1204", "var_1370",
+        "var_1536", "var_1702", "var_1868", "var_1958",
+    ]
+
+    public static let decoderHiddenKey = "input"
+
+    public private(set) var cachesK: [MLMultiArray]
+    public private(set) var cachesV: [MLMultiArray]
+    public private(set) var positions: [MLMultiArray]
+
+    /// Back-buffer set for double-buffered AR loop. Used as `outputBackings` so
+    /// CoreML writes new K/V/pos straight into our pre-allocated arrays instead
+    /// of allocating ~18.9 MB of fresh fp16 buffers per step. After each
+    /// `decoder_step` call the synthesizer calls `swapBackings()` to promote
+    /// the back set to the new front (used as the next step's inputs).
+    private var cachesKBack: [MLMultiArray]
+    private var cachesVBack: [MLMultiArray]
+    private var positionsBack: [MLMultiArray]
+
+    public let numLayers: Int
+    public let maxCacheLength: Int
+    public let numHeads: Int
+    public let headDim: Int
+
+    public init(numLayers: Int, maxCacheLength: Int, numHeads: Int, headDim: Int) throws {
+        self.numLayers = numLayers
+        self.maxCacheLength = maxCacheLength
+        self.numHeads = numHeads
+        self.headDim = headDim
+        let cacheShape: [NSNumber] = [
+            1, NSNumber(value: maxCacheLength),
+            NSNumber(value: numHeads),
+            NSNumber(value: headDim),
+        ]
+        func makeCacheArr() throws -> MLMultiArray {
+            let arr = try MLMultiArray(shape: cacheShape, dataType: .float16)
+            arr.zeroFillFloat16()
+            return arr
+        }
+        func makePosArr() throws -> MLMultiArray {
+            let arr = try MLMultiArray(shape: [1], dataType: .float16)
+            arr.zeroFillFloat16()
+            return arr
+        }
+        self.cachesK = try (0..<numLayers).map { _ in try makeCacheArr() }
+        self.cachesV = try (0..<numLayers).map { _ in try makeCacheArr() }
+        self.positions = try (0..<numLayers).map { _ in try makePosArr() }
+        self.cachesKBack = try (0..<numLayers).map { _ in try makeCacheArr() }
+        self.cachesVBack = try (0..<numLayers).map { _ in try makeCacheArr() }
+        self.positionsBack = try (0..<numLayers).map { _ in try makePosArr() }
+    }
+
+    /// Populate `inputs` with `cache_k{i}` + `cache_v{i}` + `position{i}` keys.
+    public func addInputs(to inputs: inout [String: MLMultiArray]) {
+        for i in 0..<numLayers {
+            inputs["cache_k\(i)"] = cachesK[i]
+            inputs["cache_v\(i)"] = cachesV[i]
+            inputs["position\(i)"] = positions[i]
+        }
+    }
+
+    /// Populate `outputBackings` with the back-buffer arrays under each output
+    /// key. CoreML will write directly into these arrays instead of allocating.
+    public func addOutputBackings(to backings: inout [String: Any]) {
+        for i in 0..<numLayers {
+            backings[Self.cacheKOutputKeys[i]] = cachesKBack[i]
+            backings[Self.cacheVOutputKeys[i]] = cachesVBack[i]
+            backings[Self.positionOutputKeys[i]] = positionsBack[i]
+        }
+    }
+
+    /// Promote the back-buffer set to the new front (which now holds the just-
+    /// written K/V/pos for layer i). The old front becomes the new back and
+    /// will be overwritten on the next prediction call. Cheap pointer-swap;
+    /// no data copy.
+    public func swapBackings() {
+        swap(&cachesK, &cachesKBack)
+        swap(&cachesV, &cachesVBack)
+        swap(&positions, &positionsBack)
+    }
+
+    /// Slow path: pull new K/V/pos out of an output `MLFeatureProvider` and
+    /// replace front pointers. Used when `outputBackings` is unavailable
+    /// (e.g. if a future macOS revision rejects our buffer layout).
+    public func absorbOutputs(_ output: MLFeatureProvider) throws {
+        for i in 0..<numLayers {
+            guard let newK = output.featureValue(for: Self.cacheKOutputKeys[i])?.multiArrayValue
+            else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_step",
+                    underlying: "missing K cache output key \(Self.cacheKOutputKeys[i])")
+            }
+            guard let newV = output.featureValue(for: Self.cacheVOutputKeys[i])?.multiArrayValue
+            else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_step",
+                    underlying: "missing V cache output key \(Self.cacheVOutputKeys[i])")
+            }
+            guard let newPos = output.featureValue(for: Self.positionOutputKeys[i])?.multiArrayValue
+            else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_step",
+                    underlying: "missing position output key \(Self.positionOutputKeys[i])")
+            }
+            cachesK[i] = newK
+            cachesV[i] = newV
+            positions[i] = newPos
+        }
+    }
+
+    /// Current decoder position as read from layer 0's position tensor.
+    public var position: Int {
+        guard numLayers > 0 else { return 0 }
+        return Int(positions[0][0].floatValue)
+    }
+
+    /// Seed cache state from `decoder_prefill.mlmodelc` outputs.
+    ///
+    /// Each prefill output is a `[2, 1, 512, numHeads, headDim]` fp16 tensor
+    /// where slice 0 along axis 0 is K and slice 1 is V. After this call,
+    /// `cachesK[i] / cachesV[i]` hold those slices and `positions[i]` is set to
+    /// `prefillLength` (= 110 for the 110-token speaker context).
+    public func seedFromPrefillOutputs(
+        _ output: MLFeatureProvider, prefillLength: Int
+    ) throws {
+        let perLayerCount = maxCacheLength * numHeads * headDim
+        let bytesPerSlice = perLayerCount * MemoryLayout<UInt16>.size
+
+        for i in 0..<numLayers {
+            let key = Self.prefillCacheOutputKeys[i]
+            guard let stacked = output.featureValue(for: key)?.multiArrayValue else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_prefill",
+                    underlying: "missing prefill output key \(key)")
+            }
+            guard stacked.dataType == .float16 else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_prefill",
+                    underlying: "prefill output \(key) dtype is \(stacked.dataType.rawValue), expected fp16")
+            }
+            // Validate shape: [2, 1, 512, numHeads, headDim].
+            let s = stacked.shape.map { $0.intValue }
+            guard s.count == 5,
+                s[0] == 2, s[1] == 1, s[2] == maxCacheLength,
+                s[3] == numHeads, s[4] == headDim
+            else {
+                throw MagpieError.inferenceFailed(
+                    stage: "decoder_prefill",
+                    underlying: "prefill output \(key) has unexpected shape \(s)")
+            }
+
+            let basePtr = stacked.dataPointer.assumingMemoryBound(to: UInt8.self)
+            // K slice = bytes [0, bytesPerSlice). V slice = bytes [bytesPerSlice, 2*bytesPerSlice).
+            memcpy(cachesK[i].dataPointer, basePtr, bytesPerSlice)
+            memcpy(cachesV[i].dataPointer, basePtr.advanced(by: bytesPerSlice), bytesPerSlice)
+
+            // Position: scalar fp16 = prefillLength.
+            let pos = try MLMultiArray(shape: [1], dataType: .float16)
+            pos.zeroFillFloat16()
+            pos[0] = NSNumber(value: Float(prefillLength))
+            positions[i] = pos
+        }
+    }
+}
+
+// MARK: - Helpers
+
+extension MLMultiArray {
+    /// Zero-fill an fp16 `MLMultiArray` fast (uses `memset`).
+    internal func zeroFillFloat16() {
+        guard dataType == .float16 else {
+            for i in 0..<count { self[i] = NSNumber(value: 0.0) }
+            return
+        }
+        // fp16 zero is the same byte pattern as int16 zero (0x0000).
+        let bytes = count * 2
+        memset(dataPointer, 0, bytes)
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieNanocodec.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieNanocodec.swift
new file mode 100644
index 000000000..774db99f5
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieNanocodec.swift
@@ -0,0 +1,66 @@
+@preconcurrency import CoreML
+import Foundation
+
+/// Wraps the `nanocodec_decoder.mlmodelc` model. Takes `(numCodebooks, Ttotal)`
+/// int32 codes, pads to `maxFrames = 256`, runs the decoder, returns fp32 PCM.
+public struct MagpieNanocodec {
+
+    public let model: MLModel
+    public let numCodebooks: Int
+    public let maxFrames: Int
+    public let samplesPerFrame: Int
+
+    public init(
+        model: MLModel,
+        numCodebooks: Int = MagpieConstants.numCodebooks,
+        maxFrames: Int = MagpieConstants.maxNanocodecFrames,
+        samplesPerFrame: Int = MagpieConstants.codecSamplesPerFrame
+    ) {
+        self.model = model
+        self.numCodebooks = numCodebooks
+        self.maxFrames = maxFrames
+        self.samplesPerFrame = samplesPerFrame
+    }
+
+    /// - Parameter frames: row-major `[numCodebooks][Ttotal]` codes.
+    public func decode(frames: [[Int32]]) throws -> [Float] {
+        precondition(frames.count == numCodebooks, "expected \(numCodebooks) codebook rows")
+        let tTotal = min(frames[0].count, maxFrames)
+
+        // Build tokens tensor: (1, numCodebooks, maxFrames) int32, zero-padded.
+        let tokens = try MLMultiArray(
+            shape: [1, NSNumber(value: numCodebooks), NSNumber(value: maxFrames)],
+            dataType: .int32)
+        tokens.withUnsafeMutableBytes { ptr, strides in
+            let base = ptr.bindMemory(to: Int32.self).baseAddress!
+            let total = numCodebooks * maxFrames
+            for i in 0..<total { base[i] = 0 }
+            for cb in 0..<numCodebooks {
+                for t in 0..<tTotal {
+                    base[cb * maxFrames + t] = frames[cb][t]
+                }
+            }
+            _ = strides
+        }
+
+        let provider = try MLDictionaryFeatureProvider(dictionary: [
+            "tokens": MLFeatureValue(multiArray: tokens)
+        ])
+        let output = try model.prediction(from: provider)
+        guard let audio = output.featureValue(for: "audio")?.multiArrayValue else {
+            throw MagpieError.inferenceFailed(
+                stage: "nanocodec", underlying: "missing 'audio' output key")
+        }
+
+        let expected = tTotal * samplesPerFrame
+        var samples = Swift.Array<Float>(repeating: 0, count: expected)
+        audio.withUnsafeBytes { raw in
+            let ptr = raw.bindMemory(to: Float.self)
+            let available = min(expected, audio.count)
+            for i in 0..<available {
+                samples[i] = ptr[i]
+            }
+        }
+        return samples
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpiePrefill.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpiePrefill.swift
new file mode 100644
index 000000000..447d3de96
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpiePrefill.swift
@@ -0,0 +1,114 @@
+@preconcurrency import CoreML
+import Foundation
+
+/// Prefills the decoder KV cache with the 110-token speaker context.
+///
+/// Two paths:
+///   - Fast: `decoder_prefill.mlmodelc` (1 batched call with `audio_embed` shape
+///     `[1, 110, 768]`) → outputs 12 stacked K/V tensors `[2, 1, 512, 12, 64]`.
+///   - Fallback: 110 sequential `decoder_step.mlmodelc` calls when
+///     `decoder_prefill` is unavailable or fails.
+public struct MagpiePrefill {
+
+    private let logger = AppLogger(category: "MagpiePrefill")
+    private let decoderStep: MLModel
+
+    public init(decoderStep: MLModel) {
+        self.decoderStep = decoderStep
+    }
+
+    /// Run the batched `decoder_prefill.mlmodelc` once, seed the cache from its
+    /// 12 stacked-K/V outputs, and set every layer's position to
+    /// `speakerContextLength`.
+    public func prefillFast(
+        decoderPrefill: MLModel,
+        speakerEmbedding: [Float],
+        speakerContextLength: Int,
+        dModel: Int,
+        encoderOutput: MLMultiArray,
+        encoderMask: MLMultiArray,
+        cache: MagpieKvCache
+    ) throws {
+        precondition(speakerEmbedding.count == speakerContextLength * dModel)
+
+        let audioEmbed = try MLMultiArray(
+            shape: [
+                1, NSNumber(value: speakerContextLength), NSNumber(value: dModel),
+            ],
+            dataType: .float32)
+        audioEmbed.withUnsafeMutableBytes { ptr, _ in
+            let base = ptr.bindMemory(to: Float.self).baseAddress!
+            for i in 0..<speakerEmbedding.count { base[i] = speakerEmbedding[i] }
+        }
+
+        let inputs: [String: MLFeatureValue] = [
+            "audio_embed": MLFeatureValue(multiArray: audioEmbed),
+            "encoder_output": MLFeatureValue(multiArray: encoderOutput),
+            "encoder_mask": MLFeatureValue(multiArray: encoderMask),
+        ]
+        let provider = try MLDictionaryFeatureProvider(dictionary: inputs)
+        let output = try decoderPrefill.prediction(from: provider)
+        try cache.seedFromPrefillOutputs(output, prefillLength: speakerContextLength)
+    }
+
+    public func prefill(
+        speakerEmbedding: [Float],
+        speakerContextLength: Int,
+        dModel: Int,
+        encoderOutput: MLMultiArray,
+        encoderMask: MLMultiArray,
+        cache: MagpieKvCache
+    ) throws {
+        precondition(speakerEmbedding.count == speakerContextLength * dModel)
+
+        for t in 0..<speakerContextLength {
+            let tokenBuffer = try MLMultiArray(
+                shape: [1, 1, NSNumber(value: dModel)], dataType: .float32)
+            let srcStart = t * dModel
+            tokenBuffer.withUnsafeMutableBytes { ptr, strides in
+                let base = ptr.bindMemory(to: Float.self).baseAddress!
+                for i in 0..<dModel {
+                    base[i] = speakerEmbedding[srcStart + i]
+                }
+                _ = strides
+            }
+
+            var inputs: [String: MLMultiArray] = [
+                "audio_embed": tokenBuffer,
+                "encoder_output": encoderOutput,
+                "encoder_mask": encoderMask,
+            ]
+            cache.addInputs(to: &inputs)
+
+            let provider = try MLDictionaryFeatureProvider(
+                dictionary: inputs.mapValues { MLFeatureValue(multiArray: $0) })
+            let output = try decoderStep.prediction(from: provider)
+            try cache.absorbOutputs(output)
+        }
+        logger.info("Prefill complete: position = \(cache.position)")
+    }
+
+    /// Build the unconditional (CFG) encoder output + mask pair: zero tensor +
+    /// mask with only slot 0 unmasked (mirrors NeMo's `prepare_dummy_cond_for_cfg`).
+    public static func makeUnconditional(
+        encoderOutputShape shape: [NSNumber], maxTextLen: Int
+    ) throws -> (encoderOutput: MLMultiArray, encoderMask: MLMultiArray) {
+        let encOut = try MLMultiArray(shape: shape, dataType: .float32)
+        encOut.zeroFillFloat()
+        let mask = try MLMultiArray(
+            shape: [1, NSNumber(value: maxTextLen)], dataType: .float32)
+        mask.zeroFillFloat()
+        mask[[0, 0] as [NSNumber]] = NSNumber(value: 1.0)
+        return (encOut, mask)
+    }
+}
+
+extension MLMultiArray {
+    fileprivate func zeroFillFloat() {
+        guard dataType == .float32 else {
+            for i in 0..<count { self[i] = NSNumber(value: 0.0) }
+            return
+        }
+        memset(dataPointer, 0, count * MemoryLayout<Float>.size)
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieSynthesizer.swift b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieSynthesizer.swift
new file mode 100644
index 000000000..68321cc6c
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Pipeline/Synthesize/MagpieSynthesizer.swift
@@ -0,0 +1,826 @@
+import Accelerate
+@preconcurrency import CoreML
+import Foundation
+
+/// Orchestrates one Magpie synthesis call end-to-end.
+///
+/// Pipeline (mirroring `generate_coreml.generate`):
+///   1. Tokenize text → padded ids (256) + mask.
+///   2. `text_encoder.predict` → encoderOutput (1, 256, 768).
+///   3. (CFG) make zero-context encoder pair.
+///   4. Prefill: 110 step-by-step `decoder_step` calls with speaker embedding rows.
+///   5. AR loop (≤ 500 steps):
+///        embed current 8 codes → `decoder_step` → LT sample → new codes.
+///   6. NanoCodec decode → fp32 PCM 22 kHz.
+///   7. Peak-normalize to 0.9 when `options.peakNormalize`.
+public actor MagpieSynthesizer {
+
+    private let logger = AppLogger(category: "MagpieSynthesizer")
+
+    private let store: MagpieModelStore
+    private let tokenizer: MagpieTokenizer
+
+    public init(store: MagpieModelStore, tokenizer: MagpieTokenizer) {
+        self.store = store
+        self.tokenizer = tokenizer
+    }
+
+    /// One-shot CoreML graph warmup. Runs a minimal `text_encoder` →
+    /// prefill → AR loop (~4–8 steps) → `nanocodec_decoder` pass on a
+    /// throwaway "." input so each model's first-call specialization (Metal
+    /// dispatch, ANE compile, output-backing layout) is paid here instead of
+    /// at the user's first `synthesize` call.
+    ///
+    /// Discards all generated audio. CFG is forced off so the unconditional
+    /// branch isn't warmed unless the user actually opts into it.
+    public func warmup() async throws {
+        // minFrames > maxSteps forbids EOS for the entire warmup, guaranteeing
+        // we hit `maxSteps` decoder_step calls (instead of stopping at step 4
+        // when topK=1 picks EOS). 16 steps trades full graph specialization
+        // for shorter init time — covers the first-call dispatch overhead on
+        // text_encoder + decoder_step + nanocodec without doubling load time.
+        let warmupOpts = MagpieSynthesisOptions(
+            temperature: 1.0,
+            topK: 1,
+            maxSteps: 16,
+            minFrames: 32,
+            cfgScale: 1.0,
+            seed: 0,
+            peakNormalize: false,
+            allowIpaOverride: false)
+        let tokenized = try await tokenizer.tokenize(
+            ".", language: .english, options: warmupOpts)
+        let frames = try await synthesizeFrames(
+            tokenized: tokenized, speaker: .john, options: warmupOpts)
+
+        // Force a nanocodec dispatch even if the AR loop stopped on EOS at
+        // step 4. nanocodec pads to maxFrames internally so any T works.
+        let constants = try await store.constants()
+        let nanoModel = try await store.nanocodecDecoder()
+        let nano = MagpieNanocodec(
+            model: nanoModel, numCodebooks: constants.config.numCodebooks)
+        let rows: [[Int32]]
+        if frames.numFrames > 0 {
+            rows = frames.codebookRows
+        } else {
+            rows = Swift.Array(
+                repeating: Swift.Array<Int32>(repeating: 0, count: 1),
+                count: constants.config.numCodebooks)
+        }
+        _ = try nano.decode(frames: rows)
+        logger.info("Warmup complete (text_encoder + prefill + decoder_step + nanocodec)")
+    }
+
+    /// Synthesize from plain text (honors `|...|` IPA override per `options`).
+    ///
+    /// Long inputs are split into sentence-level chunks via `MagpieChunker` so
+    /// each piece fits inside the NanoCodec 256-frame static-shape cap (~11.9 s
+    /// of audio). Chunks are synthesized **pipelined**: the actor runs the AR
+    /// loop for chunk N+1 on `decoder_step` (GPU/Metal) while a detached task
+    /// runs `nanocodec_decoder` (CPU) for chunk N — the two stages don't share
+    /// compute, so the wall time becomes `Σ AR + last nanocodec` instead of
+    /// `Σ (AR + nanocodec)`.
+    public func synthesize(
+        text: String, speaker: MagpieSpeaker, language: MagpieLanguage,
+        options: MagpieSynthesisOptions
+    ) async throws -> MagpieSynthesisResult {
+        let chunks = MagpieChunker.chunk(text: text, language: language)
+        logger.info(
+            "Chunker produced \(chunks.count) chunk(s) "
+                + "from \(text.count)-char input (lang=\(language.rawValue))")
+        if chunks.count <= 1 {
+            let tokenized = try await tokenizer.tokenize(
+                text, language: language, options: options)
+            return try await synthesize(
+                tokenized: tokenized, speaker: speaker, options: options)
+        }
+
+        // Disable per-chunk peak normalization; apply once globally so chunk
+        // boundaries don't get rescaled inconsistently.
+        var perChunkOptions = options
+        perChunkOptions.peakNormalize = false
+
+        let sampleRate = MagpieConstants.audioSampleRate
+        let nanocodecModel = try await store.nanocodecDecoder()
+        let numCodebooks = try await store.constants().config.numCodebooks
+
+        // Pre-allocate ordered slots so detached nanocodec tasks can deposit
+        // their PCM into the right chunk index regardless of completion order.
+        var chunkSamples: [[Float]] = Array(repeating: [], count: chunks.count)
+        var totalCodes = 0
+        var lastFinishedOnEos = false
+        var sumTextEnc: Double = 0
+        var sumPrefill: Double = 0
+        var sumArLoop: Double = 0
+        var sumDecoder: Double = 0
+        var sumSampler: Double = 0
+        var sumNano: Double = 0
+
+        // The nanocodec future for the *previous* chunk. While the actor runs
+        // synthesizeFrames for the current chunk on Metal, this task converts
+        // codes → PCM on CPU in parallel.
+        var pendingNano: Task<NanocodecJobResult, Error>? = nil
+
+        for (i, chunk) in chunks.enumerated() {
+            logger.info(
+                "Synthesizing chunk \(i + 1)/\(chunks.count) "
+                    + "(\(chunk.text.count) chars, est \(chunk.estimatedCodes) codes)")
+            let tokenized = try await tokenizer.tokenize(
+                chunk.text, language: language, options: perChunkOptions)
+            let frames = try await synthesizeFrames(
+                tokenized: tokenized, speaker: speaker, options: perChunkOptions)
+            totalCodes += frames.numFrames
+            lastFinishedOnEos = frames.finishedOnEos
+            sumTextEnc += frames.textEncoderSeconds
+            sumPrefill += frames.prefillSeconds
+            sumArLoop += frames.arLoopSeconds
+            sumDecoder += frames.decoderStepSeconds
+            sumSampler += frames.samplerSeconds
+
+            // Spawn nanocodec for *this* chunk on a background task so the
+            // next iteration's AR loop can start immediately on the actor.
+            let chunkIdx = i
+            let rows = frames.codebookRows
+            let model = nanocodecModel
+            let codebooks = numCodebooks
+            // Use .utility priority: decoder_step on Metal needs CPU
+            // bandwidth for its Metal driver thread, and an aggressive nano
+            // task throttles it. .utility lets the actor's AR loop keep
+            // priority while still running nano in parallel.
+            let newTask = Self.startNanoChunkTask(
+                model: model, numCodebooks: codebooks,
+                rows: rows, chunkIndex: chunkIdx)
+
+            // Drain the previous chunk's nanocodec while we set up the next.
+            if let prev = pendingNano {
+                let result = try await prev.value
+                chunkSamples[result.chunkIndex] = result.samples
+                sumNano += result.seconds
+            }
+            pendingNano = newTask
+        }
+
+        // Drain the last in-flight nanocodec.
+        if let last = pendingNano {
+            let result = try await last.value
+            chunkSamples[result.chunkIndex] = result.samples
+            sumNano += result.seconds
+        }
+
+        // Concatenate ordered chunks with punctuation-aware silence between.
+        var totalLen = 0
+        for (i, s) in chunkSamples.enumerated() {
+            totalLen += s.count
+            if i < chunks.count - 1 {
+                totalLen += (chunks[i].pauseAfterMs * sampleRate) / 1_000
+            }
+        }
+        var combined = Swift.Array<Float>()
+        combined.reserveCapacity(totalLen)
+        for (i, s) in chunkSamples.enumerated() {
+            combined.append(contentsOf: s)
+            if i < chunks.count - 1 {
+                let silenceCount = (chunks[i].pauseAfterMs * sampleRate) / 1_000
+                if silenceCount > 0 {
+                    combined.append(
+                        contentsOf: Swift.Array<Float>(repeating: 0, count: silenceCount))
+                }
+            }
+        }
+
+        if options.peakNormalize {
+            var peak: Float = 0
+            for s in combined where abs(s) > peak { peak = abs(s) }
+            if peak > 0 {
+                let scale = MagpieConstants.peakTarget / peak
+                for i in 0..<combined.count { combined[i] *= scale }
+            }
+        }
+
+        let timings = MagpieSynthesisTimings(
+            textEncoderSeconds: sumTextEnc,
+            prefillSeconds: sumPrefill,
+            arLoopSeconds: sumArLoop,
+            decoderStepSeconds: sumDecoder,
+            samplerSeconds: sumSampler,
+            nanocodecSeconds: sumNano)
+
+        return MagpieSynthesisResult(
+            samples: combined,
+            sampleRate: sampleRate,
+            codeCount: totalCodes,
+            finishedOnEos: lastFinishedOnEos,
+            timings: timings)
+    }
+
+    /// Result returned by a detached nanocodec task. Sendable so it can cross
+    /// the actor boundary cleanly.
+    private struct NanocodecJobResult: Sendable {
+        let chunkIndex: Int
+        let samples: [Float]
+        let seconds: Double
+    }
+
+    /// Streaming variant of `synthesize(text:...)`: yields `MagpieAudioChunk`s
+    /// as each chunk's codec decode finishes, instead of returning a single
+    /// concatenated buffer at the end.
+    ///
+    /// The chunker uses a smaller cap on the *first* chunk
+    /// (`MagpieChunker.streamingFirstChunkCap`, ~50 codec frames ≈ 2.3 s) so
+    /// time-to-first-audio drops from "all-text AR loop" to roughly
+    /// `prefill + (50-step AR loop) + nanocodec ≈ 1.5–4 s` on M-series. Later
+    /// chunks pack normally (≤ `MagpieChunker.maxCodesPerChunk` = 220 codes).
+    ///
+    /// Producer/consumer split internal:
+    ///   - Producer task runs the AR loop on the actor and pushes per-chunk
+    ///     `ChunkFrames` into an internal `AsyncThrowingStream`.
+    ///   - Consumer (this method's body) drains those, runs `nanocodec_decoder`
+    ///     on a detached task per chunk, applies a 5 ms edge-fade, and yields
+    ///     the resulting `MagpieAudioChunk` to the public stream.
+    ///
+    /// AR (slow, GPU/Metal) overlaps nanocodec (fast, CPU) via actor
+    /// reentrancy: while the consumer is awaiting a detached nanocodec task,
+    /// the actor is free for the producer to run the next AR loop.
+    ///
+    /// Cancellation: cancelling the consumer task (or breaking out of the
+    /// `for try await` loop) terminates the producer task as well.
+    ///
+    /// Note: `peakNormalize` is force-disabled for the stream (we can't peak-
+    /// normalize incrementally without changing chunk gain mid-stream).
+    public nonisolated func synthesizeStream(
+        text: String, speaker: MagpieSpeaker, language: MagpieLanguage,
+        options: MagpieSynthesisOptions
+    ) -> AsyncThrowingStream<MagpieAudioChunk, Error> {
+        AsyncThrowingStream { continuation in
+            let task = Task {
+                do {
+                    try await self.produceStream(
+                        text: text, speaker: speaker, language: language,
+                        options: options, continuation: continuation)
+                    continuation.finish()
+                } catch {
+                    continuation.finish(throwing: error)
+                }
+            }
+            continuation.onTermination = { _ in task.cancel() }
+        }
+    }
+
+    private func produceStream(
+        text: String, speaker: MagpieSpeaker, language: MagpieLanguage,
+        options: MagpieSynthesisOptions,
+        continuation: AsyncThrowingStream<MagpieAudioChunk, Error>.Continuation
+    ) async throws {
+        let chunks = MagpieChunker.chunkForStreaming(text: text, language: language)
+        logger.info(
+            "[stream] chunker produced \(chunks.count) chunk(s) "
+                + "from \(text.count)-char input (lang=\(language.rawValue))")
+        guard !chunks.isEmpty else { return }
+
+        var perChunkOptions = options
+        perChunkOptions.peakNormalize = false
+
+        let sampleRate = MagpieConstants.audioSampleRate
+        let nanoModel = try await store.nanocodecDecoder()
+        let numCodebooks = try await store.constants().config.numCodebooks
+        let totalChunks = chunks.count
+
+        // Producer/consumer pipe.
+        let (framesStream, framesContinuation) = AsyncThrowingStream<
+            ProducedFrames, Error
+        >.makeStream()
+
+        // Producer: AR loop on actor. Pushes each chunk's frames as soon as
+        // they're ready.
+        let producerTokenizer = self.tokenizer
+        let producer = Task {
+            do {
+                for (i, chunk) in chunks.enumerated() {
+                    try Task.checkCancellation()
+                    let tokenized = try await producerTokenizer.tokenize(
+                        chunk.text, language: language, options: perChunkOptions)
+                    let frames = try await self.synthesizeFrames(
+                        tokenized: tokenized, speaker: speaker,
+                        options: perChunkOptions)
+                    framesContinuation.yield(
+                        ProducedFrames(index: i, frames: frames, chunk: chunk))
+                }
+                framesContinuation.finish()
+            } catch {
+                framesContinuation.finish(throwing: error)
+            }
+        }
+        defer { producer.cancel() }
+
+        // Consumer: drain frames, run nano per chunk, yield audio in order.
+        var consumed = 0
+        for try await produced in framesStream {
+            try Task.checkCancellation()
+            let isFinal = (produced.index == totalChunks - 1)
+            let rows = produced.frames.codebookRows
+            let pauseMs = produced.chunk.pauseAfterMs
+            let nanoTask = Self.startNanoFramesTask(
+                model: nanoModel, numCodebooks: numCodebooks, rows: rows)
+            var samples = try await nanoTask.value
+            Self.applyEdgeFade(&samples, sampleRate: sampleRate)
+            if !isFinal && pauseMs > 0 {
+                let pad = (pauseMs * sampleRate) / 1_000
+                if pad > 0 {
+                    samples.append(
+                        contentsOf: Swift.Array<Float>(repeating: 0, count: pad))
+                }
+            }
+            let audioChunk = MagpieAudioChunk(
+                samples: samples,
+                sampleRate: sampleRate,
+                sequenceIndex: produced.index,
+                isFinal: isFinal,
+                text: produced.chunk.text,
+                codeCount: produced.frames.numFrames,
+                finishedOnEos: produced.frames.finishedOnEos)
+            continuation.yield(audioChunk)
+            consumed += 1
+        }
+
+        if consumed != totalChunks {
+            logger.warning(
+                "[stream] producer ended early: \(consumed)/\(totalChunks) chunks yielded")
+        }
+    }
+
+    private struct ProducedFrames: Sendable {
+        let index: Int
+        let frames: ChunkFrames
+        let chunk: MagpieTextChunk
+    }
+
+    /// Nonisolated factory for the chunked-path detached task. Creating
+    /// `Task.detached` from a nonisolated static context (rather than from
+    /// inside an actor method) keeps the closure itself nonisolated, which
+    /// avoids the Swift 6 region-based isolation error
+    /// "'self'-isolated value of type ... passed as a strongly transferred
+    /// parameter; later accesses could race".
+    nonisolated private static func startNanoChunkTask(
+        model: sending MLModel, numCodebooks: Int, rows: sending [[Int32]], chunkIndex: Int
+    ) -> Task<NanocodecJobResult, Error> {
+        Task.detached(priority: .utility) {
+            try Self.decodeNanoChunk(
+                model: model, numCodebooks: numCodebooks,
+                rows: rows, chunkIndex: chunkIndex)
+        }
+    }
+
+    /// Nonisolated factory for the streaming-path detached task. See
+    /// `startNanoChunkTask` for the rationale.
+    nonisolated private static func startNanoFramesTask(
+        model: sending MLModel, numCodebooks: Int, rows: sending [[Int32]]
+    ) -> Task<[Float], Error> {
+        Task.detached(priority: .utility) {
+            try Self.decodeNanoFrames(
+                model: model, numCodebooks: numCodebooks, rows: rows)
+        }
+    }
+
+    /// Nonisolated nanocodec wrapper used by detached tasks in the chunked
+    /// non-streaming path. Returning a `Sendable` `NanocodecJobResult` lets
+    /// the result cross the actor boundary cleanly without tripping Swift 6
+    /// region-based isolation.
+    nonisolated private static func decodeNanoChunk(
+        model: MLModel, numCodebooks: Int, rows: [[Int32]], chunkIndex: Int
+    ) throws -> NanocodecJobResult {
+        let nano = MagpieNanocodec(model: model, numCodebooks: numCodebooks)
+        let start = Date()
+        let samples = try nano.decode(frames: rows)
+        return NanocodecJobResult(
+            chunkIndex: chunkIndex,
+            samples: samples,
+            seconds: Date().timeIntervalSince(start))
+    }
+
+    /// Nonisolated nanocodec wrapper used by detached tasks in the streaming
+    /// path. Returns the raw `[Float]` PCM buffer (Sendable).
+    nonisolated private static func decodeNanoFrames(
+        model: MLModel, numCodebooks: Int, rows: [[Int32]]
+    ) throws -> [Float] {
+        let nano = MagpieNanocodec(model: model, numCodebooks: numCodebooks)
+        return try nano.decode(frames: rows)
+    }
+
+    /// 5 ms linear fade-in/out at chunk boundaries to mask zero-crossing pops
+    /// when chunks are concatenated by the consumer.
+    private static func applyEdgeFade(_ samples: inout [Float], sampleRate: Int) {
+        let fadeMs = 5
+        let fadeLen = (fadeMs * sampleRate) / 1_000
+        let n = min(fadeLen, samples.count / 2)
+        guard n > 0 else { return }
+        for i in 0..<n {
+            let t = Float(i) / Float(n)
+            samples[i] *= t
+            samples[samples.count - 1 - i] *= t
+        }
+    }
+
+    /// Codebook rows + per-stage timings for a single chunk; the nanocodec
+    /// stage is intentionally not run here so the caller can pipeline it.
+    private struct ChunkFrames: Sendable {
+        let codebookRows: [[Int32]]
+        let numFrames: Int
+        let finishedOnEos: Bool
+        let textEncoderSeconds: Double
+        let prefillSeconds: Double
+        let arLoopSeconds: Double
+        let decoderStepSeconds: Double
+        let samplerSeconds: Double
+    }
+
+    /// Synthesize from pre-tokenized phoneme ids.
+    public func synthesize(
+        phonemes: MagpiePhonemeTokens, speaker: MagpieSpeaker,
+        options: MagpieSynthesisOptions
+    ) async throws -> MagpieSynthesisResult {
+        let tokenized = try await tokenizer.pad(phonemes: phonemes)
+        return try await synthesize(tokenized: tokenized, speaker: speaker, options: options)
+    }
+
+    // MARK: - Core
+
+    private func synthesize(
+        tokenized: MagpieTokenizedText, speaker: MagpieSpeaker,
+        options: MagpieSynthesisOptions
+    ) async throws -> MagpieSynthesisResult {
+        let frames = try await synthesizeFrames(
+            tokenized: tokenized, speaker: speaker, options: options)
+        let nanocodecModel = try await store.nanocodecDecoder()
+        let numCodebooks = try await store.constants().config.numCodebooks
+        let nano = MagpieNanocodec(model: nanocodecModel, numCodebooks: numCodebooks)
+        let nanocodecStart = Date()
+        var samples = try nano.decode(frames: frames.codebookRows)
+        let nanocodecSeconds = Date().timeIntervalSince(nanocodecStart)
+        logger.info(
+            "nanocodec done in \(String(format: "%.0f", nanocodecSeconds * 1000))ms")
+
+        if options.peakNormalize {
+            var peak: Float = 0
+            for s in samples where abs(s) > peak { peak = abs(s) }
+            if peak > 0 {
+                let scale = MagpieConstants.peakTarget / peak
+                for i in 0..<samples.count { samples[i] *= scale }
+            }
+        }
+
+        let timings = MagpieSynthesisTimings(
+            textEncoderSeconds: frames.textEncoderSeconds,
+            prefillSeconds: frames.prefillSeconds,
+            arLoopSeconds: frames.arLoopSeconds,
+            decoderStepSeconds: frames.decoderStepSeconds,
+            samplerSeconds: frames.samplerSeconds,
+            nanocodecSeconds: nanocodecSeconds)
+
+        return MagpieSynthesisResult(
+            samples: samples,
+            sampleRate: MagpieConstants.audioSampleRate,
+            codeCount: frames.numFrames,
+            finishedOnEos: frames.finishedOnEos,
+            timings: timings)
+    }
+
+    /// Run text_encoder + prefill + AR loop only; return per-codebook rows
+    /// without invoking nanocodec. Lets the chunked path overlap nanocodec
+    /// (CPU) with the next chunk's AR loop (GPU/Metal).
+    private func synthesizeFrames(
+        tokenized: MagpieTokenizedText, speaker: MagpieSpeaker,
+        options: MagpieSynthesisOptions
+    ) async throws -> ChunkFrames {
+        let constants = try await store.constants()
+        let ltWeights = try await store.localTransformer()
+        let textEncoder = try await store.textEncoder()
+        let decoderStep = try await store.decoderStep()
+
+        let dModel = constants.config.dModel
+        let maxTextLen = MagpieConstants.maxTextLength
+        let numCodebooks = constants.config.numCodebooks
+        let audioBosId = constants.config.audioBosId
+        let audioEosId = constants.config.audioEosId
+        let speakerContextLength = constants.config.speakerContextLength
+
+        let speakerIndex = speaker.rawValue
+        guard speakerIndex >= 0 && speakerIndex < constants.speakerEmbeddings.count else {
+            throw MagpieError.invalidSpeakerIndex(speakerIndex)
+        }
+
+        // 1. text_encoder
+        let textEncoderStart = Date()
+        let encResult = try runTextEncoder(
+            tokenized: tokenized, maxTextLen: maxTextLen, model: textEncoder)
+        let encoderOutput = encResult.encoderOutput
+        let encoderMask = encResult.encoderMask
+        let textEncoderSeconds = Date().timeIntervalSince(textEncoderStart)
+        logger.info(
+            "text_encoder done in \(String(format: "%.0f", textEncoderSeconds * 1000))ms")
+
+        let useCfg = options.cfgScale != 1.0
+        let uncond: (encoderOutput: MLMultiArray, encoderMask: MLMultiArray)?
+        if useCfg {
+            uncond = try MagpiePrefill.makeUnconditional(
+                encoderOutputShape: encoderOutput.shape, maxTextLen: maxTextLen)
+        } else {
+            uncond = nil
+        }
+
+        // 2. KV caches (conditional + optional unconditional).
+        let condCache = try MagpieKvCache(
+            numLayers: constants.config.numDecoderLayers,
+            maxCacheLength: constants.config.maxCacheLength,
+            numHeads: constants.config.numHeads,
+            headDim: constants.config.headDim)
+        let uncondCache: MagpieKvCache? =
+            useCfg
+            ? try MagpieKvCache(
+                numLayers: constants.config.numDecoderLayers,
+                maxCacheLength: constants.config.maxCacheLength,
+                numHeads: constants.config.numHeads,
+                headDim: constants.config.headDim)
+            : nil
+
+        // 3. Prefill (fast batched path when decoder_prefill is available).
+        let prefill = MagpiePrefill(decoderStep: decoderStep)
+        let hasPrefill = await store.hasDecoderPrefill()
+        let prefillStart = Date()
+        if hasPrefill {
+            let decoderPrefill = try await store.decoderPrefill()
+            try prefill.prefillFast(
+                decoderPrefill: decoderPrefill,
+                speakerEmbedding: constants.speakerEmbeddings[speakerIndex],
+                speakerContextLength: speakerContextLength,
+                dModel: dModel,
+                encoderOutput: encoderOutput,
+                encoderMask: encoderMask,
+                cache: condCache)
+            if let uncondTensors = uncond, let uncondCache = uncondCache {
+                let zeroSpeaker = Swift.Array<Float>(
+                    repeating: 0, count: speakerContextLength * dModel)
+                try prefill.prefillFast(
+                    decoderPrefill: decoderPrefill,
+                    speakerEmbedding: zeroSpeaker,
+                    speakerContextLength: speakerContextLength,
+                    dModel: dModel,
+                    encoderOutput: uncondTensors.encoderOutput,
+                    encoderMask: uncondTensors.encoderMask,
+                    cache: uncondCache)
+            }
+        } else {
+            try prefill.prefill(
+                speakerEmbedding: constants.speakerEmbeddings[speakerIndex],
+                speakerContextLength: speakerContextLength,
+                dModel: dModel,
+                encoderOutput: encoderOutput,
+                encoderMask: encoderMask,
+                cache: condCache)
+            if let uncondTensors = uncond, let uncondCache = uncondCache {
+                let zeroSpeaker = Swift.Array<Float>(
+                    repeating: 0, count: speakerContextLength * dModel)
+                try prefill.prefill(
+                    speakerEmbedding: zeroSpeaker,
+                    speakerContextLength: speakerContextLength,
+                    dModel: dModel,
+                    encoderOutput: uncondTensors.encoderOutput,
+                    encoderMask: uncondTensors.encoderMask,
+                    cache: uncondCache)
+            }
+        }
+        let prefillElapsed = Date().timeIntervalSince(prefillStart)
+        logger.info(
+            "Prefill done in \(String(format: "%.2f", prefillElapsed))s "
+                + "(\(hasPrefill ? "fast batched" : "slow loop"))")
+
+        // 4. AR loop.
+        let sampler = MagpieLocalSampler(
+            localTransformer: MagpieLocalTransformer(weights: ltWeights),
+            audioEmbeddings: constants.audioEmbeddings)
+
+        var currentCodes = Swift.Array<Int32>(repeating: audioBosId, count: numCodebooks)
+        var allFrames: [[Int32]] = []
+        var finishedOnEos = false
+
+        let rng = MagpieSamplerRng(seed: options.seed)
+
+        // Allocate audio_embed buffer once; refill in-place each step (vDSP).
+        let audioEmbed = try MLMultiArray(
+            shape: [1, 1, NSNumber(value: dModel)], dataType: .float32)
+
+        // Pre-allocate the decoder_hidden output backing once. CoreML writes
+        // straight into this array each step; we then read it fp16 → fp32 via
+        // vImage. Shape: [1, 1, dModel] fp16 (per decoder_step.mlmodelc).
+        let condHiddenBacking = try MLMultiArray(
+            shape: [1, 1, NSNumber(value: dModel)], dataType: .float16)
+        condHiddenBacking.zeroFillFloat16()
+        let uncondHiddenBacking: MLMultiArray? =
+            useCfg
+            ? {
+                let arr = try? MLMultiArray(
+                    shape: [1, 1, NSNumber(value: dModel)], dataType: .float16)
+                arr?.zeroFillFloat16()
+                return arr
+            }() : nil
+
+        let arLoopStart = Date()
+        var decoderStepNanos: UInt64 = 0
+        var samplerNanos: UInt64 = 0
+
+        for step in 0..<options.maxSteps {
+            try fillAudioEmbed(
+                audioEmbed, codes: currentCodes,
+                tables: constants.audioEmbeddings, dModel: dModel)
+
+            let dsStart = DispatchTime.now()
+            let condHidden = try runDecoderStep(
+                audioEmbed: audioEmbed,
+                encoderOutput: encoderOutput, encoderMask: encoderMask,
+                cache: condCache, hiddenBacking: condHiddenBacking,
+                dModel: dModel, model: decoderStep)
+
+            let uncondHidden: [Float]?
+            if useCfg, let uncondTensors = uncond, let uncondCache = uncondCache,
+                let uncondBacking = uncondHiddenBacking
+            {
+                uncondHidden = try runDecoderStep(
+                    audioEmbed: audioEmbed,
+                    encoderOutput: uncondTensors.encoderOutput,
+                    encoderMask: uncondTensors.encoderMask,
+                    cache: uncondCache, hiddenBacking: uncondBacking,
+                    dModel: dModel, model: decoderStep)
+            } else {
+                uncondHidden = nil
+            }
+            decoderStepNanos &+= DispatchTime.now().uptimeNanoseconds &- dsStart.uptimeNanoseconds
+
+            let forbidEos = step < options.minFrames
+            let smpStart = DispatchTime.now()
+            let next = sampler.sample(
+                decoderHidden: condHidden,
+                uncondDecoderHidden: uncondHidden,
+                forbidEos: forbidEos,
+                options: options,
+                rng: rng)
+            samplerNanos &+= DispatchTime.now().uptimeNanoseconds &- smpStart.uptimeNanoseconds
+
+            let isEos = next.contains(audioEosId)
+            if isEos && step >= options.minFrames {
+                finishedOnEos = true
+                logger.info("EOS at step \(step)")
+                break
+            }
+            allFrames.append(next)
+            currentCodes = next
+        }
+
+        let numFrames = allFrames.count
+        guard numFrames > 0 else {
+            throw MagpieError.inferenceFailed(
+                stage: "synthesize", underlying: "no audio frames generated")
+        }
+        let arLoopElapsed = Date().timeIntervalSince(arLoopStart)
+        if arLoopElapsed > 0 {
+            let dsMs = Double(decoderStepNanos) / 1_000_000.0
+            let smpMs = Double(samplerNanos) / 1_000_000.0
+            logger.info(
+                "AR loop: \(numFrames) frames in "
+                    + "\(String(format: "%.2f", arLoopElapsed))s "
+                    + "(\(String(format: "%.1f", Double(numFrames) / arLoopElapsed)) fps) "
+                    + "decoder=\(String(format: "%.0f", dsMs))ms "
+                    + "(\(String(format: "%.1f", dsMs / Double(numFrames)))ms/step) "
+                    + "sampler=\(String(format: "%.0f", smpMs))ms "
+                    + "(\(String(format: "%.1f", smpMs / Double(numFrames)))ms/step)")
+        }
+
+        // 5. Reshape (numFrames × numCodebooks) into per-codebook rows; the
+        //    actual nanocodec decode is run by the caller (so the chunked
+        //    path can overlap it with the next chunk's AR loop).
+        var codebookRows = Swift.Array(
+            repeating: Swift.Array<Int32>(repeating: 0, count: numFrames),
+            count: numCodebooks)
+        for t in 0..<numFrames {
+            let row = allFrames[t]
+            for cb in 0..<numCodebooks {
+                codebookRows[cb][t] = row[cb]
+            }
+        }
+
+        return ChunkFrames(
+            codebookRows: codebookRows,
+            numFrames: numFrames,
+            finishedOnEos: finishedOnEos,
+            textEncoderSeconds: textEncoderSeconds,
+            prefillSeconds: prefillElapsed,
+            arLoopSeconds: arLoopElapsed,
+            decoderStepSeconds: Double(decoderStepNanos) / 1_000_000_000.0,
+            samplerSeconds: Double(samplerNanos) / 1_000_000_000.0)
+    }
+
+    // MARK: - Model runners
+
+    nonisolated private func runTextEncoder(
+        tokenized: MagpieTokenizedText, maxTextLen: Int, model: MLModel
+    ) throws -> (encoderOutput: MLMultiArray, encoderMask: MLMultiArray) {
+        let tokenArr = try MLMultiArray(
+            shape: [1, NSNumber(value: maxTextLen)], dataType: .int32)
+        tokenArr.withUnsafeMutableBytes { ptr, _ in
+            let base = ptr.bindMemory(to: Int32.self).baseAddress!
+            for i in 0..<maxTextLen { base[i] = tokenized.paddedIds[i] }
+        }
+        let maskArr = try MLMultiArray(
+            shape: [1, NSNumber(value: maxTextLen)], dataType: .float32)
+        maskArr.withUnsafeMutableBytes { ptr, _ in
+            let base = ptr.bindMemory(to: Float.self).baseAddress!
+            for i in 0..<maxTextLen { base[i] = tokenized.mask[i] }
+        }
+        let provider = try MLDictionaryFeatureProvider(dictionary: [
+            "text_tokens": MLFeatureValue(multiArray: tokenArr),
+            "text_mask": MLFeatureValue(multiArray: maskArr),
+        ])
+        let out = try model.prediction(from: provider)
+        guard let encoderOutput = out.featureValue(for: "encoder_output")?.multiArrayValue else {
+            throw MagpieError.inferenceFailed(
+                stage: "text_encoder", underlying: "missing encoder_output key")
+        }
+        return (encoderOutput, maskArr)
+    }
+
+    nonisolated private func runDecoderStep(
+        audioEmbed: MLMultiArray,
+        encoderOutput: MLMultiArray,
+        encoderMask: MLMultiArray,
+        cache: MagpieKvCache,
+        hiddenBacking: MLMultiArray,
+        dModel: Int,
+        model: MLModel
+    ) throws -> [Float] {
+        var inputs: [String: MLMultiArray] = [
+            "audio_embed": audioEmbed,
+            "encoder_output": encoderOutput,
+            "encoder_mask": encoderMask,
+        ]
+        cache.addInputs(to: &inputs)
+        let provider = try MLDictionaryFeatureProvider(
+            dictionary: inputs.mapValues { MLFeatureValue(multiArray: $0) })
+
+        // Bind every output to a pre-allocated MLMultiArray so CoreML writes
+        // in place instead of allocating ~18.9 MB of fresh fp16 buffers per
+        // step. The cache provides 24 K/V + 12 position back-buffers, the
+        // synthesizer provides the 1 hidden buffer. After the call,
+        // `swapBackings` promotes back→front for the next step's inputs.
+        var backings: [String: Any] = [:]
+        cache.addOutputBackings(to: &backings)
+        backings[MagpieKvCache.decoderHiddenKey] = hiddenBacking
+        let predOpts = MLPredictionOptions()
+        predOpts.outputBackings = backings
+
+        _ = try model.prediction(from: provider, options: predOpts)
+        cache.swapBackings()
+
+        // Hidden state lives in `hiddenBacking` after the call. Convert fp16
+        // → fp32 via vImage into a fresh [Float] result buffer (the sampler
+        // wants `[Float]`).
+        let dim = dModel  // hiddenBacking shape = [1, 1, dModel]
+        var result = Swift.Array<Float>(repeating: 0, count: dim)
+        hiddenBacking.withUnsafeBytes { raw in
+            guard let src = raw.baseAddress else { return }
+            var srcBuffer = vImage_Buffer(
+                data: UnsafeMutableRawPointer(mutating: src),
+                height: 1, width: vImagePixelCount(dim), rowBytes: dim * 2)
+            result.withUnsafeMutableBufferPointer { dst in
+                var dstBuffer = vImage_Buffer(
+                    data: dst.baseAddress, height: 1,
+                    width: vImagePixelCount(dim), rowBytes: dim * 4)
+                _ = vImageConvert_Planar16FtoPlanarF(&srcBuffer, &dstBuffer, 0)
+            }
+        }
+        return result
+    }
+
+    /// In-place mean-of-codebook-embeddings into a pre-allocated MLMultiArray.
+    /// Replaces the per-step alloc + manual loop with vDSP primitives:
+    ///   - `vDSP_vclr` zeros the buffer.
+    ///   - `vDSP_vadd` accumulates each codebook's embedding row.
+    ///   - `vDSP_vsmul` applies the `1 / numCodebooks` scale.
+    private func fillAudioEmbed(
+        _ arr: MLMultiArray, codes: [Int32], tables: [[Float]], dModel: Int
+    ) throws {
+        arr.withUnsafeMutableBytes { ptr, _ in
+            guard let base = ptr.bindMemory(to: Float.self).baseAddress else { return }
+            vDSP_vclr(base, 1, vDSP_Length(dModel))
+            let numCodebooks = codes.count
+            for cb in 0..<numCodebooks {
+                let row = Int(codes[cb])
+                tables[cb].withUnsafeBufferPointer { tablePtr in
+                    guard let tableBase = tablePtr.baseAddress else { return }
+                    let rowPtr = tableBase.advanced(by: row * dModel)
+                    vDSP_vadd(base, 1, rowPtr, 1, base, 1, vDSP_Length(dModel))
+                }
+            }
+            var inv = 1.0 / Float(numCodebooks)
+            vDSP_vsmul(base, 1, &inv, base, 1, vDSP_Length(dModel))
+        }
+    }
+
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Shared/MagpieMT19937.swift b/Sources/FluidAudio/TTS/Magpie/Shared/MagpieMT19937.swift
new file mode 100644
index 000000000..e63dca6f1
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Shared/MagpieMT19937.swift
@@ -0,0 +1,139 @@
+import Foundation
+
+/// NumPy-compatible Mersenne Twister (MT19937) RNG used to make Magpie sampling
+/// reproducible against `mobius/.../generate_coreml.py`.
+///
+/// NumPy's legacy `np.random.seed(seed)` initializes MT19937 via NumPy's
+/// `rk_seed(unsigned long)` which is equivalent to the Matsumoto reference's
+/// `init_genrand(seed)` — *not* `init_by_array`. Uniform doubles come from
+/// `genrand_res53` (combine two 32-bit ints → 53-bit fraction), and
+/// `np.random.choice(n, p=probs)` is: cumulative-sum → normalize → uniform
+/// draw → searchsorted side='right'. This file mirrors all three.
+///
+/// References:
+/// - Original Mersenne Twister C reference (Matsumoto/Nishimura 2002).
+/// - NumPy `randomkit.c` `rk_seed` (the path `np.random.seed(int)` uses).
+public final class MagpieMT19937: RandomNumberGenerator {
+
+    private static let n = 624
+    private static let m = 397
+    private static let upperMask: UInt32 = 0x8000_0000
+    private static let lowerMask: UInt32 = 0x7FFF_FFFF
+    private static let matrixA: UInt32 = 0x9908_B0DF
+
+    private var mt: [UInt32] = Array(repeating: 0, count: MagpieMT19937.n)
+    private var mti: Int = MagpieMT19937.n
+
+    /// Seed the generator with a single 32-bit integer (matches NumPy's
+    /// `np.random.seed(seed)` for `0 ≤ seed < 2^32`).
+    public init(seed: UInt32) {
+        initGenrand(seed)
+    }
+
+    // MARK: - Seeding
+
+    /// Mirrors Matsumoto's `init_genrand(s)` and NumPy's `rk_seed(s)`:
+    ///   `mt[0] = s; mt[i] = 1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i`.
+    private func initGenrand(_ s: UInt32) {
+        mt[0] = s
+        for i in 1..<Self.n {
+            mt[i] = 1_812_433_253 &* (mt[i - 1] ^ (mt[i - 1] >> 30)) &+ UInt32(i)
+        }
+        mti = Self.n
+    }
+
+    // MARK: - Core generation
+
+    /// One 32-bit unsigned draw. Refills the state vector when exhausted.
+    public func genrandInt32() -> UInt32 {
+        if mti >= Self.n {
+            let mag01: [UInt32] = [0, Self.matrixA]
+            var kk = 0
+            while kk < Self.n - Self.m {
+                let y = (mt[kk] & Self.upperMask) | (mt[kk + 1] & Self.lowerMask)
+                mt[kk] = mt[kk + Self.m] ^ (y >> 1) ^ mag01[Int(y & 1)]
+                kk += 1
+            }
+            while kk < Self.n - 1 {
+                let y = (mt[kk] & Self.upperMask) | (mt[kk + 1] & Self.lowerMask)
+                mt[kk] = mt[kk &+ (Self.m - Self.n)] ^ (y >> 1) ^ mag01[Int(y & 1)]
+                kk += 1
+            }
+            let yLast = (mt[Self.n - 1] & Self.upperMask) | (mt[0] & Self.lowerMask)
+            mt[Self.n - 1] = mt[Self.m - 1] ^ (yLast >> 1) ^ mag01[Int(yLast & 1)]
+            mti = 0
+        }
+        var y = mt[mti]
+        mti += 1
+        // Tempering.
+        y ^= (y >> 11)
+        y ^= (y << 7) & 0x9D2C_5680
+        y ^= (y << 15) & 0xEFC6_0000
+        y ^= (y >> 18)
+        return y
+    }
+
+    /// 53-bit precision uniform draw in `[0, 1)` (matches `genrand_res53`).
+    public func uniformDouble() -> Double {
+        let a = UInt64(genrandInt32() >> 5)  // 27 bits
+        let b = UInt64(genrandInt32() >> 6)  // 26 bits
+        return (Double(a) * 67_108_864.0 + Double(b)) * (1.0 / 9_007_199_254_740_992.0)
+    }
+
+    // MARK: - Swift `RandomNumberGenerator` conformance
+
+    /// Provided for Swift API parity. `Float.random(in:using:)` etc. work, but
+    /// will NOT match NumPy's `random_sample()` because Swift's stdlib
+    /// converts the 64-bit integer to a Double via a different masking path.
+    /// Use `uniformDouble()` for NumPy parity.
+    public func next() -> UInt64 {
+        let lo = UInt64(genrandInt32())
+        let hi = UInt64(genrandInt32())
+        return (hi << 32) | lo
+    }
+}
+
+// MARK: - NumPy-compatible weighted choice
+
+extension MagpieMT19937 {
+
+    /// Reproduces `np.random.choice(len(probs), p=probs)`.
+    ///
+    /// NumPy normalizes `probs`, computes `cdf = cumsum / cdf[-1]`, draws one
+    /// uniform double via `random_sample()`, and returns
+    /// `cdf.searchsorted(u, side='right')` — the first index where `cdf[i] > u`.
+    ///
+    /// `probs` may contain `-Float.infinity`-derived zeros (after softmax
+    /// already eliminated forbidden / non-top-k logits). Negative weights are
+    /// clamped to 0 to match NumPy's `choice`, which raises on negative inputs
+    /// — callers should pre-mask.
+    public func numpyChoice(probs: [Double]) -> Int {
+        precondition(!probs.isEmpty, "numpyChoice requires non-empty probability vector")
+        // Cumulative sum (no normalization in-place; we compare u * total).
+        var cdf = [Double](repeating: 0, count: probs.count)
+        var total: Double = 0
+        for i in 0..<probs.count {
+            let p = probs[i] > 0 ? probs[i] : 0
+            total += p
+            cdf[i] = total
+        }
+        if total <= 0 {
+            return probs.count - 1
+        }
+        let u = uniformDouble() * total
+        // `np.searchsorted(side='right')` ≡ first idx where cdf[idx] > u.
+        var lo = 0
+        var hi = cdf.count
+        while lo < hi {
+            let mid = (lo &+ hi) >> 1
+            if cdf[mid] > u {
+                hi = mid
+            } else {
+                lo = mid + 1
+            }
+        }
+        // Clamp to last valid index (handles the rare floating-point case
+        // where every cdf[i] <= u due to rounding).
+        return Swift.min(lo, probs.count - 1)
+    }
+}
diff --git a/Sources/FluidAudio/TTS/Magpie/Shared/NpyReader.swift b/Sources/FluidAudio/TTS/Magpie/Shared/NpyReader.swift
new file mode 100644
index 000000000..47a416f30
--- /dev/null
+++ b/Sources/FluidAudio/TTS/Magpie/Shared/NpyReader.swift
@@ -0,0 +1,278 @@
+import Foundation
+
+/// Minimal NumPy `.npy` (format 1.0) loader.
+///
+/// Magpie ships its tensor constants (speaker embeddings, audio codebook embeddings,
+/// local-transformer weights) as `.npy` files. We only need to read them once at
+/// load time into a flat `[Float]` in fp32, so the reader supports exactly the
+/// dtypes the Python exporter emits: `<f2` (fp16), `<f4` (fp32), and `<i4` (int32).
+///
+/// The NPY format spec is trivial: magic + version + header (Python literal dict)
+/// + raw little-endian data in C-order. We do not support Fortran-order or
+/// structured dtypes — they would be hidden bugs in the exporter, not features.
+public enum NpyReader {
+
+    public enum DType {
+        case float16
+        case float32
+        case int32
+
+        public var bytesPerElement: Int {
+            switch self {
+            case .float16: return 2
+            case .float32: return 4
+            case .int32: return 4
+            }
+        }
+    }
+
+    public struct Array {
+        public let shape: [Int]
+        public let dtype: DType
+        public let data: [Float]  // always converted to fp32 for ease of consumption
+
+        public var count: Int { data.count }
+
+        public func assertShape(_ expected: [Int], label: String) throws {
+            if shape != expected {
+                throw MagpieError.invalidNpyFile(
+                    path: label,
+                    reason: "expected shape \(expected), got \(shape)"
+                )
+            }
+        }
+    }
+
+    public static func read(from url: URL) throws -> Array {
+        let data = try Data(contentsOf: url, options: [.mappedIfSafe])
+        return try parse(data: data, sourceLabel: url.lastPathComponent)
+    }
+
+    public static func parse(data: Data, sourceLabel: String) throws -> Array {
+        guard data.count >= 10 else {
+            throw MagpieError.invalidNpyFile(path: sourceLabel, reason: "file too small")
+        }
+
+        // Magic: \x93NUMPY
+        let magic: [UInt8] = [0x93, 0x4E, 0x55, 0x4D, 0x50, 0x59]
+        for (i, expected) in magic.enumerated() where data[i] != expected {
+            throw MagpieError.invalidNpyFile(path: sourceLabel, reason: "bad magic byte \(i)")
+        }
+
+        let major = data[6]
+        let minor = data[7]
+        let headerLen: Int
+        let headerStart: Int
+        if major == 1 {
+            _ = minor
+            let low = Int(data[8])
+            let high = Int(data[9])
+            headerLen = low | (high << 8)
+            headerStart = 10
+        } else if major == 2 || major == 3 {
+            guard data.count >= 12 else {
+                throw MagpieError.invalidNpyFile(path: sourceLabel, reason: "truncated v2 header")
+            }
+            let b0 = Int(data[8])
+            let b1 = Int(data[9])
+            let b2 = Int(data[10])
+            let b3 = Int(data[11])
+            headerLen = b0 | (b1 << 8) | (b2 << 16) | (b3 << 24)
+            headerStart = 12
+        } else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "unsupported NPY version \(major).\(minor)")
+        }
+
+        let headerEnd = headerStart + headerLen
+        guard headerEnd <= data.count else {
+            throw MagpieError.invalidNpyFile(path: sourceLabel, reason: "header out of range")
+        }
+
+        guard let header = String(data: data.subdata(in: headerStart..<headerEnd), encoding: .ascii)
+        else {
+            throw MagpieError.invalidNpyFile(path: sourceLabel, reason: "non-ASCII header")
+        }
+
+        let (dtype, shape, fortranOrder) = try parseHeader(header, sourceLabel: sourceLabel)
+        if fortranOrder {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "Fortran-order arrays are not supported")
+        }
+
+        let elementCount = shape.reduce(1, *)
+        let payloadBytes = elementCount * dtype.bytesPerElement
+        let payloadStart = headerEnd
+        guard payloadStart + payloadBytes == data.count else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel,
+                reason: "payload size mismatch (expected \(payloadBytes), file has \(data.count - payloadStart))"
+            )
+        }
+
+        let floats = try convertToFloat32(
+            data: data, offset: payloadStart, count: elementCount, dtype: dtype,
+            sourceLabel: sourceLabel)
+        return Array(shape: shape, dtype: dtype, data: floats)
+    }
+
+    // MARK: - Header parsing
+
+    private static func parseHeader(
+        _ header: String, sourceLabel: String
+    ) throws -> (
+        DType, [Int], Bool
+    ) {
+        // Header is a Python dict literal, padded with spaces and terminated by '\n'.
+        // Example: {'descr': '<f4', 'fortran_order': False, 'shape': (256, 768), }
+        let dtype = try extractString(key: "descr", in: header, sourceLabel: sourceLabel)
+        let fortran = try extractBool(key: "fortran_order", in: header, sourceLabel: sourceLabel)
+        let shape = try extractShape(in: header, sourceLabel: sourceLabel)
+
+        let parsedDtype: DType
+        switch dtype {
+        case "<f2", "|f2", "=f2":
+            parsedDtype = .float16
+        case "<f4", "|f4", "=f4":
+            parsedDtype = .float32
+        case "<i4", "|i4", "=i4":
+            parsedDtype = .int32
+        default:
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "unsupported dtype '\(dtype)'")
+        }
+        return (parsedDtype, shape, fortran)
+    }
+
+    private static func extractString(
+        key: String, in header: String, sourceLabel: String
+    ) throws
+        -> String
+    {
+        guard let range = header.range(of: "'\(key)'") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing header key '\(key)'")
+        }
+        let rest = header[range.upperBound...]
+        guard let openQuote = rest.firstIndex(of: "'") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing value for '\(key)'")
+        }
+        let afterOpen = rest.index(after: openQuote)
+        guard let closeQuote = rest[afterOpen...].firstIndex(of: "'") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "unterminated value for '\(key)'")
+        }
+        return String(rest[afterOpen..<closeQuote])
+    }
+
+    private static func extractBool(
+        key: String, in header: String, sourceLabel: String
+    ) throws
+        -> Bool
+    {
+        guard let range = header.range(of: "'\(key)'") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing header key '\(key)'")
+        }
+        let rest = header[range.upperBound...]
+        if rest.range(of: "True") != nil,
+            let trueIdx = rest.range(of: "True")?.lowerBound,
+            let falseIdx = rest.range(of: "False")?.lowerBound
+        {
+            return trueIdx < falseIdx
+        }
+        if rest.range(of: "True") != nil { return true }
+        if rest.range(of: "False") != nil { return false }
+        throw MagpieError.invalidNpyFile(
+            path: sourceLabel, reason: "missing bool value for '\(key)'")
+    }
+
+    private static func extractShape(in header: String, sourceLabel: String) throws -> [Int] {
+        guard let shapeRange = header.range(of: "'shape'") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing 'shape' key")
+        }
+        let rest = header[shapeRange.upperBound...]
+        guard let openIdx = rest.firstIndex(of: "(") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing '(' in shape")
+        }
+        let afterOpen = rest.index(after: openIdx)
+        guard let closeIdx = rest[afterOpen...].firstIndex(of: ")") else {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "missing ')' in shape")
+        }
+        let inside = String(rest[afterOpen..<closeIdx])
+        // Handles "(N,)" and "(N, M)" and "(N, M, K, )"
+        let dims = inside.split(separator: ",").compactMap {
+            Int($0.trimmingCharacters(in: .whitespaces))
+        }
+        if dims.isEmpty {
+            throw MagpieError.invalidNpyFile(
+                path: sourceLabel, reason: "could not parse shape '\(inside)'")
+        }
+        return dims
+    }
+
+    // MARK: - Dtype conversion
+
+    private static func convertToFloat32(
+        data: Data, offset: Int, count: Int, dtype: DType, sourceLabel: String
+    ) throws -> [Float] {
+        let payloadRange = offset..<(offset + count * dtype.bytesPerElement)
+        let slice = data.subdata(in: payloadRange)
+
+        switch dtype {
+        case .float32:
+            return slice.withUnsafeBytes { raw -> [Float] in
+                let ptr = raw.bindMemory(to: Float.self)
+                return Swift.Array<Float>(ptr)
+            }
+        case .float16:
+            return slice.withUnsafeBytes { raw -> [Float] in
+                let ptr = raw.bindMemory(to: UInt16.self)
+                return ptr.map { Self.float16ToFloat32(bits: $0) }
+            }
+        case .int32:
+            return slice.withUnsafeBytes { raw -> [Float] in
+                let ptr = raw.bindMemory(to: Int32.self)
+                return ptr.map { Float($0) }
+            }
+        }
+    }
+
+    /// Convert IEEE-754 binary16 bits to Float32. Pure Swift (no Accelerate
+    /// dependency) so tests can run without Darwin-specific guards.
+    @inline(__always)
+    static func float16ToFloat32(bits: UInt16) -> Float {
+        let sign = UInt32(bits & 0x8000) << 16
+        let exp = UInt32((bits & 0x7C00) >> 10)
+        let mant = UInt32(bits & 0x03FF)
+        var result: UInt32
+
+        if exp == 0 {
+            if mant == 0 {
+                result = sign
+            } else {
+                // Subnormal: normalize.
+                var e: UInt32 = 127 - 15 + 1
+                var m = mant
+                while (m & 0x0400) == 0 {
+                    m <<= 1
+                    e -= 1
+                }
+                m &= 0x03FF
+                result = sign | (e << 23) | (m << 13)
+            }
+        } else if exp == 0x1F {
+            // Inf / NaN
+            result = sign | 0x7F80_0000 | (mant << 13)
+        } else {
+            let newExp = UInt32(Int(exp) - 15 + 127)
+            result = sign | (newExp << 23) | (mant << 13)
+        }
+
+        return Float(bitPattern: result)
+    }
+}
diff --git a/Sources/FluidAudioCLI/Commands/MagpieCommand.swift b/Sources/FluidAudioCLI/Commands/MagpieCommand.swift
new file mode 100644
index 000000000..3303ad614
--- /dev/null
+++ b/Sources/FluidAudioCLI/Commands/MagpieCommand.swift
@@ -0,0 +1,522 @@
+#if os(macOS)
+import CoreML
+import FluidAudio
+import Foundation
+
+/// CLI surface for the Magpie TTS Multilingual Swift port.
+///
+/// Subcommands:
+///   - `download`             Fetch models + constants + tokenizer data from HuggingFace.
+///   - `text`                 Synthesize text → WAV.
+///   - `bench`                Multi-shot in-process synthesis benchmark.
+public enum MagpieCommand {
+
+    private static let logger = AppLogger(category: "MagpieCommand")
+
+    public static func run(arguments: [String]) async {
+        guard let sub = arguments.first else {
+            printUsage()
+            return
+        }
+        let rest = Array(arguments.dropFirst())
+        switch sub {
+        case "download":
+            await runDownload(arguments: rest)
+        case "text":
+            await runText(arguments: rest)
+        case "bench":
+            await runBench(arguments: rest)
+        case "help", "--help", "-h":
+            printUsage()
+        default:
+            logger.error("Unknown magpie subcommand: \(sub)")
+            printUsage()
+            exit(1)
+        }
+    }
+
+    // MARK: - download
+
+    private static func runDownload(arguments: [String]) async {
+        var languageCodes: [String] = ["en"]
+        var i = 0
+        while i < arguments.count {
+            let arg = arguments[i]
+            if arg == "--languages" || arg == "-l", i + 1 < arguments.count {
+                languageCodes = arguments[i + 1].split(separator: ",").map(String.init)
+                i += 1
+            }
+            i += 1
+        }
+        let langs: Set<MagpieLanguage> = Set(languageCodes.compactMap { MagpieLanguage(rawValue: $0) })
+        if langs.isEmpty {
+            logger.error("No valid language codes provided")
+            exit(1)
+        }
+        do {
+            let repoDir = try await MagpieResourceDownloader.ensureAssets(languages: langs)
+            logger.info("Magpie assets ready at: \(repoDir.path)")
+        } catch {
+            logger.error("Magpie download failed: \(error.localizedDescription)")
+            exit(1)
+        }
+    }
+
+    // MARK: - text
+
+    private static func runText(arguments: [String]) async {
+        var text: String? = nil
+        var output = "magpie.wav"
+        var speakerIdx = MagpieSpeaker.john.rawValue
+        var languageCode = "en"
+        var cfg: Float = MagpieConstants.defaultCfgScale
+        var topK = MagpieConstants.defaultTopK
+        var temperature = MagpieConstants.defaultTemperature
+        var seed: UInt64? = nil
+        var allowIpa = true
+        var streaming = false
+
+        var i = 0
+        while i < arguments.count {
+            let arg = arguments[i]
+            switch arg {
+            case "--output", "-o":
+                if i + 1 < arguments.count {
+                    output = arguments[i + 1]
+                    i += 1
+                }
+            case "--speaker":
+                if i + 1 < arguments.count, let idx = Int(arguments[i + 1]) {
+                    speakerIdx = idx
+                    i += 1
+                }
+            case "--language", "-L":
+                if i + 1 < arguments.count {
+                    languageCode = arguments[i + 1]
+                    i += 1
+                }
+            case "--cfg":
+                if i + 1 < arguments.count, let v = Float(arguments[i + 1]) {
+                    cfg = v
+                    i += 1
+                }
+            case "--topk":
+                if i + 1 < arguments.count, let v = Int(arguments[i + 1]) {
+                    topK = v
+                    i += 1
+                }
+            case "--temperature":
+                if i + 1 < arguments.count, let v = Float(arguments[i + 1]) {
+                    temperature = v
+                    i += 1
+                }
+            case "--seed":
+                if i + 1 < arguments.count, let v = UInt64(arguments[i + 1]) {
+                    seed = v
+                    i += 1
+                }
+            case "--no-ipa-override":
+                allowIpa = false
+            case "--stream":
+                streaming = true
+            case "--text":
+                if i + 1 < arguments.count {
+                    text = arguments[i + 1]
+                    i += 1
+                }
+            default:
+                if text == nil { text = arg }
+            }
+            i += 1
+        }
+
+        guard let text = text, !text.isEmpty else {
+            logger.error("Missing text argument")
+            printUsage()
+            exit(1)
+        }
+        guard let speaker = MagpieSpeaker(rawValue: speakerIdx) else {
+            logger.error("Invalid speaker index \(speakerIdx); valid range 0..<\(MagpieConstants.numSpeakers)")
+            exit(1)
+        }
+        guard let language = MagpieLanguage(rawValue: languageCode) else {
+            logger.error("Invalid language code '\(languageCode)'")
+            exit(1)
+        }
+
+        do {
+            let manager = try await MagpieTtsManager.downloadAndCreate(languages: [language])
+            let opts = MagpieSynthesisOptions(
+                temperature: temperature,
+                topK: topK,
+                maxSteps: MagpieConstants.maxSteps,
+                minFrames: MagpieConstants.minFrames,
+                cfgScale: cfg,
+                seed: seed,
+                peakNormalize: true,
+                allowIpaOverride: allowIpa)
+            let outURL = URL(fileURLWithPath: output)
+            try FileManager.default.createDirectory(
+                at: outURL.deletingLastPathComponent(), withIntermediateDirectories: true)
+
+            if streaming {
+                try await runStreaming(
+                    manager: manager, text: text, speaker: speaker,
+                    language: language, options: opts, outURL: outURL)
+            } else {
+                let start = Date()
+                let result = try await manager.synthesize(
+                    text: text, speaker: speaker, language: language, options: opts)
+                let elapsed = Date().timeIntervalSince(start)
+
+                let wav = try AudioWAV.data(
+                    from: result.samples,
+                    sampleRate: Double(result.sampleRate))
+                try wav.write(to: outURL)
+
+                let audioSecs = result.durationSeconds
+                let rtfx = elapsed > 0 ? audioSecs / elapsed : 0
+                let t = result.timings
+                let stepCount = result.codeCount > 0 ? result.codeCount : 1
+                let perStepDecoderMs = t.decoderStepSeconds * 1000.0 / Double(stepCount)
+                let perStepSamplerMs = t.samplerSeconds * 1000.0 / Double(stepCount)
+                let lines = [
+                    "Magpie synthesis complete",
+                    "  Speaker: \(speaker.displayName), Language: \(language.rawValue)",
+                    "  Codes: \(result.codeCount), EOS: \(result.finishedOnEos)",
+                    "  Audio: \(String(format: "%.3f", audioSecs))s, "
+                        + "Synthesis: \(String(format: "%.3f", elapsed))s, "
+                        + "RTFx: \(String(format: "%.2f", rtfx))x",
+                    "  Stages:",
+                    "    text_encoder: \(String(format: "%.0f", t.textEncoderSeconds * 1000))ms",
+                    "    prefill:      \(String(format: "%.0f", t.prefillSeconds * 1000))ms",
+                    "    AR loop:      \(String(format: "%.2f", t.arLoopSeconds))s "
+                        + "(decoder=\(String(format: "%.2f", t.decoderStepSeconds))s "
+                        + "@ \(String(format: "%.1f", perStepDecoderMs))ms/step, "
+                        + "sampler=\(String(format: "%.2f", t.samplerSeconds))s "
+                        + "@ \(String(format: "%.1f", perStepSamplerMs))ms/step)",
+                    "    nanocodec:    \(String(format: "%.0f", t.nanocodecSeconds * 1000))ms",
+                    "  Output: \(outURL.path)",
+                ]
+                FileHandle.standardError.write(Data((lines.joined(separator: "\n") + "\n").utf8))
+            }
+        } catch {
+            logger.error("Magpie synthesis failed: \(error.localizedDescription)")
+            exit(1)
+        }
+    }
+
+    /// Streaming mode: consume `synthesizeStream` chunk-by-chunk, log
+    /// time-to-first-audio + per-chunk arrival times, then write the
+    /// concatenated waveform to `outURL` so the produced audio is comparable
+    /// to the offline path.
+    private static func runStreaming(
+        manager: MagpieTtsManager,
+        text: String,
+        speaker: MagpieSpeaker,
+        language: MagpieLanguage,
+        options: MagpieSynthesisOptions,
+        outURL: URL
+    ) async throws {
+        FileHandle.standardError.write(
+            Data("Magpie streaming synthesis (chunk-level)\n".utf8))
+        FileHandle.standardError.write(
+            Data(
+                "  Speaker: \(speaker.displayName), Language: \(language.rawValue)\n"
+                    .utf8))
+
+        let stream = try await manager.synthesizeStream(
+            text: text, speaker: speaker, language: language, options: options)
+
+        let start = Date()
+        var combined: [Float] = []
+        var ttfa: Double? = nil
+        var chunkCount = 0
+        var totalCodes = 0
+        var sampleRate = MagpieConstants.audioSampleRate
+
+        for try await chunk in stream {
+            let now = Date().timeIntervalSince(start)
+            if ttfa == nil {
+                ttfa = now
+                let ttfaLine =
+                    "  TTFA: \(String(format: "%.3f", now))s "
+                    + "(first chunk of \(chunk.codeCount) codes "
+                    + "= \(String(format: "%.2f", chunk.durationSeconds))s audio)\n"
+                FileHandle.standardError.write(Data(ttfaLine.utf8))
+            }
+            chunkCount += 1
+            totalCodes += chunk.codeCount
+            sampleRate = chunk.sampleRate
+            let preview =
+                chunk.text.count > 60
+                ? String(chunk.text.prefix(57)) + "..." : chunk.text
+            let line =
+                "    [chunk \(chunk.sequenceIndex)] +\(String(format: "%.3f", now))s "
+                + "audio=\(String(format: "%.2f", chunk.durationSeconds))s "
+                + "codes=\(chunk.codeCount) "
+                + "eos=\(chunk.finishedOnEos) "
+                + "final=\(chunk.isFinal) "
+                + "\"\(preview)\"\n"
+            FileHandle.standardError.write(Data(line.utf8))
+            combined.append(contentsOf: chunk.samples)
+        }
+        let elapsed = Date().timeIntervalSince(start)
+
+        // Optional peak-normalize once we have the full buffer (matches the
+        // offline default).
+        if options.peakNormalize {
+            var peak: Float = 0
+            for s in combined where abs(s) > peak { peak = abs(s) }
+            if peak > 0 {
+                let scale = MagpieConstants.peakTarget / peak
+                for i in 0..<combined.count { combined[i] *= scale }
+            }
+        }
+
+        let wav = try AudioWAV.data(
+            from: combined, sampleRate: Double(sampleRate))
+        try wav.write(to: outURL)
+
+        let audioSecs = Double(combined.count) / Double(sampleRate)
+        let rtfx = elapsed > 0 ? audioSecs / elapsed : 0
+        let summary = [
+            "  Chunks: \(chunkCount), Codes: \(totalCodes)",
+            "  TTFA: \(String(format: "%.3f", ttfa ?? 0))s "
+                + "(\(String(format: "%.0f", (ttfa ?? 0) * 1000))ms)",
+            "  Audio: \(String(format: "%.3f", audioSecs))s, "
+                + "Total synthesis: \(String(format: "%.3f", elapsed))s, "
+                + "RTFx: \(String(format: "%.2f", rtfx))x",
+            "  Output: \(outURL.path)",
+        ]
+        FileHandle.standardError.write(Data((summary.joined(separator: "\n") + "\n").utf8))
+    }
+
+    // MARK: - bench
+
+    /// Multi-shot in-process synthesis bench. Loads the manager once, then runs
+    /// `--runs N` synthesize() calls back-to-back on the same actor and reports
+    /// per-run + median + min/max RTFx and per-stage statistics. This bypasses
+    /// the launch-to-launch Metal scheduler variance you get from invoking
+    /// `magpie text` in a loop from the shell.
+    private static func runBench(arguments: [String]) async {
+        var text =
+            "Hello world. This is a test of the Magpie text to speech system, "
+            + "running on Apple Silicon with the Swift port."
+        var speakerIdx = MagpieSpeaker.john.rawValue
+        var languageCode = "en"
+        var runs = 5
+        var warmup = 1
+        var seed: UInt64? = 42
+
+        var i = 0
+        while i < arguments.count {
+            let arg = arguments[i]
+            switch arg {
+            case "--text":
+                if i + 1 < arguments.count {
+                    text = arguments[i + 1]
+                    i += 1
+                }
+            case "--runs":
+                if i + 1 < arguments.count, let v = Int(arguments[i + 1]), v > 0 {
+                    runs = v
+                    i += 1
+                }
+            case "--warmup":
+                if i + 1 < arguments.count, let v = Int(arguments[i + 1]), v >= 0 {
+                    warmup = v
+                    i += 1
+                }
+            case "--speaker":
+                if i + 1 < arguments.count, let v = Int(arguments[i + 1]) {
+                    speakerIdx = v
+                    i += 1
+                }
+            case "--language", "-L":
+                if i + 1 < arguments.count {
+                    languageCode = arguments[i + 1]
+                    i += 1
+                }
+            case "--seed":
+                if i + 1 < arguments.count, let v = UInt64(arguments[i + 1]) {
+                    seed = v
+                    i += 1
+                }
+            case "--no-seed":
+                seed = nil
+            default:
+                break
+            }
+            i += 1
+        }
+
+        guard let speaker = MagpieSpeaker(rawValue: speakerIdx) else {
+            logger.error("Invalid speaker index \(speakerIdx)")
+            exit(1)
+        }
+        guard let language = MagpieLanguage(rawValue: languageCode) else {
+            logger.error("Invalid language code '\(languageCode)'")
+            exit(1)
+        }
+
+        do {
+            let loadStart = Date()
+            let manager = try await MagpieTtsManager.downloadAndCreate(languages: [language])
+            let loadElapsed = Date().timeIntervalSince(loadStart)
+
+            let opts = MagpieSynthesisOptions(
+                seed: seed,
+                peakNormalize: true,
+                allowIpaOverride: true)
+
+            var header = [
+                "Magpie bench",
+                "  Text: \"\(text)\" (\(text.count) chars)",
+                "  Speaker: \(speaker.displayName), Language: \(language.rawValue)",
+                "  Seed: \(seed.map { String($0) } ?? "random")",
+                "  Manager load: \(String(format: "%.2f", loadElapsed))s",
+                "  Warmup runs: \(warmup), Measured runs: \(runs)",
+            ]
+            if warmup > 0 { header.append("") }
+            FileHandle.standardError.write(Data((header.joined(separator: "\n") + "\n").utf8))
+
+            for w in 0..<warmup {
+                let r = try await manager.synthesize(
+                    text: text, speaker: speaker, language: language, options: opts)
+                let line =
+                    "  [warmup \(w + 1)/\(warmup)] codes=\(r.codeCount) "
+                    + "decoder=\(String(format: "%.2f", r.timings.decoderStepSeconds))s "
+                    + "nano=\(String(format: "%.2f", r.timings.nanocodecSeconds))s"
+                FileHandle.standardError.write(Data((line + "\n").utf8))
+            }
+
+            // Measured runs.
+            var rtfxs: [Double] = []
+            var totals: [Double] = []
+            var encoders: [Double] = []
+            var prefills: [Double] = []
+            var arLoops: [Double] = []
+            var decoders: [Double] = []
+            var samplers: [Double] = []
+            var nanocodecs: [Double] = []
+            var perStepDecoderMs: [Double] = []
+            var codeCounts: [Int] = []
+
+            FileHandle.standardError.write(Data("\n  per-run results:\n".utf8))
+            for run in 0..<runs {
+                let start = Date()
+                let r = try await manager.synthesize(
+                    text: text, speaker: speaker, language: language, options: opts)
+                let elapsed = Date().timeIntervalSince(start)
+                let audio = r.durationSeconds
+                let rtfx = elapsed > 0 ? audio / elapsed : 0
+                let steps = max(r.codeCount, 1)
+                let perStepMs = r.timings.decoderStepSeconds * 1000.0 / Double(steps)
+
+                rtfxs.append(rtfx)
+                totals.append(elapsed)
+                encoders.append(r.timings.textEncoderSeconds)
+                prefills.append(r.timings.prefillSeconds)
+                arLoops.append(r.timings.arLoopSeconds)
+                decoders.append(r.timings.decoderStepSeconds)
+                samplers.append(r.timings.samplerSeconds)
+                nanocodecs.append(r.timings.nanocodecSeconds)
+                perStepDecoderMs.append(perStepMs)
+                codeCounts.append(r.codeCount)
+
+                let line =
+                    "    [\(run + 1)/\(runs)] "
+                    + "RTFx=\(String(format: "%.2f", rtfx))x "
+                    + "synth=\(String(format: "%.2f", elapsed))s "
+                    + "audio=\(String(format: "%.2f", audio))s "
+                    + "codes=\(r.codeCount) "
+                    + "decoder=\(String(format: "%.2f", r.timings.decoderStepSeconds))s "
+                    + "(\(String(format: "%.1f", perStepMs))ms/step) "
+                    + "nano=\(String(format: "%.2f", r.timings.nanocodecSeconds))s"
+                FileHandle.standardError.write(Data((line + "\n").utf8))
+            }
+
+            // Summary stats.
+            func stats(_ xs: [Double]) -> (median: Double, min: Double, max: Double, mean: Double) {
+                let s = xs.sorted()
+                let median = s.isEmpty ? 0 : s[s.count / 2]
+                let mean = xs.isEmpty ? 0 : xs.reduce(0, +) / Double(xs.count)
+                return (median, s.first ?? 0, s.last ?? 0, mean)
+            }
+            let rs = stats(rtfxs)
+            let ts = stats(totals)
+            let ds = stats(decoders)
+            let ns = stats(nanocodecs)
+            let ps = stats(perStepDecoderMs)
+
+            let summary = [
+                "",
+                "  summary (n=\(runs)):",
+                "    RTFx          median=\(String(format: "%.2f", rs.median))x  "
+                    + "min=\(String(format: "%.2f", rs.min))x  "
+                    + "max=\(String(format: "%.2f", rs.max))x  "
+                    + "mean=\(String(format: "%.2f", rs.mean))x",
+                "    synth         median=\(String(format: "%.2f", ts.median))s  "
+                    + "min=\(String(format: "%.2f", ts.min))s  "
+                    + "max=\(String(format: "%.2f", ts.max))s",
+                "    decoder_step  median=\(String(format: "%.2f", ds.median))s  "
+                    + "min=\(String(format: "%.2f", ds.min))s  "
+                    + "max=\(String(format: "%.2f", ds.max))s  "
+                    + "(\(String(format: "%.1f", ps.median))ms/step median)",
+                "    nanocodec     median=\(String(format: "%.2f", ns.median))s  "
+                    + "min=\(String(format: "%.2f", ns.min))s  "
+                    + "max=\(String(format: "%.2f", ns.max))s",
+            ]
+            FileHandle.standardError.write(Data((summary.joined(separator: "\n") + "\n").utf8))
+        } catch {
+            logger.error("Magpie bench failed: \(error.localizedDescription)")
+            exit(1)
+        }
+    }
+
+    // MARK: - usage
+
+    private static func printUsage() {
+        logger.info(
+            """
+            Usage: fluidaudio magpie <subcommand> [options]
+
+            ⚠️  EXPERIMENTAL — quite slow on Apple Silicon, needs further perf work.
+                Cold first synth ~30 s (model load + ANE compile). Warm synth ~96 s
+                wall for an 8-word English sentence on M-series (RTFx ≈ 0.04, i.e.
+                ~25× slower than realtime). For real-time use prefer
+                `fluidaudio tts` (Kokoro, ~20× RTFx) or PocketTTS (~1.5–2× RTFx).
+                See Documentation/TTS/Magpie.md.
+
+            Subcommands:
+              download                Download Magpie models + constants + tokenizers
+                --languages en,es,de    Comma-separated language codes (default: en)
+
+              text "<text>"           Synthesize text and write a WAV file
+                --output, -o PATH       Output WAV path (default: magpie.wav)
+                --speaker N             Speaker index 0-4 (default: 0 = John)
+                --language CODE         Language code (en, es, de, fr, it, vi, zh, hi)
+                --cfg FLOAT             CFG guidance scale (default: 1.0 = off)
+                --topk N                Top-K sampling (default: 80)
+                --temperature FLOAT     Sampling temperature (default: 0.6)
+                --seed N                Deterministic RNG seed
+                --no-ipa-override       Disable `|…|` IPA pass-through
+
+              bench                   In-process multi-shot synthesis benchmark
+                --runs N                Measured runs (default: 5)
+                --warmup N              Unmeasured warmup runs (default: 1)
+                --text "<text>"         Override the bench text
+                --speaker N             Speaker index (default: 0)
+                --language CODE         Language (default: en)
+                --seed N                Deterministic seed (default: 42)
+                --no-seed               Use a random seed each run
+
+            IPA override example:
+              fluidaudio magpie text "Hello | ˈ n ɛ m o ʊ | Text." --output demo.wav
+
+            """
+        )
+    }
+}
+#endif
diff --git a/Sources/FluidAudioCLI/FluidAudioCLI.swift b/Sources/FluidAudioCLI/FluidAudioCLI.swift
index b6096d2d5..e6159aa3a 100644
--- a/Sources/FluidAudioCLI/FluidAudioCLI.swift
+++ b/Sources/FluidAudioCLI/FluidAudioCLI.swift
@@ -42,6 +42,8 @@ struct FluidAudioCLI {
             await MultiStreamCommand.run(arguments: Array(arguments.dropFirst(2)))
         case "tts":
             await TTS.run(arguments: Array(arguments.dropFirst(2)))
+        case "magpie":
+            await MagpieCommand.run(arguments: Array(arguments.dropFirst(2)))
         case "tts-asr-verify":
             await TTSAsrVerifyCommand.run(arguments: Array(arguments.dropFirst(2)))
         case "diarization-benchmark":
@@ -108,6 +110,7 @@ struct FluidAudioCLI {
                 transcribe              Transcribe audio file using streaming ASR
                 multi-stream            Transcribe multiple audio files in parallel
                 tts                     Synthesize speech from text using Kokoro TTS
+                magpie                  Magpie TTS Multilingual 357M (experimental, ~0.04 RTFx — slow, needs perf work)
                 tts-asr-verify          Batch TTS→ASR roundtrip WER verification
                 parakeet-eou            Run Parakeet EOU Streaming ASR on a single file
                 ctc-earnings-benchmark  Run CTC keyword spotting benchmark on Earnings22
diff --git a/Tests/FluidAudioTests/ASR/Parakeet/ModelNamesTests.swift b/Tests/FluidAudioTests/ASR/Parakeet/ModelNamesTests.swift
index 88aa15af0..ab5aa8d21 100644
--- a/Tests/FluidAudioTests/ASR/Parakeet/ModelNamesTests.swift
+++ b/Tests/FluidAudioTests/ASR/Parakeet/ModelNamesTests.swift
@@ -51,7 +51,13 @@ final class ModelNamesTests: XCTestCase {
         let validExtensions: Set<String> = [".mlmodelc", ".json", ".bin"]
         let validDirectories: Set<String> = ["constants_bin"]
 
-        for repo in Repo.allCases {
+        // `magpieTts` is intentionally excluded — it is the only repo that ships
+        // bare directory entries (`constants/`, `tokenizer/`) instead of files.
+        // It's a not-production-ready experimental backend; its directory layout
+        // is asserted in `MagpieConstantsTests` rather than the global whitelist.
+        let reposExcludedFromExtensionCheck: Set<Repo> = [.magpieTts]
+
+        for repo in Repo.allCases where !reposExcludedFromExtensionCheck.contains(repo) {
             let models = ModelNames.getRequiredModelNames(for: repo, variant: nil)
             for model in models {
                 let hasValidExtension = validExtensions.contains(where: { model.hasSuffix($0) })
diff --git a/Tests/FluidAudioTests/TTS/Magpie/MagpieConstantsTests.swift b/Tests/FluidAudioTests/TTS/Magpie/MagpieConstantsTests.swift
new file mode 100644
index 000000000..f4447dded
--- /dev/null
+++ b/Tests/FluidAudioTests/TTS/Magpie/MagpieConstantsTests.swift
@@ -0,0 +1,75 @@
+import XCTest
+
+@testable import FluidAudio
+
+final class MagpieConstantsTests: XCTestCase {
+
+    func testForbiddenTokenIdsExcludeEos() {
+        // The sampler masks these auxiliary tokens unconditionally; audioEosId is only
+        // masked during the first `minFrames` steps, so it must NOT be in the forbidden list.
+        XCTAssertFalse(
+            MagpieConstants.forbiddenAudioIds.contains(MagpieConstants.audioEosId),
+            "audioEosId must be sampleable outside the min-frames window"
+        )
+        XCTAssertTrue(
+            MagpieConstants.forbiddenAudioIds.contains(MagpieConstants.audioBosId),
+            "audioBosId should never be sampled"
+        )
+    }
+
+    func testShapeRelationships() {
+        XCTAssertEqual(MagpieConstants.dModel, MagpieConstants.numHeads * MagpieConstants.headDim)
+        XCTAssertGreaterThan(MagpieConstants.maxCacheLength, MagpieConstants.speakerContextLength)
+        XCTAssertEqual(MagpieConstants.numSpeakers, 5)
+    }
+
+    func testTokenizerNameMatchesNemoNaming() {
+        // These strings are required by the mobius exporter (see
+        // generate_coreml._tokenize_text); changing either side silently breaks parity.
+        XCTAssertEqual(MagpieTokenizerFiles.tokenizerName(for: .english), "english_phoneme")
+        XCTAssertEqual(MagpieTokenizerFiles.tokenizerName(for: .french), "french_chartokenizer")
+        XCTAssertEqual(MagpieTokenizerFiles.tokenizerName(for: .mandarin), "mandarin_phoneme")
+        XCTAssertEqual(MagpieTokenizerFiles.tokenizerName(for: .hindi), "hindi_chartokenizer")
+    }
+
+    func testTokenizerFilesCoverAllLanguages() {
+        for lang in MagpieLanguage.allCases {
+            let files = MagpieTokenizerFiles.files(for: lang)
+            XCTAssertFalse(
+                files.isEmpty,
+                "Expected at least one tokenizer file for \(lang.rawValue)")
+            XCTAssertTrue(
+                files.contains { $0.hasSuffix("_token2id.json") },
+                "Language \(lang.rawValue) must ship a token2id map")
+        }
+    }
+
+    /// Magpie is the only repo that ships bare directory entries in its
+    /// required-models set (the global `ModelNamesTests.testModelFileExtensions`
+    /// excludes it). Pin the contract here so the experimental backend's
+    /// layout stays valid without loosening assertions on production repos.
+    func testRequiredModelsLayout() {
+        let required = ModelNames.Magpie.requiredModels
+        let validExtensions: Set<String> = [".mlmodelc", ".json", ".bin"]
+        let allowedDirectories: Set<String> = [
+            ModelNames.Magpie.constantsDir,
+            ModelNames.Magpie.tokenizerDir,
+        ]
+
+        for entry in required {
+            let hasValidExtension = validExtensions.contains(where: { entry.hasSuffix($0) })
+            let isAllowedDirectory = allowedDirectories.contains(entry)
+            XCTAssertTrue(
+                hasValidExtension || isAllowedDirectory,
+                "Magpie required entry '\(entry)' must have a valid file extension or be an allowed directory"
+            )
+        }
+
+        // Sanity: the three core CoreML models + the constants directory are
+        // the minimum required for English synthesis.
+        XCTAssertTrue(required.contains(ModelNames.Magpie.textEncoderFile))
+        XCTAssertTrue(required.contains(ModelNames.Magpie.decoderStepFile))
+        XCTAssertTrue(required.contains(ModelNames.Magpie.nanocodecDecoderFile))
+        XCTAssertTrue(required.contains(ModelNames.Magpie.constantsDir))
+    }
+}
diff --git a/Tests/FluidAudioTests/TTS/Magpie/MagpieIpaOverrideTests.swift b/Tests/FluidAudioTests/TTS/Magpie/MagpieIpaOverrideTests.swift
new file mode 100644
index 000000000..ccccd9487
--- /dev/null
+++ b/Tests/FluidAudioTests/TTS/Magpie/MagpieIpaOverrideTests.swift
@@ -0,0 +1,53 @@
+import XCTest
+
+@testable import FluidAudio
+
+final class MagpieIpaOverrideTests: XCTestCase {
+
+    func testPlainText() {
+        let segments = MagpieIpaOverride.segment("Hello world")
+        XCTAssertEqual(segments, [.text("Hello world")])
+    }
+
+    func testEmptyInput() {
+        XCTAssertEqual(MagpieIpaOverride.segment(""), [])
+    }
+
+    func testSingleIpaRegion() {
+        let segments = MagpieIpaOverride.segment("Hello | ˈ n ɛ m o ʊ | end")
+        XCTAssertEqual(
+            segments,
+            [
+                .text("Hello "),
+                .ipa(tokens: ["ˈ", "n", "ɛ", "m", "o", "ʊ"]),
+                .text(" end"),
+            ])
+    }
+
+    func testMultipleIpaRegions() {
+        let segments = MagpieIpaOverride.segment("A |x y| B |z|")
+        XCTAssertEqual(
+            segments,
+            [
+                .text("A "),
+                .ipa(tokens: ["x", "y"]),
+                .text(" B "),
+                .ipa(tokens: ["z"]),
+            ])
+    }
+
+    func testEmptyIpaRegionCollapses() {
+        let segments = MagpieIpaOverride.segment("A || B")
+        XCTAssertEqual(segments, [.text("A "), .text(" B")])
+    }
+
+    func testUnpairedTrailingPipeBecomesText() {
+        let segments = MagpieIpaOverride.segment("A |stuck")
+        XCTAssertEqual(segments, [.text("A "), .text("|stuck")])
+    }
+
+    func testConsecutiveWhitespaceCollapses() {
+        let segments = MagpieIpaOverride.segment("|a   b|")
+        XCTAssertEqual(segments, [.ipa(tokens: ["a", "b"])])
+    }
+}
diff --git a/Tests/FluidAudioTests/TTS/Magpie/MagpieKvCacheTests.swift b/Tests/FluidAudioTests/TTS/Magpie/MagpieKvCacheTests.swift
new file mode 100644
index 000000000..28f1f28f9
--- /dev/null
+++ b/Tests/FluidAudioTests/TTS/Magpie/MagpieKvCacheTests.swift
@@ -0,0 +1,56 @@
+import CoreML
+import XCTest
+
+@testable import FluidAudio
+
+final class MagpieKvCacheTests: XCTestCase {
+
+    func testInitialShapeAndZeroPosition() throws {
+        let cache = try MagpieKvCache(
+            numLayers: MagpieConstants.numDecoderLayers,
+            maxCacheLength: MagpieConstants.maxCacheLength,
+            numHeads: MagpieConstants.numHeads,
+            headDim: MagpieConstants.headDim)
+
+        XCTAssertEqual(cache.cachesK.count, MagpieConstants.numDecoderLayers)
+        XCTAssertEqual(cache.cachesV.count, MagpieConstants.numDecoderLayers)
+        XCTAssertEqual(cache.positions.count, MagpieConstants.numDecoderLayers)
+        XCTAssertEqual(cache.position, 0)
+
+        // Rank-4 split-K/V layout: [1, T, H, D] per cache tensor.
+        let expectedShape: [NSNumber] = [
+            1,
+            NSNumber(value: MagpieConstants.maxCacheLength),
+            NSNumber(value: MagpieConstants.numHeads),
+            NSNumber(value: MagpieConstants.headDim),
+        ]
+        XCTAssertEqual(cache.cachesK[0].shape, expectedShape)
+        XCTAssertEqual(cache.cachesV[0].shape, expectedShape)
+        XCTAssertEqual(cache.positions[0].shape, [1])
+    }
+
+    func testAddInputsProvidesAllLayerKeys() throws {
+        let cache = try MagpieKvCache(
+            numLayers: 3, maxCacheLength: 32, numHeads: 4, headDim: 8)
+        var inputs: [String: MLMultiArray] = [:]
+        cache.addInputs(to: &inputs)
+        // 3 layers × (cache_k, cache_v, position) = 9 entries.
+        XCTAssertEqual(inputs.count, 9)
+        for i in 0..<3 {
+            XCTAssertNotNil(inputs["cache_k\(i)"])
+            XCTAssertNotNil(inputs["cache_v\(i)"])
+            XCTAssertNotNil(inputs["position\(i)"])
+        }
+    }
+
+    func testStaticOutputKeyCountMatchesLayers() {
+        XCTAssertEqual(
+            MagpieKvCache.cacheKOutputKeys.count, MagpieConstants.numDecoderLayers,
+            "cacheKOutputKeys must match numDecoderLayers — regenerate list if the exporter changes.")
+        XCTAssertEqual(
+            MagpieKvCache.cacheVOutputKeys.count, MagpieConstants.numDecoderLayers,
+            "cacheVOutputKeys must match numDecoderLayers — regenerate list if the exporter changes.")
+        XCTAssertEqual(
+            MagpieKvCache.positionOutputKeys.count, MagpieConstants.numDecoderLayers)
+    }
+}
diff --git a/Tests/FluidAudioTests/TTS/Magpie/MagpieMT19937Tests.swift b/Tests/FluidAudioTests/TTS/Magpie/MagpieMT19937Tests.swift
new file mode 100644
index 000000000..48ea133f7
--- /dev/null
+++ b/Tests/FluidAudioTests/TTS/Magpie/MagpieMT19937Tests.swift
@@ -0,0 +1,62 @@
+import XCTest
+
+@testable import FluidAudio
+
+/// Behavior tests for `MagpieMT19937` and the sampler RNG wrapper.
+///
+/// Bit-exact parity against NumPy (`np.random.get_state()`, `random_sample`,
+/// `np.random.choice`) lives in the mobius reference repo — it's a one-time
+/// port-verification artifact, not a runtime invariant. Here we only assert
+/// the production-relevant properties: same seed → same draws, different seeds
+/// diverge, and the fp32 sampler path stays in lock-step with the fp64 RNG
+/// reference for exactly-representable probabilities.
+final class MagpieMT19937Tests: XCTestCase {
+
+    // MARK: - Float-overload sanity (sampler path)
+
+    /// The sampler hands fp32 probability vectors to the RNG. The fp32
+    /// `MagpieSamplerRng.numpyChoice(probs:)` should match the fp64
+    /// `MagpieMT19937.numpyChoice` when probabilities are exactly representable
+    /// in fp32.
+    func testSamplerRngMatchesNumpyChoiceForExactFp32Probs() {
+        // Exactly representable fp32 values (powers of 2 inverses).
+        let probs32: [Float] = [0.5, 0.25, 0.125, 0.0625, 0.0625]
+        let probs64: [Double] = probs32.map { Double($0) }
+
+        // Reference uses MT19937 directly with fp64 probs.
+        let referenceMt = MagpieMT19937(seed: 12_345)
+        var referenceDraws: [Int] = []
+        for _ in 0..<32 {
+            referenceDraws.append(referenceMt.numpyChoice(probs: probs64))
+        }
+
+        // Sampler RNG goes through the fp32 path with the same seed.
+        let samplerRng = MagpieSamplerRng(seed: 12_345)
+        var samplerDraws: [Int] = []
+        for _ in 0..<32 {
+            samplerDraws.append(samplerRng.numpyChoice(probs: probs32))
+        }
+
+        XCTAssertEqual(samplerDraws, referenceDraws)
+    }
+
+    // MARK: - Determinism
+
+    func testTwoInstancesWithSameSeedProduceSameSequence() {
+        let a = MagpieMT19937(seed: 0xDEAD_BEEF)
+        let b = MagpieMT19937(seed: 0xDEAD_BEEF)
+        for _ in 0..<1_000 {
+            XCTAssertEqual(a.genrandInt32(), b.genrandInt32())
+        }
+    }
+
+    func testDifferentSeedsDiverge() {
+        let a = MagpieMT19937(seed: 1)
+        let b = MagpieMT19937(seed: 2)
+        var diff = 0
+        for _ in 0..<256 {
+            if a.genrandInt32() != b.genrandInt32() { diff += 1 }
+        }
+        XCTAssertGreaterThan(diff, 200, "different seeds should diverge in nearly every draw")
+    }
+}
diff --git a/Tests/FluidAudioTests/TTS/Magpie/MagpieNpyReaderTests.swift b/Tests/FluidAudioTests/TTS/Magpie/MagpieNpyReaderTests.swift
new file mode 100644
index 000000000..eb5fd4921
--- /dev/null
+++ b/Tests/FluidAudioTests/TTS/Magpie/MagpieNpyReaderTests.swift
@@ -0,0 +1,69 @@
+import XCTest
+
+@testable import FluidAudio
+
+final class NpyReaderTests: XCTestCase {
+
+    func testParseTinyFloat32() throws {
+        let data = makeNpyV1(
+            header: "{'descr': '<f4', 'fortran_order': False, 'shape': (2, 3), }",
+            body: floatBytes([1, 2, 3, 4, 5, 6]))
+
+        let arr = try NpyReader.parse(data: data, sourceLabel: "tiny.npy")
+        XCTAssertEqual(arr.shape, [2, 3])
+        XCTAssertEqual(arr.data, [1, 2, 3, 4, 5, 6])
+    }
+
+    func testParseFloat16UpcastsToFloat32() throws {
+        // 1.0 in IEEE 754 half is 0x3C00; 2.0 is 0x4000.
+        var body = Data()
+        body.append(contentsOf: [0x00, 0x3C, 0x00, 0x40])  // 1.0, 2.0 little-endian
+        let data = makeNpyV1(
+            header: "{'descr': '<f2', 'fortran_order': False, 'shape': (2,), }",
+            body: body)
+
+        let arr = try NpyReader.parse(data: data, sourceLabel: "tiny_fp16.npy")
+        XCTAssertEqual(arr.shape, [2])
+        XCTAssertEqual(arr.data, [1.0, 2.0])
+    }
+
+    func testBadMagicThrows() {
+        var bogus = Data(repeating: 0, count: 32)
+        bogus[0] = 0x00  // break magic
+        XCTAssertThrowsError(try NpyReader.parse(data: bogus, sourceLabel: "bad.npy"))
+    }
+
+    // MARK: - helpers
+
+    /// Build an NPY v1 blob with the given header dict literal and raw body bytes.
+    private func makeNpyV1(header: String, body: Data) -> Data {
+        var data = Data()
+        // Magic + version 1.0
+        data.append(contentsOf: [0x93, 0x4E, 0x55, 0x4D, 0x50, 0x59, 0x01, 0x00])
+
+        // Pad header so that `10 + len(header)` is a multiple of 64 (NumPy convention),
+        // terminate with newline.
+        var headerBytes = Array(header.utf8)
+        let prelude = 10
+        let padTo = ((prelude + headerBytes.count + 1 + 63) / 64) * 64
+        let padLen = padTo - (prelude + headerBytes.count + 1)
+        headerBytes.append(contentsOf: Array(repeating: UInt8(0x20), count: padLen))
+        headerBytes.append(0x0A)
+
+        let headerLen = UInt16(headerBytes.count)
+        data.append(UInt8(headerLen & 0xFF))
+        data.append(UInt8((headerLen >> 8) & 0xFF))
+        data.append(contentsOf: headerBytes)
+        data.append(body)
+        return data
+    }
+
+    private func floatBytes(_ values: [Float]) -> Data {
+        var data = Data()
+        for v in values {
+            var local = v
+            withUnsafeBytes(of: &local) { data.append(contentsOf: $0) }
+        }
+        return data
+    }
+}