Skip to content

feat: add Magpie TTS CoreML conversion pipeline#24

Open
Alex-Wengg wants to merge 1 commit intomainfrom
feat/magpie-tts
Open

feat: add Magpie TTS CoreML conversion pipeline#24
Alex-Wengg wants to merge 1 commit intomainfrom
feat/magpie-tts

Conversation

@Alex-Wengg
Copy link
Member

@Alex-Wengg Alex-Wengg commented Mar 13, 2026

Summary

  • Add NVIDIA Magpie TTS Multilingual (357M) CoreML conversion pipeline as a submodule
  • Complete 4-model pipeline: text encoder, decoder prefill, decoder step (AR), NanoCodec vocoder
  • 9 languages (en, es, de, fr, it, vi, zh, hi, ja), 5 built-in speakers, float16 CoreML
  • Includes export scripts for embeddings, tokenizers, local transformer weights, and pypinyin/OpenJTalk dictionaries
  • Pure CoreML inference script (generate_coreml.py) and PyTorch reference (generate_pytorch.py)

Pipeline

Model Purpose
text_encoder Text → conditioning vectors
decoder_prefill Batch speaker context into KV cache
decoder_step Single AR step with KV cache (~50-200x per utterance)
nanocodec_decoder Codec tokens → 22kHz audio

Source


Open with Devin

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +93 to +94
os.makedirs(os.path.dirname(output_path), exist_ok=True)
mlmodel.save(output_path)
Copy link

@devin-ai-integration devin-ai-integration bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Duplicate model save due to copy-paste error

Lines 90-94 in convert_decoder_step.py contain a duplicated block that calls os.makedirs and mlmodel.save twice in succession. This is clearly a copy-paste artifact — the model is saved, then immediately saved again. While not functionally incorrect (the second write overwrites the first), it wastes significant I/O time since CoreML .mlpackage bundles can be hundreds of megabytes.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

NVIDIA Magpie TTS Multilingual (357M) conversion to CoreML.

Pipeline (4 models):
- text_encoder: text tokenization and encoding
- decoder_prefill: batch speaker context into KV cache
- decoder_step: single AR step with KV cache
- nanocodec_decoder: codec tokens to 22kHz audio

9 languages (en, es, de, fr, it, vi, zh, hi, ja), 5 speakers.

Includes conversion scripts, traceable wrappers, export scripts
for embeddings/tokenizers/weights, and CoreML inference script.

Source: nvidia/magpie_tts_multilingual_357m
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

Open in Devin Review

d_head = d_model // sa_n_heads

# Read T_ctx from speaker_info if not specified
constants_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "constants")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Wrong parent directory calculation causes constants path to resolve to incorrect location

The constants_dir at line 41 uses os.path.dirname(os.path.dirname(os.path.abspath(__file__))) which traverses up two directory levels. This was written assuming the script lives in a convert/ subdirectory (as documented in the README's python convert/convert_decoder_prefill.py), but the script is actually placed directly in coreml/. As a result, constants_dir resolves to models/tts/magpie/constants/ instead of the correct models/tts/magpie/coreml/constants/ where export_constants.py writes speaker_info.json. The os.path.exists(si_path) check silently returns False, causing the script to always use the hardcoded default t_ctx=110 instead of reading the value from the exported speaker info. The same off-by-one dirname pattern appears in the sys.path.insert at line 18, though that doesn't cause a runtime failure because Python's default sys.path[0] (the script's own directory) already contains the traceable/ package.

Suggested change
constants_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "constants")
constants_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "constants")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1 to +41
[project]
name = "magpie-tts-coreml"
requires-python = ">= 3.10,<3.13"
description = "NVIDIA Magpie TTS 357M CoreML conversion"
version = "0.1.0"
dependencies = [
"numpy>=1.24",
"torch>=2.5.0",
"coremltools>=8.0",
"soundfile>=0.12.0",
"scipy>=1.5.0",
"huggingface_hub>=0.10",
]

[project.optional-dependencies]
nemo = [
"nemo_toolkit[tts]",
"hydra-core>=1.3",
"omegaconf>=2.3",
"lightning>=2.0",
]

[tool.uv.sources]
torch = [
{ index = "pytorch-cpu" },
]

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[tool.hatch.build.targets.wheel]
packages = ["."]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.uv]
python-preference = "only-managed"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 pyproject.toml placed in model directory instead of target directory, violating AGENTS.md structure rules

AGENTS.md mandates: "Code lives under models/{class}/{model}/{target}; follow the existing vad/silero-vad/coreml pattern" and "Each target directory bundles its own pyproject.toml, uv.lock". The existing reference pattern places these files at models/vad/silero-vad/coreml/pyproject.toml. This PR places pyproject.toml at models/tts/magpie/pyproject.toml (one level above the coreml/ target directory) instead of the required models/tts/magpie/coreml/pyproject.toml. This breaks the self-contained target directory convention and means uv sync run from the coreml/ directory (as instructed by AGENTS.md) won't find the project file.

Prompt for agents
Move models/tts/magpie/pyproject.toml to models/tts/magpie/coreml/pyproject.toml and move models/tts/magpie/uv.lock to models/tts/magpie/coreml/uv.lock. This matches the established pattern in models/vad/silero-vad/coreml/ where each target directory is self-contained with its own pyproject.toml and uv.lock. After moving, verify that uv sync works correctly when run from the coreml/ directory.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant