Skip to content

Metal backend: unsupported op 'PAD' crash on Apple Silicon #3

@jasagiri

Description

@jasagiri

Environment

  • Device: Apple M4 Pro (macOS 15.5, Darwin 24.6.0, arm64)
  • Model: CosyVoice3-2512_Q4_K_S.gguf from Lourdle/Fun-CosyVoice3-0.5B-2512-GGUF
  • Frontend: speech_tokenizer_v3.onnx + campplus.onnx
  • GGML version: 0.9.11 (bundled)
  • Build: cmake -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON with SIMDe patch for ARM64 AVX2→NEON translation

Build

Build succeeds with the SIMDe patch (PR #2). libcosyvoice.dylib and cosyvoice-cli are produced without errors.

Issue

Metal backend — crashes with unsupported op

$ cosyvoice-cli --model CosyVoice3-2512_Q4_K_S.gguf \
    --speech-tokenizer speech_tokenizer_v3.onnx \
    --campplus campplus.onnx \
    --prompt-audio ref.wav --prompt-text "テスト" \
    --text "こんにちは、テストです。" \
    --output out.wav --mode zero-shot

ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_op_encode_impl: error: unsupported op 'PAD'

The crash occurs during token2wav (Flow + HiFT stage), specifically at:

ggml_metal_op_encode + 2500
ggml_metal_graph_compute + 588
ggml_backend_sched_graph_compute + 2244
cosyvoice_model_3::token2wav + 948
cosyvoice_tts + 168

CPU backend — "Too many stop tokens"

With GGML_METAL=OFF (CPU-only build), generation starts but produces only 0.48–0.96 seconds of audio before aborting:

Too many stop tokens sampled, something might be wrong with the model or the sampling parameters.
Error: TTS generation failed.

This is consistent with the README Known Issues ("CPU / Vulkan backends: Produced noisy output in tests").

Analysis

The PAD operation is used in the CosyVoice Flow/HiFT architecture but is not implemented in the GGML Metal backend. This is likely an upstream GGML issue — the PAD op exists in the CPU backend but hasn't been ported to Metal shaders.

Possible solutions:

  1. Upstream GGML: Implement PAD in ggml-metal.metal (preferred long-term fix)
  2. Workaround: Force PAD ops to CPU via ggml_backend_sched offloading while keeping other ops on Metal
  3. Graph rewrite: Replace PAD with equivalent ops that are already Metal-supported (e.g., concatenation with zero tensors)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions