Metal backend: unsupported op 'PAD' crash on Apple Silicon

## Environment

- **Device**: Apple M4 Pro (macOS 15.5, Darwin 24.6.0, arm64)
- **Model**: `CosyVoice3-2512_Q4_K_S.gguf` from [Lourdle/Fun-CosyVoice3-0.5B-2512-GGUF](https://huggingface.co/Lourdle/Fun-CosyVoice3-0.5B-2512-GGUF)
- **Frontend**: `speech_tokenizer_v3.onnx` + `campplus.onnx`
- **GGML version**: 0.9.11 (bundled)
- **Build**: `cmake -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON` with [SIMDe patch](https://github.com/Lourdle/cosyvoice.cpp/pull/2) for ARM64 AVX2→NEON translation

## Build

Build succeeds with the SIMDe patch (PR #2). `libcosyvoice.dylib` and `cosyvoice-cli` are produced without errors.

## Issue

### Metal backend — crashes with unsupported op

```
$ cosyvoice-cli --model CosyVoice3-2512_Q4_K_S.gguf \
    --speech-tokenizer speech_tokenizer_v3.onnx \
    --campplus campplus.onnx \
    --prompt-audio ref.wav --prompt-text "テスト" \
    --text "こんにちは、テストです。" \
    --output out.wav --mode zero-shot

ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_op_encode_impl: error: unsupported op 'PAD'
```

The crash occurs during `token2wav` (Flow + HiFT stage), specifically at:

```
ggml_metal_op_encode + 2500
ggml_metal_graph_compute + 588
ggml_backend_sched_graph_compute + 2244
cosyvoice_model_3::token2wav + 948
cosyvoice_tts + 168
```

### CPU backend — "Too many stop tokens"

With `GGML_METAL=OFF` (CPU-only build), generation starts but produces only 0.48–0.96 seconds of audio before aborting:

```
Too many stop tokens sampled, something might be wrong with the model or the sampling parameters.
Error: TTS generation failed.
```

This is consistent with the README Known Issues ("CPU / Vulkan backends: Produced noisy output in tests").

## Analysis

The `PAD` operation is used in the CosyVoice Flow/HiFT architecture but is not implemented in the GGML Metal backend. This is likely an upstream GGML issue — the `PAD` op exists in the CPU backend but hasn't been ported to Metal shaders.

Possible solutions:
1. **Upstream GGML**: Implement `PAD` in `ggml-metal.metal` (preferred long-term fix)
2. **Workaround**: Force `PAD` ops to CPU via `ggml_backend_sched` offloading while keeping other ops on Metal
3. **Graph rewrite**: Replace `PAD` with equivalent ops that are already Metal-supported (e.g., concatenation with zero tensors)

## Related

- Build PR: #2 (SIMDe patch for ARM64 compilation)
- GGML Metal op support: the bundled ggml 0.9.11 does not list `PAD` in its Metal shader implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal backend: unsupported op 'PAD' crash on Apple Silicon #3

Environment

Build

Issue

Metal backend — crashes with unsupported op

CPU backend — "Too many stop tokens"

Analysis

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metal backend: unsupported op 'PAD' crash on Apple Silicon #3

Description

Environment

Build

Issue

Metal backend — crashes with unsupported op

CPU backend — "Too many stop tokens"

Analysis

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions