feat: add TTS voice response messages via Mistral Voxtral API by vkavun · Pull Request #167 · RichardAtCT/claude-code-telegram

vkavun · 2026-03-28T12:16:08Z

Summary

Add text-to-speech capability so the bot can send Claude's responses as Telegram voice messages using Mistral's Voxtral TTS API
Per-user /voice on|off toggle persisted in SQLite, gated behind admin-level ENABLE_VOICE_RESPONSES env var
Short responses: sent as voice message + brief label; long responses (>threshold): Claude summarizes for spoken delivery, audio of summary + full text sent
Graceful fallback to text with "(Audio unavailable, sent as text)" note on TTS failure

Changes

Config: 5 new env vars (ENABLE_VOICE_RESPONSES, VOICE_RESPONSE_MODEL, VOICE_RESPONSE_VOICE, VOICE_RESPONSE_FORMAT, VOICE_RESPONSE_MAX_LENGTH)
Feature flag: voice_responses_enabled in FeatureFlags
Storage: Migration 5 adds voice_responses_enabled column to users table + repository get/set methods
VoiceHandler: New synthesize_speech() method calling client.audio.speech.complete_async()
Orchestrator: /voice command handler + _maybe_send_voice_response() wired into agentic_text() flow
CLAUDE.md: Updated with new command and settings docs

Test plan

533 tests pass, 0 failures
Enable ENABLE_VOICE_RESPONSES=true in production env
Verify /voice on persists preference and /voice off clears it
Send a short message and confirm voice message is received
Send a long message (>2000 chars) and confirm summary audio + full text
Verify TTS failure gracefully falls back to text with note

🤖 Generated with Claude Code

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add five new Pydantic Settings fields for text-to-speech voice responses: enable_voice_responses, voice_response_model, voice_response_voice, voice_response_format, and voice_response_max_length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add voice_responses_enabled property to FeatureFlags that gates TTS on both enable_voice_responses setting and mistral_api_key being set. Register it in is_feature_enabled() and get_enabled_features(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds migration 5 to extend the users table with a voice_responses_enabled boolean column, updates UserModel with the new field, and adds get/set repository methods to UserRepository with full test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements TDD-verified method that checks feature flag, user toggle, synthesizes speech via voice_handler, and falls back to text on failure. Short responses get a label; long responses get summarized via Claude + full text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Insert _maybe_send_voice_response() call between image-caption logic and text-sending loop; skip text messages when voice is successfully sent. Initialize response_content=None before try block to prevent UnboundLocalError on error paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add test for long-response summarization path (Task 8) and update TTS failure test to assert fallback note; send "(Audio unavailable, sent as text)" message in the except block when TTS fails (Task 9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TTS voice response was only wired into agentic_text() but voice messages go through agentic_voice() -> _handle_agentic_media_message() which had its own separate response-sending path without TTS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- synthesize_speech() now uses httpx directly to call /v1/audio/speech - Uses voice_id (UUID) instead of voice name (no preset names exist) - Decodes base64 audio_data from response - Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603) - Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to Claude's system prompt so it knows about TTS capabilities and stops telling users it cannot send voice messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RichardAtCT · 2026-03-30T06:58:34Z

Thanks for this comprehensive TTS implementation! A few things needed before we can merge:

Size concern: At 2K+ lines, this is a large PR. Consider whether any parts can be split out (e.g., the database migration as a separate PR).
Test coverage: Please add automated tests for the core TTS logic (Mistral API client, summarization for long responses, fallback behavior).
Coordination with Add make run-watch for auto-restart during development #158: PR Add make run-watch for auto-restart during development #158 (which we're merging) also modifies voice-related code (whisper.cpp support). You'll likely need to rebase after Add make run-watch for auto-restart during development #158 lands.
orchestrator.py conflicts: Several other PRs touching orchestrator.py are being merged — please rebase once the current batch completes.

The feature design is solid — looking forward to getting this in after the above items are addressed.

Handles the check_match callback from escalation messages by running claude -p with web search to evaluate current match state, then editing the original message in-place with a verdict (winning/losing/won/lost). Button remains available for repeated checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Handles the new [🔎 Investigate] button on trade fill notifications. Sends a placeholder reply, runs claude -p with DB queries/log analysis/ web search, then edits the placeholder with structured investigation results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The check_match_button markup only included the Check Match button, so clicking it would replace both buttons with just the one. Now both buttons are preserved after the message edit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of spawning Claude with WebSearch/WebFetch tools to find match scores (slow, expensive, leaks source URLs), now: 1. Look up sofa_id from poly_dashboard DB by player names 2. Fetch live/final score directly from SofaScore API (~1s) 3. Pass structured score data to Claude with no tools for assessment Result: faster response, cleaner 3-line output (status/score/reason), no web search sources in the verdict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis - Parse stdout to extract only STATUS/Score/Reason lines, strip preamble - Add --max-turns 1 to prevent tool loop overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SofaScore API returns 403 from the prod server IP. Instead of calling the API directly (which would need proxy routing), read the latest match snapshot from the poly_dashboard DB — the collector already stores live scores there via proxied SofaScore polling. Also adds current set games to the score summary for Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of showing "unexpected error" when users click buttons from before a bot restart, catch the Telegram "query too old" error and continue processing the action normally. Also prevent these benign errors from being logged as security violations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vkavun and others added 16 commits March 28, 2026 13:33

docs: add design spec for audio response messages feature

54522d7

TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add implementation plan for audio response messages

38679a5

10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add synthesize_speech() TTS method to VoiceHandler

24f3a33

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add /voice on|off toggle command

d5cec76

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: update orchestrator tests for /voice command (7 commands)

f31da0d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update CLAUDE.md with /voice command and TTS settings

c27851e

Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove redundant 'Voice response' text label for short responses

cd91f62

Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ubuntu and others added 8 commits April 4, 2026 12:32

fix: player name regex matching across newlines

2e1cbbd

Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TTS voice response messages via Mistral Voxtral API#167

feat: add TTS voice response messages via Mistral Voxtral API#167
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
vkavun:feature/audio-response-messages

vkavun commented Mar 28, 2026

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkavun commented Mar 28, 2026

Summary

Changes

Test plan

Uh oh!

RichardAtCT commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants