Skip to content

feat: add TTS voice response messages via Mistral Voxtral API#167

Open
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
vkavun:feature/audio-response-messages
Open

feat: add TTS voice response messages via Mistral Voxtral API#167
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
vkavun:feature/audio-response-messages

Conversation

@vkavun
Copy link
Copy Markdown

@vkavun vkavun commented Mar 28, 2026

Summary

  • Add text-to-speech capability so the bot can send Claude's responses as Telegram voice messages using Mistral's Voxtral TTS API
  • Per-user /voice on|off toggle persisted in SQLite, gated behind admin-level ENABLE_VOICE_RESPONSES env var
  • Short responses: sent as voice message + brief label; long responses (>threshold): Claude summarizes for spoken delivery, audio of summary + full text sent
  • Graceful fallback to text with "(Audio unavailable, sent as text)" note on TTS failure

Changes

  • Config: 5 new env vars (ENABLE_VOICE_RESPONSES, VOICE_RESPONSE_MODEL, VOICE_RESPONSE_VOICE, VOICE_RESPONSE_FORMAT, VOICE_RESPONSE_MAX_LENGTH)
  • Feature flag: voice_responses_enabled in FeatureFlags
  • Storage: Migration 5 adds voice_responses_enabled column to users table + repository get/set methods
  • VoiceHandler: New synthesize_speech() method calling client.audio.speech.complete_async()
  • Orchestrator: /voice command handler + _maybe_send_voice_response() wired into agentic_text() flow
  • CLAUDE.md: Updated with new command and settings docs

Test plan

  • 533 tests pass, 0 failures
  • Enable ENABLE_VOICE_RESPONSES=true in production env
  • Verify /voice on persists preference and /voice off clears it
  • Send a short message and confirm voice message is received
  • Send a long message (>2000 chars) and confirm summary audio + full text
  • Verify TTS failure gracefully falls back to text with note

🤖 Generated with Claude Code

vkavun and others added 16 commits March 28, 2026 13:33
TTS capability using Mistral Voxtral API to send Claude responses
as Telegram voice messages, with user toggle and graceful fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10-task TDD plan covering config, feature flag, storage migration,
TTS synthesis, /voice command, and orchestrator wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add five new Pydantic Settings fields for text-to-speech voice responses:
enable_voice_responses, voice_response_model, voice_response_voice,
voice_response_format, and voice_response_max_length.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add voice_responses_enabled property to FeatureFlags that gates TTS on
both enable_voice_responses setting and mistral_api_key being set.
Register it in is_feature_enabled() and get_enabled_features().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds migration 5 to extend the users table with a voice_responses_enabled
boolean column, updates UserModel with the new field, and adds
get/set repository methods to UserRepository with full test coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements TDD-verified method that checks feature flag, user toggle,
synthesizes speech via voice_handler, and falls back to text on failure.
Short responses get a label; long responses get summarized via Claude + full text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Insert _maybe_send_voice_response() call between image-caption logic and
text-sending loop; skip text messages when voice is successfully sent.
Initialize response_content=None before try block to prevent UnboundLocalError
on error paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for long-response summarization path (Task 8) and update TTS
failure test to assert fallback note; send "(Audio unavailable, sent as
text)" message in the except block when TTS fails (Task 9).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also remove unused imports in test_voice_command.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TTS voice response was only wired into agentic_text() but voice
messages go through agentic_voice() -> _handle_agentic_media_message()
which had its own separate response-sending path without TTS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- synthesize_speech() now uses httpx directly to call /v1/audio/speech
- Uses voice_id (UUID) instead of voice name (no preset names exist)
- Decodes base64 audio_data from response
- Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603)
- Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audio-only for short responses; long responses still send full text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to
Claude's system prompt so it knows about TTS capabilities and stops
telling users it cannot send voice messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@RichardAtCT
Copy link
Copy Markdown
Owner

Thanks for this comprehensive TTS implementation! A few things needed before we can merge:

  1. Size concern: At 2K+ lines, this is a large PR. Consider whether any parts can be split out (e.g., the database migration as a separate PR).
  2. Test coverage: Please add automated tests for the core TTS logic (Mistral API client, summarization for long responses, fallback behavior).
  3. Coordination with Add make run-watch for auto-restart during development #158: PR Add make run-watch for auto-restart during development #158 (which we're merging) also modifies voice-related code (whisper.cpp support). You'll likely need to rebase after Add make run-watch for auto-restart during development #158 lands.
  4. orchestrator.py conflicts: Several other PRs touching orchestrator.py are being merged — please rebase once the current batch completes.

The feature design is solid — looking forward to getting this in after the above items are addressed.

Ubuntu and others added 8 commits April 4, 2026 12:32
Handles the check_match callback from escalation messages by running
claude -p with web search to evaluate current match state, then editing
the original message in-place with a verdict (winning/losing/won/lost).
Button remains available for repeated checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles the new [🔎 Investigate] button on trade fill notifications.
Sends a placeholder reply, runs claude -p with DB queries/log analysis/
web search, then edits the placeholder with structured investigation results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The check_match_button markup only included the Check Match button,
so clicking it would replace both buttons with just the one.
Now both buttons are preserved after the message edit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of spawning Claude with WebSearch/WebFetch tools to find
match scores (slow, expensive, leaks source URLs), now:
1. Look up sofa_id from poly_dashboard DB by player names
2. Fetch live/final score directly from SofaScore API (~1s)
3. Pass structured score data to Claude with no tools for assessment

Result: faster response, cleaner 3-line output (status/score/reason),
no web search sources in the verdict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis
- Parse stdout to extract only STATUS/Score/Reason lines, strip preamble
- Add --max-turns 1 to prevent tool loop overhead

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use literal space [ ] instead of \s in capture groups to prevent
matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy"
instead of just "Panna Udvardy").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SofaScore API returns 403 from the prod server IP. Instead of calling
the API directly (which would need proxy routing), read the latest
match snapshot from the poly_dashboard DB — the collector already
stores live scores there via proxied SofaScore polling.

Also adds current set games to the score summary for Claude.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of showing "unexpected error" when users click buttons from
before a bot restart, catch the Telegram "query too old" error and
continue processing the action normally. Also prevent these benign
errors from being logged as security violations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants