feat: add TTS voice response messages via Mistral Voxtral API#167
Open
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
Open
feat: add TTS voice response messages via Mistral Voxtral API#167vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
vkavun wants to merge 24 commits intoRichardAtCT:mainfrom
Conversation
TTS capability using Mistral Voxtral API to send Claude responses as Telegram voice messages, with user toggle and graceful fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10-task TDD plan covering config, feature flag, storage migration, TTS synthesis, /voice command, and orchestrator wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add five new Pydantic Settings fields for text-to-speech voice responses: enable_voice_responses, voice_response_model, voice_response_voice, voice_response_format, and voice_response_max_length. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add voice_responses_enabled property to FeatureFlags that gates TTS on both enable_voice_responses setting and mistral_api_key being set. Register it in is_feature_enabled() and get_enabled_features(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds migration 5 to extend the users table with a voice_responses_enabled boolean column, updates UserModel with the new field, and adds get/set repository methods to UserRepository with full test coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements TDD-verified method that checks feature flag, user toggle, synthesizes speech via voice_handler, and falls back to text on failure. Short responses get a label; long responses get summarized via Claude + full text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Insert _maybe_send_voice_response() call between image-caption logic and text-sending loop; skip text messages when voice is successfully sent. Initialize response_content=None before try block to prevent UnboundLocalError on error paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test for long-response summarization path (Task 8) and update TTS failure test to assert fallback note; send "(Audio unavailable, sent as text)" message in the except block when TTS fails (Task 9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also remove unused imports in test_voice_command.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TTS voice response was only wired into agentic_text() but voice messages go through agentic_voice() -> _handle_agentic_media_message() which had its own separate response-sending path without TTS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- synthesize_speech() now uses httpx directly to call /v1/audio/speech - Uses voice_id (UUID) instead of voice name (no preset names exist) - Decodes base64 audio_data from response - Correct model: voxtral-mini-tts-2603 (not voxtral-4b-tts-2603) - Default voice: Paul Neutral (c69964a6-ab8b-4f8a-9465-ec0925096ec8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audio-only for short responses; long responses still send full text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ENABLE_VOICE_RESPONSES is true, the bot appends instructions to Claude's system prompt so it knows about TTS capabilities and stops telling users it cannot send voice messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
|
Thanks for this comprehensive TTS implementation! A few things needed before we can merge:
The feature design is solid — looking forward to getting this in after the above items are addressed. |
Handles the check_match callback from escalation messages by running claude -p with web search to evaluate current match state, then editing the original message in-place with a verdict (winning/losing/won/lost). Button remains available for repeated checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles the new [🔎 Investigate] button on trade fill notifications. Sends a placeholder reply, runs claude -p with DB queries/log analysis/ web search, then edits the placeholder with structured investigation results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The check_match_button markup only included the Check Match button, so clicking it would replace both buttons with just the one. Now both buttons are preserved after the message edit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of spawning Claude with WebSearch/WebFetch tools to find match scores (slow, expensive, leaks source URLs), now: 1. Look up sofa_id from poly_dashboard DB by player names 2. Fetch live/final score directly from SofaScore API (~1s) 3. Pass structured score data to Claude with no tools for assessment Result: faster response, cleaner 3-line output (status/score/reason), no web search sources in the verdict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite prompt to demand exactly 3 lines with no reasoning/analysis - Parse stdout to extract only STATUS/Score/Reason lines, strip preamble - Add --max-turns 1 to prevent tool loop overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use literal space [ ] instead of \s in capture groups to prevent matching across line boundaries (e.g. "Trade Filled\nPanna Udvardy" instead of just "Panna Udvardy"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SofaScore API returns 403 from the prod server IP. Instead of calling the API directly (which would need proxy routing), read the latest match snapshot from the poly_dashboard DB — the collector already stores live scores there via proxied SofaScore polling. Also adds current set games to the score summary for Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of showing "unexpected error" when users click buttons from before a bot restart, catch the Telegram "query too old" error and continue processing the action normally. Also prevent these benign errors from being logged as security violations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/voice on|offtoggle persisted in SQLite, gated behind admin-levelENABLE_VOICE_RESPONSESenv varChanges
ENABLE_VOICE_RESPONSES,VOICE_RESPONSE_MODEL,VOICE_RESPONSE_VOICE,VOICE_RESPONSE_FORMAT,VOICE_RESPONSE_MAX_LENGTH)voice_responses_enabledin FeatureFlagsvoice_responses_enabledcolumn to users table + repository get/set methodssynthesize_speech()method callingclient.audio.speech.complete_async()/voicecommand handler +_maybe_send_voice_response()wired intoagentic_text()flowTest plan
ENABLE_VOICE_RESPONSES=truein production env/voice onpersists preference and/voice offclears it🤖 Generated with Claude Code