Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9ac86ea
LLM Chain: Add foundation for chain execution with database schema
vprashrex Feb 20, 2026
6451bb0
LLM Chain: Add documentation and update endpoint description for chai…
vprashrex Feb 21, 2026
c9f94e2
LLM Chain: Move guardrails into execute_llm_call for per-block suppor…
vprashrex Feb 21, 2026
fb18356
Merge branch 'main' into feature/llm-chain-setup
vprashrex Feb 27, 2026
baaac95
prettify format
vprashrex Mar 1, 2026
5177bfb
refactor: update STTLLMParams to allow optional instructions and impr…
vprashrex Mar 1, 2026
2fb81b1
feat: add metadata to BlockResult and update job execution to use res…
vprashrex Mar 1, 2026
113488a
feat: add tests for LLM chain execution and job handling
vprashrex Mar 2, 2026
a62c433
Merge branch 'main' into feature/llm-chain-setup
vprashrex Mar 2, 2026
6421465
fix: correct variable name from job_id to job_uuid in execute_job fun…
vprashrex Mar 2, 2026
50acc8c
Merge branch 'main' into feature/llm-chain-setup
vprashrex Mar 5, 2026
19d6f58
refactor: streamline LLM chain execution and enhance callback handling
vprashrex Mar 5, 2026
e04c374
Merge branch 'main' into feature/llm-chain-setup
vprashrex Mar 5, 2026
9cc5cf8
docs: enhance llm_chain.md with detailed input specifications and gua…
vprashrex Mar 6, 2026
f7797d1
refactor: remove unused timestamps from LlmChain model and update rel…
vprashrex Mar 6, 2026
4624f55
Merge branch 'main' into feature/llm-chain-setup
vprashrex Mar 6, 2026
5b9a4e9
feat: basic speech-to-speech impl on top of llm_chain
Prajna1999 Mar 5, 2026
c1807df
feat: add s2s blocks
Prajna1999 Mar 5, 2026
e7de797
Merge branch 'main' into feature/speech-to-speech
Prajna1999 Mar 6, 2026
9eeb999
Merge branch 'main' into feature/speech-to-speech
Prajna1999 Mar 9, 2026
56920b1
feat: detected lang in the webhook rsponse, context passing across li…
Prajna1999 Mar 9, 2026
96ea78e
chore: docs
Prajna1999 Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions backend/app/api/docs/llm/speech_to_speech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Speech-to-Speech (STS) with RAG

Execute a complete speech-to-speech workflow with knowledge base retrieval.

## Endpoint

```
POST /llm/sts
```

## Flow

```
Voice Input → STT (auto language) → RAG (Knowledge Base) → TTS → Voice Output
```
Comment on lines +7 to +15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

The fenced code blocks at lines 7-9 and 13-15 are missing language specifiers per MD040. Use http for the endpoint block and text for the flow diagram.

📝 Proposed fix
 ## Endpoint
 
-```
+```http
 POST /llm/sts

Flow

- +text
Voice Input → STT (auto language) → RAG (Knowledge Base) → TTS → Voice Output

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
POST /llm/sts
```
## Flow
```
Voice Input → STT (auto language) → RAG (Knowledge Base) → TTS → Voice Output
```
## Endpoint
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/docs/llm/speech_to_speech.md` around lines 7 - 15, The
Markdown fenced code blocks in speech_to_speech.md (the POST /llm/sts endpoint
block and the Flow diagram block) lack language identifiers; update the first
fenced block to use the "http" language identifier and the Flow fenced block to
use the "text" identifier so they pass MD040 linting (refer to the fenced blocks
containing "POST /llm/sts" and "Voice Input → STT (auto language) → RAG
(Knowledge Base) → TTS → Voice Output").


## Input

- **Voice note**: WhatsApp-compatible audio format (required)
- **Knowledge base IDs**: One or more knowledge bases for RAG (required)
- **Languages**: Input and output languages (optional, defaults to Hindi)
- **Models**: STT, LLM, and TTS model selection (optional, defaults to Sarvam)

## Output

You will receive **3 callbacks** to your webhook URL:

1. **STT Callback** (Intermediate): Transcribed text from audio
2. **LLM Callback** (Intermediate): RAG-enhanced response text
3. **TTS Callback** (Final): Audio output + response text

Each callback includes:
- Output from that step
- Token usage
- Latency information (check timestamps)

## Supported Languages

### Primary Indian Languages
- English, Hindi, Hinglish (code-switching)
- Bengali, Kannada, Malayalam, Marathi
- Odia, Punjabi, Tamil, Telugu, Gujarati

### Additional Languages (Sarvam Saaras V3)
- Assamese, Urdu, Nepali
- Konkani, Kashmiri, Sindhi
- Sanskrit, Santali, Manipuri
- Bodo, Maithili, Dogri

**Total: 25 languages** with automatic language detection

## Available Models

### STT (Speech-to-Text)
- `saaras:v3` - Sarvam Saaras V3 (**default**, fast, auto language detection, optimized for Indian languages)
- `gemini-2.5-pro` - Google Gemini 2.5 Pro

**Note:** Sarvam STT uses automatic language detection. No need to specify input language.

### LLM (RAG)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support more than just these two models for RAG

- `gpt-4o` - OpenAI GPT-4o (**default**, best quality)
- `gpt-4o-mini` - OpenAI GPT-4o Mini (faster, lower cost)

### TTS (Text-to-Speech)
- `bulbul:v3` - Sarvam Bulbul V3 (**default**, natural Indian voices, MP3 output)
- `gemini-2.5-pro-preview-tts` - Google Gemini 2.5 Pro (OGG OPUS output)

## Edge Cases & Error Handling

### Empty STT Output
If speech-to-text returns empty/blank:
- Chain fails immediately
- Error message: "STT returned no transcription"
- No subsequent blocks are executed

### Audio Size Limit
WhatsApp limit: 16MB
- TTS providers may fail if output exceeds limit
- Error is caught and reported in callback
- Consider using shorter responses or compression

### Invalid Audio Format
If input audio format is unsupported:
- STT provider fails with format error
- Error reported in callback
- Supported: MP3, WAV, OGG, OPUS, M4A

### Provider Failures
Each block has independent error handling:
- STT fails → Chain stops, STT error reported
- LLM fails → Chain stops, RAG error reported
- TTS fails → Chain stops, TTS error reported

## Example Request

```bash
curl -X POST https://api.kaapi.ai/llm/sts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"query": {
"type": "audio",
"content": {
"format": "base64",
"value": "base64_encoded_audio_data",
"mime_type": "audio/ogg"
}
},
"knowledge_base_ids": ["kb_abc123"],
"input_language": "hindi",
"output_language": "english",
"callback_url": "https://your-app.com/webhook"
}
EOF
```

**Note:** `stt_model`, `llm_model`, and `tts_model` are optional and will use defaults if not specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "specifying" before "stt_model",etc , etc


## Example Callbacks

### Callback 1: STT Output (Intermediate)
```json
{
"success": true,
"data": {
"block_index": 1,
"total_blocks": 3,
"response": {
"provider_response_id": "stt_xyz789",
"provider": "sarvamai-native",
"model": "saarika:v1",
"output": {
"type": "text",
"content": {
"value": "नमस्ते, मुझे अपने अकाउंट के बारे में जानकारी चाहिए"
}
}
},
"usage": {
"input_tokens": 0,
"output_tokens": 12,
"total_tokens": 12
}
},
"metadata": {
"speech_to_speech": true,
"input_language": "hi-IN"
}
}
```
Comment on lines +122 to +151
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

STT callback example shows incorrect model name.

Line 132 shows "model": "saarika:v1" but the documented default STT model is "saaras:v3" (line 55). Consider updating the example to use the correct model name for consistency.

📝 Proposed fix
     "response": {
       "provider_response_id": "stt_xyz789",
       "provider": "sarvamai-native",
-      "model": "saarika:v1",
+      "model": "saaras:v3",
       "output": {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Callback 1: STT Output (Intermediate)
```json
{
"success": true,
"data": {
"block_index": 1,
"total_blocks": 3,
"response": {
"provider_response_id": "stt_xyz789",
"provider": "sarvamai-native",
"model": "saarika:v1",
"output": {
"type": "text",
"content": {
"value": "नमस्ते, मुझे अपने अकाउंट के बारे में जानकारी चाहिए"
}
}
},
"usage": {
"input_tokens": 0,
"output_tokens": 12,
"total_tokens": 12
}
},
"metadata": {
"speech_to_speech": true,
"input_language": "hi-IN"
}
}
```
### Callback 1: STT Output (Intermediate)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/docs/llm/speech_to_speech.md` around lines 122 - 151, The STT
callback example under "Callback 1: STT Output (Intermediate)" uses an incorrect
model name; update the JSON "model" field in that example (the key "model"
inside the "response" object) from "saarika:v1" to the documented default
"saaras:v3" so it matches the default STT model elsewhere in the docs.


### Callback 2: LLM Output (Intermediate)
```json
{
"success": true,
"data": {
"block_index": 2,
"total_blocks": 3,
"response": {
"provider_response_id": "chatcmpl_abc123",
"provider": "openai",
"model": "gpt-4o",
"output": {
"type": "text",
"content": {
"value": "आपके अकाउंट में कुल बैलेंस ₹5,000 है। पिछले महीने में 3 ट्रांजैक्शन हुए हैं।"
}
}
},
"usage": {
"input_tokens": 150,
"output_tokens": 45,
"total_tokens": 195
}
},
"metadata": {
"speech_to_speech": true
}
}
```

### Callback 3: TTS Output (Final)
```json
{
"success": true,
"data": {
"response": {
"provider_response_id": "tts_def456",
"provider": "sarvamai-native",
"model": "bulbul:v1",
"output": {
"type": "audio",
"content": {
"format": "base64",
"value": "base64_encoded_audio_output",
"mime_type": "audio/ogg"
}
}
},
"usage": {
"input_tokens": 15,
"output_tokens": 0,
"total_tokens": 15
}
},
"metadata": {
"speech_to_speech": true,
"output_language": "hi-IN"
}
}
```
Comment on lines +183 to +212
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TTS callback example shows incorrect model version.

Line 191 shows "model": "bulbul:v1" but the documented default TTS model is "bulbul:v3" (line 65). Update the example for consistency.

📝 Proposed fix
     "response": {
       "provider_response_id": "tts_def456",
       "provider": "sarvamai-native",
-      "model": "bulbul:v1",
+      "model": "bulbul:v3",
       "output": {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Callback 3: TTS Output (Final)
```json
{
"success": true,
"data": {
"response": {
"provider_response_id": "tts_def456",
"provider": "sarvamai-native",
"model": "bulbul:v1",
"output": {
"type": "audio",
"content": {
"format": "base64",
"value": "base64_encoded_audio_output",
"mime_type": "audio/ogg"
}
}
},
"usage": {
"input_tokens": 15,
"output_tokens": 0,
"total_tokens": 15
}
},
"metadata": {
"speech_to_speech": true,
"output_language": "hi-IN"
}
}
```
### Callback 3: TTS Output (Final)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/docs/llm/speech_to_speech.md` around lines 183 - 212, The TTS
callback example under "Callback 3: TTS Output (Final)" contains the wrong model
version; update the JSON field "model": "bulbul:v1" to "model": "bulbul:v3" so
it matches the documented default TTS model used elsewhere (e.g., the default
referenced at "bulbul:v3"); ensure the change is made only in the example JSON
"response" object where "provider": "sarvamai-native" and
"provider_response_id": "tts_def456".


## Latency Tracking

Calculate latency from callback timestamps:
- **STT latency**: Time from request to first callback
- **LLM latency**: Time between first and second callback
- **TTS latency**: Time between second and third callback
- **Total latency**: Time from request to final callback

## Best Practices

1. **Language Consistency**: If not translating, keep input_language = output_language
2. **Model Selection**: Use Sarvam models for Indian languages (faster, better quality)
3. **Knowledge Base**: Ensure KB is properly indexed and relevant to expected queries
4. **Error Handling**: Implement retry logic for transient provider failures
5. **Webhook Security**: Validate webhook signatures and use HTTPS
3 changes: 3 additions & 0 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@
config,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports should be sorted alphabetically

doc_transformation_job,
documents,
llm_sts,
login,
languages,
llm,
llm_chain,
llm_sts,
Comment on lines +10 to +15
Copy link

@coderabbitai coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the duplicate llm_sts import.

llm_sts is imported twice in the same tuple here, which trips Ruff F811 and can block the lint job.

🧰 Tools
🪛 Ruff (0.15.4)

[error] 15-15: Redefinition of unused llm_sts from line 10: llm_sts redefined here

Remove definition: llm_sts

(F811)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/main.py` around lines 10 - 15, The import tuple in main.py
contains a duplicate symbol llm_sts which triggers Ruff F811; remove the extra
llm_sts entry so llm_sts only appears once in the import list (the tuple that
currently contains llm_sts, login, languages, llm, llm_chain, llm_sts) to
resolve the lint error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajna1999 check this duplicate

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

organization,
openai_conversation,
project,
Expand Down Expand Up @@ -43,6 +45,7 @@
api_router.include_router(languages.router)
api_router.include_router(llm.router)
api_router.include_router(llm_chain.router)
api_router.include_router(llm_sts.router)
api_router.include_router(login.router)
api_router.include_router(onboarding.router)
api_router.include_router(openai_conversation.router)
Expand Down
144 changes: 144 additions & 0 deletions backend/app/api/routes/llm_sts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
"""Speech-to-Speech (STS) API endpoint with RAG."""

import logging

from fastapi import APIRouter, Depends, HTTPException
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPException is not needed


from app.api.deps import AuthContextDep, SessionDep
from app.api.permissions import Permission, require_permission
from app.models import Message
from app.models.llm.request import (
LLMChainRequest,
QueryParams,
SpeechToSpeechRequest,
)
from app.services.llm.chain.utils import (
SUPPORTED_LANGUAGE_CODES,
build_rag_block,
build_stt_block,
build_tts_block,
)
from app.services.llm.jobs import start_chain_job
from app.utils import APIResponse, load_description, validate_callback_url

logger = logging.getLogger(__name__)

router = APIRouter(tags=["LLM"])


@router.post(
"/llm/sts",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert into llm/chain/sts

description=load_description("llm/speech_to_speech.md"),
response_model=APIResponse[Message],
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
)
def speech_to_speech(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annotation missing

_current_user: AuthContextDep,
_session: SessionDep,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should _session start with underscore if later we pass to different function in line 129

request: SpeechToSpeechRequest,
):
Comment on lines +35 to +39
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add return type hint to function signature.

The function is missing a return type hint. As per coding guidelines, all function parameters and return values should have type hints.

📝 Proposed fix
 def speech_to_speech(
     _current_user: AuthContextDep,
     _session: SessionDep,
     request: SpeechToSpeechRequest,
-):
+) -> APIResponse[Message]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def speech_to_speech(
_current_user: AuthContextDep,
_session: SessionDep,
request: SpeechToSpeechRequest,
):
def speech_to_speech(
_current_user: AuthContextDep,
_session: SessionDep,
request: SpeechToSpeechRequest,
) -> APIResponse[Message]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/llm_speech.py` around lines 36 - 40, The function
speech_to_speech is missing a return type annotation; update its signature to
include an explicit return type that matches the actual return value used in the
implementation (for example, if it returns a FastAPI Response or
StreamingResponse annotate with that type, or annotate with dict/Any if it
returns JSON-like data). Modify the def speech_to_speech(...) signature to add
the correct return type hint while keeping existing parameter types
AuthContextDep, SessionDep, and SpeechToSpeechRequest unchanged.

"""
Speech-to-speech (STS) endpoint with RAG.

Executes a 3-block chain:
1. STT (Speech-to-Text) - Transcribes audio to text (auto-detects language for Sarvam)
2. RAG (Retrieval-Augmented Generation) - Processes text with knowledge base
3. TTS (Text-to-Speech) - Converts response back to audio

Input: Voice note (WhatsApp compatible)
Output 1: Voice note
Output 2: text (via intermediate callback)

"""
project_id = _current_user.project_.id
organization_id = _current_user.organization_.id

# Validate callback URL
if request.callback_url:
validate_callback_url(str(request.callback_url))

# Validate BCP-47 language codes
if (
request.input_language
and request.input_language not in SUPPORTED_LANGUAGE_CODES
):
return APIResponse.failure_response(
error=f"Unsupported input language code: {request.input_language}. Supported: {', '.join(sorted(SUPPORTED_LANGUAGE_CODES))}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it sorted ? why not keep the sorted already

metadata={"status_code": 400},
)

if (
request.output_language
and request.output_language not in SUPPORTED_LANGUAGE_CODES
):
return APIResponse.failure_response(
error=f"Unsupported output language code: {request.output_language}. Supported: {', '.join(sorted(SUPPORTED_LANGUAGE_CODES))}",
metadata={"status_code": 400},
)

# Determine language codes (already BCP-47, no conversion needed)
input_lang_code = request.input_language or "auto"

# If output_language not set, default to input_language
# If input is "auto", use "{{detected}}" marker to signal TTS to use detected language
if request.output_language:
output_lang_code = request.output_language
elif input_lang_code == "auto":
output_lang_code = "{{detected}}" # Marker to use detected language from STT
else:
output_lang_code = input_lang_code
Comment on lines +70 to +89
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject output_language="auto" / "unknown" before building TTS.

This branch currently reuses SUPPORTED_LANGUAGE_CODES, which includes the STT-only sentinels "auto" and "unknown". If a client explicitly sends either value, build_tts_block() forwards it into the TTS config unchanged, and execute_llm_call() in backend/app/services/llm/jobs.py never rewrites it because only {{detected}} is substituted. That turns into an invalid provider request instead of using the detected language.

Suggested fix
-    if (
-        request.output_language
-        and request.output_language not in SUPPORTED_LANGUAGE_CODES
-    ):
+    if request.output_language in {"auto", "unknown"}:
+        return APIResponse.failure_response(
+            error=(
+                "Unsupported output language code: output_language must be a "
+                "concrete BCP-47 code. Omit it to use the detected language."
+            ),
+            metadata={"status_code": 400},
+        )
+
+    if (
+        request.output_language
+        and request.output_language not in SUPPORTED_LANGUAGE_CODES
+    ):
         return APIResponse.failure_response(
             error=f"Unsupported output language code: {request.output_language}. Supported: {', '.join(sorted(SUPPORTED_LANGUAGE_CODES))}",
             metadata={"status_code": 400},
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/llm_sts.py` around lines 73 - 92, The code allows
request.output_language values "auto" or "unknown" (present in
SUPPORTED_LANGUAGE_CODES) to flow into output_lang_code and eventually into
build_tts_block(), which produces invalid TTS provider requests; add an explicit
validation before computing output_lang_code that rejects
request.output_language if it equals "auto" or "unknown" (return
APIResponse.failure_response with a clear message and 400 status), referencing
request.output_language and SUPPORTED_LANGUAGE_CODES so build_tts_block() and
execute_llm_call() never receive those sentinel strings.


logger.info(
f"[speech_to_speech] Starting STS chain | "
f"project_id={project_id}, "
f"input_lang={input_lang_code}, "
f"output_lang={output_lang_code}, "
f"stt_model={request.stt_model.value}, "
f"llm_model={request.llm_model.value}, "
f"tts_model={request.tts_model.value}"
)

# Build 3-block chain: STT → RAG → TTS
blocks = [
build_stt_block(request.stt_model, input_lang_code),
build_rag_block(request.llm_model, request.knowledge_base_ids),
build_tts_block(request.tts_model, output_lang_code),
]

metadata = request.request_metadata or {}
metadata.update(
{
"speech_to_speech": True,
"input_language": input_lang_code,
"output_language": output_lang_code,
"stt_model": request.stt_model.value,
"llm_model": request.llm_model.value,
"tts_model": request.tts_model.value,
}
)

# Create chain request
chain_request = LLMChainRequest(
query=QueryParams(input=request.query),
blocks=blocks,
callback_url=request.callback_url,
request_metadata=metadata,
)

# Start async chain job
start_chain_job(
db=_session,
request=chain_request,
project_id=project_id,
organization_id=organization_id,
)

return APIResponse.success_response(
data=Message(
message=(
"Speech-to-speech processing initiated. "
"You will receive intermediate callbacks for STT and LLM outputs, "
"followed by the final callback with audio and text."
)
)
)
Loading
Loading