update inference models to match the latest by davidzhao · Pull Request #4597 · livekit/agents

davidzhao · 2026-01-23T07:32:57Z

Summary by CodeRabbit

New Features
- Added Elevenlabs STT support (VAD, timestamps, silence handling); added Deepgram and new Cartesia/Inworld/Rime TTS variants; expanded STT models including Deepgram Flux and AssemblyAI multilingual.
- Expanded LLM catalog with additional OpenAI GPT-4o/GPT-5 variants, Google Gemini-3 entries, and DeepSeek v3.2.
Removed
- Dropped Qwen model support.
Tests
- Updated TTS provider/model parsing expectation to use the newer Rime version.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-23T07:33:18Z

📝 Walkthrough

Walkthrough

Updated typed model literals and public option TypedDicts across inference modules: LLM model sets rebalanced (OpenAI/Google/DeepSeek/Qwen/LLM union), STT adds Elevenlabs and adjusts Deepgram/AssemblyAI entries, and TTS adds Deepgram plus Cartesia/Inworld/Rime model and option updates.

Changes

Cohort / File(s)	Summary
LLM Model Updates `livekit-agents/livekit/agents/inference/llm.py`	Reworked `OpenAIModels` (added `gpt-4o` variants and multiple `gpt-5.*` variants), added Google `gemini-3-pro`/`gemini-3-flash`, removed `QwenModels`, expanded `DeepSeekModels` to include `deepseek-v3.2`, and updated `LLMModels` union to reflect these changes.
STT Provider & Model Expansion `livekit-agents/livekit/agents/inference/stt.py`	Added `ElevenlabsModels` literal and `ElevenlabsOptions` TypedDict; updated `DeepgramModels` (added `flux-general`, `flux-general-en`; removed `nova-3-general`/`nova-2-general`); extended `AssemblyAIModels`; included Elevenlabs in `STTModels`, constructor overloads, `extra_kwargs` typing, and fallback model aggregation.
TTS Provider & Model Expansion `livekit-agents/livekit/agents/inference/tts.py`	Introduced `DeepgramModels` and `DeepgramOptions`; expanded `CartesiaModels` (added `sonic-3`), adjusted `CartesiaOptions` (replace `duration` with `emotion` and `volume`), removed `rime/mist` (keep `rime/mistv2`), expanded `InworldModels` with `1.5`/`1-max` variants, and updated `TTSModels` union plus `__init__` overloads to include Deepgram.
Tests `tests/test_inference_tts_fallback.py`	Updated parameterized test to expect `rime/mistv2:narrator` instead of `rime/mist:narrator` to match updated Rime model literals.

Sequence Diagram(s)

(Skipped — changes are type/enum/typing updates rather than new multi-component control flows.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

remove shortcut inference STT model name #4594: Modifies STT model declarations (Deepgram/AssemblyAI-related entries) in livekit-agents/livekit/agents/inference/stt.py and likely overlaps with the Deepgram/AssemblyAI adjustments here.

Suggested reviewers

tinalenguyen

Poem

🐰 A rabbit hops through code tonight,
New models sparkle in the moonlight.
Geminis, GPTs, Deepgram’s hum—
Elevenlabs joins our chorus drum.
Typing carrots, soft and bright.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and generic, using the non-descriptive phrase 'match the latest' without clarifying which models were updated or why.	Revise to be more specific, e.g., 'Add Deepgram and Elevenlabs support; update Gemini, DeepSeek, and Cartesia models' to clearly convey the scope of model updates.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

tests/test_inference_tts_fallback.py (1)
116-128: Consider updating remaining "rime/mist" references for consistency.

Line 53 was updated to use "rime/mistv2", but "rime/mist" still appears here and in test_list_with_extra_kwargs_preserved (lines 151, 157). While the parsing logic tests work with any string, updating these to "rime/mistv2" would maintain consistency with the model catalog changes.
Suggested diff
     def test_mixed_list_strings_and_dicts(self):
         """Mixed list of strings and FallbackModel dicts."""
         fallbacks = [
             "cartesia/sonic:voice1",
             FallbackModel(model="elevenlabs/eleven_flash_v2", voice="custom"),
-            "rime/mist",
+            "rime/mistv2",
         ]
         result = _normalize_fallback(fallbacks)
         assert result == [
             {"model": "cartesia/sonic", "voice": "voice1"},
             {"model": "elevenlabs/eleven_flash_v2", "voice": "custom"},
-            {"model": "rime/mist", "voice": ""},
+            {"model": "rime/mistv2", "voice": ""},
         ]

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1863c7f and 417b7a2.

📒 Files selected for processing (2)

livekit-agents/livekit/agents/inference/stt.py
tests/test_inference_tts_fallback.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

tests/test_inference_tts_fallback.py
livekit-agents/livekit/agents/inference/stt.py

🧠 Learnings (2)

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

Applied to files:

livekit-agents/livekit/agents/inference/stt.py

📚 Learning: 2026-01-22T03:28:16.289Z

Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

livekit-agents/livekit/agents/inference/stt.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: livekit-plugins-deepgram
GitHub Check: unit-tests
GitHub Check: type-check (3.13)
GitHub Check: type-check (3.9)

🔇 Additional comments (6)

tests/test_inference_tts_fallback.py (1)

53-53: LGTM!

The test parameter update from "rime/mist:narrator" to "rime/mistv2:narrator" correctly reflects the TTS model catalog change.

livekit-agents/livekit/agents/inference/stt.py (5)

40-44: LGTM!

The AssemblyAI and ElevenLabs model identifiers are correctly defined and align with their respective provider APIs as verified in past review research.

74-82: LGTM!

The ElevenlabsOptions TypedDict follows the established pattern and provides appropriate configuration fields for the ElevenLabs Scribe v2 realtime API, including VAD and timing controls.

128-134: LGTM!

The STTModels union correctly includes ElevenlabsModels alongside existing providers.

209-225: LGTM!

The ElevenLabs __init__ overload follows the established pattern consistently with other provider-specific overloads, ensuring proper type inference for extra_kwargs.

254-260: LGTM!

The extra_kwargs union type correctly includes ElevenlabsOptions to support the new provider.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@livekit-agents/livekit/agents/inference/llm.py`:
- Around line 64-69: The DeepSeekModels literal includes an unsupported
identifier; update the DeepSeekModels union to remove or replace
"deepseek-ai/deepseek-v3.2" — change it to the supported
"deepseek-ai/deepseek-v3.1" (or remove that entry entirely) so LLMModels (which
unions OpenAIModels, GoogleModels, KimiModels, DeepSeekModels) only references
valid DeepSeek model IDs; edit the DeepSeekModels definition accordingly.

In `@livekit-agents/livekit/agents/inference/stt.py`:
- Around line 29-43: Update the Deepgram model identifier in the DeepgramModels
Literal to match Deepgram's API by replacing "deepgram/flux-general" with
"deepgram/flux-general-en"; specifically edit the DeepgramModels definition in
livekit/agents/inference/stt.py so it aligns with the plugin's V2Models and
official docs (refer to the DeepgramModels symbol to locate the change).

In `@livekit-agents/livekit/agents/inference/tts.py`:
- Around line 108-112: The CartesiaOptions TypedDict currently mismatches the
Cartesia plugin: change the speed field from Literal["slow","normal","fast"] to
a numeric type (float) because the plugin checks `isinstance(self._opts.speed,
float)` for sonic-3 models, and change emotion from `str` to a sequence type
(e.g., `Sequence[str]` or `List[str]`) because the plugin accesses
`opts.emotion[0]`; keep volume as `float`. Update the CartesiaOptions definition
accordingly so it aligns with the plugin's expectations.

🧹 Nitpick comments (1)

livekit-agents/livekit/agents/inference/tts.py (1)
114-115: DeepgramOptions should enumerate supported parameters instead of remaining empty.

Deepgram TTS API supports multiple query parameters including encoding, container, sample_rate, bit_rate, tag, callback, callback_method, and mip_opt_out. The empty TypedDict currently blocks type-safe access to these options.
♻️ Optional: widen DeepgramOptions until keys are enumerated
-class DeepgramOptions(TypedDict, total=False):
-    pass
+DeepgramOptions = dict[str, Any]
Also applies to: 173-190, 280-282

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and 396be09.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

livekit-agents/livekit/agents/inference/llm.py
livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py
livekit-agents/livekit/agents/inference/llm.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

Applied to files:

livekit-agents/livekit/agents/inference/stt.py

📚 Learning: 2026-01-22T03:28:16.289Z

Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py
livekit-agents/livekit/agents/inference/llm.py

📚 Learning: 2026-01-18T01:09:01.847Z

Learnt from: davidzhao
Repo: livekit/agents PR: 4548
File: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/responses/llm.py:77-81
Timestamp: 2026-01-18T01:09:01.847Z
Learning: In the OpenAI responses LLM (`livekit-plugins-openai/livekit/plugins/openai/responses/llm.py`), reasoning effort defaults are intentionally set lower than OpenAI's API defaults for voice interactions: "none" for gpt-5.1/gpt-5.2 and "minimal" for other reasoning-capable models like gpt-5, to avoid enabling reasoning by default in voice contexts.

Applied to files:

livekit-agents/livekit/agents/inference/llm.py

🧬 Code graph analysis (2)

livekit-agents/livekit/agents/inference/stt.py (2)

livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py (1)

model (113-114)

livekit-agents/livekit/agents/stt/stt.py (1)

model (115-124)

livekit-agents/livekit/agents/inference/tts.py (1)

livekit-agents/livekit/agents/inference/stt.py (2)

DeepgramOptions (51-62)

model (343-344)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: livekit-plugins-inworld
GitHub Check: livekit-plugins-openai
GitHub Check: livekit-plugins-deepgram
GitHub Check: unit-tests
GitHub Check: type-check (3.9)
GitHub Check: type-check (3.13)

🔇 Additional comments (4)

livekit-agents/livekit/agents/inference/stt.py (2)

73-80: Elevenlabs option keys/units verified as correct.
All field names and units in ElevenlabsOptions match the official Elevenlabs scribe_v2_realtime API documentation: commit_strategy (enum), include_timestamps (bool), vad_silence_threshold_secs (double in seconds), vad_threshold (double), min_speech_duration_ms (int in milliseconds), and min_silence_duration_ms (int in milliseconds). These will be correctly recognized by the Elevenlabs provider.

127-133: Gateway indeed uses settings.extra (SDK's extra_kwargs) for Elevenlabs options—verify code type definitions match.

The LiveKit gateway does pass Elevenlabs STT options through settings.extra (exposed as extra_kwargs in the SDK) as a passthrough to the ElevenLabs API. Valid options include language_code, audio_format, commit_strategy, include_timestamps, vad_silence_threshold_secs, and similar session config parameters for scribe_v2_realtime.

However, manual code inspection is needed to verify that the union definitions (lines 127–133), overloads (lines 208–223), and extra_kwargs typing (lines 253–255) correctly reflect this passthrough pattern and match the expected payload structure.

livekit-agents/livekit/agents/inference/tts.py (1)

21-55: All TTS model identifiers are supported by their respective providers and will not cause invalid-model errors. Verification confirms: Cartesia (sonic-3, sonic-2, sonic-turbo, sonic), Deepgram (aura-2, aura), Rime (arcana, mistv2), Inworld (inworld-tts-1.5-max, inworld-tts-1.5-mini, inworld-tts-1-max, inworld-tts-1), and ElevenLabs (eleven_flash_v2, eleven_flash_v2_5, eleven_turbo_v2, eleven_turbo_v2_5, eleven_multilingual_v2) are all officially supported. Code also complies with the 100-character line length requirement.

livekit-agents/livekit/agents/inference/llm.py (1)

36-60: No issues found. All model identifiers in the OpenAIModels and GoogleModels literals are confirmed against the LiveKit Inference Gateway contract and match the official supported models exactly. The code is correct.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

livekit-agents/livekit/agents/inference/llm.py

livekit-agents/livekit/agents/inference/stt.py

coderabbitai · 2026-01-23T07:42:06Z

livekit-agents/livekit/agents/inference/tts.py

 class CartesiaOptions(TypedDict, total=False):
-    duration: float  # max duration of audio in seconds
-    speed: Literal["slow", "normal", "fast"]  # default: not specified
+    emotion: str
+    speed: Literal["slow", "normal", "fast"]
+    volume: float
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file context to understand the CartesiaOptions usage cat -n livekit-agents/livekit/agents/inference/tts.py | head -130 | tail -40

Repository: livekit/agents

Length of output: 172

🏁 Script executed:

# Also check if there are any other references to CartesiaOptions in the codebase rg "CartesiaOptions" --type py -B 2 -A 5

Repository: livekit/agents

Length of output: 4238

🌐 Web query:

Cartesia TTS API documentation options parameters

💡 Result:

Below are the main Cartesia TTS API request options/parameters (from the official docs).

Common (all TTS endpoints)

Header: Cartesia-Version (required). Supported versions shown in the API reference include 2024-06-10, 2024-11-13, 2025-04-16. [1][2][3]

Auth:

/tts/bytes, /tts/sse: Authorization: Bearer <access_token> (per examples). [2][3]

WebSocket: API key can be provided via header (X-API-Key) or via query param api_key (useful in browsers). [4]

POST /tts/bytes (stream audio bytes)

Body fields:

model_id (required) [2]

transcript (required) [2]

voice (required): { "mode": "id", "id": "<voice_id>" } [2]

output_format (required) [2]

language (optional enum) [2]

generation_config (optional; only affects sonic-3): includes volume, speed, emotion [2]

save (optional bool; default false): when true, response includes Cartesia-File-ID header [2]

pronunciation_dict_id (optional; supported by sonic-3 and newer) [2]

speed (deprecated enum: slow|normal|fast; prefer generation_config.speed for sonic-3) [2]

POST /tts/sse (Server-Sent Events streaming)

Includes the same core fields as /tts/bytes (model_id, transcript, voice, output_format, language, generation_config, pronunciation_dict_id) plus streaming/timestamps controls: [3]

add_timestamps (bool; word timestamps) [3]

add_phoneme_timestamps (bool) [3]

use_normalized_timestamps (bool) [3]

context_id (optional string) [3]

speed (deprecated; same as above) [3]

GET wss://api.cartesia.ai/tts/websocket (bidirectional streaming)

Handshake query params (useful when headers aren’t available):

cartesia_version (alternative to Cartesia-Version) [4]

api_key (alternative to X-API-Key) [4]

Generation request message example fields include: model_id, transcript, voice, language, context_id, output_format, add_timestamps, continue (for continuation on the same context_id). [4]

output_format options (containers/encodings/sample rates)

The docs show these common options for output_format: [5]

container: raw | wav | mp3

sample_rate: 8000, 16000, 22050, 24000, 44100, 48000

encoding (required for raw and wav): pcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw

bit_rate (required for mp3)

Sources: [1] API conventions/versioning, [2] /tts/bytes, [3] /tts/sse, [4] /tts/websocket, [5] output format enums (shown on Cartesia API reference pages).

🏁 Script executed:

# Search for how CartesiaOptions is used/sent to the Cartesia API rg "generation_config|CartesiaOptions" --type py -B 3 -A 3

Repository: livekit/agents

Length of output: 10488

🏁 Script executed:

# Look for Cartesia API client calls or serialization of CartesiaOptions rg "cartesia|emotion|tts" livekit-agents/livekit/agents/inference/tts.py --type py -i -A 2

Repository: livekit/agents

Length of output: 6387

Speed and emotion types don't match Cartesia plugin implementation.

The CartesiaOptions TypedDict defines speed as Literal["slow", "normal", "fast"], but the Cartesia plugin expects floats for sonic-3 models (if not isinstance(self._opts.speed, float)). Additionally, emotion is typed as str but accessed as opts.emotion[0] in the plugin, suggesting indexing that doesn't align with a plain string type. Volume as float is correct. Update the TypedDict to match actual plugin expectations.

🤖 Prompt for AI Agents

In `@livekit-agents/livekit/agents/inference/tts.py` around lines 108 - 112, The CartesiaOptions TypedDict currently mismatches the Cartesia plugin: change the speed field from Literal["slow","normal","fast"] to a numeric type (float) because the plugin checks `isinstance(self._opts.speed, float)` for sonic-3 models, and change emotion from `str` to a sequence type (e.g., `Sequence[str]` or `List[str]`) because the plugin accesses `opts.emotion[0]`; keep volume as `float`. Update the CartesiaOptions definition accordingly so it aligns with the plugin's expectations.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@livekit-agents/livekit/agents/inference/tts.py`:
- Around line 41-45: RimeModels Literal no longer includes "rime/mist", causing
mypy failures where tests expect that value; restore backward compatibility by
adding "rime/mist" back into the RimeModels Literal in
livekit/agents/inference/tts.py (the RimeModels symbol) so tests referencing
"rime/mist" pass, or alternatively update all test occurrences in
tests/test_inference_tts_fallback.py to use "rime/mistv2" if you intend to
remove the old alias—prefer restoring "rime/mist" in RimeModels for minimal
change.

♻️ Duplicate comments (2)

livekit-agents/livekit/agents/inference/tts.py (1)

108-112: CartesiaOptions types may not align with Cartesia plugin expectations.

This was previously flagged. The Cartesia plugin may expect speed as a float for sonic-3 models (via generation_config.speed) rather than the deprecated enum, and emotion may need to be a sequence type based on plugin usage patterns. Verify alignment with how the LiveKit gateway processes these options.

livekit-agents/livekit/agents/inference/stt.py (1)

29-37: Deepgram model identifier "flux-general" should be "flux-general-en".

This issue was previously flagged. Deepgram's Flux model is documented as flux-general-en (with language suffix). The identifier "deepgram/flux-general" will likely fail API calls. Align with Deepgram's official API documentation and the plugin's V2Models = Literal["flux-general-en"] definition.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 396be09 and 1863c7f.

📒 Files selected for processing (2)

livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

📚 Learning: 2026-01-16T07:44:56.353Z

Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns

Applied to files:

livekit-agents/livekit/agents/inference/stt.py

📚 Learning: 2026-01-22T03:28:16.289Z

Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

livekit-agents/livekit/agents/inference/stt.py
livekit-agents/livekit/agents/inference/tts.py

🧬 Code graph analysis (1)

livekit-agents/livekit/agents/inference/tts.py (1)

livekit-agents/livekit/agents/types.py (1)

APIConnectOptions (54-88)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: type-check (3.13)
GitHub Check: livekit-plugins-deepgram
GitHub Check: unit-tests

🔇 Additional comments (7)

livekit-agents/livekit/agents/inference/stt.py (3)

73-80: LGTM!

The ElevenlabsOptions TypedDict fields align well with ElevenLabs Scribe v2 Realtime API parameters for VAD configuration and timestamp control.

208-224: LGTM!

The new ElevenlabsModels overload follows the established pattern for provider-specific type hints, correctly associating ElevenlabsOptions with the model type.

253-259: LGTM!

The extra_kwargs union properly includes ElevenlabsOptions, maintaining type consistency with the overloads.

livekit-agents/livekit/agents/inference/tts.py (4)

114-115: LGTM!

Empty DeepgramOptions TypedDict is acceptable as a placeholder. It can be extended as provider-specific options are identified.

173-191: LGTM!

The new DeepgramModels overload follows the established pattern for provider-specific type hints.

46-52: LGTM!

The expanded InworldModels with new TTS variants follows the consistent naming pattern and properly extends provider support.

280-287: LGTM!

The extra_kwargs union properly includes DeepgramOptions, maintaining type consistency with the overloads.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

livekit-agents/livekit/agents/inference/tts.py

chenghao-mou

LGTM. Probably should update the tests with rime/mist mentions following the code rabbit's comment.

When testing openai/gpt-5-nano, I got a crazy TTFT:

{"model_name": "openai/gpt-5-nano", "model_provider": "livekit", "ttft": 41.44, "prompt_tokens": 427, "prompt_cached_tokens": 0, 
"completion_tokens": 1907, "tokens_per_second": 38.54, "room": "console"}

chenghao-mou · 2026-01-23T10:41:23Z

livekit-agents/livekit/agents/inference/llm.py


 GoogleModels = Literal[
+    "google/gemini-3-pro",
+    "google/gemini-3-flash",


Should we clarify they are preview versions?

I think our docs page should be the authority here.

update inference models to match the latest

396be09

davidzhao requested a review from a team January 23, 2026 07:32

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

lint

1863c7f

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

livekit-agents/livekit/agents/inference/tts.py Show resolved Hide resolved

chenghao-mou approved these changes Jan 23, 2026

View reviewed changes

theomonnom approved these changes Jan 23, 2026

View reviewed changes

fix comments

417b7a2

davidzhao merged commit b79f39e into main Jan 27, 2026
19 checks passed

davidzhao deleted the match-inference-models branch January 27, 2026 00:09

zach-iee pushed a commit to sendbird/livekit-agents that referenced this pull request Feb 2, 2026

update inference models to match the latest (livekit#4597)

a03ad5f

Comments

Conversation

davidzhao commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Common (all TTS endpoints)

POST /tts/bytes (stream audio bytes)

POST /tts/sse (Server-Sent Events streaming)

GET wss://api.cartesia.ai/tts/websocket (bidirectional streaming)

output_format options (containers/encodings/sample rates)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenghao-mou left a comment

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

davidzhao Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidzhao commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

`POST /tts/bytes` (stream audio bytes)

`POST /tts/sse` (Server-Sent Events streaming)

`GET wss://api.cartesia.ai/tts/websocket` (bidirectional streaming)

`output_format` options (containers/encodings/sample rates)