Update STT metrics to include token usage and enhance logging for tra… by bml1g12 · Pull Request #5029 · livekit/agents

bml1g12 · 2026-03-06T16:47:39Z

Summary

The OpenAI Realtime API's conversation.item.input_audio_transcription.completed event carries a usage field with ASR token counts (whisper-1 / gpt-4o-transcribe), billed separately from the realtime model. LiveKit currently ignores this field, so users cannot track transcription costs via on_metrics_collected.

Per OpenAI's Realtime costs documentation, input transcription is billed at the ASR model's rate (e.g. $0.006 / 1M tokens for whisper-1), separately from the realtime model's audio tokens. I have confirmed with OpenAI support that when using gpt-realtime, the Whisper ASR model is billed per token not per minute, so for cost tracking purposes we need to track at least the audio token counts.

This PR surfaces those token counts as STTMetrics events, which is the appropriate metric type since the transcription runs on a separate ASR model (not the realtime model itself). The Metadata.model_name field identifies which transcription model produced the metrics (e.g. whisper-1, gpt-4o-transcribe).

Note that I have not emited these metrics as OTEL traces as it seems we currently do not emit STT traces in general, and because for LangFuse to track the cost of these I think I would need to use platform specific attributes (`langfuse.observation.type": "generation") as the OTEL specification does not have a standard attribute for STT token counting. I would be happy to add this as a further improvement is there is interest from the livekit team, but otherwise will just implement it in our own client code.

Changes

STTMetrics (metrics/base.py): Add optional input_tokens, output_tokens, total_tokens, and input_audio_tokens fields. All default to None so existing STT plugins are unaffected.
OpenAI realtime plugin (realtime_model.py): Extract usage from conversation.item.input_audio_transcription.completed events and emit STTMetrics via the existing metrics_collected event. Handles both the token-based (UsageTranscriptTextUsageTokens) and duration-based (UsageTranscriptTextUsageDuration) usage variants from the OpenAI SDK.
log_metrics (metrics/utils.py): Log token fields for STT metrics when present.
UsageCollector (metrics/usage_collector.py): Aggregate stt_input_tokens, stt_output_tokens, and stt_input_audio_tokens in UsageSummary.

Design decisions

STTMetrics rather than RealtimeModelMetrics: The transcription runs on a separate model (whisper/gpt-4o-transcribe) with its own billing rate, so it belongs in STTMetrics with the model identified via Metadata.
No output_token_details: The API provides no output breakdown -- output is always text tokens from transcription.

Test plan

Verify existing STT plugins (deepgram, openai STT, etc.) are unaffected -- all new fields default to None
Confirm STTMetrics with token fields is emitted when conversation.item.input_audio_transcription.completed fires with token-based usage
Confirm STTMetrics with audio_duration (no tokens) is emitted for duration-based usage variant
Verify Metadata.model_name correctly reflects the configured transcription model (e.g. whisper-1)
Verify log_metrics includes token fields in log output when present
Verify UsageCollector aggregates STT token counts in UsageSummary

…nscription events

…update method

…ml1g12/agents into add_gpt_realtime_transcription_metrics

bml1g12 added 6 commits March 6, 2026 16:47

Update STT metrics to include token usage and enhance logging for tra…

1ecfaa8

…nscription events

refactor(realtime_model): linting

272ec55

refactor(metrics): streamline metadata and extra token logging using …

86f04dd

…update method

Merge branch 'main' into add_gpt_realtime_transcription_metrics

2ca8da8

docs: clarify STT design choices

a4b4535

Merge branch 'add_gpt_realtime_transcription_metrics' of github.com:b…

ab2400c

…ml1g12/agents into add_gpt_realtime_transcription_metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update STT metrics to include token usage and enhance logging for tra…#5029

Update STT metrics to include token usage and enhance logging for tra…#5029
bml1g12 wants to merge 6 commits intolivekit:mainfrom
bml1g12:add_gpt_realtime_transcription_metrics

bml1g12 commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bml1g12 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bml1g12 commented Mar 6, 2026 •

edited

Loading