Skip to content

feat(stt): streaming transcription proxy over websocket#455

Merged
piorpua merged 15 commits into
mainfrom
feat/stt-streaming
Jun 12, 2026
Merged

feat(stt): streaming transcription proxy over websocket#455
piorpua merged 15 commits into
mainfrom
feat/stt-streaming

Conversation

@lornestack

@lornestack lornestack commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds GET /api/stt/stream — a short-lived WebSocket endpoint that proxies PCM16 microphone audio to the provider's native streaming API and pushes live transcripts back to the client. This is the backend half of AionUi's streaming voice input (phase 2 of the voice-input optimization; AionUi client follows in a separate PR).

Wire protocol (client ↔ AionCore)

Direction Frame Meaning
C→S text {"type":"start","format":"pcm16","sampleRate":24000,"channels":1,"languageHint":"zh"?} begin session
C→S binary raw PCM16LE audio chunks
C→S text {"type":"stop"} end of audio
S→C {"type":"ready"} upstream connected, send audio
S→C {"type":"partial","text":...} / {"type":"final","text":...} live transcript
S→C {"type":"done"} all finals delivered, server closes
S→C {"type":"error","code":"STT_*","msg":...} uniform error frames (reuses existing STT_* codes; new: STT_STREAM_UNSUPPORTED, STT_STREAM_PROTOCOL)

Architecture

  • aionui-shell/stt_stream.rs: transport-agnostic session state machine (mpsc channels, mock-tested) with a documented cancel-safety contract on the upstream trait.
  • Upstream adapters: Deepgram live (/v1/listen, linear16 passthrough, interim_results) and OpenAI Realtime transcription (GA protocol: session.update type=transcription, base64 input_audio_buffer.append, delta/completed events) — whisper-1 is refused with STT_STREAM_UNSUPPORTED so clients fall back to the existing POST /api/stt file path.
  • Route layer is a thin frame adapter inside shell_routes, inheriting the exact same auth middleware as POST /api/stt; config loading is shared (load_stt_config).

Found by real-API smoke testing (manual runner in examples/stt_stream_smoke.rs)

  • fix(shell): rustls had no process-level CryptoProvider → real wss:// connects panicked (mock tests use ws://). Fixed with an explicit connector following the lark/dingtalk plugin pattern.
  • fix(shell): tail loss — with server VAD producing multiple items, the adapter closed after the first final, dropping in-flight tail transcripts. Now tracks per-item state and closes only when the commit is acked and no item owes a transcript. Verified against the live OpenAI API (two-sentence clip, both finals delivered).

Test plan

  • 175 aionui-shell tests (118 unit + 57 integration incl. 26 WS-mock adapter tests) + 5 real-server E2E (stt_stream_e2e: unauth 401, STT_DISABLED, full mock-Deepgram flow, protocol violation)
  • cargo test --workspace, cargo clippy --workspace --all-targets -- -D warnings, cargo fmt --check all green
  • Real-API smoke (OpenAI gpt-4o-mini-transcribe): live partials/finals for both sentences, clean close — STT_SMOKE_API_KEY=... cargo run -p aionui-shell --example stt_stream_smoke -- clip.wav
  • Deepgram real-API smoke (nova-3): live interim partials during recording, 3 finals covering full clip incl. tail after CloseStream, clean close

@piorpua piorpua merged commit 1c19a8b into main Jun 12, 2026
6 checks passed
@piorpua piorpua deleted the feat/stt-streaming branch June 12, 2026 02:08
piorpua pushed a commit that referenced this pull request Jun 12, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.29](v0.1.28...v0.1.29)
(2026-06-12)


### Features

* converge team mode runtime architecture
([#464](#464))
([abeb9a1](abeb9a1))
* **stt:** streaming transcription proxy over websocket
([#455](#455))
([1c19a8b](1c19a8b))


### Bug Fixes

* **agent:** validate managed ACP platform binaries
([#462](#462))
([651c79f](651c79f))
* **cron:** retry busy jobs from runtime state
([#459](#459))
([9918058](9918058))
* isolate ACP cancel turn completion
([#461](#461))
([ea01ee6](ea01ee6))
* **office:** probe star-office preferred_url host as given
([#456](#456))
([3c2149c](3c2149c))


### Code Refactoring

* **assistant:** finalize unified governance storage
([#449](#449))
([aba2d2a](aba2d2a))


### Documentation

* clarify production logging guidance
([#460](#460))
([118ed03](118ed03))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants