Skip to content

feat(amd): port tunable params and postpone-termination tool from python#1368

Open
toubatbrian wants to merge 6 commits intomainfrom
claude/quirky-galileo-51AGi
Open

feat(amd): port tunable params and postpone-termination tool from python#1368
toubatbrian wants to merge 6 commits intomainfrom
claude/quirky-galileo-51AGi

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Automated port of livekit/agents#5584 (fix(amd): amd improvement (AGT-2777)) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

Ported features

All listed below land in agents/src/voice/amd.ts and are wired through the existing two-gate (verdict + silence) AMD architecture.

1. Expose all tunable parameters

New optional fields on AMDOptions:

Option Default Notes
humanSpeechThresholdMs 2_500 Speech longer than this is treated as machine-like and skips the short-greeting heuristic.
humanSilenceThresholdMs 500 Silence after a short greeting before settling as HUMAN.
machineSilenceThresholdMs 1_500 Silence after machine-like speech before opening the silence gate.
prompt bundled AMD_PROMPT Override the AMD classification system prompt.
participantIdentity undefined Currently informational (used for span attribution / logs).
suppressCompatibilityWarning false Silences the "model not evaluated" warning.

noSpeechTimeoutMs, detectionTimeoutMs, and maxTranscriptTurns were already exposed.

2. Use LLM when a transcript is available

Mirrors the python change in classifier.py::on_user_speech_ended. If the user just spoke for ≤ humanSpeechThresholdMs and a transcript is already on the record, AMD now waits machineSilenceThresholdMs (instead of the shorter humanSilenceThresholdMs + automatic HUMAN verdict) so the LLM gets the final word.

3. save_prediction + postpone_termination tools

detect() now exposes two tools to the LLM via toolCtx and toolChoice: 'required':

  • save_prediction({ label }) — commits the verdict (mirrors python save_prediction).
  • postpone_termination({ seconds }) — extends the silence window; capped at MAX_EXTENSIONS = 3 × MAX_EXTENSION_MS = 10_000. On expiration, opens the silence gate and re-runs classification with the latest transcript; with extensions exhausted, the tool is no longer offered, forcing the LLM to commit.

If the LLM doesn't emit tool calls (for example, the in-tree StaticLLM test mock or providers that ignore toolChoice='required'), AMD falls back to the previous JSON-content parsing path so the existing 4 unit tests remain green.

4. Compatibility warning for evaluated LLM models

EVALUATED_LLM_MODELS (the same 12 inference IDs from python) is checked against LLM.model once at construction; a warning is logged when the resolved model isn't in the list, suppressible via suppressCompatibilityWarning: true.

What was intentionally not ported

These pieces of agents#5584 are tightly coupled to the python AudioRecognition/RoomIO pipeline and don't have direct counterparts in agents-js today. Skipping them avoids a much larger architectural change and keeps the JS AMD compatible with its current session-event model.

Python change JS status Reason
Dedicated stt parameter on AMD Skipped The JS AMD listens to AgentSession UserInputTranscribed events; it has no audio-frame channel comparable to python's audio_recognition.push_audio → AMD path. Adding a parallel STT pipeline is a larger redesign and out of scope for this porting PR.
wait_for_track_publication(wait_for_subscription=True) + start() / start_timers() split Skipped The JS AMD starts timers inside execute(), which the user calls after session.start({ agent, room }). The "start before SIP participant joins, then wait for subscription" pattern requires async lifecycle changes to AMD that don't have an analogue in JS.
EVALUATED_STT_MODELS warning Skipped Paired with the dedicated STT pipeline above.
Python-only examples/telephony/amd.py rewrite Adapted examples/src/telephony_amd.ts gets a comment block showing the new tunable options; the SIP-participant-creation choreography is not duplicated since JS doesn't have the same room_io.set_participant API surface.
NO_SPEECH_THRESHOLD = 10.0 / TIMEOUT = 20.0 defaults Already matched JS already defaulted to 10_000 / 20_000 ms.

If a follow-up needs the dedicated AMD STT pipeline or the participant-track lifecycle, that can be tracked as a separate issue — please flag in review if you'd like me to file one.

Implementation nuances

  • Python signals "more audio expected" by sending "" into a channel; JS uses scheduleLLMClassification() to re-trigger classification with the joined transcript.
  • Python's tool_choice='required' is passed through; if a provider ignores it, the JSON-content fallback in parseDetection() keeps behavior reasonable.
  • Time units follow CLAUDE.md: all new fields are milliseconds. MAX_EXTENSION_MS is the JS analogue of MAX_EXTENSION_SECS (10s → 10_000 ms).
  • All ported sections carry // Ref: python <path> - <line range> comments per CLAUDE.md guidance.

Test plan

  • pnpm --filter @livekit/agents build — passes
  • pnpm --filter @livekit/agents lintamd.ts / amd.test.ts clean (0 errors, 0 warnings)
  • pnpm exec prettier --check on changed files — passes
  • pnpm exec vitest run agents/src/voice/amd.test.ts — 6/6 pass (4 existing + 2 new)
  • Manual smoke test against a real SIP call (left to reviewer with phone-number infra)

Changeset

patch for @livekit/agents (per the routine's standing instructions).


Generated by Claude Code

Ports python livekit/agents#5584 (AMD improvement) into agents-js.

- Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`,
  `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields.
- Defer to the LLM (instead of forcing HUMAN) when a transcript is
  already available after a short greeting.
- Add `postpone_termination` LLM tool (capped at 3 extensions × 10s)
  alongside `save_prediction`; fall back to JSON-content parsing when
  the LLM does not emit tool calls.
- Add `participantIdentity` and `suppressCompatibilityWarning` options.
- Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`.

Skipped (architectural divergence — see PR description): dedicated AMD
STT pipeline, track-subscription wait, and the `start()` /
`start_timers()` lifecycle split.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: 76de2fe

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 1, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

claude and others added 2 commits May 4, 2026 09:37
- Gate `save_prediction` and `postpone_termination` tool side effects on
  the current `detectGeneration`. Stale in-flight classifications now
  no-op instead of mutating timers, budget, or capturing a verdict that
  belongs to a superseded transcript window.
- Normalize `save_prediction`'s `label` argument through `parseCategory`
  before storing, so an off-enum value from a misbehaving LLM (or our
  manual JSON path that bypasses Zod) is treated as UNCERTAIN rather
  than producing an `AMDResult` with an invalid category string.
- Fix `warnIfNotEvaluated` substring check to also handle date-suffixed
  model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).
devin-ai-integration[bot]

This comment was marked as resolved.

Without this, a postpone_termination tool call resolved after aclose()
would still see isStale() === false (settled was never flipped) and
install a fresh silenceTimer that survives cleanup, eventually firing
scheduleLLMClassification + tryEmitResult and potentially triggering
session.interrupt on a closed AMD.
devin-ai-integration[bot]

This comment was marked as resolved.

Without a lower bound and NaN guard, a misbehaving LLM passing a
negative or non-numeric `seconds` argument would compute a clampedMs
of NaN or a negative number, which setTimeout treats as 0 and fires
immediately. The manual tool-execution path here bypasses the Zod
schema, so this defense lives in execute().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants