Added Parakeet as a STT model by dustinwloring1988 · Pull Request #766 · jamiepine/voicebox

dustinwloring1988 · 2026-06-20T00:39:59Z

Summary

Adds support for NVIDIA Parakeet speech-to-text integration in Voicebox.

What's Included

Parakeet v2
Parakeet v3

Notes

Had to update the transformers version

Summary by CodeRabbit

Release Notes

New Features
- Added support for NVIDIA Parakeet speech-to-text models alongside Whisper
- Reorganized transcription model selection with grouped Whisper and Parakeet options
- Updated model identifiers for consistency (e.g., whisper-turbo, parakeet-tdt-0.6b-v2)
Bug Fixes
- Updated default STT model from turbo to whisper-turbo
- Improved Docker build to avoid repository modifications
Localization
- Updated translations in English, Japanese, Chinese (Simplified and Traditional)

coderabbitai · 2026-06-20T00:40:13Z

📝 Walkthrough

Walkthrough

Adds NVIDIA Parakeet TDT 0.6B (v2/v3) as a second STT engine alongside Whisper. A new canonical SttModelId type and STT_MODEL_PATTERN regex replace legacy bare-size identifiers throughout backend models, routes, services, ML backends (MLX and PyTorch), MCP tools, and the frontend API/UI. A DB startup migration rewrites stored bare Whisper sizes to whisper-<size>. The Dockerfile frontend build stage is independently refactored to generate a minimal temporary package.json instead of mutating the repo's existing one.

Changes

Parakeet STT Engine Support

Layer / File(s)	Summary
STT type contracts and backend registry `app/src/lib/api/types.ts`, `app/src/lib/api/client.ts`, `app/src/lib/hooks/useTranscription.ts`, `backend/models.py`, `backend/backends/__init__.py`	Introduces `SttModelId` union type and `STT_MODEL_PATTERN` regex covering both `whisper-` and `parakeet-` identifiers. Adds `PARAKEET_HF_REPOS`, `STT_HF_REPOS`, `STT_ENGINES`, Parakeet `ModelConfig` entries, and helpers `normalize_stt_model_name`, `stt_model_name_to_repo`, `is_parakeet_model_name`. Frontend API client method signatures switch from `WhisperModelSize` to `SttModelId`.
STT service abstraction and DB migration `backend/services/transcribe.py`, `backend/database/models.py`, `backend/database/migrations.py`	Adds `get_stt_model()` and `unload_stt_model()` as canonical STT entrypoints; retains `get_whisper_model()`/`unload_whisper_model()` as deprecated aliases. Changes `CaptureSettings.stt_model` column default to `whisper-turbo` and adds a startup migration that rewrites legacy bare Whisper sizes to `whisper-<size>` in `capture_settings`.
MLX and PyTorch STT backends updated for Parakeet `backend/backends/mlx_backend.py`, `backend/backends/pytorch_backend.py`	Both backends refactored from Whisper-only to family-aware: constructors take `model_name` registry key; loading branches on `is_parakeet_model_name`; PyTorch `_load_model_sync` selects `AutoProcessor`+`AutoModelForTDT` vs Whisper loaders; language hints applied only for Whisper; Parakeet generation extracts `output.sequences`.
Backend registry lifecycle hooks for Parakeet `backend/backends/__init__.py`	`unload_model_by_config`, `check_model_loaded`, and `get_model_load_func` updated so both `whisper` and `parakeet` engines route through `transcribe.get_stt_model()`, matching on `config.model_name` instead of the prior Whisper-specific `config.model_size`.
Routes and services updated for STT model names `backend/routes/transcription.py`, `backend/routes/captures.py`, `backend/services/captures.py`, `backend/mcp_server/tools.py`	Transcription route resolves STT model from explicit field → DB saved setting → default, validates against `STT_HF_REPOS`, triggers background download when uncached, and calls `stt.transcribe(..., model_name)`. Captures readiness normalizes `stt_model` via `normalize_stt_model_name`. Capture service and MCP tool switch from `get_whisper_model`/`model_size` to `get_stt_model`/`model_name`.
Frontend UI: grouped STT dropdown and display name helper `app/src/components/ServerTab/CapturesPage.tsx`, `app/src/components/CapturesTab/CapturesTab.tsx`, `app/src/components/ServerSettings/ModelManagement.tsx`, `app/src/components/VoiceProfiles/ProfileForm.tsx`, `app/src/components/VoiceProfiles/SampleUpload.tsx`	Transcription model dropdown rebuilt with `SelectGroup`/`SelectLabel` grouping Whisper and Parakeet options; `CapturesTab` adds `formatSttModelName` for display labels; `ModelManagement` adds Parakeet descriptions and includes `parakeet*` in STT filter; `ProfileForm` and `SampleUpload` read and forward `captureSettings.stt_model` to transcription requests.
i18n strings and build artifact updates `app/src/i18n/locales/*/translation.json`, `backend/build_binary.py`, `backend/voicebox-server.spec`, `backend/requirements.txt`	Translation files (en, ja, zh-CN, zh-TW) rename Whisper option keys to `whisper-*`, add Parakeet model entries with group labels, and extend `tail` qualifiers. PyInstaller spec and `build_binary.py` add `transformers.models.parakeet` to hidden imports; `requirements.txt` raises `transformers` minimum to `>=5.6.0` and switches `Zipvoice` to PyPI.

Dockerfile Frontend Build Refactor

Layer / File(s)	Summary
Generate minimal `package.json` for web build `Dockerfile`	Frontend Bun build stage now generates a temporary minimal `package.json` with only `app`/`web` workspaces and required dependencies via `echo`, replacing the prior `sed`-based stripping of the repo's root `package.json`. Build command changes from `bunx --bun vite build` to `bunx vite build`.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant TranscriptionRoute as /transcribe endpoint
  participant DB as capture_settings DB
  participant STT_Registry as STT_HF_REPOS / normalize
  participant TaskManager
  participant STTBackend as get_stt_model() backend

  Client->>TranscriptionRoute: POST /transcribe (file, language?, model?)
  TranscriptionRoute->>DB: fetch saved stt_model (if model not provided)
  DB-->>TranscriptionRoute: saved stt_model or default
  TranscriptionRoute->>STT_Registry: normalize_stt_model_name(model)
  STT_Registry-->>TranscriptionRoute: canonical model_name
  TranscriptionRoute->>STTBackend: is_loaded() && model_name matches?
  alt model not cached
    TranscriptionRoute->>TaskManager: start background download task
    TaskManager-->>Client: HTTP 202 (download in progress)
  else model ready
    TranscriptionRoute->>STTBackend: transcribe(audio_path, language, model_name)
    STTBackend-->>TranscriptionRoute: transcription text
    TranscriptionRoute-->>Client: 200 OK (text)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

jamiepine/voicebox#238: Both PRs touch STT backend model-loading and HF repo resolution in mlx_backend.py and pytorch_backend.py; this PR extends that Whisper-specific HF mapping to a unified STT_HF_REPOS registry covering Parakeet.
jamiepine/voicebox#295: Both PRs modify the /transcription route and ApiClient.transcribeAudio model parameter; this PR further rewires that path to use canonical SttModelId/model_name normalization instead of bare Whisper sizes.
jamiepine/voicebox#544: Both PRs touch backend/mcp_server/tools.py's _transcribe_file flow; #544 introduced the initial Whisper-only selection and this PR extends it to use normalized STT model names covering both Whisper and Parakeet.

Poem

🐇 A rabbit hops through model land,
Where Whisper ruled with steady hand.
But Parakeet joins the feathered choir,
With TDT v2 and v3 to admire!
whisper-turbo defaults now reign,
And migrations sweep the old names plain. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature addition in the changeset: support for NVIDIA Parakeet as a new STT model alongside Whisper.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/mcp_server/tools.py (1)

324-324: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Undefined variable model_size will cause NameError.

The return statement references model_size but the function defines model_name. This will crash at runtime.

🐛 Proposed fix

     return {
         "text": text,
         "duration": duration,
         "language": language,
-        "model": model_size,
+        "model": model_name,
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/mcp_server/tools.py` at line 324, In the return statement at the
specified location, the dictionary key "model" is assigned the undefined
variable model_size, but the function actually defines model_name. Replace
model_size with model_name in the "model" key assignment to fix the NameError
that will occur at runtime.

🧹 Nitpick comments (2)

backend/services/captures.py (1)

121-122: 💤 Low value

Consider renaming the variable for clarity.

The variable whisper now holds a generic STT backend that could be Parakeet. Consider renaming to stt or stt_backend for clarity.

♻️ Suggested rename

-        whisper = get_stt_model()
-        resolved_stt = normalize_stt_model_name(stt_model or whisper.model_name)
-        transcript = await whisper.transcribe(str(audio_path), language, resolved_stt)
+        stt = get_stt_model()
+        resolved_stt = normalize_stt_model_name(stt_model or stt.model_name)
+        transcript = await stt.transcribe(str(audio_path), language, resolved_stt)

Same change applies to retranscribe_capture at lines 223-225.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/captures.py` around lines 121 - 122, The variable name
`whisper` is misleading because the result of `get_stt_model()` now represents a
generic STT backend that could be Parakeet or other models, not specifically
Whisper. Rename the variable `whisper` to `stt` or `stt_backend` throughout the
code where it is assigned from `get_stt_model()` and update all its references
including the usage in the `normalize_stt_model_name()` call. Apply this same
variable rename change to the `retranscribe_capture` function at lines 223-225
where the identical pattern exists.

app/src/components/CapturesTab/CapturesTab.tsx (1)

136-147: ⚡ Quick win

Tighten formatSttModelName input type to SttModelId.

Line 136 currently accepts string, which bypasses compile-time guarantees for canonical STT ids and makes regressions easier to miss.

Proposed refactor

+import type { SttModelId } from '`@/lib/api/types`';
...
-function formatSttModelName(modelName: string): string {
+function formatSttModelName(modelName: SttModelId): string {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/CapturesTab/CapturesTab.tsx` around lines 136 - 147, The
formatSttModelName function currently accepts a generic string type for the
modelName parameter, which lacks compile-time validation for canonical STT model
IDs. Change the function signature to accept SttModelId type instead of string
as the parameter type for modelName. This ensures that only valid STT model
identifiers can be passed to the function and prevents potential regressions.
The function implementation logic remains the same, only the input parameter
type needs to be updated.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/mcp_server/tools.py`:
- Around line 294-299: The database session in the block where `not model` is
checked is not properly closed because `next(get_db())` bypasses the generator's
cleanup mechanism. Replace the import of `get_db` with `SessionLocal` from the
database module, and wrap the database session creation and usage with a context
manager (with statement) to ensure the session is properly closed after calling
`settings_service.get_capture_settings(db)` to retrieve the stt_model.

In `@backend/models.py`:
- Around line 19-23: The decimal points in the Parakeet model IDs within the
STT_MODEL_PATTERN regex are unescaped, which allows invalid model names like
parakeet-tdt-0x6b-v2 to pass validation since unescaped dots match any
character. Escape the decimal points in both parakeet-tdt-0.6b-v2 and
parakeet-tdt-0.6b-v3 entries by prefixing each dot with a backslash to ensure
only literal dots are matched and prevent invalid model IDs from being persisted
through the CaptureSettingsUpdate model.

In `@backend/requirements.txt`:
- Around line 12-16: The transformers requirement on line 12
(transformers>=5.6.0) conflicts with the documented qwen-tts pin of
transformers==4.57.3, and since qwen-tts is actively used in the backend
(imported in pytorch_backend.py and qwen_custom_voice_backend.py), this version
mismatch must be resolved. Either verify the actual transformers version
constraint required by qwen-tts and update line 12 to be compatible with that
version, or update the qwen-tts installation method to use a version compatible
with transformers>=5.6.0, then update the comment on line 15 to accurately
reflect the pinned transformers version if environments are intentionally split.

In `@backend/routes/transcription.py`:
- Around line 44-49: The database session created with next(get_db()) is not
properly closed, causing connection leaks on every request where model is not
provided. Instead of using next(get_db()), refactor this block to either pass
the database session as a dependency parameter to this function or use a proper
context manager pattern with get_db() that ensures the generator's cleanup logic
(the finally block) executes to close the connection. Consider modifying the
function signature to accept a db parameter, or if that's not feasible, wrap the
get_db() call with a context manager that guarantees the session closure after
settings_service.get_capture_settings(db) completes and model is retrieved from
saved.stt_model.

In `@Dockerfile`:
- Around line 17-40: The Dockerfile uses heredoc syntax with the EOFPKG
delimiter to create a package.json file, but the Dockerfile parser cannot
interpret this without an explicit syntax directive. To fix this, add the syntax
directive `# syntax=docker/dockerfile:1.4+` as the first line of the Dockerfile
before any other instructions, which will enable support for heredoc syntax.
Alternatively, if you prefer to avoid adding the directive, refactor the RUN
command that creates the package-temp.json file to use printf or echo with
escaped newlines instead of heredoc syntax, which will work with the standard
Dockerfile parser without requiring the directive.

---

Outside diff comments:
In `@backend/mcp_server/tools.py`:
- Line 324: In the return statement at the specified location, the dictionary
key "model" is assigned the undefined variable model_size, but the function
actually defines model_name. Replace model_size with model_name in the "model"
key assignment to fix the NameError that will occur at runtime.

---

Nitpick comments:
In `@app/src/components/CapturesTab/CapturesTab.tsx`:
- Around line 136-147: The formatSttModelName function currently accepts a
generic string type for the modelName parameter, which lacks compile-time
validation for canonical STT model IDs. Change the function signature to accept
SttModelId type instead of string as the parameter type for modelName. This
ensures that only valid STT model identifiers can be passed to the function and
prevents potential regressions. The function implementation logic remains the
same, only the input parameter type needs to be updated.

In `@backend/services/captures.py`:
- Around line 121-122: The variable name `whisper` is misleading because the
result of `get_stt_model()` now represents a generic STT backend that could be
Parakeet or other models, not specifically Whisper. Rename the variable
`whisper` to `stt` or `stt_backend` throughout the code where it is assigned
from `get_stt_model()` and update all its references including the usage in the
`normalize_stt_model_name()` call. Apply this same variable rename change to the
`retranscribe_capture` function at lines 223-225 where the identical pattern
exists.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b41097d-8dff-4605-bd18-4431cce2be3c

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and 13280fa.

📒 Files selected for processing (27)

Dockerfile
app/src/components/CapturesTab/CapturesTab.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/components/ServerTab/CapturesPage.tsx
app/src/components/VoiceProfiles/ProfileForm.tsx
app/src/components/VoiceProfiles/SampleUpload.tsx
app/src/i18n/locales/en/translation.json
app/src/i18n/locales/ja/translation.json
app/src/i18n/locales/zh-CN/translation.json
app/src/i18n/locales/zh-TW/translation.json
app/src/lib/api/client.ts
app/src/lib/api/types.ts
app/src/lib/hooks/useTranscription.ts
backend/backends/__init__.py
backend/backends/mlx_backend.py
backend/backends/pytorch_backend.py
backend/build_binary.py
backend/database/migrations.py
backend/database/models.py
backend/mcp_server/tools.py
backend/models.py
backend/requirements.txt
backend/routes/captures.py
backend/routes/transcription.py
backend/services/captures.py
backend/services/transcribe.py
backend/voicebox-server.spec

coderabbitai · 2026-06-20T00:46:46Z

+    if not model:
+        from ..services import settings as settings_service
+        from ..database import get_db
+        db = next(get_db())
+        saved = settings_service.get_capture_settings(db)
+        model = saved.stt_model if saved else None


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Database session is not properly closed.

Same issue as in transcription.py: using next(get_db()) bypasses the generator's cleanup. Consider using SessionLocal() directly with a context manager.

🐛 Proposed fix

if not model: from ..services import settings as settings_service - from ..database import get_db - db = next(get_db()) - saved = settings_service.get_capture_settings(db) - model = saved.stt_model if saved else None + from ..database import SessionLocal + with SessionLocal() as db: + saved = settings_service.get_capture_settings(db) + model = saved.stt_model if saved else None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not model:

from ..services import settings as settings_service

from ..database import get_db

db = next(get_db())

saved = settings_service.get_capture_settings(db)

model = saved.stt_model if saved else None

if not model:

from ..services import settings as settings_service

from ..database import SessionLocal

with SessionLocal() as db:

saved = settings_service.get_capture_settings(db)

model = saved.stt_model if saved else None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/mcp_server/tools.py` around lines 294 - 299, The database session in the block where `not model` is checked is not properly closed because `next(get_db())` bypasses the generator's cleanup mechanism. Replace the import of `get_db` with `SessionLocal` from the database module, and wrap the database session creation and usage with a context manager (with statement) to ensure the session is properly closed after calling `settings_service.get_capture_settings(db)` to retrieve the stt_model.

coderabbitai · 2026-06-20T00:46:47Z

+STT_MODEL_PATTERN = (
+    r"^("
+    r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"
+    r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3"
+    r"|base|small|medium|large|turbo"  # legacy bare Whisper sizes


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import re pattern = r"^(whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3|base|small|medium|large|turbo)$" tests = ["parakeet-tdt-0.6b-v2", "parakeet-tdt-0x6b-v2"] for t in tests: print(t, bool(re.fullmatch(pattern, t))) PY

Repository: jamiepine/voicebox

Length of output: 114

🏁 Script executed:

# Check the actual code in backend/models.py at the relevant lines head -30 backend/models.py | tail -15

Repository: jamiepine/voicebox

Length of output: 598

🏁 Script executed:

# Verify the pattern is indeed used at the mentioned line numbers grep -n "STT_MODEL_PATTERN" backend/models.py

Repository: jamiepine/voicebox

Length of output: 392

🏁 Script executed:

# Check the context around Line 288 to confirm it's a settings update path sed -n '280,295p' backend/models.py

Repository: jamiepine/voicebox

Length of output: 637

Escape the decimal points in Parakeet model IDs in the shared regex.

Line 22 uses 0.6b with unescaped dots, allowing invalid values like parakeet-tdt-0x6b-v2 to pass validation. These can be persisted via the CaptureSettingsUpdate model at line 288, causing failures during later registry lookups.

Proposed fix

STT_MODEL_PATTERN = ( r"^(" r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo" - r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3" + r"|parakeet-tdt-0\.6b-v2|parakeet-tdt-0\.6b-v3" r"|base|small|medium|large|turbo" # legacy bare Whisper sizes r")$" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

STT_MODEL_PATTERN = (

r"^("

r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"

r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3"

r"|base|small|medium|large|turbo" # legacy bare Whisper sizes

STT_MODEL_PATTERN = (

r"^("

r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"

r"|parakeet-tdt-0\.6b-v2|parakeet-tdt-0\.6b-v3"

r"|base|small|medium|large|turbo" # legacy bare Whisper sizes

r")$"

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/models.py` around lines 19 - 23, The decimal points in the Parakeet model IDs within the STT_MODEL_PATTERN regex are unescaped, which allows invalid model names like parakeet-tdt-0x6b-v2 to pass validation since unescaped dots match any character. Escape the decimal points in both parakeet-tdt-0.6b-v2 and parakeet-tdt-0.6b-v3 entries by prefixing each dot with a backslash to ensure only literal dots are matched and prevent invalid model IDs from being persisted through the CaptureSettingsUpdate model.

coderabbitai · 2026-06-20T00:46:47Z

+transformers>=5.6.0
 accelerate>=0.26.0
 huggingface_hub>=0.20.0
-qwen-tts>=0.0.5
+# qwen-tts: installed separately with --no-deps (pins transformers==4.57.3)



⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Dependency pins ==" rg -n --no-heading -C2 '(^|\s)(transformers|qwen-tts|qwen_tts)(==|>=|<=|~=|@|:| )|--no-deps' \ requirements.txt backend/requirements.txt Dockerfile README.md || true echo echo "== Runtime usage of qwen_tts ==" rg -n --no-heading -C2 '\bqwen_tts\b|qwen-tts' backend || true echo echo "== Build/bundle references ==" rg -n --no-heading -C2 'qwen_tts|qwen-tts|transformers' backend/build_binary.py backend/voicebox-server.spec || true echo echo "== Specifier overlap sanity check ==" python - <<'PY' from packaging.specifiers import SpecifierSet from packaging.version import Version s_main = SpecifierSet(">=5.6.0") s_qwen = SpecifierSet("==4.57.3") v = Version("4.57.3") print("4.57.3 satisfies main spec:", v in s_main) print("4.57.3 satisfies qwen spec:", v in s_qwen) PY

Repository: jamiepine/voicebox

Length of output: 18323

Resolve the transformers version conflict between direct dependency and qwen-tts pin.

Line 12 specifies transformers>=5.6.0, but Line 15 documents qwen-tts as pinning transformers==4.57.3. Version 4.57.3 does not satisfy the >=5.6.0 requirement, and since qwen_tts is actively used in the backend (imported in pytorch_backend.py and qwen_custom_voice_backend.py) and bundled in the binary build, this creates a version conflict. Clarify whether environments are intentionally split, document the required version constraints, or adjust the floor to be compatible.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/requirements.txt` around lines 12 - 16, The transformers requirement on line 12 (transformers>=5.6.0) conflicts with the documented qwen-tts pin of transformers==4.57.3, and since qwen-tts is actively used in the backend (imported in pytorch_backend.py and qwen_custom_voice_backend.py), this version mismatch must be resolved. Either verify the actual transformers version constraint required by qwen-tts and update line 12 to be compatible with that version, or update the qwen-tts installation method to use a version compatible with transformers>=5.6.0, then update the comment on line 15 to accurately reflect the pinned transformers version if environments are intentionally split.

coderabbitai · 2026-06-20T00:46:47Z

+        if not model:
+            from ..services import settings as settings_service
+            from ..database import get_db
+            db = next(get_db())
+            saved = settings_service.get_capture_settings(db)
+            model = saved.stt_model if saved else None


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Database session is not properly closed.

Using next(get_db()) bypasses the generator's finally block that closes the session. This leaks database connections on every request where model is not provided.

🐛 Proposed fix using context manager pattern

# Resolve model: explicit param > user's saved setting > backend default if not model: from ..services import settings as settings_service from ..database import get_db - db = next(get_db()) - saved = settings_service.get_capture_settings(db) - model = saved.stt_model if saved else None + db_gen = get_db() + db = next(db_gen) + try: + saved = settings_service.get_capture_settings(db) + model = saved.stt_model if saved else None + finally: + try: + next(db_gen) + except StopIteration: + pass

Or more cleanly, use contextlib:

from contextlib import contextmanager # In the function: if not model: from ..services import settings as settings_service from ..database import get_db from contextlib import closing with closing(next(iter_db := get_db())) as db: # Note: This still won't work correctly. Better approach: pass

The cleanest fix is to extract a helper or pass db as a dependency:

+from ..database import SessionLocal + # In the function: if not model: from ..services import settings as settings_service - from ..database import get_db - db = next(get_db()) - saved = settings_service.get_capture_settings(db) - model = saved.stt_model if saved else None + with SessionLocal() as db: + saved = settings_service.get_capture_settings(db) + model = saved.stt_model if saved else None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/transcription.py` around lines 44 - 49, The database session created with next(get_db()) is not properly closed, causing connection leaks on every request where model is not provided. Instead of using next(get_db()), refactor this block to either pass the database session as a dependency parameter to this function or use a proper context manager pattern with get_db() that ensures the generator's cleanup logic (the finally block) executes to close the connection. Consider modifying the function signature to accept a db parameter, or if that's not feasible, wrap the get_db() call with a context manager that guarantees the session closure after settings_service.get_capture_settings(db) completes and model is retrieved from saved.stt_model.

coderabbitai · 2026-06-20T00:46:47Z

+RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json
+{
+  "name": "voicebox-container",
+  "version": "0.5.0",
+  "private": true,
+  "workspaces": [
+    "app",
+    "web"
+  ],
+  "scripts": {
+    "build:web": "cd web && bun run build"
+  },
+  "dependencies": {
+    "loaders.css": "^0.1.2",
+    "react-loaders": "^3.0.1"
+  },
+  "devDependencies": {
+    "@biomejs/biome": "2.3.12",
+    "@types/node": "^20.0.0",
+    "tailwindcss": "^4.1.18",
+    "typescript": "^5.6.0"
+  }
+}
+EOFPKG


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify parser compatibility signals for this Dockerfile change. set -euo pipefail echo "== Dockerfile first 5 lines (check syntax directive) ==" sed -n '1,5p' Dockerfile echo echo "== Locate the heredoc RUN block ==" rg -n -C3 "package-temp.json|EOFPKG|RUN cat >" Dockerfile || true echo echo "== Hadolint parse check (if installed) ==" if command -v hadolint >/dev/null 2>&1; then hadolint Dockerfile || true else echo "hadolint not installed in this environment." fi

Repository: jamiepine/voicebox

Length of output: 1045

Heredoc JSON block requires explicit syntax directive or parser-safe rewrite.

Hadolint reports a DL1000 parse error on line 18 (unexpected '{'). The Dockerfile uses heredoc syntax (<< 'EOFPKG') without the required # syntax=docker/dockerfile:1.4+ directive at the file top, causing standard parsers to misinterpret the JSON block as a Docker instruction. This blocks the build at the parse phase.

Add the syntax directive at the start of the file, or refactor to avoid heredoc syntax:

Option 1: Add syntax directive (1 line)

+# syntax=docker/dockerfile:1.4+ # ============================================================ # Voicebox — Local TTS Server with Web UI (CPU)

Option 2: Refactor with printf (parser-safe, no directive needed)

-RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json -{ - "name": "voicebox-container", - "version": "0.5.0", - "private": true, - "workspaces": [ - "app", - "web" - ], - "scripts": { - "build:web": "cd web && bun run build" - }, - "dependencies": { - "loaders.css": "^0.1.2", - "react-loaders": "^3.0.1" - }, - "devDependencies": { - "`@biomejs/biome`": "2.3.12", - "`@types/node`": "^20.0.0", - "tailwindcss": "^4.1.18", - "typescript": "^5.6.0" - } -} -EOFPKG +RUN printf '%s\n' \ +'{' \ +' "name": "voicebox-container",' \ +' "version": "0.5.0",' \ +' "private": true,' \ +' "workspaces": ["app", "web"],' \ +' "scripts": { "build:web": "cd web && bun run build" },' \ +' "dependencies": {' \ +' "loaders.css": "^0.1.2",' \ +' "react-loaders": "^3.0.1"' \ +' },' \ +' "devDependencies": {' \ +' "`@biomejs/biome`": "2.3.12",' \ +' "`@types/node`": "^20.0.0",' \ +' "tailwindcss": "^4.1.18",' \ +' "typescript": "^5.6.0"' \ +' }' \ +'}' > package.json

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json

{

"name": "voicebox-container",

"version": "0.5.0",

"private": true,

"workspaces": [

"app",

"web"

],

"scripts": {

"build:web": "cd web && bun run build"

},

"dependencies": {

"loaders.css": "^0.1.2",

"react-loaders": "^3.0.1"

},

"devDependencies": {

"@biomejs/biome": "2.3.12",

"@types/node": "^20.0.0",

"tailwindcss": "^4.1.18",

"typescript": "^5.6.0"

}

}

EOFPKG

# syntax=docker/dockerfile:1.4+

# ============================================================

# Voicebox — Local TTS Server with Web UI (CPU)

Suggested change

RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json

{

"name": "voicebox-container",

"version": "0.5.0",

"private": true,

"workspaces": [

"app",

"web"

],

"scripts": {

"build:web": "cd web && bun run build"

},

"dependencies": {

"loaders.css": "^0.1.2",

"react-loaders": "^3.0.1"

},

"devDependencies": {

"@biomejs/biome": "2.3.12",

"@types/node": "^20.0.0",

"tailwindcss": "^4.1.18",

"typescript": "^5.6.0"

}

}

EOFPKG

RUN printf '%s\n' \

'{' \

' "name": "voicebox-container",' \

' "version": "0.5.0",' \

' "private": true,' \

' "workspaces": ["app", "web"],' \

' "scripts": { "build:web": "cd web && bun run build" },' \

' "dependencies": {' \

' "loaders.css": "^0.1.2",' \

' "react-loaders": "^3.0.1"' \

' },' \

' "devDependencies": {' \

' "`@biomejs/biome`": "2.3.12",' \

' "`@types/node`": "^20.0.0",' \

' "tailwindcss": "^4.1.18",' \

' "typescript": "^5.6.0"' \

' }' \

'}' > package.json

🧰 Tools

🪛 Hadolint (2.14.0)

[error] 18-18: unexpected '{'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile` around lines 17 - 40, The Dockerfile uses heredoc syntax with the EOFPKG delimiter to create a package.json file, but the Dockerfile parser cannot interpret this without an explicit syntax directive. To fix this, add the syntax directive `# syntax=docker/dockerfile:1.4+` as the first line of the Dockerfile before any other instructions, which will enable support for heredoc syntax. Alternatively, if you prefer to avoid adding the directive, refactor the RUN command that creates the package-temp.json file to use printf or echo with escaped newlines instead of heredoc syntax, which will work with the standard Dockerfile parser without requiring the directive.

Source: Linters/SAST tools

Added Parakeet as a STT model

13280fa

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Parakeet as a STT model#766

Added Parakeet as a STT model#766
dustinwloring1988 wants to merge 1 commit into
jamiepine:mainfrom
dustinwloring1988:parakeet

dustinwloring1988 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dustinwloring1988 commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's Included

Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dustinwloring1988 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading