Skip to content

Added Parakeet as a STT model#766

Open
dustinwloring1988 wants to merge 1 commit into
jamiepine:mainfrom
dustinwloring1988:parakeet
Open

Added Parakeet as a STT model#766
dustinwloring1988 wants to merge 1 commit into
jamiepine:mainfrom
dustinwloring1988:parakeet

Conversation

@dustinwloring1988

@dustinwloring1988 dustinwloring1988 commented Jun 20, 2026

Copy link
Copy Markdown

Summary

Adds support for NVIDIA Parakeet speech-to-text integration in Voicebox.

What's Included

  • Parakeet v2
  • Parakeet v3

Notes

  • Had to update the transformers version

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for NVIDIA Parakeet speech-to-text models alongside Whisper
    • Reorganized transcription model selection with grouped Whisper and Parakeet options
    • Updated model identifiers for consistency (e.g., whisper-turbo, parakeet-tdt-0.6b-v2)
  • Bug Fixes

    • Updated default STT model from turbo to whisper-turbo
    • Improved Docker build to avoid repository modifications
  • Localization

    • Updated translations in English, Japanese, Chinese (Simplified and Traditional)

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds NVIDIA Parakeet TDT 0.6B (v2/v3) as a second STT engine alongside Whisper. A new canonical SttModelId type and STT_MODEL_PATTERN regex replace legacy bare-size identifiers throughout backend models, routes, services, ML backends (MLX and PyTorch), MCP tools, and the frontend API/UI. A DB startup migration rewrites stored bare Whisper sizes to whisper-<size>. The Dockerfile frontend build stage is independently refactored to generate a minimal temporary package.json instead of mutating the repo's existing one.

Changes

Parakeet STT Engine Support

Layer / File(s) Summary
STT type contracts and backend registry
app/src/lib/api/types.ts, app/src/lib/api/client.ts, app/src/lib/hooks/useTranscription.ts, backend/models.py, backend/backends/__init__.py
Introduces SttModelId union type and STT_MODEL_PATTERN regex covering both whisper-* and parakeet-* identifiers. Adds PARAKEET_HF_REPOS, STT_HF_REPOS, STT_ENGINES, Parakeet ModelConfig entries, and helpers normalize_stt_model_name, stt_model_name_to_repo, is_parakeet_model_name. Frontend API client method signatures switch from WhisperModelSize to SttModelId.
STT service abstraction and DB migration
backend/services/transcribe.py, backend/database/models.py, backend/database/migrations.py
Adds get_stt_model() and unload_stt_model() as canonical STT entrypoints; retains get_whisper_model()/unload_whisper_model() as deprecated aliases. Changes CaptureSettings.stt_model column default to whisper-turbo and adds a startup migration that rewrites legacy bare Whisper sizes to whisper-<size> in capture_settings.
MLX and PyTorch STT backends updated for Parakeet
backend/backends/mlx_backend.py, backend/backends/pytorch_backend.py
Both backends refactored from Whisper-only to family-aware: constructors take model_name registry key; loading branches on is_parakeet_model_name; PyTorch _load_model_sync selects AutoProcessor+AutoModelForTDT vs Whisper loaders; language hints applied only for Whisper; Parakeet generation extracts output.sequences.
Backend registry lifecycle hooks for Parakeet
backend/backends/__init__.py
unload_model_by_config, check_model_loaded, and get_model_load_func updated so both whisper and parakeet engines route through transcribe.get_stt_model(), matching on config.model_name instead of the prior Whisper-specific config.model_size.
Routes and services updated for STT model names
backend/routes/transcription.py, backend/routes/captures.py, backend/services/captures.py, backend/mcp_server/tools.py
Transcription route resolves STT model from explicit field → DB saved setting → default, validates against STT_HF_REPOS, triggers background download when uncached, and calls stt.transcribe(..., model_name). Captures readiness normalizes stt_model via normalize_stt_model_name. Capture service and MCP tool switch from get_whisper_model/model_size to get_stt_model/model_name.
Frontend UI: grouped STT dropdown and display name helper
app/src/components/ServerTab/CapturesPage.tsx, app/src/components/CapturesTab/CapturesTab.tsx, app/src/components/ServerSettings/ModelManagement.tsx, app/src/components/VoiceProfiles/ProfileForm.tsx, app/src/components/VoiceProfiles/SampleUpload.tsx
Transcription model dropdown rebuilt with SelectGroup/SelectLabel grouping Whisper and Parakeet options; CapturesTab adds formatSttModelName for display labels; ModelManagement adds Parakeet descriptions and includes parakeet* in STT filter; ProfileForm and SampleUpload read and forward captureSettings.stt_model to transcription requests.
i18n strings and build artifact updates
app/src/i18n/locales/*/translation.json, backend/build_binary.py, backend/voicebox-server.spec, backend/requirements.txt
Translation files (en, ja, zh-CN, zh-TW) rename Whisper option keys to whisper-*, add Parakeet model entries with group labels, and extend tail qualifiers. PyInstaller spec and build_binary.py add transformers.models.parakeet to hidden imports; requirements.txt raises transformers minimum to >=5.6.0 and switches Zipvoice to PyPI.

Dockerfile Frontend Build Refactor

Layer / File(s) Summary
Generate minimal package.json for web build
Dockerfile
Frontend Bun build stage now generates a temporary minimal package.json with only app/web workspaces and required dependencies via echo, replacing the prior sed-based stripping of the repo's root package.json. Build command changes from bunx --bun vite build to bunx vite build.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant TranscriptionRoute as /transcribe endpoint
  participant DB as capture_settings DB
  participant STT_Registry as STT_HF_REPOS / normalize
  participant TaskManager
  participant STTBackend as get_stt_model() backend

  Client->>TranscriptionRoute: POST /transcribe (file, language?, model?)
  TranscriptionRoute->>DB: fetch saved stt_model (if model not provided)
  DB-->>TranscriptionRoute: saved stt_model or default
  TranscriptionRoute->>STT_Registry: normalize_stt_model_name(model)
  STT_Registry-->>TranscriptionRoute: canonical model_name
  TranscriptionRoute->>STTBackend: is_loaded() && model_name matches?
  alt model not cached
    TranscriptionRoute->>TaskManager: start background download task
    TaskManager-->>Client: HTTP 202 (download in progress)
  else model ready
    TranscriptionRoute->>STTBackend: transcribe(audio_path, language, model_name)
    STTBackend-->>TranscriptionRoute: transcription text
    TranscriptionRoute-->>Client: 200 OK (text)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • jamiepine/voicebox#238: Both PRs touch STT backend model-loading and HF repo resolution in mlx_backend.py and pytorch_backend.py; this PR extends that Whisper-specific HF mapping to a unified STT_HF_REPOS registry covering Parakeet.
  • jamiepine/voicebox#295: Both PRs modify the /transcription route and ApiClient.transcribeAudio model parameter; this PR further rewires that path to use canonical SttModelId/model_name normalization instead of bare Whisper sizes.
  • jamiepine/voicebox#544: Both PRs touch backend/mcp_server/tools.py's _transcribe_file flow; #544 introduced the initial Whisper-only selection and this PR extends it to use normalized STT model names covering both Whisper and Parakeet.

Poem

🐇 A rabbit hops through model land,
Where Whisper ruled with steady hand.
But Parakeet joins the feathered choir,
With TDT v2 and v3 to admire!
whisper-turbo defaults now reign,
And migrations sweep the old names plain.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature addition in the changeset: support for NVIDIA Parakeet as a new STT model alongside Whisper.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/mcp_server/tools.py (1)

324-324: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Undefined variable model_size will cause NameError.

The return statement references model_size but the function defines model_name. This will crash at runtime.

🐛 Proposed fix
     return {
         "text": text,
         "duration": duration,
         "language": language,
-        "model": model_size,
+        "model": model_name,
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/mcp_server/tools.py` at line 324, In the return statement at the
specified location, the dictionary key "model" is assigned the undefined
variable model_size, but the function actually defines model_name. Replace
model_size with model_name in the "model" key assignment to fix the NameError
that will occur at runtime.
🧹 Nitpick comments (2)
backend/services/captures.py (1)

121-122: 💤 Low value

Consider renaming the variable for clarity.

The variable whisper now holds a generic STT backend that could be Parakeet. Consider renaming to stt or stt_backend for clarity.

♻️ Suggested rename
-        whisper = get_stt_model()
-        resolved_stt = normalize_stt_model_name(stt_model or whisper.model_name)
-        transcript = await whisper.transcribe(str(audio_path), language, resolved_stt)
+        stt = get_stt_model()
+        resolved_stt = normalize_stt_model_name(stt_model or stt.model_name)
+        transcript = await stt.transcribe(str(audio_path), language, resolved_stt)

Same change applies to retranscribe_capture at lines 223-225.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/captures.py` around lines 121 - 122, The variable name
`whisper` is misleading because the result of `get_stt_model()` now represents a
generic STT backend that could be Parakeet or other models, not specifically
Whisper. Rename the variable `whisper` to `stt` or `stt_backend` throughout the
code where it is assigned from `get_stt_model()` and update all its references
including the usage in the `normalize_stt_model_name()` call. Apply this same
variable rename change to the `retranscribe_capture` function at lines 223-225
where the identical pattern exists.
app/src/components/CapturesTab/CapturesTab.tsx (1)

136-147: ⚡ Quick win

Tighten formatSttModelName input type to SttModelId.

Line 136 currently accepts string, which bypasses compile-time guarantees for canonical STT ids and makes regressions easier to miss.

Proposed refactor
+import type { SttModelId } from '`@/lib/api/types`';
...
-function formatSttModelName(modelName: string): string {
+function formatSttModelName(modelName: SttModelId): string {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/CapturesTab/CapturesTab.tsx` around lines 136 - 147, The
formatSttModelName function currently accepts a generic string type for the
modelName parameter, which lacks compile-time validation for canonical STT model
IDs. Change the function signature to accept SttModelId type instead of string
as the parameter type for modelName. This ensures that only valid STT model
identifiers can be passed to the function and prevents potential regressions.
The function implementation logic remains the same, only the input parameter
type needs to be updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/mcp_server/tools.py`:
- Around line 294-299: The database session in the block where `not model` is
checked is not properly closed because `next(get_db())` bypasses the generator's
cleanup mechanism. Replace the import of `get_db` with `SessionLocal` from the
database module, and wrap the database session creation and usage with a context
manager (with statement) to ensure the session is properly closed after calling
`settings_service.get_capture_settings(db)` to retrieve the stt_model.

In `@backend/models.py`:
- Around line 19-23: The decimal points in the Parakeet model IDs within the
STT_MODEL_PATTERN regex are unescaped, which allows invalid model names like
parakeet-tdt-0x6b-v2 to pass validation since unescaped dots match any
character. Escape the decimal points in both parakeet-tdt-0.6b-v2 and
parakeet-tdt-0.6b-v3 entries by prefixing each dot with a backslash to ensure
only literal dots are matched and prevent invalid model IDs from being persisted
through the CaptureSettingsUpdate model.

In `@backend/requirements.txt`:
- Around line 12-16: The transformers requirement on line 12
(transformers>=5.6.0) conflicts with the documented qwen-tts pin of
transformers==4.57.3, and since qwen-tts is actively used in the backend
(imported in pytorch_backend.py and qwen_custom_voice_backend.py), this version
mismatch must be resolved. Either verify the actual transformers version
constraint required by qwen-tts and update line 12 to be compatible with that
version, or update the qwen-tts installation method to use a version compatible
with transformers>=5.6.0, then update the comment on line 15 to accurately
reflect the pinned transformers version if environments are intentionally split.

In `@backend/routes/transcription.py`:
- Around line 44-49: The database session created with next(get_db()) is not
properly closed, causing connection leaks on every request where model is not
provided. Instead of using next(get_db()), refactor this block to either pass
the database session as a dependency parameter to this function or use a proper
context manager pattern with get_db() that ensures the generator's cleanup logic
(the finally block) executes to close the connection. Consider modifying the
function signature to accept a db parameter, or if that's not feasible, wrap the
get_db() call with a context manager that guarantees the session closure after
settings_service.get_capture_settings(db) completes and model is retrieved from
saved.stt_model.

In `@Dockerfile`:
- Around line 17-40: The Dockerfile uses heredoc syntax with the EOFPKG
delimiter to create a package.json file, but the Dockerfile parser cannot
interpret this without an explicit syntax directive. To fix this, add the syntax
directive `# syntax=docker/dockerfile:1.4+` as the first line of the Dockerfile
before any other instructions, which will enable support for heredoc syntax.
Alternatively, if you prefer to avoid adding the directive, refactor the RUN
command that creates the package-temp.json file to use printf or echo with
escaped newlines instead of heredoc syntax, which will work with the standard
Dockerfile parser without requiring the directive.

---

Outside diff comments:
In `@backend/mcp_server/tools.py`:
- Line 324: In the return statement at the specified location, the dictionary
key "model" is assigned the undefined variable model_size, but the function
actually defines model_name. Replace model_size with model_name in the "model"
key assignment to fix the NameError that will occur at runtime.

---

Nitpick comments:
In `@app/src/components/CapturesTab/CapturesTab.tsx`:
- Around line 136-147: The formatSttModelName function currently accepts a
generic string type for the modelName parameter, which lacks compile-time
validation for canonical STT model IDs. Change the function signature to accept
SttModelId type instead of string as the parameter type for modelName. This
ensures that only valid STT model identifiers can be passed to the function and
prevents potential regressions. The function implementation logic remains the
same, only the input parameter type needs to be updated.

In `@backend/services/captures.py`:
- Around line 121-122: The variable name `whisper` is misleading because the
result of `get_stt_model()` now represents a generic STT backend that could be
Parakeet or other models, not specifically Whisper. Rename the variable
`whisper` to `stt` or `stt_backend` throughout the code where it is assigned
from `get_stt_model()` and update all its references including the usage in the
`normalize_stt_model_name()` call. Apply this same variable rename change to the
`retranscribe_capture` function at lines 223-225 where the identical pattern
exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b41097d-8dff-4605-bd18-4431cce2be3c

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and 13280fa.

📒 Files selected for processing (27)
  • Dockerfile
  • app/src/components/CapturesTab/CapturesTab.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/components/ServerTab/CapturesPage.tsx
  • app/src/components/VoiceProfiles/ProfileForm.tsx
  • app/src/components/VoiceProfiles/SampleUpload.tsx
  • app/src/i18n/locales/en/translation.json
  • app/src/i18n/locales/ja/translation.json
  • app/src/i18n/locales/zh-CN/translation.json
  • app/src/i18n/locales/zh-TW/translation.json
  • app/src/lib/api/client.ts
  • app/src/lib/api/types.ts
  • app/src/lib/hooks/useTranscription.ts
  • backend/backends/__init__.py
  • backend/backends/mlx_backend.py
  • backend/backends/pytorch_backend.py
  • backend/build_binary.py
  • backend/database/migrations.py
  • backend/database/models.py
  • backend/mcp_server/tools.py
  • backend/models.py
  • backend/requirements.txt
  • backend/routes/captures.py
  • backend/routes/transcription.py
  • backend/services/captures.py
  • backend/services/transcribe.py
  • backend/voicebox-server.spec

Comment on lines +294 to +299
if not model:
from ..services import settings as settings_service
from ..database import get_db
db = next(get_db())
saved = settings_service.get_capture_settings(db)
model = saved.stt_model if saved else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Database session is not properly closed.

Same issue as in transcription.py: using next(get_db()) bypasses the generator's cleanup. Consider using SessionLocal() directly with a context manager.

🐛 Proposed fix
     if not model:
         from ..services import settings as settings_service
-        from ..database import get_db
-        db = next(get_db())
-        saved = settings_service.get_capture_settings(db)
-        model = saved.stt_model if saved else None
+        from ..database import SessionLocal
+        with SessionLocal() as db:
+            saved = settings_service.get_capture_settings(db)
+            model = saved.stt_model if saved else None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not model:
from ..services import settings as settings_service
from ..database import get_db
db = next(get_db())
saved = settings_service.get_capture_settings(db)
model = saved.stt_model if saved else None
if not model:
from ..services import settings as settings_service
from ..database import SessionLocal
with SessionLocal() as db:
saved = settings_service.get_capture_settings(db)
model = saved.stt_model if saved else None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/mcp_server/tools.py` around lines 294 - 299, The database session in
the block where `not model` is checked is not properly closed because
`next(get_db())` bypasses the generator's cleanup mechanism. Replace the import
of `get_db` with `SessionLocal` from the database module, and wrap the database
session creation and usage with a context manager (with statement) to ensure the
session is properly closed after calling
`settings_service.get_capture_settings(db)` to retrieve the stt_model.

Comment thread backend/models.py
Comment on lines +19 to +23
STT_MODEL_PATTERN = (
r"^("
r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"
r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3"
r"|base|small|medium|large|turbo" # legacy bare Whisper sizes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import re
pattern = r"^(whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3|base|small|medium|large|turbo)$"
tests = ["parakeet-tdt-0.6b-v2", "parakeet-tdt-0x6b-v2"]
for t in tests:
    print(t, bool(re.fullmatch(pattern, t)))
PY

Repository: jamiepine/voicebox

Length of output: 114


🏁 Script executed:

# Check the actual code in backend/models.py at the relevant lines
head -30 backend/models.py | tail -15

Repository: jamiepine/voicebox

Length of output: 598


🏁 Script executed:

# Verify the pattern is indeed used at the mentioned line numbers
grep -n "STT_MODEL_PATTERN" backend/models.py

Repository: jamiepine/voicebox

Length of output: 392


🏁 Script executed:

# Check the context around Line 288 to confirm it's a settings update path
sed -n '280,295p' backend/models.py

Repository: jamiepine/voicebox

Length of output: 637


Escape the decimal points in Parakeet model IDs in the shared regex.

Line 22 uses 0.6b with unescaped dots, allowing invalid values like parakeet-tdt-0x6b-v2 to pass validation. These can be persisted via the CaptureSettingsUpdate model at line 288, causing failures during later registry lookups.

Proposed fix
 STT_MODEL_PATTERN = (
     r"^("
     r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"
-    r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3"
+    r"|parakeet-tdt-0\.6b-v2|parakeet-tdt-0\.6b-v3"
     r"|base|small|medium|large|turbo"  # legacy bare Whisper sizes
     r")$"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
STT_MODEL_PATTERN = (
r"^("
r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"
r"|parakeet-tdt-0.6b-v2|parakeet-tdt-0.6b-v3"
r"|base|small|medium|large|turbo" # legacy bare Whisper sizes
STT_MODEL_PATTERN = (
r"^("
r"whisper-base|whisper-small|whisper-medium|whisper-large|whisper-turbo"
r"|parakeet-tdt-0\.6b-v2|parakeet-tdt-0\.6b-v3"
r"|base|small|medium|large|turbo" # legacy bare Whisper sizes
r")$"
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/models.py` around lines 19 - 23, The decimal points in the Parakeet
model IDs within the STT_MODEL_PATTERN regex are unescaped, which allows invalid
model names like parakeet-tdt-0x6b-v2 to pass validation since unescaped dots
match any character. Escape the decimal points in both parakeet-tdt-0.6b-v2 and
parakeet-tdt-0.6b-v3 entries by prefixing each dot with a backslash to ensure
only literal dots are matched and prevent invalid model IDs from being persisted
through the CaptureSettingsUpdate model.

Comment thread backend/requirements.txt
Comment on lines +12 to 16
transformers>=5.6.0
accelerate>=0.26.0
huggingface_hub>=0.20.0
qwen-tts>=0.0.5
# qwen-tts: installed separately with --no-deps (pins transformers==4.57.3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Dependency pins =="
rg -n --no-heading -C2 '(^|\s)(transformers|qwen-tts|qwen_tts)(==|>=|<=|~=|@|:| )|--no-deps' \
  requirements.txt backend/requirements.txt Dockerfile README.md || true

echo
echo "== Runtime usage of qwen_tts =="
rg -n --no-heading -C2 '\bqwen_tts\b|qwen-tts' backend || true

echo
echo "== Build/bundle references =="
rg -n --no-heading -C2 'qwen_tts|qwen-tts|transformers' backend/build_binary.py backend/voicebox-server.spec || true

echo
echo "== Specifier overlap sanity check =="
python - <<'PY'
from packaging.specifiers import SpecifierSet
from packaging.version import Version

s_main = SpecifierSet(">=5.6.0")
s_qwen = SpecifierSet("==4.57.3")
v = Version("4.57.3")
print("4.57.3 satisfies main spec:", v in s_main)
print("4.57.3 satisfies qwen spec:", v in s_qwen)
PY

Repository: jamiepine/voicebox

Length of output: 18323


Resolve the transformers version conflict between direct dependency and qwen-tts pin.

Line 12 specifies transformers>=5.6.0, but Line 15 documents qwen-tts as pinning transformers==4.57.3. Version 4.57.3 does not satisfy the >=5.6.0 requirement, and since qwen_tts is actively used in the backend (imported in pytorch_backend.py and qwen_custom_voice_backend.py) and bundled in the binary build, this creates a version conflict. Clarify whether environments are intentionally split, document the required version constraints, or adjust the floor to be compatible.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/requirements.txt` around lines 12 - 16, The transformers requirement
on line 12 (transformers>=5.6.0) conflicts with the documented qwen-tts pin of
transformers==4.57.3, and since qwen-tts is actively used in the backend
(imported in pytorch_backend.py and qwen_custom_voice_backend.py), this version
mismatch must be resolved. Either verify the actual transformers version
constraint required by qwen-tts and update line 12 to be compatible with that
version, or update the qwen-tts installation method to use a version compatible
with transformers>=5.6.0, then update the comment on line 15 to accurately
reflect the pinned transformers version if environments are intentionally split.

Comment on lines +44 to +49
if not model:
from ..services import settings as settings_service
from ..database import get_db
db = next(get_db())
saved = settings_service.get_capture_settings(db)
model = saved.stt_model if saved else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Database session is not properly closed.

Using next(get_db()) bypasses the generator's finally block that closes the session. This leaks database connections on every request where model is not provided.

🐛 Proposed fix using context manager pattern
         # Resolve model: explicit param > user's saved setting > backend default
         if not model:
             from ..services import settings as settings_service
             from ..database import get_db
-            db = next(get_db())
-            saved = settings_service.get_capture_settings(db)
-            model = saved.stt_model if saved else None
+            db_gen = get_db()
+            db = next(db_gen)
+            try:
+                saved = settings_service.get_capture_settings(db)
+                model = saved.stt_model if saved else None
+            finally:
+                try:
+                    next(db_gen)
+                except StopIteration:
+                    pass

Or more cleanly, use contextlib:

from contextlib import contextmanager

# In the function:
if not model:
    from ..services import settings as settings_service
    from ..database import get_db
    from contextlib import closing
    
    with closing(next(iter_db := get_db())) as db:
        # Note: This still won't work correctly. Better approach:
        pass

The cleanest fix is to extract a helper or pass db as a dependency:

+from ..database import SessionLocal
+
 # In the function:
         if not model:
             from ..services import settings as settings_service
-            from ..database import get_db
-            db = next(get_db())
-            saved = settings_service.get_capture_settings(db)
-            model = saved.stt_model if saved else None
+            with SessionLocal() as db:
+                saved = settings_service.get_capture_settings(db)
+                model = saved.stt_model if saved else None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/transcription.py` around lines 44 - 49, The database session
created with next(get_db()) is not properly closed, causing connection leaks on
every request where model is not provided. Instead of using next(get_db()),
refactor this block to either pass the database session as a dependency
parameter to this function or use a proper context manager pattern with get_db()
that ensures the generator's cleanup logic (the finally block) executes to close
the connection. Consider modifying the function signature to accept a db
parameter, or if that's not feasible, wrap the get_db() call with a context
manager that guarantees the session closure after
settings_service.get_capture_settings(db) completes and model is retrieved from
saved.stt_model.

Comment thread Dockerfile
Comment on lines +17 to +40
RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json
{
"name": "voicebox-container",
"version": "0.5.0",
"private": true,
"workspaces": [
"app",
"web"
],
"scripts": {
"build:web": "cd web && bun run build"
},
"dependencies": {
"loaders.css": "^0.1.2",
"react-loaders": "^3.0.1"
},
"devDependencies": {
"@biomejs/biome": "2.3.12",
"@types/node": "^20.0.0",
"tailwindcss": "^4.1.18",
"typescript": "^5.6.0"
}
}
EOFPKG

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify parser compatibility signals for this Dockerfile change.
set -euo pipefail

echo "== Dockerfile first 5 lines (check syntax directive) =="
sed -n '1,5p' Dockerfile

echo
echo "== Locate the heredoc RUN block =="
rg -n -C3 "package-temp.json|EOFPKG|RUN cat >" Dockerfile || true

echo
echo "== Hadolint parse check (if installed) =="
if command -v hadolint >/dev/null 2>&1; then
  hadolint Dockerfile || true
else
  echo "hadolint not installed in this environment."
fi

Repository: jamiepine/voicebox

Length of output: 1045


Heredoc JSON block requires explicit syntax directive or parser-safe rewrite.

Hadolint reports a DL1000 parse error on line 18 (unexpected '{'). The Dockerfile uses heredoc syntax (<< 'EOFPKG') without the required # syntax=docker/dockerfile:1.4+ directive at the file top, causing standard parsers to misinterpret the JSON block as a Docker instruction. This blocks the build at the parse phase.

Add the syntax directive at the start of the file, or refactor to avoid heredoc syntax:

Option 1: Add syntax directive (1 line)
+# syntax=docker/dockerfile:1.4+
# ============================================================
# Voicebox — Local TTS Server with Web UI (CPU)
Option 2: Refactor with printf (parser-safe, no directive needed)
-RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json
-{
-  "name": "voicebox-container",
-  "version": "0.5.0",
-  "private": true,
-  "workspaces": [
-    "app",
-    "web"
-  ],
-  "scripts": {
-    "build:web": "cd web && bun run build"
-  },
-  "dependencies": {
-    "loaders.css": "^0.1.2",
-    "react-loaders": "^3.0.1"
-  },
-  "devDependencies": {
-    "`@biomejs/biome`": "2.3.12",
-    "`@types/node`": "^20.0.0",
-    "tailwindcss": "^4.1.18",
-    "typescript": "^5.6.0"
-  }
-}
-EOFPKG
+RUN printf '%s\n' \
+'{' \
+'  "name": "voicebox-container",' \
+'  "version": "0.5.0",' \
+'  "private": true,' \
+'  "workspaces": ["app", "web"],' \
+'  "scripts": { "build:web": "cd web && bun run build" },' \
+'  "dependencies": {' \
+'    "loaders.css": "^0.1.2",' \
+'    "react-loaders": "^3.0.1"' \
+'  },' \
+'  "devDependencies": {' \
+'    "`@biomejs/biome`": "2.3.12",' \
+'    "`@types/node`": "^20.0.0",' \
+'    "tailwindcss": "^4.1.18",' \
+'    "typescript": "^5.6.0"' \
+'  }' \
+'}' > package.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json
{
"name": "voicebox-container",
"version": "0.5.0",
"private": true,
"workspaces": [
"app",
"web"
],
"scripts": {
"build:web": "cd web && bun run build"
},
"dependencies": {
"loaders.css": "^0.1.2",
"react-loaders": "^3.0.1"
},
"devDependencies": {
"@biomejs/biome": "2.3.12",
"@types/node": "^20.0.0",
"tailwindcss": "^4.1.18",
"typescript": "^5.6.0"
}
}
EOFPKG
# syntax=docker/dockerfile:1.4+
# ============================================================
# Voicebox — Local TTS Server with Web UI (CPU)
Suggested change
RUN cat > package-temp.json << 'EOFPKG' && mv package-temp.json package.json
{
"name": "voicebox-container",
"version": "0.5.0",
"private": true,
"workspaces": [
"app",
"web"
],
"scripts": {
"build:web": "cd web && bun run build"
},
"dependencies": {
"loaders.css": "^0.1.2",
"react-loaders": "^3.0.1"
},
"devDependencies": {
"@biomejs/biome": "2.3.12",
"@types/node": "^20.0.0",
"tailwindcss": "^4.1.18",
"typescript": "^5.6.0"
}
}
EOFPKG
RUN printf '%s\n' \
'{' \
' "name": "voicebox-container",' \
' "version": "0.5.0",' \
' "private": true,' \
' "workspaces": ["app", "web"],' \
' "scripts": { "build:web": "cd web && bun run build" },' \
' "dependencies": {' \
' "loaders.css": "^0.1.2",' \
' "react-loaders": "^3.0.1"' \
' },' \
' "devDependencies": {' \
' "`@biomejs/biome`": "2.3.12",' \
' "`@types/node`": "^20.0.0",' \
' "tailwindcss": "^4.1.18",' \
' "typescript": "^5.6.0"' \
' }' \
'}' > package.json
🧰 Tools
🪛 Hadolint (2.14.0)

[error] 18-18: unexpected '{'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` around lines 17 - 40, The Dockerfile uses heredoc syntax with the
EOFPKG delimiter to create a package.json file, but the Dockerfile parser cannot
interpret this without an explicit syntax directive. To fix this, add the syntax
directive `# syntax=docker/dockerfile:1.4+` as the first line of the Dockerfile
before any other instructions, which will enable support for heredoc syntax.
Alternatively, if you prefer to avoid adding the directive, refactor the RUN
command that creates the package-temp.json file to use printf or echo with
escaped newlines instead of heredoc syntax, which will work with the standard
Dockerfile parser without requiring the directive.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant