Skip to content

[codex] Add Cantonese language option#776

Open
pppan2003 wants to merge 1 commit into
jamiepine:mainfrom
pppan2003:codex/cantonese-language-support
Open

[codex] Add Cantonese language option#776
pppan2003 wants to merge 1 commit into
jamiepine:mainfrom
pppan2003:codex/cantonese-language-support

Conversation

@pppan2003

@pppan2003 pppan2003 commented Jun 22, 2026

Copy link
Copy Markdown

Summary

  • add Cantonese (yue) as an explicit language option for capture transcription/STT settings
  • add Cantonese (yue) to Qwen TTS and Qwen CustomVoice language lists
  • map yue to cantonese when loading backend engines
  • update English, Traditional Chinese, and Simplified Chinese UI labels so Chinese and Cantonese are distinct

Notes

Validation

  • bun run typecheck
  • Pydantic request validation for GenerationRequest, SpeakRequest, and VoiceProfileCreate with language="yue"
  • Verified backend language mapping resolves yue to cantonese
  • End-to-end Qwen TTS runtime smoke test with an existing cloned profile using language="yue", engine="qwen", model_size="1.7B"; generation completed successfully with a 4.0s audio result and no backend error

Summary by CodeRabbit

  • New Features
    • Added Cantonese as a distinct language option for both transcription and voice generation, with updated UI labels in English, Simplified Chinese, and Traditional Chinese.
    • Separated “Mandarin / Chinese” from “Cantonese” in the language picker labels.
  • Refactor
    • Centralized backend language validation rules and expanded them to accept the new Cantonese language tag across relevant requests.
  • Documentation
    • Updated the Unreleased changelog entry to reflect the new Cantonese language support and label changes.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72449b6a-3d67-4d1a-8faa-64af32d87322

📥 Commits

Reviewing files that changed from the base of the PR and between bc65f85 and 255de43.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • app/src/components/ServerTab/CapturesPage.tsx
  • app/src/i18n/locales/en/translation.json
  • app/src/i18n/locales/zh-CN/translation.json
  • app/src/i18n/locales/zh-TW/translation.json
  • app/src/lib/constants/languages.ts
  • backend/backends/__init__.py
  • backend/models.py
✅ Files skipped from review due to trivial changes (5)
  • app/src/i18n/locales/zh-TW/translation.json
  • CHANGELOG.md
  • app/src/components/ServerTab/CapturesPage.tsx
  • app/src/i18n/locales/en/translation.json
  • app/src/i18n/locales/zh-CN/translation.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/src/lib/constants/languages.ts
  • backend/backends/init.py

📝 Walkthrough

Walkthrough

Cantonese (yue) is added as a supported language end-to-end: shared regex constants in backend/models.py replace inline patterns and include yue for voice validation; backends/__init__.py registers yue in the language map and all four Qwen model configs; the frontend constants, three locale files, the captures dropdown, and changelog are updated accordingly.

Changes

Cantonese Language Support

Layer / File(s) Summary
Backend validation patterns and model configs
backend/models.py, backend/backends/__init__.py
Extracts VOICE_LANGUAGE_PATTERN and STT_LANGUAGE_PATTERN regex constants; adds yue to the voice pattern; replaces five inline regex literals across request models with these constants. Registers yue → "cantonese" in the language-name map and adds yue to the supported languages list of all four Qwen TTS and CustomVoice model configs.
Frontend constants, i18n strings, and UI dropdown
app/src/lib/constants/languages.ts, app/src/i18n/locales/en/translation.json, app/src/i18n/locales/zh-CN/translation.json, app/src/i18n/locales/zh-TW/translation.json, app/src/components/ServerTab/CapturesPage.tsx
Adds yue: 'Cantonese' to ALL_LANGUAGES and appends yue to ENGINE_LANGUAGES.qwen and ENGINE_LANGUAGES.qwen_custom_voice. Updates the zh label to "Mandarin / Chinese" (and locale equivalents) and adds yue translations in all three locale files. Adds the yue SelectItem to the captures transcription language dropdown.
Changelog documentation
CHANGELOG.md
Documents the introduction of Cantonese as a distinct language option with code yue and clarifies the UI separation between Mandarin/Chinese and Cantonese.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop hop, a new tongue joins the choir,
Cantonese whispers through circuits and wire,
yue in the dropdown, the backend, the map,
From Guangzhou to bytes in a blink and a tap.
The rabbit nods fondly — 廣東話 is here! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding Cantonese language support to the application, which is reflected across all modified files including UI, translations, backend constants, and validation patterns.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pppan2003 pppan2003 marked this pull request as ready for review June 22, 2026 12:30
@pppan2003 pppan2003 force-pushed the codex/cantonese-language-support branch from bc65f85 to 255de43 Compare June 22, 2026 12:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/models.py`:
- Line 15: The STT_LANGUAGE_PATTERN regex constant excludes Hindi, creating a
mismatch where frontend UI allows selecting Hindi but backend validation rejects
it. Add `hi` to the regex pattern in the STT_LANGUAGE_PATTERN constant alongside
the existing language codes (en, zh, yue, ja, ko, de, fr, ru, pt, es, it) so
that frontend language selections are consistently accepted by backend
validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1630b0b7-9692-4f30-9147-823a6776e30a

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and bc65f85.

📒 Files selected for processing (7)
  • app/src/components/ServerTab/CapturesPage.tsx
  • app/src/i18n/locales/en/translation.json
  • app/src/i18n/locales/zh-CN/translation.json
  • app/src/i18n/locales/zh-TW/translation.json
  • app/src/lib/constants/languages.ts
  • backend/backends/__init__.py
  • backend/models.py

Comment thread backend/models.py
)

VOICE_LANGUAGE_PATTERN = "^(zh|yue|en|ja|ko|de|fr|ru|pt|es|it|he|ar|da|el|fi|hi|ms|nl|no|pl|sv|sw|tr)$"
STT_LANGUAGE_PATTERN = "^(en|zh|yue|ja|ko|de|fr|ru|pt|es|it)$"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

STT_LANGUAGE_PATTERN still excludes Hindi, which breaks frontend/backend language contract.

At Line 15, STT validation omits hi, but the captures UI still allows selecting hi; those requests can fail validation. Please include hi in the shared STT regex so UI selections are accepted consistently.

Suggested patch
-STT_LANGUAGE_PATTERN = "^(en|zh|yue|ja|ko|de|fr|ru|pt|es|it)$"
+STT_LANGUAGE_PATTERN = "^(en|zh|yue|ja|ko|de|fr|ru|pt|es|it|hi)$"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
STT_LANGUAGE_PATTERN = "^(en|zh|yue|ja|ko|de|fr|ru|pt|es|it)$"
STT_LANGUAGE_PATTERN = "^(en|zh|yue|ja|ko|de|fr|ru|pt|es|it|hi)$"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/models.py` at line 15, The STT_LANGUAGE_PATTERN regex constant
excludes Hindi, creating a mismatch where frontend UI allows selecting Hindi but
backend validation rejects it. Add `hi` to the regex pattern in the
STT_LANGUAGE_PATTERN constant alongside the existing language codes (en, zh,
yue, ja, ko, de, fr, ru, pt, es, it) so that frontend language selections are
consistently accepted by backend validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant