Skip to content

Feat: Implement robust 3-point audio language auto-detect with majority voting when no metadata, and add manual CLI override#4

Open
Zonrotan wants to merge 1 commit intoellite:mainfrom
Zonrotan:feature/auto-detect-fix
Open

Feat: Implement robust 3-point audio language auto-detect with majority voting when no metadata, and add manual CLI override#4
Zonrotan wants to merge 1 commit intoellite:mainfrom
Zonrotan:feature/auto-detect-fix

Conversation

@Zonrotan
Copy link
Copy Markdown

Hi!
This is a fix for my reported issue #3 (comment)

This PR adds an auto-detect feature for files missing language metadata. It uses WhisperX to sample three separate 30-second clips (at 30%, 50%, and 70% of the video, purposefully avoiding intros/studio logos) and uses majority voting to determine the correct language.

It also adds an --audio-language flag for manual overrides.

It works real nice from what I've been able to test, does not add that much time at all (none if lang metadata already exists of course)

Let me know what you think

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a more reliable audio-language detection path when container metadata is missing, so the translation pipeline can still be triggered correctly (plus a manual CLI override).

Changes:

  • Add --audio-language CLI flag to manually override detected/metadata audio language.
  • Add WhisperX-based audio sampling language detection (3×30s clips with majority vote) when metadata is missing.
  • Wire the detection/override into the audiosync workflow; add ffmpeg-python dependency.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
requirements.txt Adds ffmpeg-python dependency for probing/extracting audio samples.
anchor/utils/whisper.py Introduces detect_audio_language_whisper() using 3-point sampling + majority vote.
anchor/utils/args.py Adds --audio-language CLI argument.
anchor/core/audiosync/audiosync.py Uses override → metadata → Whisper-based sampling to determine audio language and decide whether to translate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread anchor/utils/whisper.py
Comment on lines +244 to +252
if detected_languages:
# Counter counts occurrences. most_common(1) returns e.g., [('en', 2)]
# We then extract just the language code from that nested result.
winner = Counter(detected_languages).most_common(1)[0][0]
return winner
else:
# Absolute fallback if ALL three clips were purely silent
return "en"

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes English as a fallback when no language is detected (silent clips / probe failures). Returning 'en' can incorrectly force translation/model selection for non-English audio; it’s safer to return None/"unknown" and let the rest of the pipeline keep Whisper in auto-detect mode (or explicitly handle an unknown audio language).

Copilot uses AI. Check for mistakes.
Comment thread anchor/utils/whisper.py
Comment on lines +212 to +214
# Load the model ONCE outside the loop to save massive amounts of time
model = whisperx.load_model("base", device, compute_type=compute_type, asr_options={"without_timestamps": True})

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect_audio_language_whisper loads an additional WhisperX model but doesn’t explicitly free it. On CUDA this can leave VRAM allocated until GC runs and can contribute to OOM when the main sync model is loaded next. Consider del model (and torch.cuda.empty_cache() when applicable) before returning, or run this detection model on CPU explicitly.

Copilot uses AI. Check for mistakes.
Comment thread anchor/utils/args.py
"--audio-language",
type=str,
help="Manually specify the video's audio language (e.g. 'en', 'sv') bypassing metadata."
)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing parenthesis for this parser.add_argument(...) call is unindented compared to the rest of the file’s argument definitions, which makes the block harder to read and easy to mis-edit. Please indent it consistently with the surrounding parser.add_argument calls.

Suggested change
)
)

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +113
if getattr(args, "audio_language", None):
audio_lang = args.audio_language.lower()
console.print(f"[dim]🌐 Audio language manually overridden to: [bold cyan]{audio_lang.upper()}[/bold cyan][/dim]")
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI override is lowercased but not normalized. If a user passes a 3-letter code like eng/swe (common in container metadata) it won’t match the 2-letter codes used elsewhere (e.g., NLLB mapping and *.en model selection). Consider normalizing the override with the existing normalize_language_code() helper so both 2- and 3-letter inputs behave consistently.

Copilot uses AI. Check for mistakes.
meta_lang = get_audio_language(vid)
if meta_lang:
console.print(f"[dim]🌐 Metadata language detected: [bold cyan]{meta_lang.upper()}[/bold cyan][/dim]")
# Detection & Language Override
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is inconsistent here: the # Detection & Language Override line is less indented than the surrounding statements inside the for loop, which will end the loop block early and make the next indented line (audio_lang = None) raise an IndentationError/SyntaxError. Align this comment (and any related code) to the same indentation level as the other statements inside the loop.

Suggested change
# Detection & Language Override
# Detection & Language Override

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants