Feat: Implement robust 3-point audio language auto-detect with majority voting when no metadata, and add manual CLI override by Zonrotan · Pull Request #4 · ellite/anchor-sub-sync

Zonrotan · 2026-04-15T07:25:08Z

Hi!
This is a fix for my reported issue #3 (comment)

This PR adds an auto-detect feature for files missing language metadata. It uses WhisperX to sample three separate 30-second clips (at 30%, 50%, and 70% of the video, purposefully avoiding intros/studio logos) and uses majority voting to determine the correct language.

It also adds an --audio-language flag for manual overrides.

It works real nice from what I've been able to test, does not add that much time at all (none if lang metadata already exists of course)

Let me know what you think

…ty voting, and add manual CLI override

Copilot

Pull request overview

Adds a more reliable audio-language detection path when container metadata is missing, so the translation pipeline can still be triggered correctly (plus a manual CLI override).

Changes:

Add --audio-language CLI flag to manually override detected/metadata audio language.
Add WhisperX-based audio sampling language detection (3×30s clips with majority vote) when metadata is missing.
Wire the detection/override into the audiosync workflow; add ffmpeg-python dependency.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`requirements.txt`	Adds `ffmpeg-python` dependency for probing/extracting audio samples.
`anchor/utils/whisper.py`	Introduces `detect_audio_language_whisper()` using 3-point sampling + majority vote.
`anchor/utils/args.py`	Adds `--audio-language` CLI argument.
`anchor/core/audiosync/audiosync.py`	Uses override → metadata → Whisper-based sampling to determine audio language and decide whether to translate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T21:09:46Z

+        if detected_languages:
+            # Counter counts occurrences. most_common(1) returns e.g., [('en', 2)]
+            # We then extract just the language code from that nested result.
+            winner = Counter(detected_languages).most_common(1)[0][0]
+            return winner
+        else:
+            # Absolute fallback if ALL three clips were purely silent
+            return "en"
+


This hard-codes English as a fallback when no language is detected (silent clips / probe failures). Returning 'en' can incorrectly force translation/model selection for non-English audio; it’s safer to return None/"unknown" and let the rest of the pipeline keep Whisper in auto-detect mode (or explicitly handle an unknown audio language).

Copilot · 2026-04-18T21:09:46Z

+        # Load the model ONCE outside the loop to save massive amounts of time
+        model = whisperx.load_model("base", device, compute_type=compute_type, asr_options={"without_timestamps": True})
+


detect_audio_language_whisper loads an additional WhisperX model but doesn’t explicitly free it. On CUDA this can leave VRAM allocated until GC runs and can contribute to OOM when the main sync model is loaded next. Consider del model (and torch.cuda.empty_cache() when applicable) before returning, or run this detection model on CPU explicitly.

Copilot · 2026-04-18T21:09:47Z

+        "--audio-language",
+        type=str,
+        help="Manually specify the video's audio language (e.g. 'en', 'sv') bypassing metadata."
+)


The closing parenthesis for this parser.add_argument(...) call is unindented compared to the rest of the file’s argument definitions, which makes the block harder to read and easy to mis-edit. Please indent it consistently with the surrounding parser.add_argument calls.

Suggested change

)

)

Copilot · 2026-04-18T21:09:47Z

+        if getattr(args, "audio_language", None):
+            audio_lang = args.audio_language.lower()
+            console.print(f"[dim]🌐 Audio language manually overridden to: [bold cyan]{audio_lang.upper()}[/bold cyan][/dim]")


The CLI override is lowercased but not normalized. If a user passes a 3-letter code like eng/swe (common in container metadata) it won’t match the 2-letter codes used elsewhere (e.g., NLLB mapping and *.en model selection). Consider normalizing the override with the existing normalize_language_code() helper so both 2- and 3-letter inputs behave consistently.

Copilot · 2026-04-18T21:09:47Z

-        meta_lang = get_audio_language(vid) 
-        if meta_lang:
-            console.print(f"[dim]🌐 Metadata language detected: [bold cyan]{meta_lang.upper()}[/bold cyan][/dim]")
+    # Detection & Language Override


Indentation is inconsistent here: the # Detection & Language Override line is less indented than the surrounding statements inside the for loop, which will end the loop block early and make the next indented line (audio_lang = None) raise an IndentationError/SyntaxError. Align this comment (and any related code) to the same indentation level as the other statements inside the loop.

Suggested change

# Detection & Language Override

# Detection & Language Override

Feat: Implement robust 3-point audio language auto-detect with majori…

b2b19da

…ty voting, and add manual CLI override

ellite requested a review from Copilot April 18, 2026 21:05

Copilot started reviewing on behalf of ellite April 18, 2026 21:06 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Implement robust 3-point audio language auto-detect with majority voting when no metadata, and add manual CLI override#4

Feat: Implement robust 3-point audio language auto-detect with majority voting when no metadata, and add manual CLI override#4
Zonrotan wants to merge 1 commit intoellite:mainfrom
Zonrotan:feature/auto-detect-fix

Zonrotan commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Load the model ONCE outside the loop to save massive amounts of time
		model = whisperx.load_model("base", device, compute_type=compute_type, asr_options={"without_timestamps": True})

	# Detection & Language Override
	# Detection & Language Override

Conversation

Zonrotan commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants