fix(cli): accept YouTube video IDs starting with a dash#600
Open
carlosacchi wants to merge 1 commit into
Open
Conversation
Argparse interprets any positional token starting with '-' as an
unknown option, so calling
youtube_transcript_api -12345678.. --format text
fails with 'unrecognized arguments'. Auto-escape 11-character tokens
matching the YouTube video ID pattern that start with a single dash
by prepending a backslash before argparse sees them. The existing
_sanitize_video_ids step then strips the backslash, leaving the
original ID intact.
Tokens starting with '--' are left untouched so long options such as
--languages are still parsed correctly; users with the (extremely
rare) video ID starting with '--' can keep using the manual backslash
escape.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #599.
Problem
argparsetreats any positional token starting with-as an unknown option, so a YouTube video ID with a leading dash makes the CLI error out:The existing
\-escape works but is undocumented, and the--separator does not help because it forces every following token (including flags) to be treated as a positional video ID.Change
YouTubeTranscriptCli.__init__now pre-escapes any incoming token matching the YouTube video ID pattern with a single leading dash:by prepending a backslash before argparse runs. The existing
_sanitize_video_idsstep strips that backslash after parsing, so the resultingvideo_idslist contains the original IDs.The second character is required to be non-
-so that long options such as--languages(which is also 11 characters long) are not misclassified as video IDs. Video IDs starting with--are extremely rare and can still be passed using the existing manual\escape.Tests
Added
test_argument_parsing__youtube_id_starting_with_dash_is_auto_escapedcovering:-X9P8VE94Po --format textparses asvideo_ids=['-X9P8VE94Po'], format='text'--languages de enparse correctlyExisting
test_argument_parsing__video_ids_starting_with_dash(manual\escape) still passes.Full CLI test suite: 23 passed, 1 skipped.