Skip to content

fix(cli): accept YouTube video IDs starting with a dash#600

Open
carlosacchi wants to merge 1 commit into
jdepoix:masterfrom
carlosacchi:fix/cli-video-id-starting-with-dash
Open

fix(cli): accept YouTube video IDs starting with a dash#600
carlosacchi wants to merge 1 commit into
jdepoix:masterfrom
carlosacchi:fix/cli-video-id-starting-with-dash

Conversation

@carlosacchi

Copy link
Copy Markdown

Fixes #599.

Problem

argparse treats any positional token starting with - as an unknown option, so a YouTube video ID with a leading dash makes the CLI error out:

$ youtube_transcript_api -X9P8VE94Po --format text
youtube_transcript_api: error: unrecognized arguments: -X9P8VE94Po

The existing \- escape works but is undocumented, and the -- separator does not help because it forces every following token (including flags) to be treated as a positional video ID.

Change

YouTubeTranscriptCli.__init__ now pre-escapes any incoming token matching the YouTube video ID pattern with a single leading dash:

^-[A-Za-z0-9_][A-Za-z0-9_-]{9}$

by prepending a backslash before argparse runs. The existing _sanitize_video_ids step strips that backslash after parsing, so the resulting video_ids list contains the original IDs.

The second character is required to be non-- so that long options such as --languages (which is also 11 characters long) are not misclassified as video IDs. Video IDs starting with -- are extremely rare and can still be passed using the existing manual \ escape.

Tests

Added test_argument_parsing__youtube_id_starting_with_dash_is_auto_escaped covering:

  • -X9P8VE94Po --format text parses as video_ids=['-X9P8VE94Po'], format='text'
  • Multiple dash-prefixed IDs followed by --languages de en parse correctly
  • Normal IDs and options are unaffected

Existing test_argument_parsing__video_ids_starting_with_dash (manual \ escape) still passes.

Full CLI test suite: 23 passed, 1 skipped.

Argparse interprets any positional token starting with '-' as an
unknown option, so calling

    youtube_transcript_api -12345678.. --format text

fails with 'unrecognized arguments'. Auto-escape 11-character tokens
matching the YouTube video ID pattern that start with a single dash
by prepending a backslash before argparse sees them. The existing
_sanitize_video_ids step then strips the backslash, leaving the
original ID intact.

Tokens starting with '--' are left untouched so long options such as
--languages are still parsed correctly; users with the (extremely
rare) video ID starting with '--' can keep using the manual backslash
escape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI fails when video ID starts with a dash (argparse treats it as an option)

1 participant