Skip to content

Support redefinition of named capture groups in regular expressions #152026

Description

@serhiy-storchaka

Feature or enhancement

Currently the re module rejects any reuse of a group name with "redefinition of group name 'x' as group N; was group M". This prevents the natural idiom of giving the same name to corresponding groups in alternative spellings of a pattern.

For example, parsing a date written in either order currently cannot reuse names:

re.compile(r'(?P<m>\d+)/(?P<d>\d+)/(?P<y>\d+)|(?P<y>\d+)-(?P<m>\d+)-(?P<d>\d+)')
# error: redefinition of group name 'y' as group 4; was group 3

Proposal

Allow a group name to be used for more than one group. All such groups share a single group number; the name (and that number) refer to whichever of the groups matched, or to the last one if more than one matched. The group's width spans the union of the definitions.

This matches the behavior of the third-party regex module and of PCRE's duplicate-names feature ((?J)), whose own canonical example — matching a weekday written as an abbreviation or in full and extracting it under one name — is exactly this alternative-spellings case.

Implementation note

This also needs a small fix in the SRE matching engine. Outside of repeats the matcher restores capture-group marks lazily (it only rewinds the lastmark high-water index), which assumes each group number is written at a single place in the bytecode. A reused group number could otherwise leak a mark from a branch that matched and was then backtracked away, or raise SystemError: The span of capturing group is wrong. Reused group numbers can be detected during code validation, and such patterns save and restore the full mark array on backtracking, exactly as is already done inside repeats; other patterns are unaffected.

The same mark save/restore flag also subsumes the existing workaround for the possessive-quantifier SystemError of gh-101955 (which currently installs a placeholder repeat context for the same purpose).

I have a working implementation and will open a PR.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions