Feature or enhancement
Currently the re module rejects any reuse of a group name with "redefinition of group name 'x' as group N; was group M". This prevents the natural idiom of giving the same name to corresponding groups in alternative spellings of a pattern.
For example, parsing a date written in either order currently cannot reuse names:
re.compile(r'(?P<m>\d+)/(?P<d>\d+)/(?P<y>\d+)|(?P<y>\d+)-(?P<m>\d+)-(?P<d>\d+)')
# error: redefinition of group name 'y' as group 4; was group 3
Proposal
Allow a group name to be used for more than one group. All such groups share a single group number; the name (and that number) refer to whichever of the groups matched, or to the last one if more than one matched. The group's width spans the union of the definitions.
This matches the behavior of the third-party regex module and of PCRE's duplicate-names feature ((?J)), whose own canonical example — matching a weekday written as an abbreviation or in full and extracting it under one name — is exactly this alternative-spellings case.
Implementation note
This also needs a small fix in the SRE matching engine. Outside of repeats the matcher restores capture-group marks lazily (it only rewinds the lastmark high-water index), which assumes each group number is written at a single place in the bytecode. A reused group number could otherwise leak a mark from a branch that matched and was then backtracked away, or raise SystemError: The span of capturing group is wrong. Reused group numbers can be detected during code validation, and such patterns save and restore the full mark array on backtracking, exactly as is already done inside repeats; other patterns are unaffected.
The same mark save/restore flag also subsumes the existing workaround for the possessive-quantifier SystemError of gh-101955 (which currently installs a placeholder repeat context for the same purpose).
I have a working implementation and will open a PR.
Linked PRs
Feature or enhancement
Currently the
remodule rejects any reuse of a group name with "redefinition of group name 'x' as group N; was group M". This prevents the natural idiom of giving the same name to corresponding groups in alternative spellings of a pattern.For example, parsing a date written in either order currently cannot reuse names:
Proposal
Allow a group name to be used for more than one group. All such groups share a single group number; the name (and that number) refer to whichever of the groups matched, or to the last one if more than one matched. The group's width spans the union of the definitions.
This matches the behavior of the third-party
regexmodule and of PCRE's duplicate-names feature ((?J)), whose own canonical example — matching a weekday written as an abbreviation or in full and extracting it under one name — is exactly this alternative-spellings case.Implementation note
This also needs a small fix in the SRE matching engine. Outside of repeats the matcher restores capture-group marks lazily (it only rewinds the
lastmarkhigh-water index), which assumes each group number is written at a single place in the bytecode. A reused group number could otherwise leak a mark from a branch that matched and was then backtracked away, or raiseSystemError: The span of capturing group is wrong. Reused group numbers can be detected during code validation, and such patterns save and restore the full mark array on backtracking, exactly as is already done inside repeats; other patterns are unaffected.The same mark save/restore flag also subsumes the existing workaround for the possessive-quantifier
SystemErrorof gh-101955 (which currently installs a placeholder repeat context for the same purpose).I have a working implementation and will open a PR.
Linked PRs