Skip to content

feat(version-scanner): generalize and enhance regex rules for dependency scanning#17574

Open
chalmerlowe wants to merge 4 commits into
mainfrom
feat/version-scanner-regex-enhancements
Open

feat(version-scanner): generalize and enhance regex rules for dependency scanning#17574
chalmerlowe wants to merge 4 commits into
mainfrom
feat/version-scanner-regex-enhancements

Conversation

@chalmerlowe

@chalmerlowe chalmerlowe commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

feat(version-scanner): Generalize and Enhance Regex Rules for Dependency Scanning

Description

This PR enhances the regex patterns used by the version scanner to identify dependency usage across the monorepo. The changes transition the scanner from highly-specific rules to a more generalized and robust set of rules applicable to any package or runtime.

Key Changes

  • Generalized Ruleset: Added flexible rules to capture dependency requirements (>=, <=, ==), wildcard specifications (4.x, 4.*), and custom version constant assignments.
  • Introspection Coverage: Added patterns to detect standard package introspection calls (e.g., __version__, importlib.metadata, packaging.version).
  • Collection Membership Checks: Added support for capturing version checks using collection membership (e.g., VERSION in ["3.", "4."]).
  • Qualifier Enforcement: Tightened flexible major version matching to require an accompanying qualifier (e.g., "python 3." or "protobuf 4.") to eliminate false positives from documentation list items.
  • Config Rename: Renamed regex_config.yaml to regex_pattern_config.yaml to better reflect its purpose and updated all references across the scanner script, documentation, and tests.

Impact

These enhancements improve the scanner's recall when hunting for legacy or specific dependency versions (e.g., protobuf 4.x) without flooding results with documentation noise.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames the regex configuration file to regex_pattern_config.yaml, applies standard code formatting to version_scanner.py, and introduces several new generic rules for identifying dependency versions. The review feedback highlights critical issues in the newly added regex patterns that could lead to a high number of false positives. Specifically, some rules (such as dependency_flexible_version and dependency_introspection) lack the {name} placeholder, allowing them to match unrelated dependencies. Additionally, the dependency_version_constant_membership pattern is too loose and can incorrectly match minor versions, while the dependency_wildcard_generic pattern uses a greedy wildcard that can span across unrelated text on the same line.

Comment thread scripts/version_scanner/regex_pattern_config.yaml Outdated
Comment thread scripts/version_scanner/regex_pattern_config.yaml Outdated
Comment thread scripts/version_scanner/regex_pattern_config.yaml Outdated
Comment thread scripts/version_scanner/regex_pattern_config.yaml Outdated
@chalmerlowe chalmerlowe force-pushed the feat/version-scanner-regex-enhancements branch from f8d1606 to 18012ab Compare June 25, 2026 12:13
Comment thread scripts/version_scanner/regex_pattern_config.yaml Outdated
@chalmerlowe chalmerlowe added the automerge Merge the pull request once unit tests and other checks pass. label Jun 25, 2026
@chalmerlowe chalmerlowe marked this pull request as ready for review June 25, 2026 12:58
@chalmerlowe chalmerlowe requested a review from a team as a code owner June 25, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Merge the pull request once unit tests and other checks pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant