Skip to content

Duplicate Code: Git patch header parsing in safe-output PR handlers #32577

@github-actions

Description

@github-actions

Analysis of commit ac972de

Summary

The create-pull-request handler now contains a robust parser for git diff --git headers, while manifest_file_helpers.cjs still has two separate regex-based helpers that parse the same header format. These three production paths now implement the same patch path extraction responsibility with different behavior.

This is a maintainability issue because future fixes for quoted paths, spaces, renames, or malformed headers need to be made in multiple places, and protection checks can disagree with the new file-count check.

Duplication Details

Pattern: Git patch header path extraction

  • Severity: Medium
  • Occurrences: 3 helper implementations
  • Locations:
    • actions/setup/js/create_pull_request.cjs (lines 435-548): parseDiffGitHeader and countUniquePatchFiles
    • actions/setup/js/manifest_file_helpers.cjs (lines 15-33): extractFilenamesFromPatch
    • actions/setup/js/manifest_file_helpers.cjs (lines 49-62): extractPathsFromPatch

Code Samples

create_pull_request.cjs now scans diff --git headers with a dedicated parser:

const headerRe = /^diff --git .*$/gm;
let match;
let unparseableIdx = 0;
while ((match = headerRe.exec(patchContent)) !== null) {
  const path = parseDiffGitHeader(match[0]);
  if (path) {
    files.add(path);
  } else {
    files.add(`__unparseable_header_${match.index}_${unparseableIdx++}`);
  }
}

manifest_file_helpers.cjs separately parses the same headers with regex capture groups:

const matches = patchContent.matchAll(/^diff --git a\/(.+) b\/(.+)$/gm);
for (const match of matches) {
  for (const filePath of [match[1], match[2]]) {
    if (filePath && filePath !== "dev/null") {
      pathSet.add(filePath);
    }
  }
}

Impact Analysis

  • Maintainability: Patch parsing behavior is split across two modules and three helpers, so parser fixes must be duplicated manually.
  • Bug Risk: countUniquePatchFiles handles quoted diff --git paths and conservatively counts unparseable headers, while manifest/protected-file checks only match unquoted a/... b/... headers. The same patch can therefore be counted by the PR file limit but missed by file-protection validation.
  • Code Bloat: The codebase carries multiple patch-path extraction loops that could share one utility and expose basename/full-path/count operations from that common parser.

Refactoring Recommendations

  1. Extract a shared patch path parser

    • Suggested location: actions/setup/js/patch_path_helpers.cjs or extend actions/setup/js/manifest_file_helpers.cjs with a lower-level exported parser.
    • Return structured entries such as { oldPath, newPath, parseable } for each diff --git header.
    • Estimated effort: Medium.
  2. Rebuild existing helpers on top of the shared parser

    • Have countUniquePatchFiles, extractPathsFromPatch, and extractFilenamesFromPatch call the same parser.
    • Preserve current policy differences explicitly: PR limits may count only post-image unique paths, while protection checks should include both sides for renames.
  3. Add cross-helper regression coverage

    • Cover spaces, quoted paths, escaped quotes/backslashes, renames, deleted files, and malformed headers.
    • Verify PR file counting and manifest/protected-file checks see consistent paths where intended.

Implementation Checklist

  • Review parser behavior differences and define the shared return shape
  • Extract common diff --git header parsing utility
  • Update PR file-count logic to use the shared parser
  • Update manifest/protected-file helpers to use the shared parser
  • Add regression tests for quoted and malformed patch headers
  • Run make fmt-cjs and relevant CJS tests

Analysis Metadata

  • Analyzed Files: 1 production file changed in scope
  • Detection Method: Serena semantic symbol overview, symbol/reference analysis, and targeted pattern search
  • Commit: ac972de
  • Workflow Run: §25954628196
  • Analysis Date: 2026-05-16

References:

Generated by 🔍 Duplicate Code Detector ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions