Skip to content

Add gcov-based test pruning with file-level coverage cache#1316

Open
sbryngelson wants to merge 10 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3-rebase
Open

Add gcov-based test pruning with file-level coverage cache#1316
sbryngelson wants to merge 10 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3-rebase

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Mar 15, 2026

Summary

  • File-level gcov coverage cache maps test UUIDs to exercised .fpp source files (gzip JSON, committed to repo)
  • --only-changes flag prunes tests by intersecting PR-changed files against coverage cache
  • --build-coverage-cache flag + 3-phase parallel cache builder (prepare, run, gcov collect)
  • New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes
  • Dep-change detection greps PR/push diffs for added use/include statements
  • Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite
  • 54 unit tests cover core coverage logic

Replaces #1284.

Test plan

  • CI lint checks pass
  • rebuild-cache job triggers (dep_changed detection)
  • Test jobs download cache artifact and prune tests via --only-changes

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 15, 2026

Outdated review (resolved)This review was on an earlier revision. All findings have been addressed.

@sbryngelson sbryngelson force-pushed the gcov-test-pruning-v3-rebase branch 9 times, most recently from c4efbc6 to 68b035b Compare March 16, 2026 00:03
@github-actions
Copy link

github-actions bot commented Mar 16, 2026

Outdated review (resolved)This review was on an earlier revision. All findings have been addressed.

@sbryngelson sbryngelson marked this pull request as ready for review March 17, 2026 00:56
Copilot AI review requested due to automatic review settings March 17, 2026 00:56
@github-actions
Copy link

github-actions bot commented Mar 17, 2026

Claude Code Review

Head SHA: c6cf6c6

Files changed: 14

  • .github/workflows/test.yml (+136/-7)
  • toolchain/mfc/test/coverage.py (new, 804 lines)
  • toolchain/mfc/test/test_coverage_unit.py (new, 802 lines)
  • toolchain/mfc/test/test.py (+77/-1)
  • toolchain/mfc/test/case.py (+49/-17)
  • CMakeLists.txt, .github/workflows/common/*.sh, toolchain/mfc/cli/commands.py, others

Summary:

  • Adds gcov-based file-level coverage cache + --only-changes test pruning for PRs
  • New rebuild-cache CI job runs on Phoenix SLURM when cases.py or Fortran deps change
  • 3-phase cache builder: prepare, run in parallel (capped at 16), gcov collect
  • Conservative fallbacks throughout: missing cache → full suite; test not in cache → include
  • 54 unit tests for coverage logic

Findings:

1. Blocked tests if rebuild-cache failstest.yml lines 200–205, 244–252

The github and self jobs are conditioned on:

(needs.rebuild-cache.result == 'success' || needs.rebuild-cache.result == 'skipped')

If rebuild-cache fails (e.g. Phoenix SLURM OOM, NFS errors), all downstream test jobs are blocked — not just degraded. Consider adding || needs.rebuild-cache.result == 'failure' here, or adding continue-on-error: true to the rebuild-cache job itself, so a cache rebuild failure degrades gracefully to full-suite tests rather than blocking CI entirely.

2. git push to master with no failure handlingtest.yml lines 186–192

git rebase origin/master
git push origin HEAD:refs/heads/master

The [skip ci] tag prevents loops, but git rebase can fail if master advanced in a way that creates a conflict (rare but possible given the 20–240 minute job runtime). No continue-on-error: true here means a rebase failure fails the job. Consider continue-on-error: true on the "Commit Cache to Master" step, since a missed cache update is non-blocking for tests.

3. Phase 2 parallelism cap is hardcodedcoverage.py line 982

phase2_jobs = min(n_jobs, 16)

The cap of 16 concurrent tests (each spawning MPI processes under gcov) is documented as a memory guard, but the rebuild-cache.sh script caps NJOBS at 64 for gcov workers. On a Phoenix node with 24 cores, this is fine, but the hardcoded 16 limit in Python is separate from the shell-level cap and the two aren't connected. This is minor but could be confusing if SLURM allocation sizes change.

No issues found with:

  • GCOV_PREFIX_STRIP computation (math is correct)
  • GITHUB_EVENT_NAME propagation into SLURM (exported inside the heredoc before module load)
  • contents: write permission scope (correctly gated to push/workflow_dispatch on master, not PR events)
  • Thread safety (Phase 1 is single-threaded as documented; each Phase 2 worker gets an isolated directory)
  • Fallback chain (missing cache → full suite; git failure → full suite; no Fortran changes → skip all)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces file-level gcov coverage–based test pruning to speed up PR CI runs by running only tests whose covered .fpp files overlap with the PR’s changed files, with CI support to rebuild and distribute the coverage cache when needed.

Changes:

  • Adds a coverage cache builder and a --only-changes pruning path to the mfc.sh test command.
  • Extends CI to (optionally) rebuild/upload the coverage cache and to run PR tests with --only-changes.
  • Updates build/test infrastructure to improve gcov fidelity (Fypp line markers, gcov-friendly flags) and adds offline unit tests for the coverage logic.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
toolchain/mfc/test/coverage.py Implements coverage cache build + changed-file-based pruning logic.
toolchain/mfc/test/test.py Wires --only-changes and --build-coverage-cache into test execution and filtering.
toolchain/mfc/cli/commands.py Adds CLI flags: --build-coverage-cache, --only-changes, --changes-branch.
toolchain/mfc/test/test_coverage_unit.py Adds offline unit tests for the coverage module.
toolchain/mfc/test/test_coverage_cache.json.gz Adds the committed coverage cache artifact.
toolchain/mfc/test/case.py Refactors post_process-related parameter mods for reuse by coverage builder.
toolchain/mfc/test/cases.py Skips 1D_qbmm example due to a gfortran 12 overflow issue.
CMakeLists.txt Adjusts gcov builds (Fypp line markers; disables LTO for gcov; uses -O1 in Release for accuracy/portability).
.github/workflows/test.yml Adds cache rebuild job + artifact download; runs PR tests with --only-changes.
.github/workflows/common/test.sh Enables pruning only on PR events in SLURM-based CI runs.
.github/workflows/common/rebuild-cache.sh Adds SLURM script to rebuild the gcov cache.
.github/scripts/submit-slurm-job.sh Exports GITHUB_EVENT_NAME into SLURM job environment.
.github/file-filter.yml Adds cases_py filter and includes common workflow paths.
.gitignore Ignores legacy raw (non-gz) coverage cache file.
Comments suppressed due to low confidence (1)

toolchain/mfc/test/test.py:90

  • __filter operates on the objects returned by list_cases() (TestCaseBuilder instances), and later test() converts them via .to_case(). The current return type hint uses List[TestCase], which is misleading and makes type-checking harder; consider changing the annotations to List[TestCaseBuilder] (and importing that type) for both inputs and returned lists.
def __filter(cases_) -> typing.Tuple[typing.List[TestCase], typing.List[TestCase]]:
    cases = cases_[:]
    selected_cases = []
    skipped_cases = []

Comment on lines +137 to +194
def _parse_gcov_json_output(raw_bytes: bytes, root_dir: str) -> set:
"""
Parse gcov JSON output and return the set of .fpp file paths with coverage.
Handles both gzip-compressed (gcov 13+) and raw JSON (gcov 12) formats.
Handles concatenated JSON objects from batched gcov calls (multiple .gcno
files passed to a single gcov invocation).
Only .fpp files with at least one executed line are included.
"""
try:
text = gzip.decompress(raw_bytes).decode("utf-8", errors="replace")
except (gzip.BadGzipFile, OSError):
text = raw_bytes.decode("utf-8", errors="replace")

result = set()
real_root = os.path.realpath(root_dir)
parsed_any = False

# Parse potentially concatenated JSON objects (one per .gcno file).
decoder = json.JSONDecoder()
pos = 0
while pos < len(text):
while pos < len(text) and text[pos] in " \t\n\r":
pos += 1
if pos >= len(text):
break
try:
data, end_pos = decoder.raw_decode(text, pos)
pos = end_pos
parsed_any = True
except json.JSONDecodeError:
remaining = len(text) - pos
if remaining > 0:
cons.print(f"[yellow]Warning: gcov JSON parse error at offset {pos} ({remaining} bytes remaining) — partial coverage recorded for this test.[/yellow]")
break

for file_entry in data.get("files", []):
file_path = file_entry.get("file", "")
if not file_path.endswith(".fpp"):
continue
if any(line.get("count", 0) > 0 for line in file_entry.get("lines", [])):
try:
rel_path = os.path.relpath(os.path.realpath(file_path), real_root)
except ValueError:
rel_path = file_path
# Only keep src/ paths — build/staging/ artifacts from
# case-optimized builds are auto-generated and never
# appear in PR diffs.
if rel_path.startswith("src/"):
result.add(rel_path)

# If no JSON was parsed at all (complete garbage input), return None
# so the caller omits this test from the cache (conservatively included
# on future runs). An empty set after successful parsing means the test
# genuinely covers no .fpp files.
if not parsed_any:
return None

return result
Comment on lines +598 to +602
cache["_meta"] = {
"created": datetime.datetime.now(datetime.timezone.utc).isoformat(),
"cases_hash": cases_hash,
"gcov_version": gcov_version,
}
Comment on lines +99 to +105
if echo "$DIFF" | \
grep -qE '^[+-][[:space:]]*(use[[:space:],]+[a-zA-Z_]|#:include[[:space:]]|include[[:space:]]+['"'"'"])'; then
echo "dep_changed=true" >> "$GITHUB_OUTPUT"
echo "Fortran dependency change detected — will rebuild coverage cache."
else
echo "dep_changed=false" >> "$GITHUB_OUTPUT"
fi
Comment on lines +151 to +158
cons.print()
cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
cons.print("-" * 50)
cons.print("[green]No Fortran source changes detected — skipping all tests.[/green]")
cons.print("-" * 50)
cons.print()
skipped_cases += cases
cases = []
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Outdated review (resolved)This review was on an earlier revision. All findings have been addressed.

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.01%. Comparing base (93e3d09) to head (b1685b8).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1316      +/-   ##
==========================================
- Coverage   45.36%   45.01%   -0.36%     
==========================================
  Files          70       70              
  Lines       20499    20561      +62     
  Branches     1953     1961       +8     
==========================================
- Hits         9300     9256      -44     
- Misses      10074    10177     +103     
- Partials     1125     1128       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson force-pushed the gcov-test-pruning-v3-rebase branch from c6cf6c6 to f5d7344 Compare March 17, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants