Add gcov-based test pruning with file-level coverage cache by sbryngelson · Pull Request #1316 · MFlowCode/MFC

sbryngelson · 2026-03-15T21:33:19Z

Summary

File-level gcov coverage cache maps test UUIDs to exercised .fpp source files (gzip JSON, committed to repo)
--only-changes flag prunes tests by intersecting PR-changed files against coverage cache
--build-coverage-cache flag + 3-phase parallel cache builder (prepare, run, gcov collect)
New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes
Dep-change detection greps PR/push diffs for added use/include statements
Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite
54 unit tests cover core coverage logic

Replaces #1284.

Test plan

CI lint checks pass
rebuild-cache job triggers (dep_changed detection)
Test jobs download cache artifact and prune tests via --only-changes

🤖 Generated with Claude Code

github-actions · 2026-03-15T21:38:52Z

Outdated review (resolved)

This review was on an earlier revision. All findings have been addressed.

github-actions · 2026-03-16T00:49:49Z

Outdated review (resolved)

This review was on an earlier revision. All findings have been addressed.

github-actions · 2026-03-17T01:00:33Z

Claude Code Review

Head SHA: c6cf6c6

Files changed: 14

.github/workflows/test.yml (+136/-7)
toolchain/mfc/test/coverage.py (new, 804 lines)
toolchain/mfc/test/test_coverage_unit.py (new, 802 lines)
toolchain/mfc/test/test.py (+77/-1)
toolchain/mfc/test/case.py (+49/-17)
CMakeLists.txt, .github/workflows/common/*.sh, toolchain/mfc/cli/commands.py, others

Summary:

Adds gcov-based file-level coverage cache + --only-changes test pruning for PRs
New rebuild-cache CI job runs on Phoenix SLURM when cases.py or Fortran deps change
3-phase cache builder: prepare, run in parallel (capped at 16), gcov collect
Conservative fallbacks throughout: missing cache → full suite; test not in cache → include
54 unit tests for coverage logic

Findings:

1. Blocked tests if rebuild-cache fails — test.yml lines 200–205, 244–252

The github and self jobs are conditioned on:

(needs.rebuild-cache.result == 'success' || needs.rebuild-cache.result == 'skipped')

If rebuild-cache fails (e.g. Phoenix SLURM OOM, NFS errors), all downstream test jobs are blocked — not just degraded. Consider adding || needs.rebuild-cache.result == 'failure' here, or adding continue-on-error: true to the rebuild-cache job itself, so a cache rebuild failure degrades gracefully to full-suite tests rather than blocking CI entirely.

2. git push to master with no failure handling — test.yml lines 186–192

git rebase origin/master
git push origin HEAD:refs/heads/master

The [skip ci] tag prevents loops, but git rebase can fail if master advanced in a way that creates a conflict (rare but possible given the 20–240 minute job runtime). No continue-on-error: true here means a rebase failure fails the job. Consider continue-on-error: true on the "Commit Cache to Master" step, since a missed cache update is non-blocking for tests.

3. Phase 2 parallelism cap is hardcoded — coverage.py line 982

phase2_jobs = min(n_jobs, 16)

The cap of 16 concurrent tests (each spawning MPI processes under gcov) is documented as a memory guard, but the rebuild-cache.sh script caps NJOBS at 64 for gcov workers. On a Phoenix node with 24 cores, this is fine, but the hardcoded 16 limit in Python is separate from the shell-level cap and the two aren't connected. This is minor but could be confusing if SLURM allocation sizes change.

No issues found with:

GCOV_PREFIX_STRIP computation (math is correct)
GITHUB_EVENT_NAME propagation into SLURM (exported inside the heredoc before module load)
contents: write permission scope (correctly gated to push/workflow_dispatch on master, not PR events)
Thread safety (Phase 1 is single-threaded as documented; each Phase 2 worker gets an isolated directory)
Fallback chain (missing cache → full suite; git failure → full suite; no Fortran changes → skip all)

Copilot

Pull request overview

Introduces file-level gcov coverage–based test pruning to speed up PR CI runs by running only tests whose covered .fpp files overlap with the PR’s changed files, with CI support to rebuild and distribute the coverage cache when needed.

Changes:

Adds a coverage cache builder and a --only-changes pruning path to the mfc.sh test command.
Extends CI to (optionally) rebuild/upload the coverage cache and to run PR tests with --only-changes.
Updates build/test infrastructure to improve gcov fidelity (Fypp line markers, gcov-friendly flags) and adds offline unit tests for the coverage logic.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`toolchain/mfc/test/coverage.py`	Implements coverage cache build + changed-file-based pruning logic.
`toolchain/mfc/test/test.py`	Wires `--only-changes` and `--build-coverage-cache` into test execution and filtering.
`toolchain/mfc/cli/commands.py`	Adds CLI flags: `--build-coverage-cache`, `--only-changes`, `--changes-branch`.
`toolchain/mfc/test/test_coverage_unit.py`	Adds offline unit tests for the coverage module.
`toolchain/mfc/test/test_coverage_cache.json.gz`	Adds the committed coverage cache artifact.
`toolchain/mfc/test/case.py`	Refactors post_process-related parameter mods for reuse by coverage builder.
`toolchain/mfc/test/cases.py`	Skips `1D_qbmm` example due to a gfortran 12 overflow issue.
`CMakeLists.txt`	Adjusts gcov builds (Fypp line markers; disables LTO for gcov; uses -O1 in Release for accuracy/portability).
`.github/workflows/test.yml`	Adds cache rebuild job + artifact download; runs PR tests with `--only-changes`.
`.github/workflows/common/test.sh`	Enables pruning only on PR events in SLURM-based CI runs.
`.github/workflows/common/rebuild-cache.sh`	Adds SLURM script to rebuild the gcov cache.
`.github/scripts/submit-slurm-job.sh`	Exports `GITHUB_EVENT_NAME` into SLURM job environment.
`.github/file-filter.yml`	Adds `cases_py` filter and includes common workflow paths.
`.gitignore`	Ignores legacy raw (non-gz) coverage cache file.

Comments suppressed due to low confidence (1)

toolchain/mfc/test/test.py:90

__filter operates on the objects returned by list_cases() (TestCaseBuilder instances), and later test() converts them via .to_case(). The current return type hint uses List[TestCase], which is misleading and makes type-checking harder; consider changing the annotations to List[TestCaseBuilder] (and importing that type) for both inputs and returned lists.

def __filter(cases_) -> typing.Tuple[typing.List[TestCase], typing.List[TestCase]]:
    cases = cases_[:]
    selected_cases = []
    skipped_cases = []

toolchain/mfc/test/coverage.py

+def _parse_gcov_json_output(raw_bytes: bytes, root_dir: str) -> set:
+    """
+    Parse gcov JSON output and return the set of .fpp file paths with coverage.
+    Handles both gzip-compressed (gcov 13+) and raw JSON (gcov 12) formats.
+    Handles concatenated JSON objects from batched gcov calls (multiple .gcno
+    files passed to a single gcov invocation).
+    Only .fpp files with at least one executed line are included.
+    """
+    try:
+        text = gzip.decompress(raw_bytes).decode("utf-8", errors="replace")
+    except (gzip.BadGzipFile, OSError):
+        text = raw_bytes.decode("utf-8", errors="replace")
+
+    result = set()
+    real_root = os.path.realpath(root_dir)
+    parsed_any = False
+
+    # Parse potentially concatenated JSON objects (one per .gcno file).
+    decoder = json.JSONDecoder()
+    pos = 0
+    while pos < len(text):
+        while pos < len(text) and text[pos] in " \t\n\r":
+            pos += 1
+        if pos >= len(text):
+            break
+        try:
+            data, end_pos = decoder.raw_decode(text, pos)
+            pos = end_pos
+            parsed_any = True
+        except json.JSONDecodeError:
+            remaining = len(text) - pos
+            if remaining > 0:
+                cons.print(f"[yellow]Warning: gcov JSON parse error at offset {pos} ({remaining} bytes remaining) — partial coverage recorded for this test.[/yellow]")
+            break
+
+        for file_entry in data.get("files", []):
+            file_path = file_entry.get("file", "")
+            if not file_path.endswith(".fpp"):
+                continue
+            if any(line.get("count", 0) > 0 for line in file_entry.get("lines", [])):
+                try:
+                    rel_path = os.path.relpath(os.path.realpath(file_path), real_root)
+                except ValueError:
+                    rel_path = file_path
+                # Only keep src/ paths — build/staging/ artifacts from
+                # case-optimized builds are auto-generated and never
+                # appear in PR diffs.
+                if rel_path.startswith("src/"):
+                    result.add(rel_path)
+
+    # If no JSON was parsed at all (complete garbage input), return None
+    # so the caller omits this test from the cache (conservatively included
+    # on future runs).  An empty set after successful parsing means the test
+    # genuinely covers no .fpp files.
+    if not parsed_any:
+        return None
+
+    return result


toolchain/mfc/test/coverage.py

+    cache["_meta"] = {
+        "created": datetime.datetime.now(datetime.timezone.utc).isoformat(),
+        "cases_hash": cases_hash,
+        "gcov_version": gcov_version,
+    }


.github/workflows/test.yml

+          if echo "$DIFF" | \
+             grep -qE '^[+-][[:space:]]*(use[[:space:],]+[a-zA-Z_]|#:include[[:space:]]|include[[:space:]]+['"'"'"])'; then
+            echo "dep_changed=true" >> "$GITHUB_OUTPUT"
+            echo "Fortran dependency change detected — will rebuild coverage cache."
+          else
+            echo "dep_changed=false" >> "$GITHUB_OUTPUT"
+          fi


toolchain/mfc/test/test.py

+                    cons.print()
+                    cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
+                    cons.print("-" * 50)
+                    cons.print("[green]No Fortran source changes detected — skipping all tests.[/green]")
+                    cons.print("-" * 50)
+                    cons.print()
+                    skipped_cases += cases
+                    cases = []


coderabbitai · 2026-03-17T01:04:42Z

Outdated review (resolved)

This review was on an earlier revision. All findings have been addressed.

codecov · 2026-03-17T03:53:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.01%. Comparing base (93e3d09) to head (b1685b8).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1316      +/-   ##
==========================================
- Coverage   45.36%   45.01%   -0.36%     
==========================================
  Files          70       70              
  Lines       20499    20561      +62     
  Branches     1953     1961       +8     
==========================================
- Hits         9300     9256      -44     
- Misses      10074    10177     +103     
- Partials     1125     1128       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…rify empty gcno_copies return

…ning

…uld_run_all_tests

…owCode#1193)

sbryngelson force-pushed the gcov-test-pruning-v3-rebase branch 9 times, most recently from c4efbc6 to 68b035b Compare March 16, 2026 00:03

sbryngelson marked this pull request as ready for review March 17, 2026 00:56

Copilot AI review requested due to automatic review settings March 17, 2026 00:56

Copilot started reviewing on behalf of sbryngelson March 17, 2026 00:57 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

sbryngelson added 10 commits March 17, 2026 13:44

Add gcov-based test pruning with file-level coverage cache

1e90404

Fix review findings: gcov error handling, diagnostics, comments

ff1cc20

Add tests for load_coverage_cache and _compute_gcov_prefix_strip, cla…

b46c163

…rify empty gcno_copies return

TEMP: disable cases.py staleness check and ALWAYS_RUN_ALL to test pru…

24235d9

…ning

Wait for rebuild-cache when it runs, skip immediately when it doesn't

295bfd0

TEMP: bypass should_run_all_tests to test pruning on this PR

14cbf82

Include example tests in coverage pruning and cache builds

ceda4b7

Revert all TEMP changes: restore staleness check, ALWAYS_RUN_ALL, sho…

dbf0013

…uld_run_all_tests

Fix _parse_gcov_json_output return type annotation to Optional[set]

5ea88f4

Fix lint: remove separator comments flagged by new source linter (MFl…

f5d7344

…owCode#1193)

sbryngelson force-pushed the gcov-test-pruning-v3-rebase branch from c6cf6c6 to f5d7344 Compare March 17, 2026 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gcov-based test pruning with file-level coverage cache#1316

Add gcov-based test pruning with file-level coverage cache#1316
sbryngelson wants to merge 10 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3-rebase

sbryngelson commented Mar 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 15, 2026 •

edited by sbryngelson

Loading

Uh oh!

github-actions bot commented Mar 16, 2026 •

edited by sbryngelson

Loading

Uh oh!

github-actions bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited by sbryngelson

Loading

Uh oh!

codecov bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

sbryngelson commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Mar 15, 2026 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 16, 2026 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

sbryngelson commented Mar 15, 2026 •

edited

Loading

github-actions bot commented Mar 15, 2026 •

edited by sbryngelson

Loading

github-actions bot commented Mar 16, 2026 •

edited by sbryngelson

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading

coderabbitai bot commented Mar 17, 2026 •

edited by sbryngelson

Loading

codecov bot commented Mar 17, 2026 •

edited

Loading