Skip to content

feat: add attribution fuzzer for e2e randomized testing#1414

Closed
svarlamov wants to merge 23 commits into
mainfrom
feat/attr-fuzzer
Closed

feat: add attribution fuzzer for e2e randomized testing#1414
svarlamov wants to merge 23 commits into
mainfrom
feat/attr-fuzzer

Conversation

@svarlamov

@svarlamov svarlamov commented May 21, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds a self-contained attribution fuzzer (tests/integration/fuzzer/) that performs randomized, pathological testing of git-ai's attribution system
  • Uses a char-based oracle: each edit step allocates a unique character mapped to an attribution type (AI or KnownHuman), allowing deterministic verification at blame time without complex state tracking
  • Tests all git operations: multi-edit commits, amend chains, fast-forward merges, rebases, squash merges, and multi-file interleaving
  • Includes 21 fixed-seed tests across 3 profiles (standard, rewrite-heavy, checkpoint-heavy) plus a random-seed test that prints its seed on failure for reproduction
  • Adds Taskfile entries (test:fuzz, test:fuzz:all, test:fuzz:heavy) for running the fuzzer

Design

The fuzzer allocates unique chars (A-Z, a-z, 0-9, then Unicode U+0100+) for each edit step. Each char is mapped to an attribution type. After every commit or rewrite operation, the fuzzer runs git-ai blame and verifies that the author on each line matches the expected attribution for that line's character. This sidesteps the need to track complex state through rewrites — the char on disk tells you what the attribution should be.

Known Findings

The fuzzer identifies real attribution bugs in rewrite operations (amend, rebase, squash) where AI-attributed lines lose their attribution and show as the committer. These are pre-existing bugs, not regressions. Seeds 1, 5, and 8 reliably reproduce these issues.

Test plan

  • cargo check --tests passes
  • Fixed-seed tests are deterministic and reproducible
  • Run task test:fuzz to execute the standard fuzzer suite
  • Verify that failures are in rewrite operations (known bugs) not in normal commit flows

🤖 Generated with Claude Code


Open in Devin Review

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

🐛 1 issue in files not directly in the diff

🐛 Unregistered '?' character in execute_untracked_interleave causes oracle panic (tests/integration/fuzzer/operations.rs:4096-4101)

The execute_untracked_interleave function adds '?' characters to file_state.lines (operations.rs:4100-4101), and the comment at line 4097 claims "the oracle will skip unknown chars during blame verification." However, the oracle's verify_blame function (oracle.rs:176-189) does not skip unknown chars — it panics via unwrap_or_else when self.get(expected_char) returns None for any unregistered character.

When this operation is triggered via CombinedOp::UntrackedInterleave in engine.rs:1205-1214, verify_main_file is called immediately after, passing file_state.lines (which now contains '?' chars) to registry.verify_blame. The lookup fails and the test panics. This will cause spurious fuzzer failures whenever UntrackedInterleave is randomly selected (~1/32 probability per combined op).

View 18 additional findings in Devin Review.

Open in Devin Review

svarlamov and others added 23 commits May 28, 2026 22:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Multiple interleaved edits (AI + Human) per commit cycle
- Rewrite ops on same file (amend chains, rebase, squash merge)
- Multi-file rapid-fire checkpoint bursts to stress daemon
- OverwriteAll and destructive strategies enabled
- Removed Untracked attribution type (known design limitation:
  content after AI checkpoint without subsequent checkpoint gets
  attributed to AI by design)
- Replace cherry-pick with ff-merge (known daemon reflog ambiguity
  bug with cherry-pick in repos with many commits)

Found real bugs:
- AI attribution loss during rewrite operations (seeds 1, 5, 8)
- Known human edits inserted between AI lines lose attribution
- Rapid checkpoint interleaving reveals attribution boundary issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename RewriteOp::CherryPick to FfMerge to match actual behavior
- Rename execute_cherry_pick_same_file to execute_ff_merge
- Remove unused parameters (_file_state, _allow_destructive)
- Pass actual seed to verify_blame instead of hardcoded 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…to fuzzer

Three major enhancements to the attribution fuzzer:

1. Partial staging: tests line-level partial commits, selective file
   commits, and interleaved partial commits across multiple files.
   Forces git-ai to correctly split working log entries between
   committed and uncommitted attribution.

2. Session verification: after each commit, verifies that the authorship
   note contains the correct session types (AI sessions for AI lines,
   h_ entries for human lines). Catches session data loss during
   rewrite operations.

3. Destructive/pathological operations: hard reset, soft reset + recommit,
   checkout discard, stash/pop cycles, dirty branch switches,
   reset-and-reedit, and checkpoint-then-overwrite. Stresses the daemon
   with rapid HEAD changes and discarded working state.

New fuzzer profiles: partial_stage_heavy (60% partial ops),
destructive_heavy (50% destructive ops). New Taskfile entries:
test:fuzz:partial, test:fuzz:destructive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New operation categories:
- File operations: rename (git mv), delete+recreate, move to subdirectory,
  concurrent multi-file creation
- Stress operations: rapid checkpoint bursts (5-15 rapid-fire), double
  commit rapid fire, alternating amend (3-6 AI/human flips), amend
  attribution flip, multi-commit rebase (3-5 commits)
- Enhanced destructive: mixed reset, stash with pathspec, orphaned
  checkpoints (fire then discard), empty commit interleaving
- Enhanced partial staging: squash merge with partial staging

New fuzzer profiles:
- file_ops_heavy: 45% file operations
- stress_heavy: 55% stress operations
- chaos: equal distribution across ALL operation types (max pathological)

New test suites: fuzz_file_ops_*, fuzz_stress_*, fuzz_chaos_* (including
random seed chaos test). Total: 51 fuzzer test cases.

Updated Taskfile with test:fuzz:partial and test:fuzz:destructive targets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New pathological operations:
- Thrash: rapid cycle of edit→commit→edit→discard/amend/recommit
- Rebase-then-amend: rebase branch, then immediately amend the result
- Checkpoint on non-existent file: fire checkpoint before file exists
- Two-branch merge: create divergent branches, merge both back (true
  merge commit with multiple parents)
- Exponential amend: double file size each amend step (1→2→4→8→16→32)
- Session interleave: 4-10 alternating AI/human edits with mixed
  strategies (append/prepend/insert) before a single commit

Total operations available: 33 distinct pathological patterns across
5 categories (rewrite, destructive, partial staging, file ops, stress).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e, multi-squash, and more

Adds 10 new pathological operations:
- cherry_pick_conflict: cherry-pick with deliberate conflicts
- rapid_branch_merge: rapid create-commit-merge branch cycles
- rebase_cherry_pick_combo: interleaved rebase and cherry-pick
- reset_edit_recommit: mixed reset then re-edit and recommit
- checkpoint_storm: 5-15 rapid-fire checkpoints before single commit
- partial_amend_flip: partial stage + amend with flipped attribution
- discard_then_reedit: checkout discard then new attribution
- create_delete_batch: batch file creation then random deletion
- multi_squash: N commits squashed into one via soft reset
- alternating_amend_storm: 4-10 rapid amends alternating AI/human

New CombinedOp category in generators with combined_heavy profile.
Taskfile entry for test:fuzz:combined added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… more combined ops

New combined operations:
- rename_chain: sequential renames A→B→C→D with edits between
- fixup_squash: main commit + N fixup commits then squash
- empty_tree_rebuild: delete all files, commit, recreate from scratch
- revert_then_redo: commit, revert, then new attribution
- selective_multi_file_commit: edit multiple files, commit in batches
- amend_with_deletion: amend a commit to also delete a file
- recommit_loop: repeated soft-reset + recommit cycles

Total combined ops now: 14 variants across all pathological patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cases

New operations targeting specific bug-prone areas:
- initial_carryover: multiple checkpoint rounds without commit
- merge_conflict_resolve: branch merge with conflict resolution
- double_checkpoint_race: rapid AI→human→AI checkpoints on same file
- hunk_partial_stage: stage only first hunk, commit, then rest
- rename_during_edit: rename one file while editing another in same commit
- noop_overwrite: checkpoint identical content then real edit
- concurrent_sessions: multiple AI/human sessions interleaved
- amend_shrink: amend that reduces file size (removes lines)

Total combined ops: 22 variants. Total operations across all categories: 57.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… merge, and more

New operations targeting daemon sequencer and history rewriting:
- deep_rebase_chain: N-deep branch rebase (3-7 commits) onto diverged base
- untracked_interleave: edits without checkpoints mixed with real attributions
- rapid_head_change: multiple commits then hard reset to middle, new branch
- three_way_merge: create two branches, merge both back (octopus-style)
- edge_case_commit_flags: empty messages, long messages, special chars

Total combined ops: 27 variants. Total operation types across all categories: 62.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d more extreme ops

Final batch of pathological operations:
- rapid_lifecycle: checkpoint→commit→amend cycles in rapid succession
- multi_stash: create multiple stash entries, pop in sequence
- overwrite_and_rollback: OverwriteAll + soft reset + new content
- cherry_pick_chain: N sequential cherry-picks from a source branch
- interleaved_amend_new: alternating new commits and amends

Total combined ops: 32 variants.
Total unique operation types across all categories: 67.
Test count: 58 test cases across 9 profiles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Marathon tests run 150-200 operations in chaos mode for maximum coverage.
Marked #[ignore] so they don't run in normal test suite but can be
invoked via `task test:fuzz:marathon`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion holes

Adds squash-specific operations that replicate real-world user reports
of missing data/holes in attribution after squash merges:

- squash_mixed_attribution: 4-8 commits alternating AI/human at different positions
- squash_after_amend: branch commits amended before squash (pre-rewritten notes)
- squash_then_amend: squash then immediately amend (most common hole cause)
- squash_rebased_branch: rebase branch then squash (double rewrite)
- squash_with_overwrites: later commits overwrite earlier lines then squash
- squash_multi_file: multiple files with different attributions squashed
- squash_reset_recommit: squash, soft reset, recommit (double-squash pattern)
- squash_nonlinear_branch: branch with merge commits then squash

Also adds:
- squash_heavy FuzzerConfig profile (55% combined ratio)
- 7 fixed-seed + 1 random-seed squash test cases
- task test:fuzz:squash Taskfile entry

All seeds immediately find the known "AI lines present but no AI session"
bug, confirming the fuzzer correctly catches the reported squash issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements exact session tracking in the oracle: after each successful
verification, all session IDs (s_* for AI, h_* for human) are extracted
from the authorship note and stored. On subsequent verifications, the
oracle asserts that ALL previously-committed sessions are still present
in the current HEAD's note — sessions must never disappear through
rewrite operations (amend, squash, rebase, cherry-pick).

This catches the specific bug class where sessions representing "failed
paths" (overwritten contributions) are lost during rewrites. The
invariant: sessions accumulate monotonically; a commit's authorship note
must be a superset of all source commits' sessions.

Session tracking is reset before destructive operations (hard reset,
branch switch, thrash, mixed reset) that legitimately drop commits, and
before combined ops that involve resets (ResetEditRecommit, EmptyTree,
RecommitLoop, RapidHeadChange, OverwriteAndRollback, SquashResetRecommit).

Immediately finds real bugs: seed 0 shows 3 AI sessions lost after an
amend operation, confirming the rewrite hooks don't carry forward all
source sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove dead code (empty if-bodies, unused snapshot_sessions/restore_sessions),
remove incorrect #[allow(dead_code)], cap operation_log growth at 500 entries
to bound memory in marathon mode, and add workflow operations module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix execute_fixup_autosquash: abort failed rebase before retrying with
  different base (prevents "rebase already in progress" error)
- Fix execute_stash_pop_cycle: use read_file_state_from_disk helper
  instead of raw fs::read_to_string (handles missing file after pop)
- Fix execute_cherry_pick_conflict: use .ok() for abort (abort can fail
  if cherry-pick didn't leave conflicted state)
- Fix execute_amend_chain: make amend fallible with early return instead
  of panicking if amend fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rathon

Global sh: vars (TEST_BINARY_ARGS) aren't re-evaluated when subtask vars
override NO_CAPTURE/EXTRA_TEST_BINARY_ARGS. Add SUBTASK_BINARY_ARGS param
to test:base that subtasks can set directly, bypassing the sh: evaluation
timing issue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When multiple valid chains exist in the HEAD reflog (common in repos with
many operations), pick the most recent one instead of erroring. Since we
iterate chronologically (oldest first), the last match corresponds to the
most recent reflog entries and is the correct chain for the daemon.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix 1: Panic on blame line count mismatch when verifiable chars exist
  (instead of silently skipping). Divergence only tolerated after
  mark_all_unverifiable.
- Fix 5: verify_multi_file_commit checks all files in a commit have
  their attribution tracked in the note.
- Fix 9: verify_note_line_ranges parses note attestation entries and
  verifies AI sessions only claim AI-attributed lines and vice versa.
- Fix 10: verify_note_schema validates note structure (separator, valid
  JSON, required keys, attestation format).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root causes fixed:
- fast-import race on refs/notes/ai: concurrent note writes would silently
  fail when the ref tip moved between read and write. Added retry loop with
  backoff for both notes_add_batch and notes_add_blob_batch.
- Rebase skipping commits with existing notes: rewrite_authorship_after_rebase_v2
  would skip new commits that already had a note, even when the original had AI
  attestation data not present in the new note. Now always reprocesses when the
  original has AI data.
- Empty pathspecs during rewrite-path post-commit: when working log is empty
  after a rewrite, fall back to final_state_override keys as pathspecs.
- Double-processing overwriting valid notes: never overwrite an existing note
  that has more attestation entries than the newly-generated one.
- Family sequencer blocking checkpoints behind PendingRoot: checkpoints are now
  extracted eagerly and sorted before commands to ensure working log is populated
  before commands read it.
- Synthetic human replay on already-archived commits: skip replay when old-{sha}
  archive already exists.
- Trace payload execution without family lock: acquire side_effect_exec_lock.
- Fuzzer: use dirty_files for all checkpoints (pre and post edit) to eliminate
  disk-read races, add sync_daemon_force after stash ops, use reset --hard for
  stash conflict recovery, add file existence checks in oracle.

Verified: 3 consecutive clean runs of 71 tests at 12 threads, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@svarlamov

Copy link
Copy Markdown
Member Author

superseded

@svarlamov svarlamov closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant