perf: optimize checkpoint flow with 9 targeted improvements#1292
Open
svarlamov wants to merge 4 commits into
Open
perf: optimize checkpoint flow with 9 targeted improvements#1292svarlamov wants to merge 4 commits into
svarlamov wants to merge 4 commits into
Conversation
- Add parent-dir keyed repo discovery cache to avoid redundant directory walks for sibling files in build_checkpoint_files - Hoist create_dir_all out of per-file async loop in save_current_file_states - Skip blob write when content-addressed file already exists - Wrap dirty_files in Arc to avoid deep-cloning all file contents - Eliminate redundant second Myers diff by deriving line stats from attribution diff ops (with CRLF normalization) - Avoid entries.clone() by moving entries into Checkpoint and iterating checkpoint.entries for metrics - JSONL append-only instead of full read/prune/rewrite on every checkpoint (lazy compaction at 200KB threshold) - Short-circuit hash migration when no 7-char hashes exist (common case) - Reverse iteration in build_previous_file_state_maps to avoid cloning attributions for entries that would be overwritten Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The positional comparison in Replace blocks would produce inflated stats when CRLF changes co-occurred with insertions/deletions that shifted line positions. Replace with a sub-diff on normalized slices within the Replace block so the diff algorithm correctly identifies which lines actually changed regardless of line-ending differences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Proves that the sub-diff approach in Replace blocks correctly handles CRLF→LF conversion combined with deletions/insertions that shift line positions — the exact scenario flagged by Devin's review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
71370f7 to
f715011
Compare
When read_all_checkpoints() fails during compaction (e.g. corrupted JSON line from interrupted write), unwrap_or_default() would return an empty Vec and write_all_checkpoints would erase all data. Now we skip compaction entirely on read failure, preserving existing data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
create_dir_allout of per-file async loop and skip existing blob writes (content-addressed dedup)dirty_filesinArcto avoid deep-cloning all file contents on every checkpointentries.clone()by moving entries intoCheckpointand iteratingcheckpoint.entriesfor metricsbuild_previous_file_state_mapsto skip cloning attributions for overwritten entriesTest plan
task lintandtask fmtclean🤖 Generated with Claude Code