Skip to content

Optimize changelog diffing for large metadata sets#125

Merged
JakeShirley merged 7 commits intomainfrom
jashir/optimize-changelog-diffing
Apr 17, 2026
Merged

Optimize changelog diffing for large metadata sets#125
JakeShirley merged 7 commits intomainfrom
jashir/optimize-changelog-diffing

Conversation

@JakeShirley
Copy link
Copy Markdown
Member

@JakeShirley JakeShirley commented Apr 17, 2026

Summary

The changelog diffing step in @minecraft/api-docs-generator was extremely slow and memory-hungry when processing many Minecraft metadata versions. This PR replaces four hot-path algorithms with faster equivalents.

Changes

1. Value-case comparison — JSON.stringify with sorted-key replacer

Replaces deepCopyJson + removePropertyRecursive + deepEqual with a single JSON.stringify pass per value using a sorted-key replacer. This is the main win, as it eliminates millions of unnecessary object clones on production-scale inputs.

2. compareArray nextSubobjects lookup — Map instead of .map().indexOf()

Builds a Map<key, index> over nextSubobjects once per call instead of allocating an intermediate array and scanning it per element. Reduces complexity from O(N·M) to O(N+M).

3. getChangelogForVersion — reverse scan instead of .map().indexOf()

Scans from the end of the changelog array (where recently-added versions are) instead of mapping the entire array into a new one per call.

4. Per-module changelog copy — serialize once, parse N times

Replaces deepCopyJson(sortedChangelogs) called once per module with a single JSON.stringify followed by N JSON.parse calls.

Benchmark Results

Tested on 5 Minecraft releases (~278 MB of real script module metadata including server-bindings up to 2.5 MB each):

Metric Before After Improvement
Wall-clock time 897 s 75 s 12× faster
Peak RSS 6,683 MB 6,596 MB −87 MB

Output is byte-identical to baseline across all 12 generated changelog JSON files.

Testing

  • All 44 unit tests pass (tools/api-docs-generator)
  • All 18 changelog-related snapshot tests pass (tools/api-docs-generator-test-snapshots)

Replace hot-path algorithms in ChangelogGenerator with faster equivalents:

- Value-case comparison: use JSON.stringify with sorted-key replacer instead
  of deepCopyJson + removePropertyRecursive + deepEqual. This eliminates
  millions of unnecessary object clones on large inputs.
- compareArray nextSubobjects lookup: build a Map<key, index> once per call
  instead of .map().indexOf() per element (O(N*M) -> O(N+M)).
- getChangelogForVersion: reverse linear scan instead of .map().indexOf(),
  avoiding an intermediate array allocation per call.
- Per-module changelog copy: JSON.stringify once + JSON.parse N times instead
  of deepCopyJson per module.

Benchmarked on 5 Minecraft releases (~278 MB of script module metadata):
  Before: 897s wall-clock, 6683 MB peak RSS
  After:   75s wall-clock, 6596 MB peak RSS (12x faster)
@JakeShirley JakeShirley requested a review from rlandav as a code owner April 17, 2026 18:27
Comment thread tools/api-docs-generator/src/changelog.ts Outdated
Comment thread tools/api-docs-generator/src/changelog.ts Outdated
Comment thread tools/api-docs-generator/src/changelog.ts
Comment thread tools/api-docs-generator/src/changelog.ts Outdated
Comment thread tools/api-docs-generator/src/changelog.ts Outdated
@JakeShirley JakeShirley merged commit 8de3a55 into main Apr 17, 2026
3 checks passed
@JakeShirley JakeShirley deleted the jashir/optimize-changelog-diffing branch April 17, 2026 21:18
@JakeShirley JakeShirley changed the title Optimize changelog diffing for large metadata sets (12x speedup) Optimize changelog diffing for large metadata sets Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants