feat(analysis): N time-series stability comparison across A-matrix methods#381
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
ddaee84 to
9b1a6f4
Compare
Adds rebuild_run_index_from_drive.py — lists Google Sheets in a diagnostics Drive folder and parses each title into (approach, baseline, sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual- flow gap: users who triggered diagnostics via the GH Actions UI can now auto-build the run index instead of hand-typing sheet IDs. Title regex handles both formats currently in use: Manual: [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics Dispatch: [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point at the new script as the auto-rebuild path, with hand-write as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-series dispatch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta-overwrite Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated in year-Y dollars. Cross-year comparison requires a single dollar reference; add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows (empty year column) skip the step and behave unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dispatcher previously crashed mid-run when `gh run list` returned non-zero exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s backoff and falls back to a "still busy" sentinel so the poll loop keeps spinning instead of unwinding the dispatch and losing the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__ return inferable types. The casts were added to placate Pyright, which isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Removed 6 redundant casts that mypy flagged. The casts placated Pyright but mypy infers the same types without them. - pd.ExcelFile.sheet_names is typed as list[int|str]; explicitly str() the tab name before passing to _parse_tab and pd.read_excel. - Apply black formatting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9b1a6f4 to
d4c3d13
Compare
9be63f6 to
50931d0
Compare
Member
MoLi7
commented
May 7, 2026
| joined, source_year=int(year), ref_year=REFERENCE_DOLLAR_YEAR | ||
| joined, | ||
| source_year=int(float(year)), | ||
| ref_year=REFERENCE_DOLLAR_YEAR, |
Member
Author
There was a problem hiding this comment.
@WesIngwersen this step here deflates the new N values in each year-based diagnostics from the varying model_base_year to a shared REFERENCE_DOLLAR_YEAR, so that the N comparison in a time series is comparing N values based on the same dollar year.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds rebuild_run_index_from_drive.py — lists Google Sheets in a diagnostics Drive folder and parses each title into (approach, baseline, sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual- flow gap: users who triggered diagnostics via the GH Actions UI can now auto-build the run index instead of hand-typing sheet IDs. Title regex handles both formats currently in use: Manual: [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics Dispatch: [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point at the new script as the auto-rebuild path, with hand-write as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-series dispatch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta-overwrite Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated in year-Y dollars. Cross-year comparison requires a single dollar reference; add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows (empty year column) skip the step and behave unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dispatcher previously crashed mid-run when `gh run list` returned non-zero exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s backoff and falls back to a "still busy" sentinel so the poll loop keeps spinning instead of unwinding the dispatch and losing the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__ return inferable types. The casts were added to placate Pyright, which isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Removed 6 redundant casts that mypy flagged. The casts placated Pyright but mypy infers the same types without them. - pd.ExcelFile.sheet_names is typed as list[int|str]; explicitly str() the tab name before passing to _parse_tab and pd.read_excel. - Apply black formatting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d4c3d13 to
10ecba4
Compare
50931d0 to
627fb5b
Compare
…/cornerstone-data/bedrock into mo__step7-ef-time-series-dispatch
…stone-data/bedrock into mo__n-stability-comparison
WesIngwersen
approved these changes
May 12, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


cc:
Closes:
What changed? Why?
Adds
bedrock/analysis/a_matrix_time_series/compare_method_stability.py, a new analysis script that ranks the 3 A-matrix methods (commodity-PI, industry-PI, summary-tables) by year-over-year stability ofNacross 2019–2023.useeiois excluded (no temporal scaling on A — not comparable in this framing).The script reads the per-pair tabs in
ef_comparison.xlsx, pivots to a long panel ofN_new_ref(deflated to 2023$), and emits:output/results/n_yoy_ranking.csv— per-approachmean_abs_yoy_pct,max_abs_yoy_pct,abs_total_drift_pct, each rolled up as median, p95, and emissions-weighted (by|mean_N|).output/results/n_yoy_per_sector.csv— per-sector per-yearN, the 4 transition YoY %s, and the aggregates.output/plots/n_indexed_lines.png— head sectors covering 30% of |mean_N| (cap 8),Nrebased to 2019=100, faceted by method.output/plots/n_yoy_distribution.png— per-method boxplot of mean |YoY %| + per-transition |YoY %| boxplots grouped by method.Also fixes a year-coercion bug in
compile_ef_diagnostics.py: theyearcolumn fromef_run_index.csvreads as float-string ("2019.0"), soint(year)raised; switched toint(float(year)).Testing
Ran compile + the new script end-to-end against the 20 dispatched bundle_v0_2 cells; all 4 artifacts written, ranking matches expectations (industry-PI ≈ commodity-PI ≪ summary-tables on weighted YoY).