Skip to content

feat(analysis): N time-series stability comparison across A-matrix methods#381

Merged
MoLi7 merged 30 commits into
mainfrom
mo__n-stability-comparison
May 13, 2026
Merged

feat(analysis): N time-series stability comparison across A-matrix methods#381
MoLi7 merged 30 commits into
mainfrom
mo__n-stability-comparison

Conversation

@MoLi7

@MoLi7 MoLi7 commented May 6, 2026

Copy link
Copy Markdown
Member

cc:
Closes:

What changed? Why?

Adds bedrock/analysis/a_matrix_time_series/compare_method_stability.py, a new analysis script that ranks the 3 A-matrix methods (commodity-PI, industry-PI, summary-tables) by year-over-year stability of N across 2019–2023. useeio is excluded (no temporal scaling on A — not comparable in this framing).

The script reads the per-pair tabs in ef_comparison.xlsx, pivots to a long panel of N_new_ref (deflated to 2023$), and emits:

  • output/results/n_yoy_ranking.csv — per-approach mean_abs_yoy_pct, max_abs_yoy_pct, abs_total_drift_pct, each rolled up as median, p95, and emissions-weighted (by |mean_N|).
  • output/results/n_yoy_per_sector.csv — per-sector per-year N, the 4 transition YoY %s, and the aggregates.
  • output/plots/n_indexed_lines.png — head sectors covering 30% of |mean_N| (cap 8), N rebased to 2019=100, faceted by method.
  • output/plots/n_yoy_distribution.png — per-method boxplot of mean |YoY %| + per-transition |YoY %| boxplots grouped by method.

Also fixes a year-coercion bug in compile_ef_diagnostics.py: the year column from ef_run_index.csv reads as float-string ("2019.0"), so int(year) raised; switched to int(float(year)).

Testing

Ran compile + the new script end-to-end against the 20 dispatched bundle_v0_2 cells; all 4 artifacts written, ranking matches expectations (industry-PI ≈ commodity-PI ≪ summary-tables on weighted YoY).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MoLi7 commented May 6, 2026

Copy link
Copy Markdown
Member Author

@MoLi7 MoLi7 marked this pull request as ready for review May 6, 2026 05:12
@MoLi7 MoLi7 requested a review from WesIngwersen May 6, 2026 05:22
@MoLi7 MoLi7 force-pushed the mo__n-stability-comparison branch from ddaee84 to 9b1a6f4 Compare May 6, 2026 05:28
MoLi7 and others added 11 commits May 6, 2026 09:13
Adds rebuild_run_index_from_drive.py — lists Google Sheets in a
diagnostics Drive folder and parses each title into (approach, baseline,
sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual-
flow gap: users who triggered diagnostics via the GH Actions UI can now
auto-build the run index instead of hand-typing sheet IDs.

Title regex handles both formats currently in use:
  Manual:    [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics
  Dispatch:  [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics

Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point
at the new script as the auto-rebuild path, with hand-write as a
fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-series dispatch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta-overwrite

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated
in year-Y dollars. Cross-year comparison requires a single dollar reference;
add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year
to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows
(empty year column) skip the step and behave unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dispatcher previously crashed mid-run when `gh run list` returned non-zero
exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s
backoff and falls back to a "still busy" sentinel so the poll loop keeps
spinning instead of unwinding the dispatch and losing the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__
return inferable types. The casts were added to placate Pyright, which
isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Removed 6 redundant casts that mypy flagged. The casts placated Pyright
  but mypy infers the same types without them.
- pd.ExcelFile.sheet_names is typed as list[int|str]; explicitly str()
  the tab name before passing to _parse_tab and pd.read_excel.
- Apply black formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MoLi7 MoLi7 force-pushed the mo__n-stability-comparison branch from 9b1a6f4 to d4c3d13 Compare May 6, 2026 16:14
@MoLi7 MoLi7 force-pushed the mo__step7-ef-time-series-dispatch branch from 9be63f6 to 50931d0 Compare May 6, 2026 16:14
@WesIngwersen WesIngwersen removed their request for review May 6, 2026 21:40
@WesIngwersen

Copy link
Copy Markdown
Member
n_indexed_lines

Useful line charts and indexing but the n values need to be converted to a common dollar year to see stability

joined, source_year=int(year), ref_year=REFERENCE_DOLLAR_YEAR
joined,
source_year=int(float(year)),
ref_year=REFERENCE_DOLLAR_YEAR,

@MoLi7 MoLi7 May 7, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WesIngwersen this step here deflates the new N values in each year-based diagnostics from the varying model_base_year to a shared REFERENCE_DOLLAR_YEAR, so that the N comparison in a time series is comparing N values based on the same dollar year.

MoLi7 and others added 6 commits May 11, 2026 09:59
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds rebuild_run_index_from_drive.py — lists Google Sheets in a
diagnostics Drive folder and parses each title into (approach, baseline,
sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual-
flow gap: users who triggered diagnostics via the GH Actions UI can now
auto-build the run index instead of hand-typing sheet IDs.

Title regex handles both formats currently in use:
  Manual:    [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics
  Dispatch:  [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics

Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point
at the new script as the auto-rebuild path, with hand-write as a
fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-series dispatch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta-overwrite

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MoLi7 and others added 6 commits May 11, 2026 09:59
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated
in year-Y dollars. Cross-year comparison requires a single dollar reference;
add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year
to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows
(empty year column) skip the step and behave unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dispatcher previously crashed mid-run when `gh run list` returned non-zero
exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s
backoff and falls back to a "still busy" sentinel so the poll loop keeps
spinning instead of unwinding the dispatch and losing the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__
return inferable types. The casts were added to placate Pyright, which
isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Removed 6 redundant casts that mypy flagged. The casts placated Pyright
  but mypy infers the same types without them.
- pd.ExcelFile.sheet_names is typed as list[int|str]; explicitly str()
  the tab name before passing to _parse_tab and pd.read_excel.
- Apply black formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MoLi7 MoLi7 force-pushed the mo__n-stability-comparison branch from d4c3d13 to 10ecba4 Compare May 12, 2026 20:17
@MoLi7 MoLi7 force-pushed the mo__step7-ef-time-series-dispatch branch from 50931d0 to 627fb5b Compare May 12, 2026 20:17
@WesIngwersen WesIngwersen self-requested a review May 12, 2026 21:02
Base automatically changed from mo__step7-ef-time-series-dispatch to main May 13, 2026 00:02
MoLi7 and others added 2 commits May 12, 2026 17:05
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MoLi7 MoLi7 merged commit bbc3405 into main May 13, 2026
5 checks passed
@MoLi7 MoLi7 deleted the mo__n-stability-comparison branch May 13, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants