Skip to content

Add LMDI signal-noise diagnosis for A-matrix physical residual#405

Draft
MoLi7 wants to merge 13 commits into
mainfrom
mo__lmdi-signal-noise
Draft

Add LMDI signal-noise diagnosis for A-matrix physical residual#405
MoLi7 wants to merge 13 commits into
mainfrom
mo__lmdi-signal-noise

Conversation

@MoLi7

@MoLi7 MoLi7 commented May 12, 2026

Copy link
Copy Markdown
Member

cc:
Closes:

What changed? Why?

Adds a 7-script pipeline under bedrock/analysis/a_matrix_time_series/signal_noise/ that diagnoses whether the A-matrix physical residual (LMDI Q_phys = A_summary / A_pi) carries real signal or BEA-revision noise.

The pipeline runs in three phases:

  • Phase A (derive_A_snapshots.pycompute_lmdi_phys.pyplot_lmdi_phys.py) — per-year A snapshots for summary_tables and commodity_price_index; cell-level Q_phys + LMDI aggregation to output sector / NAICS-3.
  • Phase B (compute_consistency_tests.pyplot_consistency_tests.pyextract_signal_clean_naics3.py) — lag-1 autocorr, within-NAICS-3 coherence (LMDI-weighted ICC), magnitude/shape distribution, and a 3-threshold pass/fail flag.
  • Phase C (validate_klems.py) — Pearson correlation against BEA-BLS KLEMS TFP and Materials/Output ratio at NAICS-3 level.

Side changes:

  • bedrock/utils/config/usa_config.py: adds 2018 to the model_base_year Literal — Phase A.1 needs the full 2017–2024 window.
  • bedrock/analysis/a_matrix_time_series/compare_method_stability.py: expands the 2-panel |YoY| boxplot into 3 panels — pooled, per-transition, and an ECDF reading view.

Testing

compute_lmdi_phys produces 816,590 active cells × 7 transitions; compute_consistency_tests headlines (ICC 0.182/0.187 dom/imp, pooled lag-1 r −0.054/−0.195) reproduce stably; extract_signal_clean_naics3 round-trips. Black, ruff, mypy clean across bedrock/analysis/a_matrix_time_series/ and bedrock/utils/config/usa_config.py.

MoLi7 and others added 13 commits May 11, 2026 09:59
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds rebuild_run_index_from_drive.py — lists Google Sheets in a
diagnostics Drive folder and parses each title into (approach, baseline,
sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual-
flow gap: users who triggered diagnostics via the GH Actions UI can now
auto-build the run index instead of hand-typing sheet IDs.

Title regex handles both formats currently in use:
  Manual:    [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics
  Dispatch:  [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics

Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point
at the new script as the auto-rebuild path, with hand-write as a
fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patcher

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-series dispatch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta-overwrite

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated
in year-Y dollars. Cross-year comparison requires a single dollar reference;
add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year
to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows
(empty year column) skip the step and behave unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dispatcher previously crashed mid-run when `gh run list` returned non-zero
exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s
backoff and falls back to a "still busy" sentinel so the poll loop keeps
spinning instead of unwinding the dispatch and losing the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__
return inferable types. The casts were added to placate Pyright, which
isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Removed 6 redundant casts that mypy flagged. The casts placated Pyright
  but mypy infers the same types without them.
- pd.ExcelFile.sheet_names is typed as list[int|str]; explicitly str()
  the tab name before passing to _parse_tab and pd.read_excel.
- Apply black formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MoLi7 commented May 12, 2026

Copy link
Copy Markdown
Member Author

@MoLi7 MoLi7 changed the title feat(analysis): add LMDI signal-noise diagnosis for A-matrix Add LMDI signal-noise diagnosis for A-matrix physical residual May 12, 2026
@MoLi7 MoLi7 marked this pull request as draft May 12, 2026 20:18
Base automatically changed from mo__n-stability-comparison to main May 13, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant