feat(analysis): step 7 dispatch — workflow input + EF time-series dispatcher by MoLi7 · Pull Request #380 · cornerstone-data/bedrock

MoLi7 · 2026-05-05T22:50:51Z

cc:
Closes:

What changed? Why?

Step 7 / Phase 1 — infra to dispatch the EF time-series across (scenario, approach, year) cells without YAML proliferation, with year-aligned GHG inventory data and Sheets-API-quota-safe serialization.

Workflow + config plumbing

Two new optional workflow inputs on generate_diagnostics.yml, mirrored as CLI options on bedrock/utils/validation/generate_diagnostics.py:

model_base_year — overrides cfg.model_base_year. Drives the A-matrix scaling target year + inflation target year.
usa_ghg_data_year — overrides cfg.usa_ghg_data_year. Drives the GHG (E) inventory year + gross-output x-vector year.

Both flow through set_global_usa_config's diagnostics_cli_overrides path. USAConfig.model_base_year Literal expands to {2019…2024} (was {2022, 2023, 2024}); usa_ghg_data_year Literal expands to {2019…2024} (was {2023, 2024}). Both keys added to DIAGNOSTICS_CLI_OVERRIDE_KEYS.

GHG FBS year templatization

load_E_from_flowsa() in bedrock/transform/allocation/derived.py previously hard-coded 'GHG_national_Cornerstone_2023' / 'GHG_national_CEDA_2023'. Templatizes the new_ghg_method and CEDA fallback branches with cfg.usa_ghg_data_year — those FBS YAMLs and GCS parquets exist for 2019–2023. Variant FBSes (*_coa_allocation, *_electricity, etc.) only exist for 2023, so the function raises a clear ValueError if any update_*_method flag is set with year ≠ 2023, instead of failing later with an opaque "FBS not found".

Workflow-level serialization

Add concurrency: { group: generate_diagnostics, cancel-in-progress: false } to generate_diagnostics.yml. Belt-and-suspenders against the Sheets API write quota (60/min/user) — guarantees only one generate_diagnostics job is writing to Sheets at a time, regardless of dispatch source.

Dispatch script

bedrock/analysis/a_matrix_time_series/dispatch_ef_time_series.py. Per (scenario, approach, year) cell:

Creates a Sheet in the v0.3 Diagnostics Drive folder with deterministic title [{run_date}, {model_year}, {baseline} based, {approach_label}, {scenario}] EFs diagnostics.
Triggers generate_diagnostics via gh workflow run with config_name, model_base_year, usa_ghg_data_year, sheet_id, use_useeio_baseline. Both year overrides get the same value per cell.
Appends a row to output/results/ef_run_index.csv (audit trail).

Idempotent — already-recorded cells are skipped, so re-running picks up only unfilled cells. CEDA-only baseline as the starting cut.

Two scenarios with explicit names:

isolate_a_matrix — vary only A-matrix scaling, hold everything else to v0 defaults. Reuses the four Step 6 candidate YAMLs. Currently parked.
bundle_v0_2 (default) — single config 2025_usa_cornerstone_full_model representing the full v0.2 release-candidate stack. 5 cells (1 approach × 5 years).

Throttle modes (--throttle):

poll (default) — block until prior workflow runs clear via gh run list.
sleep:N — fixed N-second sleep between triggers.
none — fire immediately (only safe with bumped Sheets quota).

Re-dispatch from CSV (--re-dispatch-from-csv): re-trigger workflows for cells already in ef_run_index.csv — used to recover from rate-limit batch failures, re-uses existing Sheets.

Compile-script extension

compile_ef_diagnostics.py now tolerates the optional scenario/year columns. When populated, per-pair tab names get a {scenario}_{year}_ prefix and summary rows stamp the dimensions. Step 6's existing CSV (no scenario/year) still works — backfilled to empty strings on load.

Testing

Dry-run dispatches the expected 5 cells for bundle_v0_2 × 2019–2023 with the agreed title format.
compile_ef_diagnostics.py still compiles Step 6's existing 7-row index unchanged.
black --check, ruff check ., mypy bedrock (385 files) all clean.
Live dispatch run on this branch (post-merge of feat(analysis): step 6 phase 2/3 — EF diagnostics compile + plot scripts #379, with the new GHG-year wiring) currently in flight to validate end-to-end.

Pre-merge note

Live dispatch needs the workflow on the target ref. Either land this PR to main first and run with --git-ref main, or run with --git-ref mo__step7-ef-time-series-dispatch to dispatch against the branch (current state).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MoLi7 · 2026-05-05T22:51:04Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Adds rebuild_run_index_from_drive.py — lists Google Sheets in a diagnostics Drive folder and parses each title into (approach, baseline, sheet_id, year?, scenario?) rows for ef_run_index.csv. Closes the manual- flow gap: users who triggered diagnostics via the GH Actions UI can now auto-build the run index instead of hand-typing sheet IDs. Title regex handles both formats currently in use: Manual: [DATE, BASELINE based, A matrix with APPROACH] EFs diagnostics Dispatch: [DATE, YEAR, BASELINE based, A matrix with APPROACH, SCENARIO] EFs diagnostics Also rewords the FileNotFoundError in compile_ef_diagnostics.py to point at the new script as the auto-rebuild path, with hand-write as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…patcher Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-series dispatch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…race Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ta-overwrite Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each diagnostics cell runs at model_base_year=Y so D_new/N_new are denominated in year-Y dollars. Cross-year comparison requires a single dollar reference; add D_new_ref/N_new_ref columns that apply inflation_adjust_ef_denom_to_new_base_year to land every cell on REFERENCE_DOLLAR_YEAR (2023). Step 6 single-year rows (empty year column) skip the step and behave unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Dispatcher previously crashed mid-run when `gh run list` returned non-zero exit (transient API hiccup mid-batch). Now retries 3× per status with a 5s backoff and falls back to a "still busy" sentinel so the poll loop keeps spinning instead of unwinding the dispatch and losing the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mypy flags ta.cast as redundant since pd.to_numeric/concat/__getitem__ return inferable types. The casts were added to placate Pyright, which isn't a CI check. Switching back to no-cast satisfies mypy + black; ruff unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>