Skip to content

Add US L0 refit H5 reconstruction utility#234

Merged
MaxGhenis merged 1 commit into
mainfrom
codex/sparse-default-release-20260701
Jul 1, 2026
Merged

Add US L0 refit H5 reconstruction utility#234
MaxGhenis merged 1 commit into
mainfrom
codex/sparse-default-release-20260701

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Summary

  • add a reusable US L0/refit H5 reconstruction utility that attaches saved post-L0 refit weights to an existing base support H5
  • emit a content-addressed reconstruction manifest with base H5, weight NPZ, output H5, selected counts, weight sum, copied Populace attrs, and saved calibration metadata
  • route the fiscal release builder's L0/refit export path through the same shared helper so fresh release builds and saved-weight reconstruction use one attachment implementation

Local reconstruction performed

uv run --package populace-build --extra us python tools/export_us_l0_refit_h5.py \
  --base-h5 /Users/maxghenis/.codex-worktrees/populace-historical-asec-pooling-20260629/out/pooled-asec-puf-support-three-year-20260629/base/base_populace_us_2024_puf_support.h5 \
  --weights-npz /Users/maxghenis/maria-l0-review/l0-paper-full-surface/runs/fixed-lambda-3asec-share0p8-e1500-matched-baselines-20260630/weights/informed_l0_refit_seed0_budget57240_l02p369e-06.npz \
  --output-h5 out/sparse-default-release-20260701/artifacts/populace_us_2024.h5 \
  --summary-json out/sparse-default-release-20260701/artifacts/populace_us_2024.l0_refit_export_summary.json

Final local artifact summary:

  • output H5: out/sparse-default-release-20260701/artifacts/populace_us_2024.h5
  • output SHA256: f028c2ee15b011a678586f5da024bd0dc2e10a5a8d6a9eef91235f34cc393c37
  • base H5 SHA256: ec290055a1856e8528b13818e506501f160398b8660f67a3edc6bbea869fbe08
  • weight NPZ SHA256: 3c2a872c7e624218f9fe2aa920210419c23c4b5b9c33cb38455f5f04fe3e3e16
  • households: 57,240
  • people: 166,302
  • tax units: 79,736
  • states: 51
  • congressional districts: 436
  • household weight sum: 134,690,323.3402448
  • saved refit loss metadata: 0.04740850352691568 on 32,633 targets
  • copied CD attrs: populace_congressional_district_vintage_target=119th_congress, populace_congressional_district_vintage_crosswalk_sha256=a61553933a585d23365bef0602328ff347a0ae00f35e8499516f0eef069ba17d

HDF5 bytes are not assumed deterministic across repeated writes; the manifest records the exact produced artifact hash.

Tests

  • uv run ruff format packages/populace-build/src/populace/build/us_runtime/l0_refit_export.py packages/populace-build/tests/test_us_l0_refit_export.py tools/export_us_l0_refit_h5.py tools/build_us_fiscal_refresh_release.py
  • uv run ruff check packages/populace-build/src/populace/build/us_runtime/l0_refit_export.py packages/populace-build/tests/test_us_l0_refit_export.py tools/export_us_l0_refit_h5.py tools/build_us_fiscal_refresh_release.py
  • uv run pytest packages/populace-build/tests/test_us_l0_refit_export.py packages/populace-build/tests/test_us_fiscal_refresh_builder.py::test_l0_refit_export_subsets_clean_base_frame
  • local H5 load/provenance check against policyengine_us.data.USSingleYearDataset

@MaxGhenis MaxGhenis merged commit a518894 into main Jul 1, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the codex/sparse-default-release-20260701 branch July 1, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant