Skip to content

Make target materialization checkpoints reusable across calibration-only runs #217

Description

@MaxGhenis

Problem

Changing calibration settings such as epochs, loss weights, or optimizer options should not rerun expensive PolicyEngine target materialization. Today the fiscal refresh scorer/build path can rematerialize JCT reform vectors such as SALT even when the reform definition itself did not change.

This is especially visible when validating Populace on the full CD target surface: tools/score_us_fiscal_targets.py and tools/build_us_fiscal_refresh_release.py materialize JCT targets again because the cache identity includes broad build state such as build commit and target registry version. That is conservative, but too coarse for launch iteration.

Desired design

Use finer-grained checkpoints:

  • Reform-vector cache keyed by the inputs that actually determine per-household reform estimates: (base_h5_sha256, policyengine_us_version, reform_id, period, congressional_district_crosswalk_sha256 if geography-dependent). Do not include unrelated target registry changes or the whole Populace build commit when the reform code and PE-US version are unchanged.
  • Target-frame / constraint-matrix cache keyed by (base_h5_sha256, exact target_surface.sha256, materializer_code_version, policyengine_us_version).
  • Calibration-only reruns should load the cached target frame/matrix and only solve weights.

Acceptance criteria

  • A calibration-only rerun with the same base H5 and target surface does not rerun JCT/SALT reform simulations.
  • A target-surface semantic change still invalidates the target frame/matrix safely.
  • Two different H5s, such as incumbent vs candidate with different record counts, do not share incompatible per-household vectors.
  • Diagnostics/manifest record which checkpoint keys were used.

Context

This came up during the June 2026 full-CD Populace validation. A corrected target-surface score needed fresh materialization, but future changes to only epochs/loss/optimizer should not pay that cost again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions