Skip to content

Make policyengine-api-v2-alpha runtime execution honor model and dataset version selection #219

@MaxGhenis

Description

@MaxGhenis

Problem

policyengine-api-v2-alpha has the strongest database model for versioning on paper and the weakest runtime enforcement in practice.

Today the repo:

  • stores tax_benefit_model_version_id on simulations
  • dedupes simulations by (dataset_id, model_version_id, policy_id, dynamic_id)
  • exposes model-version rows via API
  • defines a DatasetVersion model

But the actual execution paths still import us_latest / uk_latest directly and do not honor the selected DB model version when running simulations. Datasets also do not yet have a real compatibility contract tying them to a model/data release bundle.

Relevant code paths:

  • API layer and analysis flow: src/policyengine_api/api/analysis.py
  • runtime execution paths using *_latest: src/policyengine_api/modal_app.py
  • dataset schema: src/policyengine_api/models/dataset.py
  • dataset version schema: src/policyengine_api/models/dataset_version.py
  • storage cache keyed only by object path: src/policyengine_api/services/storage.py
  • public docs still imply stronger dataset-version semantics than the runtime actually enforces: docs/src/app/endpoints/datasets/page.tsx

This makes the current version tables partially decorative. The database can say “run version X on dataset Y”, but the worker can still execute against the latest installed country package and a weakly versioned dataset path.

Desired contract

If a client specifies a TaxBenefitModelVersion and dataset, the runtime must execute that exact compatible bundle, not whatever the latest worker image happens to expose.

That requires:

  • real runtime resolution from selected DB version rows to executable model/data bundles
  • a first-class compatibility contract between datasets and model/data releases
  • cache and dedupe keys that reflect the resolved execution bundle, not just IDs or object names

What should change

  1. Make worker execution honor the selected TaxBenefitModelVersion instead of importing us_latest / uk_latest.
  2. Turn DatasetVersion into a real execution-time contract, or remove it if we do not intend to use it.
  3. Tie datasets to compatible model/data bundle identity explicitly.
  4. Make storage cache keys bundle-aware rather than object-name-only.
  5. Align docs, ORM models, and runtime behavior so they all describe the same dataset-version semantics.
  6. Return the resolved immutable bundle in simulation and analysis responses.

Acceptance criteria

  • Runtime execution resolves from the selected TaxBenefitModelVersion and a compatible dataset/data release rather than from *_latest imports.
  • DatasetVersion is either actively enforced in execution or intentionally removed in favor of a clearer alternative.
  • Dataset records have an explicit compatibility story with model/data bundle identity.
  • Cache and dedupe keys include the resolved bundle identity rather than only dataset IDs, model version IDs, or object names.
  • Public docs and API schemas match the actual runtime contract.
  • Simulation and analysis responses expose the resolved model/data bundle used for execution.

Upstream dependencies

This should consume the release contracts from:

And stay aligned with:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions