Problem
policyengine-api-v2-alpha has the strongest database model for versioning on paper and the weakest runtime enforcement in practice.
Today the repo:
- stores
tax_benefit_model_version_id on simulations
- dedupes simulations by
(dataset_id, model_version_id, policy_id, dynamic_id)
- exposes model-version rows via API
- defines a
DatasetVersion model
But the actual execution paths still import us_latest / uk_latest directly and do not honor the selected DB model version when running simulations. Datasets also do not yet have a real compatibility contract tying them to a model/data release bundle.
Relevant code paths:
- API layer and analysis flow:
src/policyengine_api/api/analysis.py
- runtime execution paths using
*_latest: src/policyengine_api/modal_app.py
- dataset schema:
src/policyengine_api/models/dataset.py
- dataset version schema:
src/policyengine_api/models/dataset_version.py
- storage cache keyed only by object path:
src/policyengine_api/services/storage.py
- public docs still imply stronger dataset-version semantics than the runtime actually enforces:
docs/src/app/endpoints/datasets/page.tsx
This makes the current version tables partially decorative. The database can say “run version X on dataset Y”, but the worker can still execute against the latest installed country package and a weakly versioned dataset path.
Desired contract
If a client specifies a TaxBenefitModelVersion and dataset, the runtime must execute that exact compatible bundle, not whatever the latest worker image happens to expose.
That requires:
- real runtime resolution from selected DB version rows to executable model/data bundles
- a first-class compatibility contract between datasets and model/data releases
- cache and dedupe keys that reflect the resolved execution bundle, not just IDs or object names
What should change
- Make worker execution honor the selected
TaxBenefitModelVersion instead of importing us_latest / uk_latest.
- Turn
DatasetVersion into a real execution-time contract, or remove it if we do not intend to use it.
- Tie datasets to compatible model/data bundle identity explicitly.
- Make storage cache keys bundle-aware rather than object-name-only.
- Align docs, ORM models, and runtime behavior so they all describe the same dataset-version semantics.
- Return the resolved immutable bundle in simulation and analysis responses.
Acceptance criteria
- Runtime execution resolves from the selected
TaxBenefitModelVersion and a compatible dataset/data release rather than from *_latest imports.
DatasetVersion is either actively enforced in execution or intentionally removed in favor of a clearer alternative.
- Dataset records have an explicit compatibility story with model/data bundle identity.
- Cache and dedupe keys include the resolved bundle identity rather than only dataset IDs, model version IDs, or object names.
- Public docs and API schemas match the actual runtime contract.
- Simulation and analysis responses expose the resolved model/data bundle used for execution.
Upstream dependencies
This should consume the release contracts from:
And stay aligned with:
Problem
policyengine-api-v2-alphahas the strongest database model for versioning on paper and the weakest runtime enforcement in practice.Today the repo:
tax_benefit_model_version_idon simulations(dataset_id, model_version_id, policy_id, dynamic_id)DatasetVersionmodelBut the actual execution paths still import
us_latest/uk_latestdirectly and do not honor the selected DB model version when running simulations. Datasets also do not yet have a real compatibility contract tying them to a model/data release bundle.Relevant code paths:
src/policyengine_api/api/analysis.py*_latest:src/policyengine_api/modal_app.pysrc/policyengine_api/models/dataset.pysrc/policyengine_api/models/dataset_version.pysrc/policyengine_api/services/storage.pydocs/src/app/endpoints/datasets/page.tsxThis makes the current version tables partially decorative. The database can say “run version X on dataset Y”, but the worker can still execute against the latest installed country package and a weakly versioned dataset path.
Desired contract
If a client specifies a
TaxBenefitModelVersionand dataset, the runtime must execute that exact compatible bundle, not whatever the latest worker image happens to expose.That requires:
What should change
TaxBenefitModelVersioninstead of importingus_latest/uk_latest.DatasetVersioninto a real execution-time contract, or remove it if we do not intend to use it.Acceptance criteria
TaxBenefitModelVersionand a compatible dataset/data release rather than from*_latestimports.DatasetVersionis either actively enforced in execution or intentionally removed in favor of a clearer alternative.Upstream dependencies
This should consume the release contracts from:
And stay aligned with:
policyengine.pythe immutable release boundary for country model and data versions policyengine.py#270