Problem
policyengine-api-v2 already has a useful model-version routing layer, but its public contract is still model-only.
Today the simulation gateway:
- resolves
country + version to a versioned Modal app
- defaults omitted versions to a mutable
latest registry entry
- returns the resolved model version in submit responses
But it does not have first-class data-release semantics. Dataset identity is still whatever data path the caller passes through or whatever default the worker uses. That means the API can tell a client which model package version ran, but not which immutable model+data bundle actually produced the result.
Relevant code paths:
- gateway request/response models and endpoints:
projects/policyengine-api-simulation/src/modal/gateway/models.py, .../gateway/endpoints.py
- worker execution path:
projects/policyengine-api-simulation/src/modal/simulation.py
- versioned Modal app naming/install behavior:
projects/policyengine-api-simulation/src/modal/app.py
- mutable registry updates for
latest: projects/policyengine-api-simulation/src/modal/utils/update_version_registry.py
- checked-in dependency spec is open-ended while runtime installs exact env-driven versions:
projects/policyengine-api-simulation/pyproject.toml
Desired contract
The simulation API should route and report an immutable execution bundle, not just a model package version.
At minimum that bundle should include:
- country model package version
- country data package version
- resolved dataset artifact or manifest revision
- bundle or manifest identifier suitable for caching and replay
latest can remain a convenience alias for discovery, but the API response for an executed job should always contain the resolved immutable bundle.
What should change
- Add first-class data-release semantics to the gateway request/response contract.
- Resolve a full model+data bundle before execution, not just a model version.
- Return the resolved bundle in submit and status responses.
- Make registries store or resolve immutable bundle identities rather than only mutable latest model-version pointers.
- Keep
latest as a moving alias if needed, but never let it be the only provenance attached to a completed job.
- Align deployment/runtime configuration so checked-in specs and runtime-installed versions tell the same bundle story.
Acceptance criteria
- Gateway requests can specify data release identity directly or via a higher-level bundle reference.
- Submit/status responses include the resolved model/data bundle used for execution.
- Worker execution uses the resolved data release rather than implicit defaults.
- Cache and dedupe semantics are bundle-aware rather than model-version-only.
- The mutable
latest registry remains optional discovery sugar and is not the only provenance available for executed simulations.
- Deployment/runtime metadata can reconstruct the exact execution bundle for a given deployed version.
Upstream dependencies
This should consume the data-release contracts from:
And it should stay aligned with:
Problem
policyengine-api-v2already has a useful model-version routing layer, but its public contract is still model-only.Today the simulation gateway:
country + versionto a versioned Modal applatestregistry entryBut it does not have first-class data-release semantics. Dataset identity is still whatever
datapath the caller passes through or whatever default the worker uses. That means the API can tell a client which model package version ran, but not which immutable model+data bundle actually produced the result.Relevant code paths:
projects/policyengine-api-simulation/src/modal/gateway/models.py,.../gateway/endpoints.pyprojects/policyengine-api-simulation/src/modal/simulation.pyprojects/policyengine-api-simulation/src/modal/app.pylatest:projects/policyengine-api-simulation/src/modal/utils/update_version_registry.pyprojects/policyengine-api-simulation/pyproject.tomlDesired contract
The simulation API should route and report an immutable execution bundle, not just a model package version.
At minimum that bundle should include:
latestcan remain a convenience alias for discovery, but the API response for an executed job should always contain the resolved immutable bundle.What should change
latestas a moving alias if needed, but never let it be the only provenance attached to a completed job.Acceptance criteria
latestregistry remains optional discovery sugar and is not the only provenance available for executed simulations.Upstream dependencies
This should consume the data-release contracts from:
And it should stay aligned with:
policyengine.pythe immutable release boundary for country model and data versions policyengine.py#270