Skip to content

Add first-class data release semantics to the policyengine-api-v2 simulation gateway #434

@MaxGhenis

Description

@MaxGhenis

Problem

policyengine-api-v2 already has a useful model-version routing layer, but its public contract is still model-only.

Today the simulation gateway:

  • resolves country + version to a versioned Modal app
  • defaults omitted versions to a mutable latest registry entry
  • returns the resolved model version in submit responses

But it does not have first-class data-release semantics. Dataset identity is still whatever data path the caller passes through or whatever default the worker uses. That means the API can tell a client which model package version ran, but not which immutable model+data bundle actually produced the result.

Relevant code paths:

  • gateway request/response models and endpoints: projects/policyengine-api-simulation/src/modal/gateway/models.py, .../gateway/endpoints.py
  • worker execution path: projects/policyengine-api-simulation/src/modal/simulation.py
  • versioned Modal app naming/install behavior: projects/policyengine-api-simulation/src/modal/app.py
  • mutable registry updates for latest: projects/policyengine-api-simulation/src/modal/utils/update_version_registry.py
  • checked-in dependency spec is open-ended while runtime installs exact env-driven versions: projects/policyengine-api-simulation/pyproject.toml

Desired contract

The simulation API should route and report an immutable execution bundle, not just a model package version.

At minimum that bundle should include:

  • country model package version
  • country data package version
  • resolved dataset artifact or manifest revision
  • bundle or manifest identifier suitable for caching and replay

latest can remain a convenience alias for discovery, but the API response for an executed job should always contain the resolved immutable bundle.

What should change

  1. Add first-class data-release semantics to the gateway request/response contract.
  2. Resolve a full model+data bundle before execution, not just a model version.
  3. Return the resolved bundle in submit and status responses.
  4. Make registries store or resolve immutable bundle identities rather than only mutable latest model-version pointers.
  5. Keep latest as a moving alias if needed, but never let it be the only provenance attached to a completed job.
  6. Align deployment/runtime configuration so checked-in specs and runtime-installed versions tell the same bundle story.

Acceptance criteria

  • Gateway requests can specify data release identity directly or via a higher-level bundle reference.
  • Submit/status responses include the resolved model/data bundle used for execution.
  • Worker execution uses the resolved data release rather than implicit defaults.
  • Cache and dedupe semantics are bundle-aware rather than model-version-only.
  • The mutable latest registry remains optional discovery sugar and is not the only provenance available for executed simulations.
  • Deployment/runtime metadata can reconstruct the exact execution bundle for a given deployed version.

Upstream dependencies

This should consume the data-release contracts from:

And it should stay aligned with:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions