Skip to content

Add ACA marketplace bronze-selection target ETL#618

Open
daphnehanse11 wants to merge 12 commits intomainfrom
codex/aca-marketplace-plan-selection
Open

Add ACA marketplace bronze-selection target ETL#618
daphnehanse11 wants to merge 12 commits intomainfrom
codex/aca-marketplace-plan-selection

Conversation

@daphnehanse11
Copy link
Copy Markdown
Collaborator

@daphnehanse11 daphnehanse11 commented Mar 17, 2026

Summary

This PR has been refocused to the ACA marketplace targets ETL path.

It now:

  • adds policyengine_us_data/db/etl_aca_marketplace.py to transform 2024 CMS state metal status data into state-level ACA marketplace targets
  • loads two state-level target strata per HC.gov state: all APTC marketplace tax units and the bronze-plan subset
  • checks in the CMS-derived state target input at policyengine_us_data/storage/calibration_targets/aca_marketplace_state_metal_selection_2024.csv
  • trims the tests, changelog, and storage exceptions to match the narrower scope

Why

This follows review feedback to keep unified_matrix_builder.py generic and to avoid publish_local_area.py / calibration-plumbing changes in this PR.

Out of Scope

This PR does not:

  • add marketplace-specific logic to unified_matrix_builder.py
  • modify publish_local_area.py
  • include the proxy builder, H5 publishing support, or local-area calibration plumbing from the earlier draft
  • change enhanced_cps.py or loss.py
  • add the underlying policyengine-us formulas for used_aca_ptc or selected_marketplace_plan_benchmark_ratio

Those downstream pieces can follow separately once the upstream variable path is in place.

Validation

Checks run locally in the fresh worktree:

  • python3 -m py_compile policyengine_us_data/db/etl_aca_marketplace.py tests/unit/test_aca_marketplace_targets.py policyengine_us_data/calibration/unified_matrix_builder.py policyengine_us_data/calibration/publish_local_area.py
  • git diff --check
  • attempted uv run pytest tests/unit/test_aca_marketplace_targets.py -q, but it did not return a result in this sandbox

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @daphnehanse11 , I'm going to let my Claude do the talking below, but the short of it is that there's a lot to do. I think Codex went for the quick win, and there's just not a quick win here.

  the CMS data sourcing is thorough and the underlying goal of decomposing PTC into used vs. unused makes sense. However, I think
   the approach needs to be restructured. The matrix builder should stay generic and not contain variable-specific logic, and the variables you're deriving   
  don't yet exist in the places they need to for calibration to actually work.                                                                             
                                                                                                                                                              
  Here's the full path I'd suggest, roughly in dependency order:      
                                                                                                                                                              
  1. policyengine-us: Add used_aca_ptc, unused_aca_ptc, and selects_bronze_marketplace_plan as real calculated variables with formulas and parameters. The    
  state-level bronze selection probabilities and price ratios from your CMS data become parameters there. Everything downstream depends on these existing     
  first.                                                                                                                                                      
  2. ETL scripts (policy_data.db): Derive state-level calibration targets (e.g., total used PTC by state) from the CMS data and load them into the targets
  database. That's where calibration targets live now.                                                                                                        
  3. enhanced_cps.py: Wire up the bronze plan selection so the legacy calibration pipeline has access to the new variables.
  4. target_config.yaml: Add the new variable names so the unified matrix builder picks them up — no code changes to the builder itself, just config.         
                                                                                                                                                              
  With this approach, the matrix builder never needs to know what these variables are. It just sees new names in the config and new rows in the database, same
   as any other target.                                                                                                                                       
                                                                                                                                                              
  I'd suggest starting with step 1 since everything else depends on it.               
                                                                     

anth-volk added a commit that referenced this pull request Mar 24, 2026
New "under construction" node type (amber dashed) for showing
pipeline changes that are actively being developed:

US:
- PR #611: Pipeline orchestrator in Overview (Modal hardening)
- PR #540: Category takeup rerandomization in Stage 2, extracted
  puf_impute.py + source_impute.py modules in Stage 4
- PR #618: CMS marketplace data + plan selection in Stage 5

UK:
- PR #291: New Stage 9 — OA calibration pipeline (6 phases)
- PR #296: New Stage 10 — Adversarial weight regularisation
- PR #279: Modal GPU calibration nodes in Stages 6, 7, Overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
anth-volk added a commit that referenced this pull request Mar 27, 2026
New "under construction" node type (amber dashed) for showing
pipeline changes that are actively being developed:

US:
- PR #611: Pipeline orchestrator in Overview (Modal hardening)
- PR #540: Category takeup rerandomization in Stage 2, extracted
  puf_impute.py + source_impute.py modules in Stage 4
- PR #618: CMS marketplace data + plan selection in Stage 5

UK:
- PR #291: New Stage 9 — OA calibration pipeline (6 phases)
- PR #296: New Stage 10 — Adversarial weight regularisation
- PR #279: Modal GPU calibration nodes in Stages 6, 7, Overview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-plan-selection

# Conflicts:
#	policyengine_us_data/calibration/unified_matrix_builder.py
#	policyengine_us_data/storage/calibration_targets/README.md
#	tests/unit/test_aca_marketplace_plan_selection_proxies.py
#	tests/unit/test_aca_marketplace_targets.py
#	tests/unit/test_marketplace_plan_selection.py
Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daphnehanse11 I'm requesting that this PR be refocused to the targets ETL and perhaps the ECPS logic. Please note that that this current PR will not affect the ECPS because it's not touching either loss.py or enhanced_cps.py. I don't think your coding agent was able to pick up on the two distinct paths.

I cannot approve the changes in unified_matrix_builder.py or publish_local_area.py. and I recommend that they be removed from the PR. Hard-coded variables in the matrix builder are what made the junkyard the junkyard. We need to do everything humanly (or codexly) possible to never, ever hard-code a variable in unified_matrix_builder.py.

It is possible that publish_local_area.py will need a small modification before this works in local area calibration. Once these targets are in, we can start building models locally and test out the changes. So, I really think this needs to be a two part process.

So if you want the ECPS to be improved, which will get you a benefit now, there needs to be a separate editing of loss.py or enhanced_cps.py in this PR. In that case, some CSVs are acceptable in the storage/calibraiton folder. If you only want better local area h5 calibration, then there should not be CSVs at all, with the exception of sources are not available for download online (like our national "Tips" target). Please see etl_medicaid.py for reference.

Note: the meaning of "ETL" is
E: Extract from the original source
T: Transform the data
L: Load the data into the database.

Forgive me from being tough on this PR: the target sourcing is excellent work. There is just a lot of risk in modifying some of these files.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pipeline-diagrams Error Error Apr 13, 2026 5:49pm

Request Review

@daphnehanse11 daphnehanse11 changed the title Add ACA marketplace plan selection proxies Add ACA marketplace bronze-selection target ETL Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants