diff --git a/README.md b/README.md index 0a1a7eb..55c9290 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# State Legislative Tracker +# Tax and Transfer Bill Tracker -Tracks state tax and benefit legislation relevant to [PolicyEngine](https://policyengine.org), scores bills for modelability, and computes fiscal impacts using microsimulation. +Tracks state and federal tax and transfer legislation relevant to [PolicyEngine](https://policyengine.org), while keeping a state-first browsing experience for state bills. The pipeline scores bills for modelability and computes fiscal impacts using microsimulation. **Live app:** [state-legislative-tracker.modal.run](https://policengine--state-legislative-tracker.modal.run) @@ -27,7 +27,7 @@ Tracks state tax and benefit legislation relevant to [PolicyEngine](https://poli ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ React Frontend (Modal) │ -│ Dashboard showing scored bills, impact analyses, district maps │ +│ State-first tracker with a federal workspace and shared analysis │ └─────────────────────────────────────────────────────────────────────┘ ``` diff --git a/docs/GENERAL_BILL_TRACKER_ARCHITECTURE.md b/docs/GENERAL_BILL_TRACKER_ARCHITECTURE.md new file mode 100644 index 0000000..27c804b --- /dev/null +++ b/docs/GENERAL_BILL_TRACKER_ARCHITECTURE.md @@ -0,0 +1,133 @@ +# Unified Bill Tracker Direction + +## Goal + +Move from a `2026 state legislative session tracker` to a broader `tax and transfer bill tracker` that: + +- keeps `state` browsing as the main user experience for state legislation +- adds `federal` as a first-class destination instead of a special case +- supports multiple sessions instead of centering the product on one year +- keeps the existing scoring, encoding, and microsimulation workflow + +## What Is Coupled Today + +### Product framing + +- [README.md](/Users/pavelmakarchuk/state-research-tracker/README.md) originally framed the app as a state legislative tracker +- [src/App.jsx](/Users/pavelmakarchuk/state-research-tracker/src/App.jsx) was hard-coded around `2026 State Legislative Tracker` and state-session language + +### Routing + +- [src/App.jsx](/Users/pavelmakarchuk/state-research-tracker/src/App.jsx) originally only understood: + - `/` + - `/:state` + - `/:state/:billId` +- that makes `state` the only valid top-level destination + +### Static state/session backbone + +- [src/data/states.js](/Users/pavelmakarchuk/state-research-tracker/src/data/states.js) still carries important display metadata +- the problem is not that it exists; the problem is when it doubles as the application structure + +### Content model + +- [src/components/StatePanel.jsx](/Users/pavelmakarchuk/state-research-tracker/src/components/StatePanel.jsx) is correctly state-first, but federal content only appears as an attachment to states +- [src/context/DataContext.jsx](/Users/pavelmakarchuk/state-research-tracker/src/context/DataContext.jsx) still treats federal research as a special-case fake-state model + +### Pipeline assumptions + +- [scripts/openstates_monitor.py](/Users/pavelmakarchuk/state-research-tracker/scripts/openstates_monitor.py) and [scripts/refresh_bill_status.py](/Users/pavelmakarchuk/state-research-tracker/scripts/refresh_bill_status.py) are state/OpenStates-specific +- federal ingestion will need a second source, but it should plug into the same downstream bill pipeline + +## Product Direction + +The right structure is: + +- state-first UX +- federal as a peer surface +- jurisdiction-first data model underneath + +That means: + +- the homepage still starts with states +- the map remains useful for state legislation +- federal gets its own page and navigation affordance +- sessions remain visible and useful, but they stop being the product backbone + +## Recommended UI Shape + +### Keep these + +- homepage map and state search +- state pages as the primary state workflow +- state bill detail pages + +### Add these + +- `/federal` as a first-class route +- a federal page using the same research and bill pipeline concepts +- search and breadcrumbs that understand both state and federal destinations + +### Add later if it proves useful + +- shared bill detail routes independent of state/federal +- session views such as `2026 session` or `119th Congress` +- a generic bill index across jurisdictions + +## Data Model Direction + +The schema should move toward explicit jurisdiction fields. + +For `processed_bills` and `research`, prefer: + +- `jurisdiction_type` +- `jurisdiction_code` +- `jurisdiction_name` +- `session_name` + +Keep `session` and `year` separate: + +- `session_name` is the primary legislative unit +- `activity_year` is a secondary filter derived from bill and research dates +- `effective_year` or `tax_year` should remain separate policy metadata + +Keep `state` temporarily for compatibility if needed, but stop relying on: + +- `state = "all"` as the main federal representation +- `relevant_states` as the main way to model federal content + +`relevant_states` is still useful, but as targeting metadata rather than the core federal identity. + +## Refactor Sequence + +### Phase 1 + +- update product copy +- add a federal destination in the UI +- keep state pages and the map intact + +### Phase 2 + +- introduce jurisdiction-aware schema fields +- backfill state rows +- define a federal ingestion source abstraction + +### Phase 3 + +- reduce [src/data/states.js](/Users/pavelmakarchuk/state-research-tracker/src/data/states.js) to display metadata +- move session and jurisdiction truth into data-driven structures + +### Phase 4 + +- add shared bill/session views if user behavior shows they are valuable + +## Prototype On This Branch + +This branch now reflects the first architectural step: + +- [src/App.jsx](/Users/pavelmakarchuk/state-research-tracker/src/App.jsx) supports a first-class `/federal` route +- [src/components/FederalPanel.jsx](/Users/pavelmakarchuk/state-research-tracker/src/components/FederalPanel.jsx) provides a federal workspace +- [src/components/StateSearchCombobox.jsx](/Users/pavelmakarchuk/state-research-tracker/src/components/StateSearchCombobox.jsx) can navigate to either a state or federal +- [src/context/DataContext.jsx](/Users/pavelmakarchuk/state-research-tracker/src/context/DataContext.jsx) now exposes federal bill/research helpers alongside state helpers + +This is the right test. It changes the product structure without discarding the state-centric workflow that users actually want. diff --git a/scripts/openstates_monitor.py b/scripts/openstates_monitor.py index 9013664..5e59a14 100644 --- a/scripts/openstates_monitor.py +++ b/scripts/openstates_monitor.py @@ -108,7 +108,13 @@ def openstates_request(endpoint, params=None, max_retries=3): url = f"{OPENSTATES_BASE_URL}{endpoint}" for attempt in range(max_retries): - response = requests.get(url, headers=headers, params=params or {}) + try: + response = requests.get(url, headers=headers, params=params or {}, timeout=45) + except requests.RequestException as e: + wait = 5 * (attempt + 1) + print(f" Request failed ({e.__class__.__name__}), retrying in {wait}s...") + time.sleep(wait) + continue if response.status_code == 429: wait = 15 * (attempt + 1) # 15s, 30s, 45s @@ -116,11 +122,17 @@ def openstates_request(endpoint, params=None, max_retries=3): time.sleep(wait) continue + if response.status_code in {500, 502, 503, 504}: + wait = 5 * (attempt + 1) + print(f" OpenStates {response.status_code}, retrying in {wait}s...") + time.sleep(wait) + continue + response.raise_for_status() return response.json() # Final attempt without retry - response = requests.get(url, headers=headers, params=params or {}) + response = requests.get(url, headers=headers, params=params or {}, timeout=45) response.raise_for_status() return response.json() diff --git a/scripts/refresh_bill_status.py b/scripts/refresh_bill_status.py index 4fda29c..cba5fcf 100644 --- a/scripts/refresh_bill_status.py +++ b/scripts/refresh_bill_status.py @@ -29,12 +29,22 @@ import json import argparse import time +import re +import difflib import requests +from datetime import datetime # ============== Configuration ============== OPENSTATES_API_KEY = os.environ.get("OPENSTATES_API_KEY") OPENSTATES_BASE_URL = "https://v3.openstates.org" +RECENT_CREATED_SINCE = f"{datetime.utcnow().year - 1}-01-01" + +STOPWORDS = { + "act", "bill", "state", "tax", "income", "credit", "credits", "reduction", + "increase", "expanded", "expansion", "child", "marriage", "penalty", + "elimination", "supplemental", "empire", +} # Legislative stage classification based on action classifications # Order matters — later stages override earlier ones @@ -90,6 +100,54 @@ "dead": "Dead/Withdrawn", } +BILL_NUMBER_RE = re.compile(r"\b(?!FY)([A-Z]{1,3}\.?\s*\d+(?:\s*S\d+)?)\b", re.I) + + +class RateLimitExhaustedError(RuntimeError): + """Raised when OpenStates continues returning 429 after retries.""" + + +def normalize_bill_number(value): + """Normalize bill numbers across spacing and leading-zero variants.""" + if not value: + return None + + value = re.sub(r"\s+", "", value).replace(".", "").upper() + return re.sub(r"([A-Z]+)0+(\d)", r"\1\2", value) + + +def normalize_text(value): + """Lowercase and strip punctuation for fuzzy title comparisons.""" + return re.sub(r"[^a-z0-9 ]+", " ", (value or "").lower()) + + +def token_set(value): + """Tokenize bill titles while dropping generic legislative filler.""" + tokens = set() + for token in normalize_text(value).split(): + if len(token) <= 2 or token in STOPWORDS or token.isdigit(): + continue + tokens.add(token) + return tokens + + +def title_similarity_score(left, right): + """Return sequence and token-overlap similarity for two bill titles.""" + left_norm = normalize_text(left) + right_norm = normalize_text(right) + ratio = difflib.SequenceMatcher(None, left_norm, right_norm).ratio() + left_tokens = token_set(left) + right_tokens = token_set(right) + overlap = len(left_tokens & right_tokens) / max(1, len(left_tokens | right_tokens)) + return ratio, overlap + + +def normalize_action_date(value): + """Collapse ISO timestamps to YYYY-MM-DD for stable comparison/storage.""" + if not value: + return None + return str(value)[:10] + def openstates_request(endpoint, params=None, max_retries=3): """Make a request to the OpenStates API v3 with retry on rate limit.""" @@ -108,6 +166,12 @@ def openstates_request(endpoint, params=None, max_retries=3): time.sleep(wait) continue + if response.status_code in {500, 502, 503, 504}: + wait = 5 * (attempt + 1) + print(f" OpenStates {response.status_code}, retrying in {wait}s...") + time.sleep(wait) + continue + if response.status_code == 404: return None @@ -116,8 +180,14 @@ def openstates_request(endpoint, params=None, max_retries=3): response = requests.get(url, headers=headers, params=params or {}) if response.status_code == 429: - print(f" Still rate limited after {max_retries} retries, skipping") - return None + raise RateLimitExhaustedError( + f"OpenStates rate limit exhausted after {max_retries} retries" + ) + if response.status_code in {500, 502, 503, 504}: + raise requests.HTTPError( + f"OpenStates transient error persisted ({response.status_code})", + response=response, + ) response.raise_for_status() return response.json() @@ -155,39 +225,50 @@ def classify_stage(actions): return stage -def search_bill_on_openstates(state_name, bill_number): +def search_bill_on_openstates(state_name, bill_number, title=""): """ Search for a bill by state + identifier on OpenStates. Returns the bill detail with actions, or None. """ - # Clean bill number for search (e.g., "HB05133" -> "HB 5133", "SB0032" -> "SB 32") clean_num = bill_number.strip() + target_norm = normalize_bill_number(clean_num) params = { "jurisdiction": state_name, "q": clean_num, - "per_page": 5, + "per_page": 8, "include": "actions", + "sort": "updated_desc", + "created_since": RECENT_CREATED_SINCE, } data = openstates_request("/bills", params) if not data or not data.get("results"): return None - # Find best match by identifier + candidates = [] for result in data["results"]: - result_id = result.get("identifier", "").replace(" ", "").upper() - search_id = clean_num.replace(" ", "").upper() - # Strip leading zeros for comparison - import re - result_norm = re.sub(r'([A-Z]+)0*(\d+)', r'\1\2', result_id) - search_norm = re.sub(r'([A-Z]+)0*(\d+)', r'\1\2', search_id) + result_norm = normalize_bill_number(result.get("identifier", "")) + if target_norm and result_norm != target_norm: + continue + + ratio, overlap = title_similarity_score(title, result.get("title", "")) + latest_date = normalize_action_date(result.get("latest_action_date")) + recency_bonus = 20 if latest_date and latest_date >= RECENT_CREATED_SINCE else 0 + score = ratio * 100 + overlap * 100 + recency_bonus + candidates.append((score, ratio, overlap, result)) - if result_norm == search_norm: - return result + if not candidates: + return None - # If no exact match, return first result as fallback - return data["results"][0] if data["results"] else None + candidates.sort(key=lambda item: item[0], reverse=True) + _, ratio, overlap, result = candidates[0] + + # Reject low-confidence title mismatches to avoid wrong-session collisions. + if title and ratio < 0.22 and overlap == 0: + return None + + return result def get_bill_detail(openstates_id): @@ -269,6 +350,9 @@ def main(): skipped = 0 errors = 0 + interrupted_by_rate_limit = False + resume_offset = None + for i, bill in enumerate(bills): state = bill["state"] bn = bill["bill_number"] @@ -278,7 +362,7 @@ def main(): try: # Search for the bill on OpenStates by state + bill number - detail = search_bill_on_openstates(state_name, bn) + detail = search_bill_on_openstates(state_name, bn, bill.get("title", "")) if not detail: print("not found on OpenStates") @@ -291,15 +375,19 @@ def main(): # Get latest action info latest_action = detail.get("latest_action_description", "") - latest_action_date = detail.get("latest_action_date", "") or None + latest_action_date = normalize_action_date(detail.get("latest_action_date", "") or None) # Determine if anything changed old_action = bill.get("last_action", "") - old_date = bill.get("last_action_date", "") + old_date = normalize_action_date(bill.get("last_action_date", "")) stage_label = STAGE_LABELS.get(stage, stage) - if latest_action == old_action and latest_action_date == old_date: + if ( + latest_action == old_action + and latest_action_date == old_date + and stage_label == (bill.get("status") or "") + ): print(f"{stage_label} (no change)") skipped += 1 else: @@ -319,6 +407,11 @@ def main(): updated += 1 + except RateLimitExhaustedError as e: + print(f"STOPPING: {e}") + interrupted_by_rate_limit = True + resume_offset = args.offset + i + break except Exception as e: print(f"ERROR: {e}") errors += 1 @@ -332,6 +425,8 @@ def main(): print(f" Updated: {updated}") print(f" No change: {skipped}") print(f" Errors: {errors}") + if interrupted_by_rate_limit: + print(f" Resume with: --offset {resume_offset}") return 0 diff --git a/src/App.jsx b/src/App.jsx index 967f301..06bbe3f 100644 --- a/src/App.jsx +++ b/src/App.jsx @@ -4,6 +4,7 @@ import Breadcrumb from "./components/Breadcrumb"; import StateSearchCombobox from "./components/StateSearchCombobox"; import { RecentActivitySidebar } from "./components/BillActivityFeed"; +const FederalPanel = lazy(() => import("./components/FederalPanel")); const StatePanel = lazy(() => import("./components/StatePanel")); const ReformAnalyzer = lazy(() => import("./components/reform/ReformAnalyzer")); import { useData } from "./context/DataContext"; @@ -11,6 +12,11 @@ import { stateData } from "./data/states"; import { colors, mapColors, typography, spacing } from "./designTokens"; import { track } from "./lib/analytics"; import { BASE_PATH } from "./lib/basePath"; +import { + FEDERAL_JURISDICTION, + isFederalJurisdiction, + isStateJurisdiction, +} from "./lib/jurisdictions"; function parsePath() { // Support old hash URLs for backward compat @@ -18,11 +24,15 @@ function parsePath() { // Strip BASE_PATH prefix before parsing const raw = hash || window.location.pathname; const path = (BASE_PATH ? raw.replace(BASE_PATH, "") : raw).replace(/^\//, ""); - if (!path) return { state: null, billId: null }; + if (!path) return { jurisdiction: null, billId: null }; const parts = path.split("/"); - const state = parts[0].toUpperCase(); + const segment = parts[0]; + const state = segment.toUpperCase(); const billId = parts[1] || null; - return { state: stateData[state] ? state : null, billId }; + if (segment.toLowerCase() === FEDERAL_JURISDICTION) { + return { jurisdiction: FEDERAL_JURISDICTION, billId }; + } + return { jurisdiction: stateData[state] ? state : null, billId }; } function notifyParent(path) { @@ -47,8 +57,8 @@ function LoadingPlaceholder() { } function App() { - const { statesWithBills, getBillsForState } = useData(); - const [selectedState, setSelectedState] = useState(() => parsePath().state); + const { statesWithBills, getBillsForState, getFederalBills } = useData(); + const [selectedJurisdiction, setSelectedJurisdiction] = useState(() => parsePath().jurisdiction); const [billId, setBillId] = useState(() => parsePath().billId); const activeStates = useMemo( @@ -68,40 +78,44 @@ function App() { } }, []); - const handleStateSelect = useCallback((abbr) => { - setSelectedState(abbr); + const handleJurisdictionSelect = useCallback((jurisdiction) => { + setSelectedJurisdiction(jurisdiction); setBillId(null); - if (abbr) { - history.pushState(null, "", BASE_PATH + "/" + abbr); - notifyParent("/" + abbr); - track("state_selected", { state_abbr: abbr, state_name: stateData[abbr]?.name }); + if (jurisdiction) { + history.pushState(null, "", BASE_PATH + "/" + jurisdiction); + notifyParent("/" + jurisdiction); + if (isFederalJurisdiction(jurisdiction)) { + track("federal_selected", { jurisdiction }); + } else { + track("state_selected", { state_abbr: jurisdiction, state_name: stateData[jurisdiction]?.name }); + } } else { history.pushState(null, "", BASE_PATH + "/"); notifyParent("/"); } }, []); - const handleBillSelect = useCallback((stateAbbr, id) => { - setSelectedState(stateAbbr); + const handleBillSelect = useCallback((jurisdiction, id) => { + setSelectedJurisdiction(jurisdiction); setBillId(id); - history.pushState(null, "", `${BASE_PATH}/${stateAbbr}/${id}`); - notifyParent(`/${stateAbbr}/${id}`); + history.pushState(null, "", `${BASE_PATH}/${jurisdiction}/${id}`); + notifyParent(`/${jurisdiction}/${id}`); }, []); const handleNavigateHome = useCallback(() => { - handleStateSelect(null); - }, [handleStateSelect]); + handleJurisdictionSelect(null); + }, [handleJurisdictionSelect]); - const handleNavigateState = useCallback(() => { - if (selectedState) { - handleStateSelect(selectedState); + const handleNavigateJurisdiction = useCallback(() => { + if (selectedJurisdiction) { + handleJurisdictionSelect(selectedJurisdiction); } - }, [selectedState, handleStateSelect]); + }, [selectedJurisdiction, handleJurisdictionSelect]); useEffect(() => { const onPopState = () => { - const { state, billId: bid } = parsePath(); - setSelectedState(state); + const { jurisdiction, billId: bid } = parsePath(); + setSelectedJurisdiction(jurisdiction); setBillId(bid); const strippedPath = BASE_PATH ? window.location.pathname.replace(BASE_PATH, "") @@ -114,20 +128,25 @@ function App() { // Resolve bill for bill page const activeBill = useMemo(() => { - if (!selectedState || !billId) return null; - const bills = getBillsForState(selectedState); + if (!selectedJurisdiction || !billId) return null; + const bills = isFederalJurisdiction(selectedJurisdiction) + ? getFederalBills() + : getBillsForState(selectedJurisdiction); return bills.find((b) => b.id === billId) || null; - }, [selectedState, billId, getBillsForState]); + }, [selectedJurisdiction, billId, getBillsForState, getFederalBills]); // Determine view - const isBillPage = selectedState && billId && activeBill?.reformConfig; - const isStatePage = selectedState && !isBillPage; + const isBillPage = + isStateJurisdiction(selectedJurisdiction) && + selectedJurisdiction && + billId && + activeBill?.reformConfig; + const isJurisdictionPage = selectedJurisdiction && !isBillPage; return (
{/* Header */}
-
+
-
+
PolicyEngine -
-

- 2026 State Legislative Tracker -

-

- PolicyEngine State Tax Research -

-
+

+ Bill Tracker +

- +
+
@@ -179,61 +202,58 @@ function App() { {isBillPage && (
}>
)} - {/* === State Page === */} - {isStatePage && ( + {/* === Jurisdiction Page === */} + {isJurisdictionPage && (
- }> - handleBillSelect(selectedState, id)} - /> + {isFederalJurisdiction(selectedJurisdiction) ? ( + + ) : ( + handleBillSelect(selectedJurisdiction, id)} + /> + )}
)} {/* === Home Page === */} - {!selectedState && ( + {!selectedJurisdiction && ( <> - {/* Intro */} -
+

- State Tax Policy Research + Select a state to explore legislation

- Explore state legislative sessions and PolicyEngine analysis. Select a state to see tax changes, active bills, and related research. + Click a state on the map or use the search bar above.

@@ -259,8 +279,8 @@ function App() {
States with Published Analysis - +
)}
- +
- +
@@ -536,6 +556,18 @@ function QuickLinkCard({ href, title, description }) { ); } +// Nav Tab Component +function NavTab({ active, onClick, children }) { + return ( + + ); +} + // Footer Link Component function FooterLink({ href, children }) { return ( diff --git a/src/components/BillActivityFeed.jsx b/src/components/BillActivityFeed.jsx index c6a38f7..3eadce1 100644 --- a/src/components/BillActivityFeed.jsx +++ b/src/components/BillActivityFeed.jsx @@ -2,6 +2,7 @@ import { useState, useEffect, useMemo } from "react"; import { supabase } from "../lib/supabase"; import { useData } from "../context/DataContext"; import { colors, typography, spacing } from "../designTokens"; +import { ALL_YEARS, matchesSessionScope, matchesYearFilter } from "../lib/sessionFilters"; const REQUEST_API_PATH = "/api/bill-analysis-request"; const MAILCHIMP_SUBSCRIBE_URL = @@ -774,13 +775,21 @@ function StageSummaryBar({ bills }) { const DEFAULT_VISIBLE = 5; -export function StateBillActivity({ stateAbbr, onBillSelect }) { +export function StateBillActivity({ stateAbbr, onBillSelect, sessionYearSet = null, selectedYear = ALL_YEARS }) { const { bills, loading } = useProcessedBills(stateAbbr); const { research } = useData(); const [expanded, setExpanded] = useState(false); const [actionBill, setActionBill] = useState(null); const [requestBill, setRequestBill] = useState(null); + const scopedBills = useMemo( + () => bills.filter((bill) => ( + matchesSessionScope(bill, sessionYearSet, "last_action_date") && + matchesYearFilter(bill, selectedYear, "last_action_date") + )), + [bills, sessionYearSet, selectedYear], + ); + const { analyzedBillIds, billToResearchId } = useMemo(() => { const ids = new Set(); const lookup = {}; @@ -800,11 +809,11 @@ export function StateBillActivity({ stateAbbr, onBillSelect }) { }, [research]); const unananalyzedBills = useMemo( - () => bills.filter((b) => { + () => scopedBills.filter((b) => { const norm = `${b.state}:${b.bill_number.replace(/\s+/g, "").replace(/^([A-Z]+)0+(\d)/, "$1$2").toUpperCase()}`; return !analyzedBillIds.has(norm); }), - [bills, analyzedBillIds], + [scopedBills, analyzedBillIds], ); if (loading || !unananalyzedBills.length) return null; diff --git a/src/components/Breadcrumb.jsx b/src/components/Breadcrumb.jsx index 769b96c..2050480 100644 --- a/src/components/Breadcrumb.jsx +++ b/src/components/Breadcrumb.jsx @@ -1,5 +1,6 @@ import { colors, typography, spacing } from "../designTokens"; import { stateData } from "../data/states"; +import { getJurisdictionLabel } from "../lib/jurisdictions"; const ArrowLeft = () => ( @@ -13,8 +14,10 @@ const ChevronRight = () => ( ); -export default function Breadcrumb({ stateAbbr, billLabel, onNavigateHome, onNavigateState }) { - const onBack = billLabel ? onNavigateState : onNavigateHome; +export default function Breadcrumb({ jurisdiction, billLabel, onNavigateHome, onNavigateJurisdiction }) { + const onBack = billLabel ? onNavigateJurisdiction : onNavigateHome; + const jurisdictionLabel = getJurisdictionLabel(jurisdiction, stateData); + return (