Correct net-VAT liability scaling; rewrite paper on the corrected model by MaxGhenis · Pull Request #21 · PolicyEngine/firm-microsim-paper

MaxGhenis · 2026-07-01T22:17:26Z

Summary

Completes the net-VAT liability fix from #15 (superseding draft PR #17), re-runs the entire pipeline on the corrected model, and rewrites the paper against the corrected numbers plus a seven-referee review (methodology, red-team, domain, citations, neutrality, style, reproducibility).

The defect and its consequences

vat_liability_k was turnover − input (no ×0.20), inflating per-firm liabilities ~5×, with ρ ∈ [0.6, 1.5] leaving ~47% of firms with negative value added. The weight optimiser compensated on band totals (aggregate scores barely moved) while distorting the near-threshold density — manufacturing the paper's "reproduction" of the administrative bunching step.

Model fixes

v = 0.20 × (turnover − input); ρ ∈ [0.1, 0.95], mean VA share 40%, zero negative-VA firms (from Fix vat liability scaling #17, applied to current main)
Behavioural solver rewritten: the damped fixed-point iteration has no fixed point for firms straddling a reform notch (iterates oscillate; results were iteration-parity artifacts). Replaced with a closed-form region-confined solve using the formulation-A response ratio y* = y_obs[(1−τf₁)/(1−τf₀)]^e (deductible share cancels). e→0 nesting now exact; baseline reproduction asserted; raise-to-£100k behavioural ≡ static at every e — a derived invariance, asserted in the crosscheck
Bunching estimator: removed non-identified elasticity outputs (σ/Π/eps, incl. an ad hoc τ/2 normalisation from a deleted model); fixed bootstrap replicate weights (were scaled n/W)
Recovery test redesigned: the old injection was invisible to the counterfactual fit by construction (donor+deposit both inside the exclusion window); new KW-consistent injection extends beyond the window and is scored by the headline estimator
Secondary-notch mass block in dominated_region_mass; new results/static_sweep.txt artifact (the sweep table was previously figure-pixels only); voluntary-retention sensitivity on the anchor

Headline results on the corrected model

Object	Old	Corrected
Common-base menu (2023-24, base)	£183.6bn	£184.7bn
Raise to £100k	−£508m	−£698m (behavioural = static at every e)
Graduated taper	−£336m	−£466m
Reduced rate 10% / 15%	−£343m / −£171m	−£460m / −£230m
Anchor 85→90k (2025-26) vs HMRC −£185m	−£175m ("within £10m")	−£248m full-dereg / −£141m retention — brackets HMRC in all 5 years
Bunching at 85k	E=8,712, b=0.060 ("reproduces admin step")	E=0, b=−0.139
Bunching at 90k (2024-25 vintage)	—	E≈196k spurious, spec-robust — band-edge artifact
Behavioural offsets	40–75% of static	0% (raise), <4% (bands)

Paper rewrite

Every section updated. Section 5 reframed around the correction: band-calibrated synthetic data cannot support bunching inference in either direction — the corrected data show nothing at £85k while the same generator shows a huge spurious signal at the £90k band edge, robust in all 16 sensitivity cells. The earlier draft's artifact is documented openly as the sharpest demonstration of the thesis. Value-added "robustness" paragraph (which implied dominated widths above the £21,250 statutory cap) replaced by the formulation-A invariance result. Anchor mechanism corrected (uprated counterfactual threshold path, not "fiscal drag"). Citations: LLAT sample setting corrected (2004–14, £58k–£81k thresholds — not "the same £85,000 notch"); Chetty et al. credited for the polynomial counterfactual; HMT 2018 call for evidence + Council Directive (EU) 2020/285 added; URLs/DOIs throughout. Full referee reports available on request.

Upstream

populace's experimental UK firm generator (Add experimental UK firm generator populace#223) was ported pre-fix and inherits the same mis-scaling — flagged in the appendix and README; upstream issue to follow.

Closes #15. Supersedes #17.

🤖 Generated with Claude Code

v_i = 0.20 x (turnover - inputs) in generator, calibration, and validator; rho recentred to mean ~0.6 and clamped to [0.1, 0.95] so value added is strictly positive; secondary-notch mass block in dominated_region_mass. Cherry-picked from origin/fix-vat-liability-scaling (issue #15) without the tex changes, which are superseded by the rewrite in this branch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- Replace the damped fixed-point forward solve (no fixed point at reform notches; iterates oscillate) with a closed-form region-confined solve: each firm re-optimises within the schedule region containing its observed turnover, using the formulation-A response ratio y* = y_obs [(1-tau f1)/(1-tau f0)]^e (deductible share cancels). e->0 nesting is now exact; a pure threshold raise provably has zero intensive-margin offset (behavioural == static at every e) — asserted in the crosscheck; baseline reproduction asserted in reform_revenue. - Remove the non-identified elasticity outputs (sigma/Pi/eps) and the tau/2 wedge normalisation inherited from a deleted sigmoid model from the bunching estimator; keep the geometry (b, b_llat, E, Delta_R, y_R). - Fix bootstrap replicate weights to sum to the population mass W rather than the row count n (E replicates were scaled by n/W). - Redesign the recovery test: KW-consistent injection with missing mass extending beyond the exclusion window, scored by the headline estimator (the previous injection was invisible to the counterfactual fit by construction). - Relabel e sweep values as assumptions (0.05 external KW anchor); remove the false 'calibrated to reduced-form estimates' provenance. - report.py: fall back to the generic output CSV name; taper guard in the dynamic CLI. Addresses issues #15 and the methodology findings in the review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Both vintages rebuilt with v = 0.20(turnover - inputs); all artifacts regenerated: calibration report (89.4%/89.4%), static sweep + anchor (new static_sweep.txt artifact with voluntary-retention sensitivity), reform menu (raise -698m, taper -466m, 10% band -460m, 15% band -230m on the common 184.7bn base), behavioural table (raise == static at every e; band offsets second-order), bunching (E=0 at 85k; spurious E=196k at the 90k band edge on the 2024-25 vintage), placebo (E=0 under all treatments), redesigned recovery test, dominated-region masses, iso-optimum verification. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Abstract/intro/data/static/bunching/model/behavioural/conclusion and both appendices rewritten against the corrected pipeline and the review findings: - Section 5 reframed: the corrected data contain no bunching at 85k (E=0, b=-0.139) and a spurious spec-robust signal at the 90k band edge (E~196k); the earlier draft's 'reproduced step' is documented as a liability-scaling artifact and its correction reported openly. - Anchor comparison rewritten: full-deregistration (-248m, 2025-26) and 43% voluntary-retention (-141m) conventions bracket HMRC's -185m in every forecast year; 'fiscal drag lifts the frozen baseline' replaced with the uprated counterfactual threshold path the code implements. - Behavioural section rewritten around the region-confined solver: raise-to-100k behavioural == static at every e (derived and asserted); reduced-rate offsets under 4%; taper exclusion restated structurally; the orphaned 'pathological FOC solve' claim removed; e sweep relabelled as assumptions (KW 0.05 anchor; LLAT UK 0.09-0.14 noted). - Dominated region: formulation-A invariance replaces the erroneous per-firm tau0/(1-tau0) robustness paragraph (which implied widths above the statutory cap); masses updated to the corrected population. - Static: new sweep (2024-25 vintage, explicitly introduced), menu on the 184.7bn base, direct-vs-smooth method reconciliation, +5.8% base overshoot disclosed against HMRC liability-target sums. - Neutrality/style: metric-scoped verdicts only; hedges consolidated; self-praise removed; findings renumbered to four consistently. - Citations: LLAT setting corrected (2004-14 sample, 58k-81k thresholds); Chetty et al. credited for the polynomial counterfactual; URLs/DOIs added throughout; benedek2015 to techreport; belloncopestake year fixed; HMT 2018 call for evidence and Council Directive (EU) 2020/285 added; ONS panel relabelled to the VAT/PAYE-registered frame with a BPE note; 150k step FRS attribution removed (band edge; step absent in corrected data). - Title: 'An Open Firm-Level Microsimulation of the UK VAT Registration Threshold'. README tables updated; upstream Populace inheritance of the mis-scaling flagged. Closes #15. Supersedes #17. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Adds the OBR Mar-2023 EFO Chart C data (already in-repo) as calibration targets for the 2023-24 vintage: five bins at/above the threshold as direct counts (same universe as the coarse HMRC band they refine), twenty below as shape targets normalised over the £65k-£85k window (the chart universe is narrower than the ONS frame there). The 2023-24 profile is interpolated between the 2019-20 outturn and the 2025-26 frozen-threshold projection. The near-threshold density now rises into the threshold and steps down across it - the administratively observed bunching profile, held as an explicit cited target. Placebo B regenerates without the fine targets and returns E=0. Responds to Nikhil's review comment that the previous optimizer-equilibrium shape looked economically backwards. Numbers on the OBR-target build: menu -783/-550/-497/-248 on £184.8bn; anchor conventions overshoot HMRC early (-366/-209 vs -185) and match by 2026-27 (-125.2 vs -125), sign flip reproduced; dominated-band mass ~160k; bunching at 85k reads the target-inherited E=14,663 (placebo->0); 90k spurious signal unchanged; recovery 55/75/82%; calibration holds at 89.4%. All propagated through paper, artifacts, and figures; anchor figure y-limits now scale with data. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…s; add --fast stratified sampling Root-cause fix for the seam Nikhil's review surfaced: the HMRC GBP1-to-Threshold liability total is remitted by voluntary registrants (~GBP2,150 average net, input-reclaim traders) the liability model does not represent, but was calibrated against the whole below-threshold population at the model's ~8% net rate (1.76x over-subscribed), draining weights in the GBP50k-85k region. Now an informational diagnostic, like liability by sector. One exclusion fixes three artifacts: the weight cliff at the OBR window edge (shoulder now continuous at ~1.05-1.15 average weight), the 2024-25 vintage's weak liability calibration (79% -> 92%; overall 89.4% -> 92.8%), and the spec-robust E~196k monster at the 90k band edge (now E=11,072 with b_LLAT=1.23 - within 10% of the administrative 1.361, the paper's cautionary exhibit in its sharpest form). Bootstrap SEs removed throughout: the synthetic file is a deterministic construction, so row-resampling has no estimand and its dispersion scales with the analyst-chosen row count. Replaced by specification grids plus generator-seed sensitivity (results/seed_sensitivity.txt: E +/-2%, reform costs +/-GBP1m across three full-size seeds). --fast mode: stratified thinning (30% in the GBP15k-150k window, 5% tails, per-stratum floors and exact ratio base weights so all targets stay true totals; optimizer initialised at and penalised toward base). 15 seconds per vintage vs ~13 minutes; aggregates within 0.3%, local bunching stats within ~5%. Definitive numbers: menu -784/-550/-497/-249 on GBP184.8bn; anchor -358/-366/-219/-75/+101 full-dereg, -204/-208/-125/-43/+57 retention (2026-27 matches -125.1 vs -125); sweep base GBP200.9bn; E=7,873 (b_LLAT 2.55) target-inherited at 85k, placebos -> 0; recovery 15/50/62%. All propagated through paper, README, artifacts, figures. Closes #23. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…sons Responds to review: the turnover-distribution levels reflect the ONS VAT/PAYE-registered frame; unregistered sole traders - most of the below-threshold mass and the cross-threshold cliff in all-business administrative charts (e.g. Tax Policy Associates 2018-19) - are outside it, so level comparisons to such charts are not like-for-like. Above the threshold the universes coincide and levels match (ours 10.7-11.5k/1k vs ~10.4k all-business). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vahid-ahmadi · 2026-07-02T08:31:42Z

cc @MaxGhenis — flagging some data-generation design questions on this branch before we lock it in.

Since this PR rewrites the generation pipeline substantially, I think we should capture sector heterogeneity of VAT liability properly at the same time. Right now it isn't, and the new setup makes it structurally harder:

1. Sector heterogeneity is a three-level caricature

The input/output ratio is a single global Beta(4,2) mapped to [0.2, 0.8], plus two hand-coded SIC sets (17 "negative-liability" sectors shifted up by U[0, 0.3], 5 "high-liability" sectors shifted down by U[0, 0.2]) — calibrated to nothing (generate.py:82–84, 164–170). The branch's own calibration report shows the consequence: VAT liability by sector = 45% accuracy, demoted to an "informational diagnostic", while every calibrated dimension sits at 78–94%.

2. The new clamp makes negative sectors unreachable by construction

We have the right data in-repo (data/processed/2023-24/hmrc_vat_liability_by_sector.csv, ~80 sectors) and it includes genuinely negative sectors — crop production −£2,330m, fishing −£100m, oil/gas −£460m (zero-rated outputs → repayment traders). The new ρ ∈ [0.1, 0.95] clamp forces every firm's liability strictly positive, so those sector targets can never be matched — the 45% is a ceiling, not a tuning problem. The old model was wrong at the firm level (no ×0.20) but could at least represent negative sector aggregates; this branch fixes the rate but silently deletes the repayment-trader mechanism.

This matters for the paper directly: sector composition near £85–90k sets the net-rate distribution τ₀ there, which drives the anchor cost, every behavioural wedge, and the firm-level dominated-region analysis.

3. The HMRC overshoot is mostly convention, not shape

Anchor now −358/−365 vs HMRC −150/−185 — but this branch's own artifact shows that with 43% voluntary retention (Liu et al.) of released-firm liability the series becomes −204 / −208 / −125 / −43 / +57 vs HMRC's −150 / −185 / −125 / −50 / +65: within ~£25m every year. So the headline gap is dominated by the assume-everyone-deregisters convention, not by the Beta parameters. The residual is where sector/VA-share composition near the threshold matters.

4. The near-threshold density shape is an optimizer artifact

On the corrected model the 2023-24 file has a density step up at £85k (below/above ≈ 0.74 — the opposite of admin data), and the 2024-25 vintage piles ~54k firms/£1k just below £90k vs ~6.9k above (7.8×). Root cause: turnover is drawn ~uniform within coarse ONS bands and the Adam reweighting has no target constraining within-band density shape, so it manufactures spikes to reconcile count and liability totals. No Beta(a,b) choice fixes this — it lives in the weights.

Proposed fixes (in order of value)

Per-sector Beta parameters, calibrated: set each sector's mean input ratio μ_s from HMRC sector liability ÷ sector turnover (or ONS Supply–Use intermediate-consumption shares), then a_s = μ_s·κ, b_s = (1−μ_s)·κ with a single concentration κ ≈ 6–10. Replaces both hand-coded sector sets. (a, b are exactly the right knobs — but per sector, matched to data, not global.)
Reintroduce repayment traders via the correct mechanism: a sector-level effective output rate r_s < 0.20 (zero-rating share: food, agriculture, exports), so liability = r_s·y − 0.20·x goes negative where it genuinely is — then liability-by-sector becomes a real calibration target instead of a diagnostic. (Not ρ > 1, which was the old bug's enabler.) If we'd rather keep the standard-rate approximation, we should drop negative sectors from the target explicitly and document it.
Constrain near-threshold density shape: a smoothness penalty (e.g. second differences of weighted density over ~£60k–£120k) or fine near-threshold count targets in the calibration loss, so the optimizer can't fabricate steps/spikes. Without this, any bunching/placebo statement on the synthetic file is at the optimizer's mercy.
Make the 43%-retention anchor the headline convention (or co-headline) — it closes most of the HMRC gap for a documented, literature-anchored reason. Caveat: once retention is tuned, the anchor is a consistency check, not validation.

One dependency to sequence deliberately: with per-sector rates and negative liabilities restored, the behavioural layer's creditor handling becomes first-order again (creditors rationally never deregister), so the solver convention needs to be fixed at the same time.

Both sides of the threshold now enter as shape-only targets scaled to the synthetic frame's own mass per side (below over £65-85k, above over £85-90k). The previous build imported OBR chart levels as direct counts above the threshold while frame-scaling the shape below, which inverted the cross-threshold ordering (more mass just above than just below the notch - economically backwards). The cross-threshold step now comes from the frame's own band structure; the OBR data supply only the within-side geometry. Regenerated the 2023-24 vintage and every downstream artifact, and propagated through the paper: - Menu (common £85k base): raise-to-100k -753m, taper -520m, 10% band -484m, 15% band -242m - Anchor (85k->90k): retention convention now matches HMRC's published 2025-26 costing to £1m (-184.1m vs -185m); full-dereg -323m - Near-threshold density 13,839 -> 10,319 per £1k across the notch (correct ordering, -25% step) - Bunching: E=7,933, b_LLAT=1.91 (target-inherited as before); placebos still 0; recovery 15/49/62% - Dominated-region mass 156,040; reduced-rate totals 152,675 (15%) / 154,840 (10%) - Seed sensitivity re-run under the new scaling via new reproducible scripts/seed_sensitivity.py (E +/-161, costs +/-£2m) - data.tex/conclusion.tex now describe the side-consistent convention; turnover-distribution caption flags the £90k fine-window edge as a target boundary Responds to review discussion of the cross-threshold ordering. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Propagates the side-consistent frame-scaling build (paper PR #21) on top of Vahid's presentation trims: menu -753/-520/-484/-242; anchor retention matches HMRC 2025-26 to £1m (-184.1m vs -185m), full-dereg -323m; E=7,933; dominated region ~156k firms; behavioural table -753 flat, offsets <6%; appendix build slide states the shape-only side-consistent convention. Data figures re-synced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis and others added 4 commits July 1, 2026 22:37

MaxGhenis and others added 3 commits July 2, 2026 08:17

Fix lint (unused import)

048c09f

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxGhenis mentioned this pull request Jul 2, 2026

Poster: definitive-build numbers #24

Closed

MaxGhenis mentioned this pull request Jul 2, 2026

Extend the population frame to all businesses (BPE) so threshold cuts and the administrative cliff are in-frame #25

Open

MaxGhenis merged commit fab07c0 into main Jul 2, 2026
2 checks passed

MaxGhenis deleted the fix/vat-liability-scaling-v2 branch July 2, 2026 09:25

This was referenced Jul 2, 2026

Fix vat liability scaling #17

Closed

Restore VAT-liability-by-sector as a calibration target #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct net-VAT liability scaling; rewrite paper on the corrected model#21

Correct net-VAT liability scaling; rewrite paper on the corrected model#21
MaxGhenis merged 9 commits into
mainfrom
fix/vat-liability-scaling-v2

MaxGhenis commented Jul 1, 2026

Uh oh!

vahid-ahmadi commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MaxGhenis commented Jul 1, 2026

Summary

The defect and its consequences

Model fixes

Headline results on the corrected model

Paper rewrite

Upstream

Uh oh!

vahid-ahmadi commented Jul 2, 2026

1. Sector heterogeneity is a three-level caricature

2. The new clamp makes negative sectors unreachable by construction

3. The HMRC overshoot is mostly convention, not shape

4. The near-threshold density shape is an optimizer artifact

Proposed fixes (in order of value)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants