Skip to content

Correct net-VAT liability scaling; rewrite paper on the corrected model#21

Merged
MaxGhenis merged 9 commits into
mainfrom
fix/vat-liability-scaling-v2
Jul 2, 2026
Merged

Correct net-VAT liability scaling; rewrite paper on the corrected model#21
MaxGhenis merged 9 commits into
mainfrom
fix/vat-liability-scaling-v2

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Summary

Completes the net-VAT liability fix from #15 (superseding draft PR #17), re-runs the entire pipeline on the corrected model, and rewrites the paper against the corrected numbers plus a seven-referee review (methodology, red-team, domain, citations, neutrality, style, reproducibility).

The defect and its consequences

vat_liability_k was turnover − input (no ×0.20), inflating per-firm liabilities ~5×, with ρ ∈ [0.6, 1.5] leaving ~47% of firms with negative value added. The weight optimiser compensated on band totals (aggregate scores barely moved) while distorting the near-threshold density — manufacturing the paper's "reproduction" of the administrative bunching step.

Model fixes

  • v = 0.20 × (turnover − input); ρ ∈ [0.1, 0.95], mean VA share 40%, zero negative-VA firms (from Fix vat liability scaling #17, applied to current main)
  • Behavioural solver rewritten: the damped fixed-point iteration has no fixed point for firms straddling a reform notch (iterates oscillate; results were iteration-parity artifacts). Replaced with a closed-form region-confined solve using the formulation-A response ratio y* = y_obs[(1−τf₁)/(1−τf₀)]^e (deductible share cancels). e→0 nesting now exact; baseline reproduction asserted; raise-to-£100k behavioural ≡ static at every e — a derived invariance, asserted in the crosscheck
  • Bunching estimator: removed non-identified elasticity outputs (σ/Π/eps, incl. an ad hoc τ/2 normalisation from a deleted model); fixed bootstrap replicate weights (were scaled n/W)
  • Recovery test redesigned: the old injection was invisible to the counterfactual fit by construction (donor+deposit both inside the exclusion window); new KW-consistent injection extends beyond the window and is scored by the headline estimator
  • Secondary-notch mass block in dominated_region_mass; new results/static_sweep.txt artifact (the sweep table was previously figure-pixels only); voluntary-retention sensitivity on the anchor

Headline results on the corrected model

Object Old Corrected
Common-base menu (2023-24, base) £183.6bn £184.7bn
Raise to £100k −£508m −£698m (behavioural = static at every e)
Graduated taper −£336m −£466m
Reduced rate 10% / 15% −£343m / −£171m −£460m / −£230m
Anchor 85→90k (2025-26) vs HMRC −£185m −£175m ("within £10m") −£248m full-dereg / −£141m retention — brackets HMRC in all 5 years
Bunching at 85k E=8,712, b=0.060 ("reproduces admin step") E=0, b=−0.139
Bunching at 90k (2024-25 vintage) E≈196k spurious, spec-robust — band-edge artifact
Behavioural offsets 40–75% of static 0% (raise), <4% (bands)

Paper rewrite

Every section updated. Section 5 reframed around the correction: band-calibrated synthetic data cannot support bunching inference in either direction — the corrected data show nothing at £85k while the same generator shows a huge spurious signal at the £90k band edge, robust in all 16 sensitivity cells. The earlier draft's artifact is documented openly as the sharpest demonstration of the thesis. Value-added "robustness" paragraph (which implied dominated widths above the £21,250 statutory cap) replaced by the formulation-A invariance result. Anchor mechanism corrected (uprated counterfactual threshold path, not "fiscal drag"). Citations: LLAT sample setting corrected (2004–14, £58k–£81k thresholds — not "the same £85,000 notch"); Chetty et al. credited for the polynomial counterfactual; HMT 2018 call for evidence + Council Directive (EU) 2020/285 added; URLs/DOIs throughout. Full referee reports available on request.

Upstream

Closes #15. Supersedes #17.

🤖 Generated with Claude Code

MaxGhenis and others added 4 commits July 1, 2026 22:37
v_i = 0.20 x (turnover - inputs) in generator, calibration, and validator;
rho recentred to mean ~0.6 and clamped to [0.1, 0.95] so value added is
strictly positive; secondary-notch mass block in dominated_region_mass.

Cherry-picked from origin/fix-vat-liability-scaling (issue #15) without
the tex changes, which are superseded by the rewrite in this branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Replace the damped fixed-point forward solve (no fixed point at reform
  notches; iterates oscillate) with a closed-form region-confined solve:
  each firm re-optimises within the schedule region containing its observed
  turnover, using the formulation-A response ratio
  y* = y_obs [(1-tau f1)/(1-tau f0)]^e (deductible share cancels).
  e->0 nesting is now exact; a pure threshold raise provably has zero
  intensive-margin offset (behavioural == static at every e) — asserted in
  the crosscheck; baseline reproduction asserted in reform_revenue.
- Remove the non-identified elasticity outputs (sigma/Pi/eps) and the
  tau/2 wedge normalisation inherited from a deleted sigmoid model from the
  bunching estimator; keep the geometry (b, b_llat, E, Delta_R, y_R).
- Fix bootstrap replicate weights to sum to the population mass W rather
  than the row count n (E replicates were scaled by n/W).
- Redesign the recovery test: KW-consistent injection with missing mass
  extending beyond the exclusion window, scored by the headline estimator
  (the previous injection was invisible to the counterfactual fit by
  construction).
- Relabel e sweep values as assumptions (0.05 external KW anchor); remove
  the false 'calibrated to reduced-form estimates' provenance.
- report.py: fall back to the generic output CSV name; taper guard in the
  dynamic CLI.

Addresses issues #15 and the methodology findings in the review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both vintages rebuilt with v = 0.20(turnover - inputs); all artifacts
regenerated: calibration report (89.4%/89.4%), static sweep + anchor
(new static_sweep.txt artifact with voluntary-retention sensitivity),
reform menu (raise -698m, taper -466m, 10% band -460m, 15% band -230m
on the common 184.7bn base), behavioural table (raise == static at
every e; band offsets second-order), bunching (E=0 at 85k; spurious
E=196k at the 90k band edge on the 2024-25 vintage), placebo (E=0
under all treatments), redesigned recovery test, dominated-region
masses, iso-optimum verification.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Abstract/intro/data/static/bunching/model/behavioural/conclusion and both
appendices rewritten against the corrected pipeline and the review findings:

- Section 5 reframed: the corrected data contain no bunching at 85k
  (E=0, b=-0.139) and a spurious spec-robust signal at the 90k band edge
  (E~196k); the earlier draft's 'reproduced step' is documented as a
  liability-scaling artifact and its correction reported openly.
- Anchor comparison rewritten: full-deregistration (-248m, 2025-26) and
  43% voluntary-retention (-141m) conventions bracket HMRC's -185m in
  every forecast year; 'fiscal drag lifts the frozen baseline' replaced
  with the uprated counterfactual threshold path the code implements.
- Behavioural section rewritten around the region-confined solver:
  raise-to-100k behavioural == static at every e (derived and asserted);
  reduced-rate offsets under 4%; taper exclusion restated structurally;
  the orphaned 'pathological FOC solve' claim removed; e sweep relabelled
  as assumptions (KW 0.05 anchor; LLAT UK 0.09-0.14 noted).
- Dominated region: formulation-A invariance replaces the erroneous
  per-firm tau0/(1-tau0) robustness paragraph (which implied widths above
  the statutory cap); masses updated to the corrected population.
- Static: new sweep (2024-25 vintage, explicitly introduced), menu on the
  184.7bn base, direct-vs-smooth method reconciliation, +5.8% base
  overshoot disclosed against HMRC liability-target sums.
- Neutrality/style: metric-scoped verdicts only; hedges consolidated;
  self-praise removed; findings renumbered to four consistently.
- Citations: LLAT setting corrected (2004-14 sample, 58k-81k thresholds);
  Chetty et al. credited for the polynomial counterfactual; URLs/DOIs
  added throughout; benedek2015 to techreport; belloncopestake year fixed;
  HMT 2018 call for evidence and Council Directive (EU) 2020/285 added;
  ONS panel relabelled to the VAT/PAYE-registered frame with a BPE note;
  150k step FRS attribution removed (band edge; step absent in corrected
  data).
- Title: 'An Open Firm-Level Microsimulation of the UK VAT Registration
  Threshold'. README tables updated; upstream Populace inheritance of the
  mis-scaling flagged.

Closes #15. Supersedes #17.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MaxGhenis and others added 3 commits July 2, 2026 08:17
Adds the OBR Mar-2023 EFO Chart C data (already in-repo) as calibration
targets for the 2023-24 vintage: five bins at/above the threshold as
direct counts (same universe as the coarse HMRC band they refine), twenty
below as shape targets normalised over the £65k-£85k window (the chart
universe is narrower than the ONS frame there). The 2023-24 profile is
interpolated between the 2019-20 outturn and the 2025-26 frozen-threshold
projection. The near-threshold density now rises into the threshold and
steps down across it - the administratively observed bunching profile,
held as an explicit cited target. Placebo B regenerates without the fine
targets and returns E=0. Responds to Nikhil's review comment that the
previous optimizer-equilibrium shape looked economically backwards.

Numbers on the OBR-target build: menu -783/-550/-497/-248 on £184.8bn;
anchor conventions overshoot HMRC early (-366/-209 vs -185) and match by
2026-27 (-125.2 vs -125), sign flip reproduced; dominated-band mass
~160k; bunching at 85k reads the target-inherited E=14,663 (placebo->0);
90k spurious signal unchanged; recovery 55/75/82%; calibration holds at
89.4%. All propagated through paper, artifacts, and figures; anchor
figure y-limits now scale with data.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s; add --fast stratified sampling

Root-cause fix for the seam Nikhil's review surfaced: the HMRC
GBP1-to-Threshold liability total is remitted by voluntary registrants
(~GBP2,150 average net, input-reclaim traders) the liability model does
not represent, but was calibrated against the whole below-threshold
population at the model's ~8% net rate (1.76x over-subscribed), draining
weights in the GBP50k-85k region. Now an informational diagnostic, like
liability by sector. One exclusion fixes three artifacts: the weight
cliff at the OBR window edge (shoulder now continuous at ~1.05-1.15
average weight), the 2024-25 vintage's weak liability calibration
(79% -> 92%; overall 89.4% -> 92.8%), and the spec-robust E~196k monster
at the 90k band edge (now E=11,072 with b_LLAT=1.23 - within 10% of the
administrative 1.361, the paper's cautionary exhibit in its sharpest
form).

Bootstrap SEs removed throughout: the synthetic file is a deterministic
construction, so row-resampling has no estimand and its dispersion
scales with the analyst-chosen row count. Replaced by specification
grids plus generator-seed sensitivity (results/seed_sensitivity.txt:
E +/-2%, reform costs +/-GBP1m across three full-size seeds).

--fast mode: stratified thinning (30% in the GBP15k-150k window, 5%
tails, per-stratum floors and exact ratio base weights so all targets
stay true totals; optimizer initialised at and penalised toward base).
15 seconds per vintage vs ~13 minutes; aggregates within 0.3%, local
bunching stats within ~5%.

Definitive numbers: menu -784/-550/-497/-249 on GBP184.8bn; anchor
-358/-366/-219/-75/+101 full-dereg, -204/-208/-125/-43/+57 retention
(2026-27 matches -125.1 vs -125); sweep base GBP200.9bn; E=7,873
(b_LLAT 2.55) target-inherited at 85k, placebos -> 0; recovery
15/50/62%. All propagated through paper, README, artifacts, figures.

Closes #23.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…sons

Responds to review: the turnover-distribution levels reflect the ONS
VAT/PAYE-registered frame; unregistered sole traders - most of the
below-threshold mass and the cross-threshold cliff in all-business
administrative charts (e.g. Tax Policy Associates 2018-19) - are outside
it, so level comparisons to such charts are not like-for-like. Above the
threshold the universes coincide and levels match (ours 10.7-11.5k/1k vs
~10.4k all-business).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vahid-ahmadi

Copy link
Copy Markdown
Contributor

cc @MaxGhenis — flagging some data-generation design questions on this branch before we lock it in.

Since this PR rewrites the generation pipeline substantially, I think we should capture sector heterogeneity of VAT liability properly at the same time. Right now it isn't, and the new setup makes it structurally harder:

1. Sector heterogeneity is a three-level caricature

The input/output ratio is a single global Beta(4,2) mapped to [0.2, 0.8], plus two hand-coded SIC sets (17 "negative-liability" sectors shifted up by U[0, 0.3], 5 "high-liability" sectors shifted down by U[0, 0.2]) — calibrated to nothing (generate.py:82–84, 164–170). The branch's own calibration report shows the consequence: VAT liability by sector = 45% accuracy, demoted to an "informational diagnostic", while every calibrated dimension sits at 78–94%.

2. The new clamp makes negative sectors unreachable by construction

We have the right data in-repo (data/processed/2023-24/hmrc_vat_liability_by_sector.csv, ~80 sectors) and it includes genuinely negative sectors — crop production −£2,330m, fishing −£100m, oil/gas −£460m (zero-rated outputs → repayment traders). The new ρ ∈ [0.1, 0.95] clamp forces every firm's liability strictly positive, so those sector targets can never be matched — the 45% is a ceiling, not a tuning problem. The old model was wrong at the firm level (no ×0.20) but could at least represent negative sector aggregates; this branch fixes the rate but silently deletes the repayment-trader mechanism.

This matters for the paper directly: sector composition near £85–90k sets the net-rate distribution τ₀ there, which drives the anchor cost, every behavioural wedge, and the firm-level dominated-region analysis.

3. The HMRC overshoot is mostly convention, not shape

Anchor now −358/−365 vs HMRC −150/−185 — but this branch's own artifact shows that with 43% voluntary retention (Liu et al.) of released-firm liability the series becomes −204 / −208 / −125 / −43 / +57 vs HMRC's −150 / −185 / −125 / −50 / +65: within ~£25m every year. So the headline gap is dominated by the assume-everyone-deregisters convention, not by the Beta parameters. The residual is where sector/VA-share composition near the threshold matters.

4. The near-threshold density shape is an optimizer artifact

On the corrected model the 2023-24 file has a density step up at £85k (below/above ≈ 0.74 — the opposite of admin data), and the 2024-25 vintage piles ~54k firms/£1k just below £90k vs ~6.9k above (7.8×). Root cause: turnover is drawn ~uniform within coarse ONS bands and the Adam reweighting has no target constraining within-band density shape, so it manufactures spikes to reconcile count and liability totals. No Beta(a,b) choice fixes this — it lives in the weights.

Proposed fixes (in order of value)

  1. Per-sector Beta parameters, calibrated: set each sector's mean input ratio μ_s from HMRC sector liability ÷ sector turnover (or ONS Supply–Use intermediate-consumption shares), then a_s = μ_s·κ, b_s = (1−μ_s)·κ with a single concentration κ ≈ 6–10. Replaces both hand-coded sector sets. (a, b are exactly the right knobs — but per sector, matched to data, not global.)
  2. Reintroduce repayment traders via the correct mechanism: a sector-level effective output rate r_s < 0.20 (zero-rating share: food, agriculture, exports), so liability = r_s·y − 0.20·x goes negative where it genuinely is — then liability-by-sector becomes a real calibration target instead of a diagnostic. (Not ρ > 1, which was the old bug's enabler.) If we'd rather keep the standard-rate approximation, we should drop negative sectors from the target explicitly and document it.
  3. Constrain near-threshold density shape: a smoothness penalty (e.g. second differences of weighted density over ~£60k–£120k) or fine near-threshold count targets in the calibration loss, so the optimizer can't fabricate steps/spikes. Without this, any bunching/placebo statement on the synthetic file is at the optimizer's mercy.
  4. Make the 43%-retention anchor the headline convention (or co-headline) — it closes most of the HMRC gap for a documented, literature-anchored reason. Caveat: once retention is tuned, the anchor is a consistency check, not validation.

One dependency to sequence deliberately: with per-sector rates and negative liabilities restored, the behavioural layer's creditor handling becomes first-order again (creditors rationally never deregister), so the solver convention needs to be fixed at the same time.

Both sides of the threshold now enter as shape-only targets scaled to
the synthetic frame's own mass per side (below over £65-85k, above over
£85-90k). The previous build imported OBR chart levels as direct counts
above the threshold while frame-scaling the shape below, which inverted
the cross-threshold ordering (more mass just above than just below the
notch - economically backwards). The cross-threshold step now comes
from the frame's own band structure; the OBR data supply only the
within-side geometry.

Regenerated the 2023-24 vintage and every downstream artifact, and
propagated through the paper:

- Menu (common £85k base): raise-to-100k -753m, taper -520m, 10% band
  -484m, 15% band -242m
- Anchor (85k->90k): retention convention now matches HMRC's published
  2025-26 costing to £1m (-184.1m vs -185m); full-dereg -323m
- Near-threshold density 13,839 -> 10,319 per £1k across the notch
  (correct ordering, -25% step)
- Bunching: E=7,933, b_LLAT=1.91 (target-inherited as before); placebos
  still 0; recovery 15/49/62%
- Dominated-region mass 156,040; reduced-rate totals 152,675 (15%) /
  154,840 (10%)
- Seed sensitivity re-run under the new scaling via new reproducible
  scripts/seed_sensitivity.py (E +/-161, costs +/-£2m)
- data.tex/conclusion.tex now describe the side-consistent convention;
  turnover-distribution caption flags the £90k fine-window edge as a
  target boundary

Responds to review discussion of the cross-threshold ordering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MaxGhenis added a commit that referenced this pull request Jul 2, 2026
Propagates the side-consistent frame-scaling build (paper PR #21) on top
of Vahid's presentation trims: menu -753/-520/-484/-242; anchor
retention matches HMRC 2025-26 to £1m (-184.1m vs -185m), full-dereg
-323m; E=7,933; dominated region ~156k firms; behavioural table -753
flat, offsets <6%; appendix build slide states the shape-only
side-consistent convention. Data figures re-synced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit fab07c0 into main Jul 2, 2026
2 checks passed
@MaxGhenis MaxGhenis deleted the fix/vat-liability-scaling-v2 branch July 2, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix per-firm VAT liability scaling (value-added → net VAT) + repair bunching reproduction

2 participants