Han Feedback: plan-implementation — 2026-06-30

# Han Feedback — 2026-06-30

**Skills used:** `han:plan-implementation`
**Context:** Planning the implementation of a dashboard feature with two additions — a revenue-mix KPI/column and a "gone-dark" client detector (clients with prior-year revenue and none this year) flagged in a list that is built from a filtered population.
**Outcome:** Full implementation plan produced in 1 round (medium team, 4 specialists), 9 decisions, shipped and verified — but it carried a latent design flaw that the team had already surfaced and the aggregation silently dropped.

---

## What worked well

- **The junior-developer found the load-bearing flaw on the first pass.** Its R1 output named, verbatim, that the list is built only from records flagged "active," and that a client which has gone dark "is already invisible to the page and can't be flagged by this metric." That is the exact failure that later showed up in manual testing — the detector was structurally pinned at zero because every member of its target population is excluded by the upstream filter before the check runs. The reframing agent earned its seat.
- **The data-engineer's grain analysis was correct and durable.** It pinned an additive-streams assumption introduced by a recent schema change and recommended a regression test to lock it. That test is exactly what protects the aggregation from a silent future break. Strong, evidence-cited, no over-reach.
- **The convergent finding across two specialists was real signal.** Data-engineer and test-engineer independently flagged that a portfolio-level percentage cannot be back-computed from per-row percentages (revenue weighting is lost). Catching the same defect from two domains raised confidence correctly and the plan adopted the raw-sum approach.
- **Discovery notes as shared context held up.** Specialists read the prepared notes first and built on them; the corrected provenance (a prior PR deferred rather than removed the work; a newer PR moved the schema) was captured once and reused, not rediscovered.
- **YAGNI discipline was right-sized.** Drill-down drawers, filter chips, an index, and a percentage clamp were all correctly deferred with reopening triggers rather than built speculatively.

---

## What didn't work

- **The single most important failure: the deterministic aggregation (Step 5) demoted the junior-developer's finding to a "hidden assumption" bullet and never converted it to an Open Question or a user-facing decision.** The finding met the bar for escalation — it invalidates the feature's core value, not a code detail — yet because it was phrased as an observation rather than "the spec is silent on X," the text-rule classifier filed it under spec-maturity noise and the gate math (which needs ≥5 spec-level findings across ≥3 specialists) never tripped on a single high-severity one. A lone finding that nullifies the feature should be able to trip the gate on its own. Severity, not just count, needs to feed the gate.
- **No specialist was positioned to catch the interaction.** The flaw lives at the seam between "how the list population is built" (an existing filter) and "what the new detector assumes about that population." The junior-developer reached across that seam; the domain specialists each stayed in their lane and the orchestration didn't route the cross-cutting finding to anyone to adjudicate. When the junior-developer is the only one who crosses a seam, that finding deserves *more* weight, not less.
- **The plan's own RAID log recorded the assumption as "Open — documented" and shipped it.** Documenting a value-nullifying assumption in a RAID table is not the same as surfacing it for a decision. The plan read as "ship as planned, one non-blocking open item," when in fact the open item was the difference between the feature working and the feature being inert. The operator caught it in manual testing — the plan should have caught it at synthesis.
- **The aggregation summary made the omission invisible downstream.** Because the finding was logged as a hidden-assumption bullet rather than an OQ with a recommended answer, the synthesis agent never had it in front of it as a decision needing a call, and the operator never saw it in the plan summary. The lossy step was the classification, and nothing after it could recover what was dropped.

---

## Overall

The team did its job — the right finding was on the table after round one, in plain language, from the agent specifically meant to stress-test assumptions. The failure was entirely in the deterministic aggregation layer between the specialists and the plan: a count-based, keyword-classified gate that treats a feature-nullifying observation the same as a minor stylistic note because it didn't contain the phrase "the spec is silent." The fix is not more specialists or more rounds; it is a severity dimension on the spec-maturity gate so that a single finding which invalidates the feature's premise forces an Open Question with a recommended answer, regardless of how many specialists raised it or how it was phrased. Everything else about the run was good — fast convergence, correct YAGNI calls, durable regression tests. This one classification gap turned a good plan into one that shipped a latent dud, and it was recoverable only because a human exercised the feature by hand.

---

## Rating (informal)

| Dimension | Score |
|---|---|
| Specialist finding quality (the right flaw was found) | 5/5 |
| Aggregation fidelity (did findings survive to the plan?) | 2/5 |
| Spec-maturity gate calibration (severity vs. count) | 2/5 |
| Cross-seam finding routing | 2/5 |
| YAGNI discipline | 5/5 |
| Regression-test instinct (grain lock) | 5/5 |
| Round efficiency | 4/5 |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Han Feedback: plan-implementation — 2026-06-30 #101

Han Feedback — 2026-06-30

What worked well

What didn't work

Overall

Rating (informal)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dimension	Score
Specialist finding quality (the right flaw was found)	5/5
Aggregation fidelity (did findings survive to the plan?)	2/5
Spec-maturity gate calibration (severity vs. count)	2/5
Cross-seam finding routing	2/5
YAGNI discipline	5/5
Regression-test instinct (grain lock)	5/5
Round efficiency	4/5

Uh oh!

Han Feedback: plan-implementation — 2026-06-30 #101

Description

Han Feedback — 2026-06-30

What worked well

What didn't work

Overall

Rating (informal)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions