Skip to content

Han Feedback: plan-implementation — 2026-06-30 #101

Description

@mjansen401

Han Feedback — 2026-06-30

Skills used: han:plan-implementation
Context: Planning the implementation of a dashboard feature with two additions — a revenue-mix KPI/column and a "gone-dark" client detector (clients with prior-year revenue and none this year) flagged in a list that is built from a filtered population.
Outcome: Full implementation plan produced in 1 round (medium team, 4 specialists), 9 decisions, shipped and verified — but it carried a latent design flaw that the team had already surfaced and the aggregation silently dropped.


What worked well

  • The junior-developer found the load-bearing flaw on the first pass. Its R1 output named, verbatim, that the list is built only from records flagged "active," and that a client which has gone dark "is already invisible to the page and can't be flagged by this metric." That is the exact failure that later showed up in manual testing — the detector was structurally pinned at zero because every member of its target population is excluded by the upstream filter before the check runs. The reframing agent earned its seat.
  • The data-engineer's grain analysis was correct and durable. It pinned an additive-streams assumption introduced by a recent schema change and recommended a regression test to lock it. That test is exactly what protects the aggregation from a silent future break. Strong, evidence-cited, no over-reach.
  • The convergent finding across two specialists was real signal. Data-engineer and test-engineer independently flagged that a portfolio-level percentage cannot be back-computed from per-row percentages (revenue weighting is lost). Catching the same defect from two domains raised confidence correctly and the plan adopted the raw-sum approach.
  • Discovery notes as shared context held up. Specialists read the prepared notes first and built on them; the corrected provenance (a prior PR deferred rather than removed the work; a newer PR moved the schema) was captured once and reused, not rediscovered.
  • YAGNI discipline was right-sized. Drill-down drawers, filter chips, an index, and a percentage clamp were all correctly deferred with reopening triggers rather than built speculatively.

What didn't work

  • The single most important failure: the deterministic aggregation (Step 5) demoted the junior-developer's finding to a "hidden assumption" bullet and never converted it to an Open Question or a user-facing decision. The finding met the bar for escalation — it invalidates the feature's core value, not a code detail — yet because it was phrased as an observation rather than "the spec is silent on X," the text-rule classifier filed it under spec-maturity noise and the gate math (which needs ≥5 spec-level findings across ≥3 specialists) never tripped on a single high-severity one. A lone finding that nullifies the feature should be able to trip the gate on its own. Severity, not just count, needs to feed the gate.
  • No specialist was positioned to catch the interaction. The flaw lives at the seam between "how the list population is built" (an existing filter) and "what the new detector assumes about that population." The junior-developer reached across that seam; the domain specialists each stayed in their lane and the orchestration didn't route the cross-cutting finding to anyone to adjudicate. When the junior-developer is the only one who crosses a seam, that finding deserves more weight, not less.
  • The plan's own RAID log recorded the assumption as "Open — documented" and shipped it. Documenting a value-nullifying assumption in a RAID table is not the same as surfacing it for a decision. The plan read as "ship as planned, one non-blocking open item," when in fact the open item was the difference between the feature working and the feature being inert. The operator caught it in manual testing — the plan should have caught it at synthesis.
  • The aggregation summary made the omission invisible downstream. Because the finding was logged as a hidden-assumption bullet rather than an OQ with a recommended answer, the synthesis agent never had it in front of it as a decision needing a call, and the operator never saw it in the plan summary. The lossy step was the classification, and nothing after it could recover what was dropped.

Overall

The team did its job — the right finding was on the table after round one, in plain language, from the agent specifically meant to stress-test assumptions. The failure was entirely in the deterministic aggregation layer between the specialists and the plan: a count-based, keyword-classified gate that treats a feature-nullifying observation the same as a minor stylistic note because it didn't contain the phrase "the spec is silent." The fix is not more specialists or more rounds; it is a severity dimension on the spec-maturity gate so that a single finding which invalidates the feature's premise forces an Open Question with a recommended answer, regardless of how many specialists raised it or how it was phrased. Everything else about the run was good — fast convergence, correct YAGNI calls, durable regression tests. This one classification gap turned a good plan into one that shipped a latent dud, and it was recoverable only because a human exercised the feature by hand.


Rating (informal)

Dimension Score
Specialist finding quality (the right flaw was found) 5/5
Aggregation fidelity (did findings survive to the plan?) 2/5
Spec-maturity gate calibration (severity vs. count) 2/5
Cross-seam finding routing 2/5
YAGNI discipline 5/5
Regression-test instinct (grain lock) 5/5
Round efficiency 4/5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions