test(e2e): harden approved-real manifest boundaries by DeliciousBuding · Pull Request #410 · TokenDanceLab/AgentHub

DeliciousBuding · 2026-06-28T19:09:51Z

Summary

Harden observed and approved-real data-mode manifest validation so replay/readiness rows cannot overclaim real execution.
Require approved-real rows with real_tested=true to include approval reference plus real-login and real CLI/model/API claims.
Keep packaged Desktop and release upload claims tied to packaged-release evidence, and align smoke-matrix contract checks with current stubbed-Hub replay naming.
Sync the active SPEC progress log for T3.3.

Task Issue

Closes #388

S.U.P.E.R Review

Ports over Implementation: evidence claims are machine-validated contract fields, not prose-only assertions.
Environment-Agnostic: observed, stubbed, approved-real, and packaged-release evidence levels stay separate.

Tests

corepack.cmd pnpm --dir app install --frozen-lockfile --prefer-offline
corepack.cmd pnpm --dir app/shared exec vitest run src/testing/e2eDataModeContract.test.ts src/testing/e2eEvidenceManifest.test.ts (19 passed)
pwsh ./scripts/verify/verify-real-e2e-contract.ps1
pwsh ./tests/contract/scripts/verify-approved-real-preflight.ps1 -RepoRoot .
pwsh ./tests/contract/scripts/verify-e2e-smoke-matrix.ps1 -RepoRoot .
pwsh ./scripts/verify/verify-doc-ssot.ps1
pwsh ./scripts/verify/verify-project-skills.ps1
git diff --check

Evidence Boundary

real_tested=false for this PR's verification: no real TokenDance ID login, no real CLI/model/API execution, no Hub production dispatch, no packaged Tauri/Desktop installer, no signing, no release upload.
The added approved-real real_tested=true examples are fixture/unit manifest contract cases only; they validate required fields and do not execute a real path.

Project Governance

Instruction surfaces: .agents/skills/real-e2e-acceptance/SKILL.md updated with the durable approved-real row boundary.
Progress: docs/progress/MASTER.md updated for T3.3 implementation state.

Summary by CodeRabbit

Bug Fixes
- Tightened validation for evidence and E2E contract data so invalid combinations are rejected earlier.
- Improved checks around approved-real and packaged-release entries to keep evidence details consistent.
- Updated smoke-matrix expectations to reflect the current replay/stubbed-Hub workflow.

coderabbitai · 2026-06-28T19:10:05Z

📝 Walkthrough

Walkthrough

Tightens E2E evidence boundary enforcement: validateEvidenceRow now requires approval_ref plus explicit real_login/real_cli_or_model claims on approved-real rows with real_tested=true, and gates release_upload on packaged-release evidence level. validateE2EDataModeScenario adds similar guards for observed-hub-replay and approved-real-preflight data sources. Tests, verification scripts, and SKILL.md documentation are updated accordingly, and smoke-matrix scripts align to web-stubbed-hub-playwright naming.

Changes

T3.3 Observed/Approved-Real Manifest Boundary Hardening

Layer / File(s)	Summary
Evidence manifest validation rules `app/shared/src/testing/e2eEvidenceManifest.ts`	`validateEvidenceRow` adds a branch requiring `approval_ref` + `real_login` + `real_cli_or_model` for `approved-real` rows with `real_tested=true`; `release_upload` check now requires `evidence_level === 'packaged-release'`.
Data-mode contract scenario validation `app/shared/src/testing/e2eDataModeContract.ts`	`validateE2EDataModeScenario` gains branches rejecting `observed-hub-replay` scenarios with real-tested flags or TokenDance secret usage, and rejecting `approved-real-preflight` scenarios with mock adapter usage or wrong `dataMode`.
Evidence manifest and data-mode contract tests `app/shared/src/testing/e2eEvidenceManifest.test.ts`, `app/shared/src/testing/e2eDataModeContract.test.ts`	New test cases cover observed-local non-real builds, approved-real acceptance/rejection on missing/present claims, packaged-release claim validity, observed-hub-replay manifest output, and approved-real-preflight manifest output with negative mutation checks.
Verification scripts and SKILL.md `scripts/verify/verify-real-e2e-contract.ps1`, `tests/contract/scripts/verify-e2e-smoke-matrix.ps1`, `.agents/skills/real-e2e-acceptance/SKILL.md`	`verify-real-e2e-contract.ps1` asserts smoke-matrix rows do not set `real_tested=true` and checks SKILL.md for required boundary strings; smoke-matrix script updates expected Web row to `web-stubbed-hub-playwright` and task-contract assertions to stubbed-Hub data source; SKILL.md documents the new `approved-real`/`real_tested=true` rule.
Progress log `docs/progress/MASTER.md`	Adds T3.3 session log entry for 2026-06-29.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

TokenDanceLab/AgentHub#395: Introduced the buildChatFlowEvidenceManifest/validateChatFlowEvidenceManifest contract that this PR directly tightens with new approved-real and release_upload validation rules.
TokenDanceLab/AgentHub#397: Also modifies app/shared/src/testing/e2eDataModeContract.ts and its tests, adding request-decision logic in the same scenario validation module extended here.
TokenDanceLab/AgentHub#406: Extends e2eDataModeContract.ts scenario checks and evidence-level expectations for specific dataSource modes, closely related to the new observed-hub-replay/approved-real-preflight branches added here.

Poem

🐇 Hop, hop! The boundary's tight,
Real claims need proof, or they're not right.
approval_ref must be in place,
Mock adapters cannot take real's space.
The manifest knows what's true or not—
No overclaiming the evidence lot! ✅

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly names the main change: hardening approved-real e2e manifest boundaries.
Linked Issues check	✅ Passed	The changes enforce approved-real and readiness/stubbed boundaries, keep packaged-release separate, and add the requested verifier and fixture coverage.
Out of Scope Changes check	✅ Passed	The diff stays focused on manifest boundary hardening, related tests, and the documented progress or instruction surfaces.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch task/388-manifest-boundary

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/shared/src/testing/e2eDataModeContract.test.ts`:
- Around line 296-304: The e2e data mode contract tests are asserting exact
validator error prose, which makes them brittle to wording-only changes. Update
the affected expectations in the e2eDataModeContract.test suite to verify the
semantic violation shape returned by the validator (for example, the specific
failure categories or identifiers in the result from the contract check) rather
than matching hard-coded full error strings. Keep the existing test cases for
desktop-observed-replay, but assert the meaningful contract fields from the
validator output instead of exact message text.

In `@app/shared/src/testing/e2eEvidenceManifest.test.ts`:
- Around line 158-165: The test for validateChatFlowEvidenceManifest is pinning
exact validator prose, making wording-only changes fail unnecessarily. Update
the assertions to check stable machine-readable error codes/reasons, or use
predicate-based matching for the expected invariant instead of comparing full
error strings. Keep the coverage around the approved-real-missing-claims cases,
but make the checks resilient to message wording changes in
e2eEvidenceManifest.test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4027946d-7258-4bb1-bd5d-7d2a69f717bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1b167aa and 9e04be8.

📒 Files selected for processing (8)

.agents/skills/real-e2e-acceptance/SKILL.md
app/shared/src/testing/e2eDataModeContract.test.ts
app/shared/src/testing/e2eDataModeContract.ts
app/shared/src/testing/e2eEvidenceManifest.test.ts
app/shared/src/testing/e2eEvidenceManifest.ts
docs/progress/MASTER.md
scripts/verify/verify-real-e2e-contract.ps1
tests/contract/scripts/verify-e2e-smoke-matrix.ps1

coderabbitai · 2026-06-28T19:15:12Z

+    }, [])).toEqual({
+      ok: false,
+      errors: [
+        'desktop-observed-replay uses observed-hub-replay but dataMode is approved-real',
+        'desktop-observed-replay uses observed-hub-replay but claims real login was tested',
+        'desktop-observed-replay uses observed-hub-replay but claims real CLI/model execution',
+        'desktop-observed-replay uses observed-hub-replay but marks TokenDance ID secret usage',
+      ],
+    });


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Keep the failure assertions semantic, not prose-coupled.

These cases lock the suite to exact validator wording instead of the contract violation being reported. That makes harmless message edits look like regressions. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."

Also applies to: 339-344

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/shared/src/testing/e2eDataModeContract.test.ts` around lines 296 - 304, The e2e data mode contract tests are asserting exact validator error prose, which makes them brittle to wording-only changes. Update the affected expectations in the e2eDataModeContract.test suite to verify the semantic violation shape returned by the validator (for example, the specific failure categories or identifiers in the result from the contract check) rather than matching hard-coded full error strings. Keep the existing test cases for desktop-observed-replay, but assert the meaningful contract fields from the validator output instead of exact message text.

Source: Coding guidelines

coderabbitai · 2026-06-28T19:15:12Z

+    expect(validateChatFlowEvidenceManifest(manifest)).toEqual({
+      ok: false,
+      errors: [
+        'approved-real-missing-claims row approved-real-row sets real_tested=true without approval_ref',
+        'approved-real-missing-claims row approved-real-row sets real_tested=true without real_login claim',
+        'approved-real-missing-claims row approved-real-row sets real_tested=true without real_cli_or_model claim',
+      ],
+    });


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid pinning validator prose in these tests.

These assertions hard-code the full validation messages, so wording-only changes will fail the suite even when the contract is unchanged. Prefer stable machine-readable reasons/codes, or at least predicate-based checks for the relevant invariant instead of exact prose. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."

Also applies to: 221-223

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/shared/src/testing/e2eEvidenceManifest.test.ts` around lines 158 - 165, The test for validateChatFlowEvidenceManifest is pinning exact validator prose, making wording-only changes fail unnecessarily. Update the assertions to check stable machine-readable error codes/reasons, or use predicate-based matching for the expected invariant instead of comparing full error strings. Keep the coverage around the approved-real-missing-claims cases, but make the checks resilient to message wording changes in e2eEvidenceManifest.test.

Source: Coding guidelines

test(e2e): harden approved-real manifest boundaries

9e04be8

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

DeliciousBuding mentioned this pull request Jun 28, 2026

[T3.3] Observed and approved-real manifest boundary #388

Closed

6 tasks

DeliciousBuding merged commit 3c907a4 into dev/delicious233 Jun 28, 2026
21 checks passed

DeliciousBuding deleted the task/388-manifest-boundary branch June 28, 2026 19:16

DeliciousBuding mentioned this pull request Jun 28, 2026

docs(progress): sync phase 3 completion #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): harden approved-real manifest boundaries#410

test(e2e): harden approved-real manifest boundaries#410
DeliciousBuding merged 1 commit into
dev/delicious233from
task/388-manifest-boundary

DeliciousBuding commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DeliciousBuding commented Jun 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Task Issue

S.U.P.E.R Review

Tests

Evidence Boundary

Project Governance

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DeliciousBuding commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading