Skip to content

test(e2e): harden approved-real manifest boundaries#410

Merged
DeliciousBuding merged 1 commit into
dev/delicious233from
task/388-manifest-boundary
Jun 28, 2026
Merged

test(e2e): harden approved-real manifest boundaries#410
DeliciousBuding merged 1 commit into
dev/delicious233from
task/388-manifest-boundary

Conversation

@DeliciousBuding

@DeliciousBuding DeliciousBuding commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Harden observed and approved-real data-mode manifest validation so replay/readiness rows cannot overclaim real execution.
  • Require approved-real rows with real_tested=true to include approval reference plus real-login and real CLI/model/API claims.
  • Keep packaged Desktop and release upload claims tied to packaged-release evidence, and align smoke-matrix contract checks with current stubbed-Hub replay naming.
  • Sync the active SPEC progress log for T3.3.

Task Issue

Closes #388

S.U.P.E.R Review

  • Ports over Implementation: evidence claims are machine-validated contract fields, not prose-only assertions.
  • Environment-Agnostic: observed, stubbed, approved-real, and packaged-release evidence levels stay separate.

Tests

  • corepack.cmd pnpm --dir app install --frozen-lockfile --prefer-offline
  • corepack.cmd pnpm --dir app/shared exec vitest run src/testing/e2eDataModeContract.test.ts src/testing/e2eEvidenceManifest.test.ts (19 passed)
  • pwsh ./scripts/verify/verify-real-e2e-contract.ps1
  • pwsh ./tests/contract/scripts/verify-approved-real-preflight.ps1 -RepoRoot .
  • pwsh ./tests/contract/scripts/verify-e2e-smoke-matrix.ps1 -RepoRoot .
  • pwsh ./scripts/verify/verify-doc-ssot.ps1
  • pwsh ./scripts/verify/verify-project-skills.ps1
  • git diff --check

Evidence Boundary

  • real_tested=false for this PR's verification: no real TokenDance ID login, no real CLI/model/API execution, no Hub production dispatch, no packaged Tauri/Desktop installer, no signing, no release upload.
  • The added approved-real real_tested=true examples are fixture/unit manifest contract cases only; they validate required fields and do not execute a real path.

Project Governance

  • Instruction surfaces: .agents/skills/real-e2e-acceptance/SKILL.md updated with the durable approved-real row boundary.
  • Progress: docs/progress/MASTER.md updated for T3.3 implementation state.

Summary by CodeRabbit

  • Bug Fixes
    • Tightened validation for evidence and E2E contract data so invalid combinations are rejected earlier.
    • Improved checks around approved-real and packaged-release entries to keep evidence details consistent.
    • Updated smoke-matrix expectations to reflect the current replay/stubbed-Hub workflow.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Tightens E2E evidence boundary enforcement: validateEvidenceRow now requires approval_ref plus explicit real_login/real_cli_or_model claims on approved-real rows with real_tested=true, and gates release_upload on packaged-release evidence level. validateE2EDataModeScenario adds similar guards for observed-hub-replay and approved-real-preflight data sources. Tests, verification scripts, and SKILL.md documentation are updated accordingly, and smoke-matrix scripts align to web-stubbed-hub-playwright naming.

Changes

T3.3 Observed/Approved-Real Manifest Boundary Hardening

Layer / File(s) Summary
Evidence manifest validation rules
app/shared/src/testing/e2eEvidenceManifest.ts
validateEvidenceRow adds a branch requiring approval_ref + real_login + real_cli_or_model for approved-real rows with real_tested=true; release_upload check now requires evidence_level === 'packaged-release'.
Data-mode contract scenario validation
app/shared/src/testing/e2eDataModeContract.ts
validateE2EDataModeScenario gains branches rejecting observed-hub-replay scenarios with real-tested flags or TokenDance secret usage, and rejecting approved-real-preflight scenarios with mock adapter usage or wrong dataMode.
Evidence manifest and data-mode contract tests
app/shared/src/testing/e2eEvidenceManifest.test.ts, app/shared/src/testing/e2eDataModeContract.test.ts
New test cases cover observed-local non-real builds, approved-real acceptance/rejection on missing/present claims, packaged-release claim validity, observed-hub-replay manifest output, and approved-real-preflight manifest output with negative mutation checks.
Verification scripts and SKILL.md
scripts/verify/verify-real-e2e-contract.ps1, tests/contract/scripts/verify-e2e-smoke-matrix.ps1, .agents/skills/real-e2e-acceptance/SKILL.md
verify-real-e2e-contract.ps1 asserts smoke-matrix rows do not set real_tested=true and checks SKILL.md for required boundary strings; smoke-matrix script updates expected Web row to web-stubbed-hub-playwright and task-contract assertions to stubbed-Hub data source; SKILL.md documents the new approved-real/real_tested=true rule.
Progress log
docs/progress/MASTER.md
Adds T3.3 session log entry for 2026-06-29.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • TokenDanceLab/AgentHub#395: Introduced the buildChatFlowEvidenceManifest/validateChatFlowEvidenceManifest contract that this PR directly tightens with new approved-real and release_upload validation rules.
  • TokenDanceLab/AgentHub#397: Also modifies app/shared/src/testing/e2eDataModeContract.ts and its tests, adding request-decision logic in the same scenario validation module extended here.
  • TokenDanceLab/AgentHub#406: Extends e2eDataModeContract.ts scenario checks and evidence-level expectations for specific dataSource modes, closely related to the new observed-hub-replay/approved-real-preflight branches added here.

Poem

🐇 Hop, hop! The boundary's tight,
Real claims need proof, or they're not right.
approval_ref must be in place,
Mock adapters cannot take real's space.
The manifest knows what's true or not—
No overclaiming the evidence lot! ✅

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly names the main change: hardening approved-real e2e manifest boundaries.
Linked Issues check ✅ Passed The changes enforce approved-real and readiness/stubbed boundaries, keep packaged-release separate, and add the requested verifier and fixture coverage.
Out of Scope Changes check ✅ Passed The diff stays focused on manifest boundary hardening, related tests, and the documented progress or instruction surfaces.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch task/388-manifest-boundary

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/shared/src/testing/e2eDataModeContract.test.ts`:
- Around line 296-304: The e2e data mode contract tests are asserting exact
validator error prose, which makes them brittle to wording-only changes. Update
the affected expectations in the e2eDataModeContract.test suite to verify the
semantic violation shape returned by the validator (for example, the specific
failure categories or identifiers in the result from the contract check) rather
than matching hard-coded full error strings. Keep the existing test cases for
desktop-observed-replay, but assert the meaningful contract fields from the
validator output instead of exact message text.

In `@app/shared/src/testing/e2eEvidenceManifest.test.ts`:
- Around line 158-165: The test for validateChatFlowEvidenceManifest is pinning
exact validator prose, making wording-only changes fail unnecessarily. Update
the assertions to check stable machine-readable error codes/reasons, or use
predicate-based matching for the expected invariant instead of comparing full
error strings. Keep the coverage around the approved-real-missing-claims cases,
but make the checks resilient to message wording changes in
e2eEvidenceManifest.test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4027946d-7258-4bb1-bd5d-7d2a69f717bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1b167aa and 9e04be8.

📒 Files selected for processing (8)
  • .agents/skills/real-e2e-acceptance/SKILL.md
  • app/shared/src/testing/e2eDataModeContract.test.ts
  • app/shared/src/testing/e2eDataModeContract.ts
  • app/shared/src/testing/e2eEvidenceManifest.test.ts
  • app/shared/src/testing/e2eEvidenceManifest.ts
  • docs/progress/MASTER.md
  • scripts/verify/verify-real-e2e-contract.ps1
  • tests/contract/scripts/verify-e2e-smoke-matrix.ps1

Comment on lines +296 to +304
}, [])).toEqual({
ok: false,
errors: [
'desktop-observed-replay uses observed-hub-replay but dataMode is approved-real',
'desktop-observed-replay uses observed-hub-replay but claims real login was tested',
'desktop-observed-replay uses observed-hub-replay but claims real CLI/model execution',
'desktop-observed-replay uses observed-hub-replay but marks TokenDance ID secret usage',
],
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Keep the failure assertions semantic, not prose-coupled.

These cases lock the suite to exact validator wording instead of the contract violation being reported. That makes harmless message edits look like regressions. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."

Also applies to: 339-344

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/shared/src/testing/e2eDataModeContract.test.ts` around lines 296 - 304,
The e2e data mode contract tests are asserting exact validator error prose,
which makes them brittle to wording-only changes. Update the affected
expectations in the e2eDataModeContract.test suite to verify the semantic
violation shape returned by the validator (for example, the specific failure
categories or identifiers in the result from the contract check) rather than
matching hard-coded full error strings. Keep the existing test cases for
desktop-observed-replay, but assert the meaningful contract fields from the
validator output instead of exact message text.

Source: Coding guidelines

Comment on lines +158 to +165
expect(validateChatFlowEvidenceManifest(manifest)).toEqual({
ok: false,
errors: [
'approved-real-missing-claims row approved-real-row sets real_tested=true without approval_ref',
'approved-real-missing-claims row approved-real-row sets real_tested=true without real_login claim',
'approved-real-missing-claims row approved-real-row sets real_tested=true without real_cli_or_model claim',
],
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid pinning validator prose in these tests.

These assertions hard-code the full validation messages, so wording-only changes will fail the suite even when the contract is unchanged. Prefer stable machine-readable reasons/codes, or at least predicate-based checks for the relevant invariant instead of exact prose. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."

Also applies to: 221-223

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/shared/src/testing/e2eEvidenceManifest.test.ts` around lines 158 - 165,
The test for validateChatFlowEvidenceManifest is pinning exact validator prose,
making wording-only changes fail unnecessarily. Update the assertions to check
stable machine-readable error codes/reasons, or use predicate-based matching for
the expected invariant instead of comparing full error strings. Keep the
coverage around the approved-real-missing-claims cases, but make the checks
resilient to message wording changes in e2eEvidenceManifest.test.

Source: Coding guidelines

@DeliciousBuding DeliciousBuding merged commit 3c907a4 into dev/delicious233 Jun 28, 2026
21 checks passed
@DeliciousBuding DeliciousBuding deleted the task/388-manifest-boundary branch June 28, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant