test(e2e): harden approved-real manifest boundaries#410
Conversation
📝 WalkthroughWalkthroughTightens E2E evidence boundary enforcement: ChangesT3.3 Observed/Approved-Real Manifest Boundary Hardening
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/shared/src/testing/e2eDataModeContract.test.ts`:
- Around line 296-304: The e2e data mode contract tests are asserting exact
validator error prose, which makes them brittle to wording-only changes. Update
the affected expectations in the e2eDataModeContract.test suite to verify the
semantic violation shape returned by the validator (for example, the specific
failure categories or identifiers in the result from the contract check) rather
than matching hard-coded full error strings. Keep the existing test cases for
desktop-observed-replay, but assert the meaningful contract fields from the
validator output instead of exact message text.
In `@app/shared/src/testing/e2eEvidenceManifest.test.ts`:
- Around line 158-165: The test for validateChatFlowEvidenceManifest is pinning
exact validator prose, making wording-only changes fail unnecessarily. Update
the assertions to check stable machine-readable error codes/reasons, or use
predicate-based matching for the expected invariant instead of comparing full
error strings. Keep the coverage around the approved-real-missing-claims cases,
but make the checks resilient to message wording changes in
e2eEvidenceManifest.test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4027946d-7258-4bb1-bd5d-7d2a69f717bd
📒 Files selected for processing (8)
.agents/skills/real-e2e-acceptance/SKILL.mdapp/shared/src/testing/e2eDataModeContract.test.tsapp/shared/src/testing/e2eDataModeContract.tsapp/shared/src/testing/e2eEvidenceManifest.test.tsapp/shared/src/testing/e2eEvidenceManifest.tsdocs/progress/MASTER.mdscripts/verify/verify-real-e2e-contract.ps1tests/contract/scripts/verify-e2e-smoke-matrix.ps1
| }, [])).toEqual({ | ||
| ok: false, | ||
| errors: [ | ||
| 'desktop-observed-replay uses observed-hub-replay but dataMode is approved-real', | ||
| 'desktop-observed-replay uses observed-hub-replay but claims real login was tested', | ||
| 'desktop-observed-replay uses observed-hub-replay but claims real CLI/model execution', | ||
| 'desktop-observed-replay uses observed-hub-replay but marks TokenDance ID secret usage', | ||
| ], | ||
| }); |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Keep the failure assertions semantic, not prose-coupled.
These cases lock the suite to exact validator wording instead of the contract violation being reported. That makes harmless message edits look like regressions. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."
Also applies to: 339-344
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/shared/src/testing/e2eDataModeContract.test.ts` around lines 296 - 304,
The e2e data mode contract tests are asserting exact validator error prose,
which makes them brittle to wording-only changes. Update the affected
expectations in the e2eDataModeContract.test suite to verify the semantic
violation shape returned by the validator (for example, the specific failure
categories or identifiers in the result from the contract check) rather than
matching hard-coded full error strings. Keep the existing test cases for
desktop-observed-replay, but assert the meaningful contract fields from the
validator output instead of exact message text.
Source: Coding guidelines
| expect(validateChatFlowEvidenceManifest(manifest)).toEqual({ | ||
| ok: false, | ||
| errors: [ | ||
| 'approved-real-missing-claims row approved-real-row sets real_tested=true without approval_ref', | ||
| 'approved-real-missing-claims row approved-real-row sets real_tested=true without real_login claim', | ||
| 'approved-real-missing-claims row approved-real-row sets real_tested=true without real_cli_or_model claim', | ||
| ], | ||
| }); |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Avoid pinning validator prose in these tests.
These assertions hard-code the full validation messages, so wording-only changes will fail the suite even when the contract is unchanged. Prefer stable machine-readable reasons/codes, or at least predicate-based checks for the relevant invariant instead of exact prose. As per coding guidelines, "**/*.{ts,tsx,js,jsx}: Do not write tests that merely replicate implementation branches, assert constant strings, hard-code error text as behavior, or mock the function under test itself."
Also applies to: 221-223
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/shared/src/testing/e2eEvidenceManifest.test.ts` around lines 158 - 165,
The test for validateChatFlowEvidenceManifest is pinning exact validator prose,
making wording-only changes fail unnecessarily. Update the assertions to check
stable machine-readable error codes/reasons, or use predicate-based matching for
the expected invariant instead of comparing full error strings. Keep the
coverage around the approved-real-missing-claims cases, but make the checks
resilient to message wording changes in e2eEvidenceManifest.test.
Source: Coding guidelines
Summary
real_tested=trueto include approval reference plus real-login and real CLI/model/API claims.packaged-releaseevidence, and align smoke-matrix contract checks with current stubbed-Hub replay naming.Task Issue
Closes #388
S.U.P.E.R Review
Tests
corepack.cmd pnpm --dir app install --frozen-lockfile --prefer-offlinecorepack.cmd pnpm --dir app/shared exec vitest run src/testing/e2eDataModeContract.test.ts src/testing/e2eEvidenceManifest.test.ts(19 passed)pwsh ./scripts/verify/verify-real-e2e-contract.ps1pwsh ./tests/contract/scripts/verify-approved-real-preflight.ps1 -RepoRoot .pwsh ./tests/contract/scripts/verify-e2e-smoke-matrix.ps1 -RepoRoot .pwsh ./scripts/verify/verify-doc-ssot.ps1pwsh ./scripts/verify/verify-project-skills.ps1git diff --checkEvidence Boundary
real_tested=falsefor this PR's verification: no real TokenDance ID login, no real CLI/model/API execution, no Hub production dispatch, no packaged Tauri/Desktop installer, no signing, no release upload.real_tested=trueexamples are fixture/unit manifest contract cases only; they validate required fields and do not execute a real path.Project Governance
.agents/skills/real-e2e-acceptance/SKILL.mdupdated with the durable approved-real row boundary.docs/progress/MASTER.mdupdated for T3.3 implementation state.Summary by CodeRabbit