test(chat): add visual qa artifact reports by DeliciousBuding · Pull Request #414 · TokenDanceLab/AgentHub

DeliciousBuding · 2026-06-29T08:20:22Z

Summary

Make Desktop/Web focused chat Visual QA write stable screenshot, metrics, and report artifacts.
Include Visual QA metrics/report paths in the chat acceptance manifest rows.
Add Node contract validation for the Visual QA artifact/report schema.

Evidence

node tests/contract/scripts/verify-chat-visual-qa.mjs --repo-root .
node tests/contract/scripts/verify-chat-acceptance.mjs --repo-root .
corepack.cmd pnpm --dir app/desktop run test:visual:chat-flow
corepack.cmd pnpm --dir app/web run test:visual:chat-flow
corepack.cmd pnpm --dir app run test:acceptance:chat-flow -- --command-timeout-sec 420
pwsh ./scripts/verify/verify-doc-ssot.ps1
pwsh ./scripts/verify/verify-project-skills.ps1
pwsh ./scripts/verify/verify-real-e2e-contract.ps1
git diff --check

Artifact Examples

Desktop screenshot: app/desktop/.tmp/manual-chat-flow-uiux/desktop-1440x810-chat-flow.png
Desktop metrics: app/desktop/.tmp/manual-chat-flow-uiux/desktop-1440x810-chat-flow.metrics.json
Desktop report: app/desktop/.tmp/manual-chat-flow-uiux/desktop-chat-flow-visual-qa.json
Web screenshot: app/web/.tmp/manual-chat-flow-uiux/web-1440x810-chat-flow.png
Web metrics: app/web/.tmp/manual-chat-flow-uiux/web-1440x810-chat-flow.metrics.json
Web report: app/web/.tmp/manual-chat-flow-uiux/web-chat-flow-visual-qa.json
Acceptance manifest: .tmp/chat-acceptance/run-76904/chat-acceptance-manifest.json

Boundary

real_tested=false. This proves fixture/unit, Playwright UI, and Visual QA evidence only. It does not claim real TokenDance ID login, real CLI/model/API execution, packaged Desktop, signing, release upload, or production deployment.

Closes #390

Summary by CodeRabbit

New Features
- Manual chat QA checks now generate structured visual QA reports and separate metrics files for both Desktop and Web.
- Acceptance checks now track multiple artifacts per run, including screenshots, metrics, and report JSON files.
Bug Fixes
- QA failures are now recorded in a report before the script exits, making failed runs easier to review.
- Validation now captures and logs failure details instead of stopping before artifacts are saved.
Tests
- Added contract coverage to verify the new report format and expected artifact paths.

coderabbitai · 2026-06-29T08:20:37Z

📝 Walkthrough

>Runner: throws or returns

Runner->>writeVisualQaReport: { status: passed/failed, metrics, failure }
writeVisualQaReport->>Disk: write metricsPath JSON
writeVisualQaReport->>Disk: write reportPath JSON
writeVisualQaReport-->>Runner: report object
Runner->>Runner: throw failure (if present)

</diagram>
</layer>
<layer id="acceptance_gate_artifacts" title="Chat acceptance gate artifact registration" depends_on="main_flow_failure_capture">
<summary>The desktop-chat-visual-qa and web-chat-visual-qa gates in chat-acceptance.mjs expand artifacts from a single PNG to arrays including the metrics JSON and visual-qa report JSON.</summary>
<ranges>
range_25bfb5398936
range_ba7bc923a10f
</ranges>
</layer>
<layer id="contract_verification" title="Contract verification scripts for Visual QA paths and acceptance manifest" depends_on="acceptance_gate_artifacts">
<summary>New verify-chat-visual-qa.mjs asserts stable artifact paths, schema strings, evidence flags, and Desktop non-approval markers in both scripts and the acceptance runner; verify-chat-acceptance.mjs gains assertions that manifest rows include the new JSON artifact paths.</summary>
<ranges>
range_c3c6ff3b4bb7
range_4449c4619f7a
range_a7d123230247
range_5da7ccb554c6
range_f417706ea351
range_643be75270eb
</ranges>
</layer>
<layer id="progress_log" title="Session log entry" depends_on="contract_verification">
<summary>MASTER.md records the T4.2 implementation session entry.</summary>
<ranges>
range_2f8327b4287e
</ranges>
</layer>
</cohort>
<unassigned_ranges>
</unassigned_ranges>
-->

## Walkthrough

The Desktop and Web manual chat flow scripts gain `writeVisualQaReport` helpers that persist structured metrics and visual QA report JSON files, with `assertMetrics` failures captured before throwing. The chat acceptance gate registers the new JSON artifacts alongside the existing screenshot, and a new contract verification script asserts stable paths, schema markers, and evidence flags in both scripts and the acceptance runner.

## Changes

**Visual QA artifact persistence and contract verification**

| Layer / File(s) | Summary |
|---|---|
| **Output path constants and `writeVisualQaReport` helper** <br> `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs` | Both scripts add `metricsPath`/`reportPath` constants and a `writeVisualQaReport` function that builds a schema'd report object and writes both JSON files to `outputDir`. |
| **Failure capture and deferred throw in main flow** <br> `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs` | Both scripts wrap `assertMetrics` in try/catch to record failure, call `writeVisualQaReport` with pass/fail status, and re-throw only after artifacts are written. |
| **Chat acceptance gate artifact registration** <br> `scripts/verify/chat-acceptance.mjs` | `desktop-chat-visual-qa` and `web-chat-visual-qa` gate `artifacts` fields expanded from a single PNG to arrays including `*.metrics.json` and `*-visual-qa.json`. |
| **Contract verification for Visual QA scripts and acceptance manifest** <br> `tests/contract/scripts/verify-chat-visual-qa.mjs`, `tests/contract/scripts/verify-chat-acceptance.mjs`, `docs/progress/MASTER.md` | New `verify-chat-visual-qa.mjs` asserts stable artifact paths, schema strings, evidence flags, and Desktop non-approval markers; `verify-chat-acceptance.mjs` gains assertions that manifest rows include the new JSON artifact paths. |

## Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

## Possibly related issues

- **`#390` [T4.2] Add semi-automated Visual QA artifact loop**: This PR directly implements the acceptance criteria — screenshot and metrics paths are included in the manifest, `writeVisualQaReport` produces structured JSON, and contract tests validate the outputs.
- **TokenDanceLab/AgentHub#389**: The expanded artifact registration and manifest assertions align with the focused chat acceptance gate work described in this issue.

## Possibly related PRs

- [TokenDanceLab/AgentHub#395](https://github.com/TokenDanceLab/AgentHub/pull/395): Established the manifest schema and `real_tested`/evidence-level constraints that the new `*-visual-qa.json` report fields and contract assertions directly depend on.
- [TokenDanceLab/AgentHub#412](https://github.com/TokenDanceLab/AgentHub/pull/412): Added the Node chat acceptance bundle verifier and manifest/artifact gating flow that this PR extends with the new metrics and visual QA report artifact entries.

## Poem

> 🐇 Hop hop, the QA trail is clear,
> Two JSON files now always appear!
> Metrics written, report in place,
> No failure lost without a trace.
> The manifest knows what paths to keep —
> A rabbit's work before I sleep. ✨

</details>

<!-- walkthrough_end -->
<!-- pre_merge_checks_walkthrough_start -->

<details>
<summary>🚥 Pre-merge checks | ✅ 4 | ❌ 1</summary>

### ❌ Failed checks (1 warning)

|     Check name     | Status     | Explanation                                                                          | Resolution                                                                         |
| :----------------: | :--------- | :----------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------- |
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |

<details>
<summary>✅ Passed checks (4 passed)</summary>

|         Check name         | Status   | Explanation                                                                                                                            |
| :------------------------: | :------- | :------------------------------------------------------------------------------------------------------------------------------------- |
|      Description Check     | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled.                                                                            |
|         Title check        | ✅ Passed | The title is concise and matches the main change: adding visual QA artifact reports for chat flows.                                    |
|     Linked Issues check    | ✅ Passed | The PR adds stable screenshot, metrics, and report artifacts plus manifest and contract validation for chat Visual QA.                 |
| Out of Scope Changes check | ✅ Passed | Changes stay focused on chat Visual QA artifacts, manifest updates, contract checks, and related docs with no obvious unrelated scope. |

</details>

</details>

<!-- pre_merge_checks_walkthrough_end -->
<!-- finishing_touch_checkbox_start -->

<details>
<summary>✨ Finishing Touches</summary>

<details>
<summary>📝 Generate docstrings</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> Create stacked PR
- [ ] <!-- {"checkboxId": "3e1879ae-f29b-4d0d-8e06-d12b7ba33d98"} --> Commit on current branch

</details>
<details>
<summary>🧪 Generate unit tests (beta)</summary>

- [ ] <!-- {"checkboxId": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Create PR with unit tests
- [ ] <!-- {"checkboxId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Commit unit tests in branch `task/390-visual-qa-artifact-loop`

</details>

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---




<sub>Comment `@coderabbitai help` to get the list of available commands.</sub>

<!-- tips_end -->

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/contract/scripts/verify-chat-visual-qa.mjs`:
- Line 51: The verification in verify-chat-visual-qa.mjs only rejects
approved-real when paired with real_tested:true, so it can still slip through
with real_tested:false. Update the assertion in the script check to explicitly
reject any occurrence of approved-real in the script text, alongside the
existing real_tested and pnpm tauri build guards, so the scope restriction is
enforced regardless of other flags.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 514be412-39c3-47ad-ae86-ac3603bac721

📥 Commits

Reviewing files that changed from the base of the PR and between f7e6812 and 46d5aec.

📒 Files selected for processing (6)

app/desktop/scripts/manual-chat-flow-check.mjs
app/web/scripts/manual-chat-flow-check.mjs
docs/progress/MASTER.md
scripts/verify/chat-acceptance.mjs
tests/contract/scripts/verify-chat-acceptance.mjs
tests/contract/scripts/verify-chat-visual-qa.mjs

coderabbitai · 2026-06-29T08:25:54Z

+  assert(script.includes('Visual QA screenshot:'), `${label} stdout prints screenshot path`);
+  assert(script.includes('Visual QA metrics:'), `${label} stdout prints metrics path`);
+  assert(script.includes('Visual QA report:'), `${label} stdout prints report path`);
+  assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Block approved-real explicitly.

This regex still passes if the script starts emitting approved-real while keeping real_tested: false, so it does not actually enforce the stated scope restriction.

Suggested fix

- assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`); + assert(!/real_tested:\s*true/i.test(script), `${label} script does not claim real_tested=true`); + assert(!/approved-real/i.test(script), `${label} script does not claim approved-real evidence`); + assert(!/pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim packaged Desktop evidence`);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);

assert(!/real_tested:\s*true/i.test(script), `${label} script does not claim real_tested=true`);

assert(!/approved-real/i.test(script), `${label} script does not claim approved-real evidence`);

assert(!/pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim packaged Desktop evidence`);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/contract/scripts/verify-chat-visual-qa.mjs` at line 51, The verification in verify-chat-visual-qa.mjs only rejects approved-real when paired with real_tested:true, so it can still slip through with real_tested:false. Update the assertion in the script check to explicitly reject any occurrence of approved-real in the script text, alongside the existing real_tested and pnpm tauri build guards, so the scope restriction is enforced regardless of other flags.

test(chat): add visual qa artifact reports

46d5aec

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

DeliciousBuding merged commit de666c5 into dev/delicious233 Jun 29, 2026
21 checks passed

DeliciousBuding deleted the task/390-visual-qa-artifact-loop branch June 29, 2026 08:26

This was referenced Jun 29, 2026

[T4.2] Add semi-automated Visual QA artifact loop #390

Closed

docs(progress): sync t4.2 completion #415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(chat): add visual qa artifact reports#414

test(chat): add visual qa artifact reports#414
DeliciousBuding merged 1 commit into
dev/delicious233from
task/390-visual-qa-artifact-loop

DeliciousBuding commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-  assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);
+  assert(!/real_tested:\s*true/i.test(script), `${label} script does not claim real_tested=true`);
+  assert(!/approved-real/i.test(script), `${label} script does not claim approved-real evidence`);
+  assert(!/pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim packaged Desktop evidence`);

Uh oh!

Conversation

DeliciousBuding commented Jun 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence

Artifact Examples

Boundary

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DeliciousBuding commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading