Skip to content

test(chat): add visual qa artifact reports#414

Merged
DeliciousBuding merged 1 commit into
dev/delicious233from
task/390-visual-qa-artifact-loop
Jun 29, 2026
Merged

test(chat): add visual qa artifact reports#414
DeliciousBuding merged 1 commit into
dev/delicious233from
task/390-visual-qa-artifact-loop

Conversation

@DeliciousBuding

@DeliciousBuding DeliciousBuding commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Make Desktop/Web focused chat Visual QA write stable screenshot, metrics, and report artifacts.
  • Include Visual QA metrics/report paths in the chat acceptance manifest rows.
  • Add Node contract validation for the Visual QA artifact/report schema.

Evidence

  • node tests/contract/scripts/verify-chat-visual-qa.mjs --repo-root .
  • node tests/contract/scripts/verify-chat-acceptance.mjs --repo-root .
  • corepack.cmd pnpm --dir app/desktop run test:visual:chat-flow
  • corepack.cmd pnpm --dir app/web run test:visual:chat-flow
  • corepack.cmd pnpm --dir app run test:acceptance:chat-flow -- --command-timeout-sec 420
  • pwsh ./scripts/verify/verify-doc-ssot.ps1
  • pwsh ./scripts/verify/verify-project-skills.ps1
  • pwsh ./scripts/verify/verify-real-e2e-contract.ps1
  • git diff --check

Artifact Examples

  • Desktop screenshot: app/desktop/.tmp/manual-chat-flow-uiux/desktop-1440x810-chat-flow.png
  • Desktop metrics: app/desktop/.tmp/manual-chat-flow-uiux/desktop-1440x810-chat-flow.metrics.json
  • Desktop report: app/desktop/.tmp/manual-chat-flow-uiux/desktop-chat-flow-visual-qa.json
  • Web screenshot: app/web/.tmp/manual-chat-flow-uiux/web-1440x810-chat-flow.png
  • Web metrics: app/web/.tmp/manual-chat-flow-uiux/web-1440x810-chat-flow.metrics.json
  • Web report: app/web/.tmp/manual-chat-flow-uiux/web-chat-flow-visual-qa.json
  • Acceptance manifest: .tmp/chat-acceptance/run-76904/chat-acceptance-manifest.json

Boundary

real_tested=false. This proves fixture/unit, Playwright UI, and Visual QA evidence only. It does not claim real TokenDance ID login, real CLI/model/API execution, packaged Desktop, signing, release upload, or production deployment.

Closes #390

Summary by CodeRabbit

  • New Features

    • Manual chat QA checks now generate structured visual QA reports and separate metrics files for both Desktop and Web.
    • Acceptance checks now track multiple artifacts per run, including screenshots, metrics, and report JSON files.
  • Bug Fixes

    • QA failures are now recorded in a report before the script exits, making failed runs easier to review.
    • Validation now captures and logs failure details instead of stopping before artifacts are saved.
  • Tests

    • Added contract coverage to verify the new report format and expected artifact paths.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough >Runner: throws or returns

Runner->>writeVisualQaReport: { status: passed/failed, metrics, failure }
writeVisualQaReport->>Disk: write metricsPath JSON
writeVisualQaReport->>Disk: write reportPath JSON
writeVisualQaReport-->>Runner: report object
Runner->>Runner: throw failure (if present)

</diagram>
</layer>
<layer id="acceptance_gate_artifacts" title="Chat acceptance gate artifact registration" depends_on="main_flow_failure_capture">
<summary>The desktop-chat-visual-qa and web-chat-visual-qa gates in chat-acceptance.mjs expand artifacts from a single PNG to arrays including the metrics JSON and visual-qa report JSON.</summary>
<ranges>
range_25bfb5398936
range_ba7bc923a10f
</ranges>
</layer>
<layer id="contract_verification" title="Contract verification scripts for Visual QA paths and acceptance manifest" depends_on="acceptance_gate_artifacts">
<summary>New verify-chat-visual-qa.mjs asserts stable artifact paths, schema strings, evidence flags, and Desktop non-approval markers in both scripts and the acceptance runner; verify-chat-acceptance.mjs gains assertions that manifest rows include the new JSON artifact paths.</summary>
<ranges>
range_c3c6ff3b4bb7
range_4449c4619f7a
range_a7d123230247
range_5da7ccb554c6
range_f417706ea351
range_643be75270eb
</ranges>
</layer>
<layer id="progress_log" title="Session log entry" depends_on="contract_verification">
<summary>MASTER.md records the T4.2 implementation session entry.</summary>
<ranges>
range_2f8327b4287e
</ranges>
</layer>
</cohort>
<unassigned_ranges>
</unassigned_ranges>
-->

## Walkthrough

The Desktop and Web manual chat flow scripts gain `writeVisualQaReport` helpers that persist structured metrics and visual QA report JSON files, with `assertMetrics` failures captured before throwing. The chat acceptance gate registers the new JSON artifacts alongside the existing screenshot, and a new contract verification script asserts stable paths, schema markers, and evidence flags in both scripts and the acceptance runner.

## Changes

**Visual QA artifact persistence and contract verification**

| Layer / File(s) | Summary |
|---|---|
| **Output path constants and `writeVisualQaReport` helper** <br> `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs` | Both scripts add `metricsPath`/`reportPath` constants and a `writeVisualQaReport` function that builds a schema'd report object and writes both JSON files to `outputDir`. |
| **Failure capture and deferred throw in main flow** <br> `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs` | Both scripts wrap `assertMetrics` in try/catch to record failure, call `writeVisualQaReport` with pass/fail status, and re-throw only after artifacts are written. |
| **Chat acceptance gate artifact registration** <br> `scripts/verify/chat-acceptance.mjs` | `desktop-chat-visual-qa` and `web-chat-visual-qa` gate `artifacts` fields expanded from a single PNG to arrays including `*.metrics.json` and `*-visual-qa.json`. |
| **Contract verification for Visual QA scripts and acceptance manifest** <br> `tests/contract/scripts/verify-chat-visual-qa.mjs`, `tests/contract/scripts/verify-chat-acceptance.mjs`, `docs/progress/MASTER.md` | New `verify-chat-visual-qa.mjs` asserts stable artifact paths, schema strings, evidence flags, and Desktop non-approval markers; `verify-chat-acceptance.mjs` gains assertions that manifest rows include the new JSON artifact paths. |

## Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

## Possibly related issues

- **`#390` [T4.2] Add semi-automated Visual QA artifact loop**: This PR directly implements the acceptance criteria — screenshot and metrics paths are included in the manifest, `writeVisualQaReport` produces structured JSON, and contract tests validate the outputs.
- **TokenDanceLab/AgentHub#389**: The expanded artifact registration and manifest assertions align with the focused chat acceptance gate work described in this issue.

## Possibly related PRs

- [TokenDanceLab/AgentHub#395](https://github.com/TokenDanceLab/AgentHub/pull/395): Established the manifest schema and `real_tested`/evidence-level constraints that the new `*-visual-qa.json` report fields and contract assertions directly depend on.
- [TokenDanceLab/AgentHub#412](https://github.com/TokenDanceLab/AgentHub/pull/412): Added the Node chat acceptance bundle verifier and manifest/artifact gating flow that this PR extends with the new metrics and visual QA report artifact entries.

## Poem

> 🐇 Hop hop, the QA trail is clear,
> Two JSON files now always appear!
> Metrics written, report in place,
> No failure lost without a trace.
> The manifest knows what paths to keep —
> A rabbit's work before I sleep. ✨

</details>

<!-- walkthrough_end -->
<!-- pre_merge_checks_walkthrough_start -->

<details>
<summary>🚥 Pre-merge checks | ✅ 4 | ❌ 1</summary>

### ❌ Failed checks (1 warning)

|     Check name     | Status     | Explanation                                                                          | Resolution                                                                         |
| :----------------: | :--------- | :----------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------- |
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |

<details>
<summary>✅ Passed checks (4 passed)</summary>

|         Check name         | Status   | Explanation                                                                                                                            |
| :------------------------: | :------- | :------------------------------------------------------------------------------------------------------------------------------------- |
|      Description Check     | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled.                                                                            |
|         Title check        | ✅ Passed | The title is concise and matches the main change: adding visual QA artifact reports for chat flows.                                    |
|     Linked Issues check    | ✅ Passed | The PR adds stable screenshot, metrics, and report artifacts plus manifest and contract validation for chat Visual QA.                 |
| Out of Scope Changes check | ✅ Passed | Changes stay focused on chat Visual QA artifacts, manifest updates, contract checks, and related docs with no obvious unrelated scope. |

</details>

</details>

<!-- pre_merge_checks_walkthrough_end -->
<!-- finishing_touch_checkbox_start -->

<details>
<summary>✨ Finishing Touches</summary>

<details>
<summary>📝 Generate docstrings</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> Create stacked PR
- [ ] <!-- {"checkboxId": "3e1879ae-f29b-4d0d-8e06-d12b7ba33d98"} --> Commit on current branch

</details>
<details>
<summary>🧪 Generate unit tests (beta)</summary>

- [ ] <!-- {"checkboxId": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Create PR with unit tests
- [ ] <!-- {"checkboxId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Commit unit tests in branch `task/390-visual-qa-artifact-loop`

</details>

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---




<sub>Comment `@coderabbitai help` to get the list of available commands.</sub>

<!-- tips_end -->

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/contract/scripts/verify-chat-visual-qa.mjs`:
- Line 51: The verification in verify-chat-visual-qa.mjs only rejects
approved-real when paired with real_tested:true, so it can still slip through
with real_tested:false. Update the assertion in the script check to explicitly
reject any occurrence of approved-real in the script text, alongside the
existing real_tested and pnpm tauri build guards, so the scope restriction is
enforced regardless of other flags.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 514be412-39c3-47ad-ae86-ac3603bac721

📥 Commits

Reviewing files that changed from the base of the PR and between f7e6812 and 46d5aec.

📒 Files selected for processing (6)
  • app/desktop/scripts/manual-chat-flow-check.mjs
  • app/web/scripts/manual-chat-flow-check.mjs
  • docs/progress/MASTER.md
  • scripts/verify/chat-acceptance.mjs
  • tests/contract/scripts/verify-chat-acceptance.mjs
  • tests/contract/scripts/verify-chat-visual-qa.mjs

assert(script.includes('Visual QA screenshot:'), `${label} stdout prints screenshot path`);
assert(script.includes('Visual QA metrics:'), `${label} stdout prints metrics path`);
assert(script.includes('Visual QA report:'), `${label} stdout prints report path`);
assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Block approved-real explicitly.

This regex still passes if the script starts emitting approved-real while keeping real_tested: false, so it does not actually enforce the stated scope restriction.

Suggested fix
-  assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);
+  assert(!/real_tested:\s*true/i.test(script), `${label} script does not claim real_tested=true`);
+  assert(!/approved-real/i.test(script), `${label} script does not claim approved-real evidence`);
+  assert(!/pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim packaged Desktop evidence`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert(!/real_tested:\s*true|approved-real.+real_tested:\s*true|pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim approved-real or packaged Desktop evidence`);
assert(!/real_tested:\s*true/i.test(script), `${label} script does not claim real_tested=true`);
assert(!/approved-real/i.test(script), `${label} script does not claim approved-real evidence`);
assert(!/pnpm\s+tauri\s+build/i.test(script), `${label} script does not claim packaged Desktop evidence`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/contract/scripts/verify-chat-visual-qa.mjs` at line 51, The
verification in verify-chat-visual-qa.mjs only rejects approved-real when paired
with real_tested:true, so it can still slip through with real_tested:false.
Update the assertion in the script check to explicitly reject any occurrence of
approved-real in the script text, alongside the existing real_tested and pnpm
tauri build guards, so the scope restriction is enforced regardless of other
flags.

@DeliciousBuding DeliciousBuding merged commit de666c5 into dev/delicious233 Jun 29, 2026
21 checks passed
@DeliciousBuding DeliciousBuding deleted the task/390-visual-qa-artifact-loop branch June 29, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant