diff --git a/docs/README.md b/docs/README.md index 2fda3f8e..f6bf2ab9 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,6 @@ # AgentHub 文档 -最后更新:2026-06-28 +最后更新:2026-06-29 ## 快速入口 diff --git a/docs/analysis/module-inventory.md b/docs/analysis/module-inventory.md deleted file mode 100644 index 985e0356..00000000 --- a/docs/analysis/module-inventory.md +++ /dev/null @@ -1,134 +0,0 @@ -# Real Foundation Hardening - Module Inventory - -## Summary - -| Module | Responsibility | Dependencies | Files | Lines | Complexity | S.U.P.E.R Score | -|:--|:--|:--|--:|--:|:--|:--| -| Shared ChatView | Render transcript items, cards, bubbles, markdown, auto-scroll | React, shared transcript, markdown UI | 29 | 6603 | High | S🟡 U🟢 P🟡 E🟢 R🟡 | -| Shared Transcript | Normalize Hub/Edge/runtime input to `TranscriptBlock[]` | shared events, diff helpers | 20 | 5412 | High | S🟢 U🟢 P🟢 E🟢 R🟡 | -| E2E Data Contract | Validate surface/data/auth/request boundary claims | demo dataMode | 2 | 477 | Medium | S🟢 U🟢 P🟢 E🟢 R🟢 | -| Desktop Renderer | Tauri/Vite shell, platform wiring, local edge health, UI app | shared, Tauri, React Query | 346 | 62424 | High | S🟡 U🟡 P🟡 E🟡 R🟡 | -| Web Renderer | Hub-facing shell, auth/session, workbench app | shared, Hub client, React Query | 162 | 32026 | High | S🟡 U🟡 P🟡 E🟡 R🟡 | -| Desktop/Web E2E | Browser behavior checks for chat flow and data boundaries | Playwright, shared contract | 8 | 2649 | Medium | S🟡 U🟢 P🟡 E🟢 R🟡 | -| Visual QA Scripts | Screenshot + DOM geometry acceptance | Playwright, app stubs | 3+ | 1600+ | Medium | S🟡 U🟢 P🟡 E🟡 R🟡 | -| Hub Server | Auth, IM, task/event API, routing, audit | Go, DB, Redis-like stores | 426 | 405896 | Critical | S🟡 U🟡 P🟡 E🟡 R🟡 | -| Edge Server | Local projects, run lifecycle, event store, adapters | Go, filesystem, process adapters | 195 | 75154 | Critical | S🟡 U🟡 P🟡 E🟡 R🟡 | -| API Contracts | REST/OpenAPI and WS event contract | YAML/docs | 5 | 7511 | Medium | S🟢 U🟢 P🟡 E🟢 R🟡 | -| Verification Scripts | Governance/evidence/readiness checks | PowerShell, shell | 29 | 6268 | Medium | S🟡 U🟢 P🟡 E🟡 R🟡 | - -## Module Details - -### Shared ChatView - -- **Path**: `app/shared/src/chatview/` -- **Responsibility**: Convert normalized transcript items into the visible IM timeline: user bubbles, agent groups, cards, markdown, and scroll behavior. -- **Public API**: `blocksToTranscriptItems`, `ChatViewTranscript`, `Transcript`, `AgentGroup`, `RowItem`. -- **Internal Dependencies**: `app/shared/src/transcript/`, shared UI markdown and design tokens. -- **External Dependencies**: React, `react-markdown`, `remark-gfm`, syntax/markdown helpers. -- **Complexity Rating**: High. -- **Transformation Notes**: This is the most important UI contract boundary. Implementation already has card ordering and scroll tests; tasks should tighten behavior rather than fork components. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Rendering, grouping, card stack logic, metadata, and scroll behavior are close but still intertwined. - - **U**: Good. It consumes `TranscriptItem[]` and does not reach into Hub/Edge. - - **P**: Partial. `TranscriptBlock`/`TranscriptItem` types are explicit; visual QA manifests are less formal. - - **E**: Good. Browser/runtime specific work is outside shared ChatView. - - **R**: Partial. Renderer can be reused by Desktop/Web, but replacing card behavior requires careful shared tests. - -### Shared Transcript - -- **Path**: `app/shared/src/transcript/` -- **Responsibility**: Normalize Hub messages, Edge events, runtime events, ordering, evidence refs, and diagnostics into `TranscriptBlock[]`. -- **Public API**: `normalizeEdgeEventsToTranscript`, `normalizeHubMessagesToTranscript`, `normalizeHubRuntimeEventsToTranscript`, `normalizeThreadItemsToTranscript`, `orderTranscriptBlocks`, transcript types. -- **Internal Dependencies**: shared event/diff types. -- **External Dependencies**: none material for the core normalizer path. -- **Complexity Rating**: High. -- **Transformation Notes**: Strong candidate for contract-first fixes. The normalizer is the correct place to filter runtime diagnostics and preserve linear order before UI rendering. -- **S.U.P.E.R Assessment**: - - **S**: Good. Normalization and ordering are separated by file. - - **U**: Good. Data flows raw source -> normalized blocks. - - **P**: Good. `TranscriptBlock` is the serializable internal port. - - **E**: Good. Pure TypeScript tests can run without services. - - **R**: Partial. Upstream source replacement is feasible, but Hub/Edge event shape drift still needs stronger golden fixtures. - -### E2E Data Contract - -- **Path**: `app/shared/src/testing/e2eDataModeContract.ts` -- **Responsibility**: Classify observed requests and validate scenario claims across surface, data source, auth/execution, and phase. -- **Public API**: `createE2EDataModeScenario`, `assertE2EDataModeScenario`, `buildE2EDataModeManifest`. -- **Internal Dependencies**: `app/shared/src/demo/dataMode.ts`. -- **Complexity Rating**: Medium. -- **Transformation Notes**: This is a clean port. Future E2E should import it rather than duplicate mode switch logic. -- **S.U.P.E.R Assessment**: All five principles are currently healthy. - -### Desktop Renderer - -- **Path**: `app/desktop/src/` -- **Responsibility**: Desktop app shell, adapter wiring, Tauri host bridge, Local Edge preflight, and shared workbench rendering. -- **Complexity Rating**: High. -- **Transformation Notes**: Desktop Vite evidence proves renderer behavior only. Packaged Tauri sidecar/sqlite/icon/installer must be separate. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Shell, settings, host integration, and local edge concerns share a large source tree. - - **U**: Partial. Architecture says renderer must not execute raw CLI; tasks should verify this boundary where touched. - - **P**: Partial. Platform adapter is the intended port; some UI tests still rely on app state setup. - - **E**: Partial. Desktop has expected local environment dependencies. - - **R**: Partial. Shared UI is replaceable; host/runtime wiring is costlier. - -### Web Renderer - -- **Path**: `app/web/src/` -- **Responsibility**: Web app shell, Hub session, Hub APIs, remote target routing, and shared workbench rendering. -- **Complexity Rating**: High. -- **Transformation Notes**: Web must remain Hub-only and must not silently fall back to mock in guarded Hub flows. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Auth/session, app shell, settings, and task wiring are broad. - - **U**: Partial. Correct direction is Web -> Hub -> shared UI; boundary tests exist and should be expanded only where useful. - - **P**: Partial. Hub stubs exist but need clearer evidence manifests. - - **E**: Partial. Browser-safe constraints are clear; env/session setup is still test-sensitive. - - **R**: Partial. Shared UI is replaceable; Hub adapter behavior needs stronger contracts. - -### Desktop/Web E2E - -- **Path**: `app/desktop/src/__e2e__/`, `app/web/src/__e2e__/` -- **Responsibility**: Real browser validation for chat flow, routing, boundary, and optimistic send. -- **Complexity Rating**: Medium. -- **Transformation Notes**: Existing specs are valuable. Avoid adding tests that only check constants or duplicate implementation switches. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Some specs mix behavior checks, boundary assertions, and stub setup. - - **U**: Good. Browser drives app, app produces visible behavior and request logs. - - **P**: Partial. Uses shared contract but lacks a single acceptance manifest format. - - **E**: Good. Runs against local Vite surfaces. - - **R**: Partial. Stub setup is reusable but not yet a shared harness. - -### Visual QA Scripts - -- **Path**: `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/visual-qa.mjs` -- **Responsibility**: Produce screenshots plus DOM geometry metrics for visual and interaction acceptance. -- **Complexity Rating**: Medium. -- **Transformation Notes**: Desktop/Web chat-flow scripts and broader Web visual QA are aligned on the `1440x810` desktop acceptance viewport by T1.2. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Scripts combine server startup, stubbing, actions, metrics, and report output. - - **U**: Good. Inputs are URL/stubs; outputs are screenshots/JSON. - - **P**: Partial. Reports are JSON but schema is informal. - - **E**: Partial. Local Vite assumptions are expected but should be explicit. - - **R**: Partial. Shared visual harness is not yet extracted. - -### Hub Server, Edge Server, API Contracts - -- **Paths**: `hub-server/`, `edge-server/`, `api/` -- **Responsibility**: Hub collaboration/auth/event APIs, local execution/event store/adapters, REST and WS contracts. -- **Complexity Rating**: Critical for Hub/Edge, Medium for `api/`. -- **Transformation Notes**: This SPEC should touch backend code only where front-end E2E reveals contract mismatch. Approved-real and packaged claims need explicit gates. -- **S.U.P.E.R Assessment**: Mixed partial across S/U/P/E/R because service scope is large and environment-bound, but API/event contracts provide the correct ports. - -### Verification Scripts - -- **Path**: `scripts/verify/` -- **Responsibility**: Enforce doc SSOT, project skill whitelist, CI gate shape, real E2E contract, and readiness claims. -- **Complexity Rating**: Medium. -- **Transformation Notes**: Add only contract checks with real protection value. Do not create root-level script wrappers. -- **S.U.P.E.R Assessment**: - - **S**: Partial. Some checks are broad but categorized under `scripts/verify/`. - - **U**: Good. Read-only checkers produce pass/fail. - - **P**: Partial. Some output is human text; manifests should be more schema-like. - - **E**: Partial. PowerShell is Windows-first but project operates on Windows. - - **R**: Partial. Checkers are replaceable if contracts remain stable. diff --git a/docs/analysis/project-overview.md b/docs/analysis/project-overview.md deleted file mode 100644 index 4208b8d8..00000000 --- a/docs/analysis/project-overview.md +++ /dev/null @@ -1,98 +0,0 @@ -# Real Foundation Hardening - Project Overview - -## Preliminary Direction - -Build a real, product-grade foundation for AgentHub Desktop/Web chat workflow, shared transcript rendering, Hub/Edge data boundaries, and E2E/Visual QA evidence. Mobile is boundary-only in this SPEC. - -## Current Architecture - -```mermaid -flowchart LR - Desktop["Desktop shell"] --> DesktopAdapter["Desktop platform adapter"] - Web["Web shell"] --> WebAdapter["Web platform adapter"] - DesktopAdapter --> Shared["app/shared workbench + chatview"] - WebAdapter --> Shared - Shared --> Transcript["TranscriptBlock normalizers"] - DesktopAdapter --> Edge["Local Edge"] - WebAdapter --> Hub["Hub Server"] - Hub --> EdgeRelay["Edge routing / relay"] - Edge --> Runtime["Runtime adapters"] - EdgeRelay --> Runtime -``` - -The intended direction already exists in the code shape: Desktop and Web render through `app/shared`, while the shells provide platform adapters and data/session wiring. The unstable area is not a missing renderer; it is the product contract around real-time ordering, optimistic send, card grouping, visual QA, and honest evidence labels. - -## Technology Stack - -| Layer | Current | Target For This SPEC | -|:--|:--|:--| -| Shared UI | React 19, TypeScript, CSS Modules/tokens | Same, with stricter shared transcript contracts | -| Desktop | Vite renderer, Tauri 2 shell, Playwright | Same; Vite evidence remains separate from packaged Desktop evidence | -| Web | Vite, Hub-facing adapter, Playwright | Same; Web remains Hub-only | -| Hub/Edge | Go services, REST JSON, typed WS events | Contract alignment only unless task touches handlers | -| Test tooling | Vitest, Playwright, visual scripts, PowerShell verifiers | Focused E2E + Visual QA + evidence manifests | -| Package manager | pnpm/Corepack per app | No package manager change | - -## Entry Points - -| Surface | Entry | -|:--|:--| -| Shared transcript | `app/shared/src/transcript/`, `app/shared/src/chatview/` | -| Data-mode contract | `app/shared/src/testing/e2eDataModeContract.ts` | -| Desktop UI | `app/desktop/src/App.tsx`, `app/desktop/src/platform/`, `app/desktop/src/__e2e__/` | -| Web UI | `app/web/src/App.tsx`, `app/web/src/platform/`, `app/web/src/__e2e__/` | -| Visual QA | `app/desktop/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/manual-chat-flow-check.mjs`, `app/web/scripts/visual-qa.mjs` | -| Hub/Edge contracts | `api/openapi.yaml`, `api/events.md`, `hub-server/`, `edge-server/` | - -## Build And Run - -| Scope | Existing Command | -|:--|:--| -| Shared tests | `corepack pnpm --dir app/shared test` | -| Desktop tests | `corepack pnpm --dir app/desktop test`, `corepack pnpm --dir app/desktop typecheck` | -| Desktop chat E2E | `corepack pnpm --dir app/desktop test:e2e:chat-flow` | -| Desktop chat Visual QA | `corepack pnpm --dir app/desktop test:visual:chat-flow` | -| Web type/build | `corepack.cmd pnpm --dir app/web typecheck`, `corepack.cmd pnpm --dir app/web build` | -| Web chat E2E | `corepack.cmd pnpm --dir app/web test:e2e:chat-flow` | -| Web stubbed Hub E2E | `corepack.cmd pnpm --dir app/web test:e2e:stubbed-hub` | -| Web chat Visual QA | `corepack.cmd pnpm --dir app/web test:visual:chat-flow` | -| Governance gates | `pwsh ./scripts/verify/verify-doc-ssot.ps1`, `pwsh ./scripts/verify/verify-real-e2e-contract.ps1`, `pwsh ./scripts/verify/verify-project-skills.ps1` | - -## Testing Baseline - -Useful existing coverage: - -- `app/shared/src/transcript/*test.ts` covers Hub/Edge normalization, ordering, runtime diagnostics, and evidence. -- `app/shared/src/chatview/*test.tsx` covers adapter integration, markdown rendering, CSS contract, and auto-follow. -- `app/desktop/src/__e2e__/chat-flow-ui.spec.ts` covers optimistic send, no flash/disappear, scroll follow, no debug labels, no overflow, and merged approval/preview card stack. -- `app/web/src/__e2e__/chat-flow-contract.spec.ts` covers Hub-shaped replay, markdown table rendering, tool result ordering, inspector-only subagent details, optimistic send, and Web boundary assertions. -- `app/shared/src/testing/e2eDataModeContract.ts` separates surface, data source, auth/execution, request phase, and `real_tested`. - -Current gaps: - -- T1.2 aligns the full Web visual QA scene naming/viewport with the `1440x810` architecture acceptance contract. -- Visual QA is split between manual chat-flow scripts and broader Web visual QA; there is no project-level manifest that clearly records automated vs semi-automated evidence. -- Existing tests protect key regressions, but the acceptance bundle is not yet a single reusable gate for "chat workflow is merge-ready". -- Packaged Desktop sidecar/sqlite/icon/installer claims remain outside Vite Playwright and must stay separately gated. - -## Project Governance Baseline - -| Surface | Current Resolution | -|:--|:--| -| Shared agent rules | `AGENTS.md` is the only project rule entry | -| Claude-specific rules | None; no separate Claude-only rule surface is active | -| Current SPEC state | No active `docs/progress/MASTER.md` before this branch | -| Durable project skill | `.agents/skills/real-e2e-acceptance/SKILL.md` | -| Native memory | Codex memory is available for prior AgentHub worktree and approved-real guidance | -| Repo fallback memory | None selected; do not create one silently | -| GitHub mode | `GITHUB_STANDARD`: repo/issues work; Projects scope missing | - -## External Integrations - -| Integration | Boundary | -|:--|:--| -| Hub Server | Web/Desktop Hub session, IM messages, agent-task events | -| Local Edge | Desktop local execution and health preflight only; Web cannot direct-call it | -| TokenDance ID | Real login proof only when approved-real login gates run | -| Runtime CLIs / model APIs | No real spend/CLI claim without explicit approved-real evidence | -| Tauri packaged Desktop | Separate packaged-release evidence, not proven by Vite renderer E2E | diff --git a/docs/analysis/risk-assessment.md b/docs/analysis/risk-assessment.md deleted file mode 100644 index a33924fa..00000000 --- a/docs/analysis/risk-assessment.md +++ /dev/null @@ -1,77 +0,0 @@ -# Real Foundation Hardening - Risk Assessment - -## S.U.P.E.R Architecture Health Summary - -| Principle | Status | Key Findings | Transformation Priority | -|:--|:--|:--|:--| -| **S** Single Purpose | 🟡 | Shared renderer and visual scripts mix several concerns; tests can be clearer without splitting everything immediately. | High | -| **U** Unidirectional Flow | 🟡 | Intended flow is documented and mostly implemented, but Desktop/Web adapters and E2E stubs can still blur source/runtime boundaries. | High | -| **P** Ports over Implementation | 🟡 | `TranscriptBlock` and data-mode contracts are strong ports; Visual QA and acceptance manifests are still informal. | High | -| **E** Environment-Agnostic | 🟡 | Vite tests are local and stable; packaged Desktop, real login, CLI/model/API paths require separate environment-specific gates. | Medium | -| **R** Replaceable Parts | 🟡 | Shared UI reuse is real, but replacing Hub/Edge sources or Visual QA harness still has medium ripple cost. | Medium | - -**Overall Health**: 0/5 fully healthy at SPEC scope, but all principles are partially supported. This is refactoring-needed, not a rewrite. - -### S.U.P.E.R Violation Hotspots - -| Hotspot | Severity | Why It Matters | -|:--|:--|:--| -| Visual QA evidence split | High | Automated Playwright, manual visual scripts, and Web visual QA do not yet produce one honest acceptance bundle. | -| Chat flow source/event merge | High | User sends, Hub messages, Edge runtime events, tool results, subagent reports, and inspector-only details must remain one linear product timeline. | -| Evidence wording | High | Stubbed Hub, Vite renderer, readiness, approved-real, and packaged Desktop can be accidentally overclaimed. | -| Web visual viewport drift | Medium | T1.2 keeps `app/web/scripts/visual-qa.mjs` aligned with the `1440x810` architecture contract and guards against stale active script references. | -| Backend/frontend contract drift | Medium | Hub/Edge event shapes can change without immediately breaking a shared transcript golden fixture. | - -## Risk Matrix - -| Risk | Impact | Likelihood | Severity | Mitigation | -|:--|:--|:--|:--|:--| -| Main chat stream shows mock/debug/mode state | User-facing UI becomes noisy and misleading | Medium | High | Shared normalizer/render tests plus Playwright assertions that transcript excludes debug labels | -| User message flashes or disappears during send/refetch | Core chat workflow feels broken | Medium | High | Optimistic send contract in shared unit + Desktop/Web E2E with mutation probe | -| Tool calls/results/subagent reports render out of order | Users cannot trust agent activity | Medium | High | Golden event fixture tests and Web/Desktop E2E ordering assertions | -| Visual QA produces screenshots but not behavioral proof | False confidence | Medium | High | Require DOM metrics and behavior assertions next to screenshots | -| Stubbed Hub evidence is reported as real | Governance and release risk | Medium | High | Manifest must include `real_tested=false` and evidence level | -| Web direct-calls Local Edge | Security/product boundary violation | Low/Medium | High | Reuse `e2eDataModeContract` in Web tests | -| Desktop Vite E2E is treated as packaged Desktop proof | Release readiness overclaim | Medium | High | Separate packaged-release issue/gate; no packaged claim in chat-flow tasks | -| Adding broad snapshot tests slows work without protection | Developer speed loss | Medium | Medium | Prefer targeted behavior/contract/geometry tests | - -## High-Severity Risks - -### Evidence Overclaim - -The project has many evidence levels. The most damaging failure is not a red test; it is a green stubbed test reported as real login, real model execution, or packaged Desktop behavior. Every task in this SPEC must label evidence level and `real_tested` honestly. - -### Timeline Integrity - -The user experience depends on an IM-like linear transcript. Any split between Hub messages, Edge runtime events, optimistic sends, tool cards, subagent reports, and inspector-only details can cause disappearing messages, wrong order, duplicated cards, or noisy internal state. - -### Visual QA Drift - -Visual QA must be half automated and half agent-inspected. Scripts should fail on measurable layout problems, and screenshots should be available for human/agent review. A screenshot alone is not acceptance. - -## Technical Debt - -- ChatView grouping and rendering are implemented in a few central files, so behavioral changes can have wide visual impact. -- Visual QA scripts contain their own local stubs and reporting shape; useful but not yet a project-level evidence bundle. -- Some broader Web visual QA scenes still include mobile-heavy coverage. Mobile is out of scope for detailed work in this SPEC. -- Backend service directories are very large; this SPEC should avoid broad backend rewrites and only fix contract mismatches. - -## Testing Risks - -- Playwright tests can become slow or flaky if they start real services unnecessarily. -- Tests that mirror implementation switches in `dataMode` would add little protection; E2E should assert observed requests and visible behavior. -- Manual Visual QA scripts must not be treated as "manual only"; they need machine-failing metrics plus screenshot artifacts. -- Packaged Desktop cannot be inferred from Vite renderer tests. - -## Project Governance Risks - -- Current rules correctly centralize in `AGENTS.md`; adding duplicate rules to roadmap/MASTER would recreate the prior doc mess. -- Native memory is available, but this SPEC must not create a repo-local fallback memory file without explicit selection. -- GitHub Projects are unavailable with current token scope, so this run must use `GITHUB_STANDARD`. - -## Compatibility Concerns - -- `dataMode` remains a compatibility field and must not be repurposed as the full truth source. -- Existing scripts/verify layout must remain under `scripts/verify/`; no root wrapper scripts. -- Current Desktop/Web app scripts should continue to work while any shared harness is introduced. -- Mobile contract notes can be updated if needed, but no native/mobile UI rewrite belongs in this SPEC. diff --git a/docs/architecture.md b/docs/architecture.md index 726b153b..804e7e50 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,6 +1,6 @@ # AgentHub 架构概览 -最后更新:2026-06-28 +最后更新:2026-06-29 本文档是架构入口,只保留当前结构、边界和 owner 链接。旧长版架构说明见 [history.md](history.md)。 @@ -108,7 +108,7 @@ Conversation -> Message -> TranscriptBlock -> TranscriptItem -> RowItem / UserMs ## Acceptance Gates -证据包入口只保留 owner 链接;证据等级矩阵和真实边界规则由 [.agents/skills/real-e2e-acceptance/SKILL.md](../.agents/skills/real-e2e-acceptance/SKILL.md) 维护,当前专项进度和验收记录见 [progress/MASTER.md](progress/MASTER.md)。 +证据包入口只保留 owner 链接;证据等级矩阵和真实边界规则由 [.agents/skills/real-e2e-acceptance/SKILL.md](../.agents/skills/real-e2e-acceptance/SKILL.md) 维护。已完成 SPEC 的验收记录只从 [history.md](history.md) 指向的外部归档追溯。 | 变更 | 最低验收 | |---|---| @@ -122,6 +122,6 @@ Conversation -> Message -> TranscriptBlock -> TranscriptItem -> RowItem / UserMs ## 文档权威 - 当前规则:[../AGENTS.md](../AGENTS.md) -- 当前 SPEC 进度:`docs/progress/MASTER.md`(仅当文件存在时有效) +- 当前 SPEC 进度:执行中才临时存在 `docs/progress/MASTER.md`;完成后从 [history.md](history.md) 追溯 - 总进度:[roadmap.md](roadmap.md) - 安全风险:[governance/security-risk-register.md](governance/security-risk-register.md) diff --git a/docs/developer-quickstart.md b/docs/developer-quickstart.md index 9758e702..1b4bf8f7 100644 --- a/docs/developer-quickstart.md +++ b/docs/developer-quickstart.md @@ -1,8 +1,8 @@ # AgentHub 开发快速上手 -最后更新:2026-06-28 +最后更新:2026-06-29 -本文档只保留新人启动本地开发环境需要的最短路径。规则、分支、E2E 证据等级和发布门禁以 `AGENTS.md`、当前 `docs/progress/MASTER.md`(仅当存在时)、`docs/roadmap.md` 和 `.agents/skills/real-e2e-acceptance/SKILL.md` 为准。 +本文档只保留新人启动本地开发环境需要的最短路径。规则、分支、E2E 证据等级和发布门禁以 `AGENTS.md`、执行中的 `docs/progress/MASTER.md`(仅当存在时)、`docs/roadmap.md` 和 `.agents/skills/real-e2e-acceptance/SKILL.md` 为准。 ## 前置条件 diff --git a/docs/history.md b/docs/history.md index eb638de1..7f4f23e8 100644 --- a/docs/history.md +++ b/docs/history.md @@ -1,6 +1,6 @@ # AgentHub History -最后更新:2026-06-28 +最后更新:2026-06-29 本文件是 AgentHub 源仓的历史材料索引。历史 longform、日期型审计、旧发布材料、过期设计、参考调研、完成的 spec-driven 工件和过期项目 skill 不再保存在 AgentHub active source tree。 @@ -17,6 +17,8 @@ | Root evidence archive commit | `bc774192` merge commit; source archive commit `6cb00e9` | | Repo structure SPEC archive PR | TokenDanceLab/docs#4 | | Repo structure SPEC archive commit | `b7c6478d` merge commit; source archive commit `b845480` | +| Real Foundation Hardening SPEC archive PR | TokenDanceLab/docs#5 | +| Real Foundation Hardening SPEC archive commit | `c6bd127` merge commit; source archive commit `f8206d2` | | Archive root | `archive/agenthub/` | ## Migrated Paths @@ -29,6 +31,7 @@ | archived project skills | `archive/agenthub/repo/docs/archives/project-skills/` | | `css-audit-results.json` | `archive/agenthub/repo/root-evidence/css-audit-results.json` | | repo structure cleanup SPEC | `archive/agenthub/repo/specs/repo-structure-doc-tooling-cleanup/` | +| real foundation hardening SPEC | `archive/agenthub/repo/specs/real-foundation-hardening/` | ## Rules diff --git a/docs/plan/dependency-graph.md b/docs/plan/dependency-graph.md deleted file mode 100644 index 7179580d..00000000 --- a/docs/plan/dependency-graph.md +++ /dev/null @@ -1,63 +0,0 @@ -# Real Foundation Hardening - Dependency Graph - -```mermaid -graph TD - subgraph P1["Phase 1: Evidence Contract Foundation"] - T11["T1.1 Evidence manifest contract"] - T12["T1.2 Visual QA viewport/report"] - T13["T1.3 Data-mode boundary reuse"] - T14["T1.4 Evidence docs without duplication"] - T11 --> T12 - T11 --> T13 - T11 --> T14 - end - - subgraph P2["Phase 2: Shared Chat Timeline Hardening"] - T21["T2.1 Mixed-source golden fixtures"] - T22["T2.2 Optimistic send and auto-follow"] - T23["T2.3 Card grouping and radii"] - T24["T2.4 Markdown and debug filtering"] - T13 --> T21 - T21 --> T22 - T21 --> T23 - T21 --> T24 - end - - subgraph P3["Phase 3: Desktop/Web Boundary And Backend Truth"] - T31["T3.1 Web Hub-only check"] - T32["T3.2 Desktop phase split"] - T33["T3.3 Observed/approved-real manifest boundary"] - T22 --> T31 - T13 --> T32 - T11 --> T33 - end - - subgraph P4["Phase 4: Real E2E And Visual QA Closure"] - T41["T4.1 Chat acceptance gate"] - T42["T4.2 Semi-automated Visual QA artifact loop"] - T43["T4.3 Packaged Desktop claim separation"] - T22 --> T41 - T23 --> T41 - T31 --> T41 - T32 --> T41 - T12 --> T42 - T41 --> T42 - T41 --> T43 - end - - subgraph P5["Phase 5: Acceptance, Merge, Archive"] - T51["T5.1 Final acceptance matrix"] - T52["T5.2 Merge readiness and archive"] - T41 --> T51 - T42 --> T51 - T43 --> T51 - T51 --> T52 - end -``` - -## Parallel Lane Notes - -- Phase 1 can split evidence manifest/data-boundary work from docs, but Visual QA report work depends on the manifest shape. -- Phase 2 can split optimistic send/scroll from card/markdown rendering after golden fixtures exist. -- Phase 3 should run Web, Desktop, and manifest boundary work in separate worktrees to reduce conflicts. -- Phase 4 should keep acceptance-gate scripting separate from the semi-automated Visual QA artifact loop. diff --git a/docs/plan/milestones.md b/docs/plan/milestones.md deleted file mode 100644 index 6b526a03..00000000 --- a/docs/plan/milestones.md +++ /dev/null @@ -1,19 +0,0 @@ -# Real Foundation Hardening - Milestones - -| # | Milestone | Target Phase | Criteria | Status | -|:--|:--|:--|:--|:--| -| 1 | Phase 1: Evidence Contract Foundation | After Phase 1 | Evidence manifest, data-mode phase boundary, Visual QA viewport/report contract, and concise docs are in place | Pending | -| 2 | Phase 2: Shared Chat Timeline Hardening | After Phase 2 | Shared transcript/chatview golden tests prove send, ordering, card grouping, markdown/table, debug filtering, and auto-follow | Pending | -| 3 | Phase 3: Desktop/Web Boundary And Backend Truth | After Phase 3 | Web Hub-only, Desktop entry/workbench phase split, and observed/approved-real boundaries are enforced | Pending | -| 4 | Phase 4: Real E2E And Visual QA Closure | After Phase 4 | One acceptance bundle runs automated Playwright plus Visual QA artifacts and keeps packaged Desktop claims separate | Pending | -| 5 | Phase 5: Acceptance, Merge, Archive | After Phase 5 | Final gates pass, PRs merge through `dev/delicious233`, and SPEC artifacts are archived externally per project rules | Pending | - -## Milestone Drift Strategy - -Each GitHub milestone stores adaptive state in its description. Thresholds use the spec-driven defaults: annotate at 20%, replan at 40%, and rescope at 60% of phase task count. - -## Non-Goals - -- No Mobile UI/native deep work. -- No real login, real model/API spend, production deploy, packaged Desktop, signing, notarization, updater, or release upload unless an approved-real/package gate is explicitly opened. -- No broad visual redesign outside defects proven by the acceptance tests. diff --git a/docs/plan/task-breakdown.md b/docs/plan/task-breakdown.md deleted file mode 100644 index 168b234c..00000000 --- a/docs/plan/task-breakdown.md +++ /dev/null @@ -1,163 +0,0 @@ -# Real Foundation Hardening - Task Breakdown - -## Confirmed Task Definition - -Establish a clean, real engineering foundation for Desktop/Web AgentHub chat: shared transcript correctness, optimistic send, linear ordering, card grouping, markdown/table rendering, clean UI without debug/mock pollution, honest data/execution boundaries, and a useful E2E + Visual QA acceptance loop. Mobile remains boundary-only. - -## Overview - -- **Total Phases**: 5 -- **Total Tasks**: 16 -- **Estimated Total Effort**: XL -- **Tracking Mode**: GITHUB_STANDARD - -## S.U.P.E.R Design Constraints - -- **S**: Keep transcript normalization, rendering, E2E stubbing, Visual QA metrics, and evidence packaging as separate responsibilities. -- **U**: Preserve source -> normalizer -> `TranscriptBlock[]` -> ChatView -> DOM. UI must not reach into Hub/Edge internals. -- **P**: New cross-layer behavior needs typed/serializable contracts or manifest fields before implementation. -- **E**: Vite, stubbed Hub, observed local, approved-real, and packaged Desktop are different environments and must not share claims. -- **R**: Prefer shared helpers and fixtures that Desktop/Web can reuse without forking components. - -## Testing And Governance Constraints - -- Chat workflow changes require shared Vitest plus Desktop/Web Playwright. -- Visual changes require Visual QA metrics and screenshots; screenshots alone are not acceptance. -- Stubbed/fixture/readiness gates must write `real_tested=false`. -- No debug/mode/mock labels may be added to main transcript bubbles/cards. -- No root script wrappers; scripts remain under `scripts/verify/`, `scripts/dev/`, `scripts/release/`, `scripts/smoke/`, or app-local scripts. -- Durable future-agent rules go to `AGENTS.md` only; current progress goes to `docs/progress/MASTER.md`. - -## Phase 1: Evidence Contract Foundation - -**Goal**: Make the acceptance target machine-honest before changing product behavior. -**Prerequisite**: SPEC analysis accepted. -**S.U.P.E.R Focus**: P, U, E. - -| # | Task | Priority | Effort | Depends On | Lane | S.U.P.E.R | Test Expectation | Memory Impact | Acceptance Criteria | -|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--| -| T1.1 | Define chat-flow evidence manifest contract | P0 | M | - | A | P,E | Add/update unit tests for manifest validation | Update memory only if a durable evidence invariant emerges | Manifest records surface, evidence level, data source, auth/execution, `real_tested`, screenshots, metrics, and commands | -| T1.2 | Align Visual QA viewports and report shape | P0 | M | T1.1 | A | P,E,R | Update app visual scripts and verifier tests | None unless command changes become stable rules | Desktop/Web chat Visual QA use 1440x810; stale 1440x920 active references removed or justified | -| T1.3 | Reuse data-mode boundary helper in acceptance gates | P0 | S | T1.1 | B | U,P,R | Shared/unit tests plus affected Playwright assertions | None | E2E request assertions are phase-aware and do not duplicate mode switch logic | -| T1.4 | Document the evidence bundle without rule duplication | P1 | S | T1.1 | B | S,P | Docs-only; run doc SSOT and real-e2e contract verifiers | None | `docs/architecture.md`/`docs/roadmap.md` link to the bundle; rules remain in `AGENTS.md` and skill | - -### Parallel Lanes - -| Lane | Tasks | Combined Effort | Merge Risk | Key Files | -|:--|:--|:--|:--|:--| -| A | T1.1, T1.2 | L | Medium | `app/*/scripts/*chat-flow*`, `scripts/verify/*`, shared testing files | -| B | T1.3, T1.4 | M | Low | `app/shared/src/testing/`, docs | - -## Phase 2: Shared Chat Timeline Hardening - -**Goal**: Fix the user-visible chat flow contract at the shared layer first. -**Prerequisite**: Phase 1 manifest/Visual QA contract. -**S.U.P.E.R Focus**: S, U, P, R. - -| # | Task | Priority | Effort | Depends On | Lane | S.U.P.E.R | Test Expectation | Memory Impact | Acceptance Criteria | -|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--| -| T2.1 | Add golden mixed-source transcript fixtures | P0 | M | T1.3 | A | U,P,R | Shared Vitest golden tests | None | Hub user message, Edge tool call/result, agent reply, subagent/inspector detail, markdown table order is deterministic | -| T2.2 | Harden optimistic send and auto-follow contract | P0 | M | T2.1 | A | U,R | Shared auto-scroll tests plus Desktop/Web E2E | None | User send appears immediately, never flashes away, and scroll follows submit without stealing scrollback | -| T2.3 | Harden card grouping and rounded-stack rules | P0 | M | T2.1 | B | S,R | Shared render/CSS tests plus Desktop Playwright geometry | None | Consecutive related cards merge visually; inner radii collapse; unrelated cards stay distinct | -| T2.4 | Keep markdown/table rendering and debug filtering clean | P0 | S | T2.1 | B | S,P | Shared render tests plus Web Playwright | None | Markdown tables render; mode/mock/runtime diagnostics do not appear in transcript bubbles | - -### Parallel Lanes - -| Lane | Tasks | Combined Effort | Merge Risk | Key Files | -|:--|:--|:--|:--|:--| -| A | T2.1, T2.2 | L | Medium | `app/shared/src/transcript/`, `app/shared/src/chatview/`, app E2E | -| B | T2.3, T2.4 | M | Medium | ChatView components/CSS/tests | - -## Phase 3: Desktop/Web Boundary And Backend Truth - -**Goal**: Keep product mode, data source, auth, and execution truth separate across Desktop/Web. -**Prerequisite**: Phase 2 shared timeline stable. -**S.U.P.E.R Focus**: U, P, E. - -| # | Task | Priority | Effort | Depends On | Lane | S.U.P.E.R | Test Expectation | Memory Impact | Acceptance Criteria | -|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--| -| T3.1 | Web Hub-only guarded-flow check | P0 | M | T2.2 | A | U,E | Web Playwright + data-mode contract | None | Web never direct-calls Local Edge and does not silently fall back to mock after Hub guard | -| T3.2 | Desktop entry-preflight vs workbench-runtime split | P0 | M | T1.3 | B | U,E | Desktop Playwright request-phase assertions | None | Desktop may probe Local Edge health on entry, but Demo workbench performs no Hub/Edge runtime requests | -| T3.3 | Observed/approved-real manifest boundary | P1 | M | T1.1 | C | P,E | Verifier tests and manifest fixture tests | Update `AGENTS.md` only if an agent-facing rule changes | Stubbed and readiness manifests cannot claim real login, CLI/model/API, packaged Desktop, or release | - -### Parallel Lanes - -| Lane | Tasks | Combined Effort | Merge Risk | Key Files | -|:--|:--|:--|:--|:--| -| A | T3.1 | M | Low | `app/web/src/__e2e__/`, Web adapter | -| B | T3.2 | M | Low | `app/desktop/src/__e2e__/`, Desktop adapter | -| C | T3.3 | M | Medium | `scripts/verify/`, tests/contracts | - -## Phase 4: Real E2E And Visual QA Closure - -**Goal**: Turn tests into a repeatable acceptance loop for agents and CI. -**Prerequisite**: Phase 3 boundaries stable. -**S.U.P.E.R Focus**: S, P, E, R. - -| # | Task | Priority | Effort | Depends On | Lane | S.U.P.E.R | Test Expectation | Memory Impact | Acceptance Criteria | -|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--| -| T4.1 | Add focused chat acceptance gate | P0 | M | T2.2,T2.3,T3.1,T3.2 | A | S,P,R | Script/verifier tests plus actual Desktop/Web commands | None | One command/report lists shared unit, Desktop/Web Playwright, Visual QA, and evidence boundaries | -| T4.2 | Add semi-automated Visual QA artifact loop | P0 | M | T1.2,T4.1 | B | P,E | Visual QA script output validation | None | Agent can inspect screenshot + JSON metrics; pass/fail is machine readable | -| T4.3 | Keep packaged Desktop claim separate | P1 | S | T4.1 | C | E,P | Verifier/docs checks | None | Acceptance output says Vite renderer is not packaged Desktop; packaged-release remains a separate gate | - -### Parallel Lanes - -| Lane | Tasks | Combined Effort | Merge Risk | Key Files | -|:--|:--|:--|:--|:--| -| A | T4.1 | M | Medium | `scripts/verify/`, package scripts | -| B | T4.2 | M | Medium | visual scripts, artifacts | -| C | T4.3 | S | Low | docs/verifiers | - -## Phase 5: Acceptance, Merge, Archive - -**Goal**: Prove the foundation and merge without leaving active SPEC clutter. -**Prerequisite**: Phases 1-4 complete. -**S.U.P.E.R Focus**: P, E. - -| # | Task | Priority | Effort | Depends On | Lane | S.U.P.E.R | Test Expectation | Memory Impact | Acceptance Criteria | -|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--| -| T5.1 | Run final acceptance matrix | P0 | L | T4.1,T4.2,T4.3 | A | P,E | Full targeted gates with evidence summary | Record durable gotchas in native memory if discovered | All required commands pass or failures are explicitly scoped and fixed | -| T5.2 | Merge readiness and archive SPEC | P0 | M | T5.1 | A | S,P | Doc SSOT, real-e2e contract, project skills, diff check | None | PRs merged to `dev/delicious233`, promoted to `master` if approved, SPEC artifacts archived via `docs/history.md` external archive | - -### Parallel Lanes - -| Lane | Tasks | Combined Effort | Merge Risk | Key Files | -|:--|:--|:--|:--|:--| -| A | T5.1, T5.2 | L | Medium | all touched surfaces | - -## Required Acceptance Commands - -Minimum per implementation PR: - -```powershell -git diff --check -pwsh ./scripts/verify/verify-doc-ssot.ps1 -pwsh ./scripts/verify/verify-project-skills.ps1 -pwsh ./scripts/verify/verify-real-e2e-contract.ps1 -``` - -When chat/UI behavior changes: - -```powershell -corepack pnpm --dir app/shared test -corepack pnpm --dir app/desktop test:e2e:chat-flow -corepack pnpm --dir app/desktop test:visual:chat-flow -corepack.cmd pnpm --dir app/web test:e2e:chat-flow -corepack.cmd pnpm --dir app/web test:visual:chat-flow -``` - -When Web Hub boundary changes: - -```powershell -corepack.cmd pnpm --dir app/web test:e2e:stubbed-hub -pwsh ./scripts/verify/verify-web-hub-boundary.ps1 -``` - -When Hub/Edge code changes: - -```powershell -cd edge-server; go test ./... -short -count=1 -cd ../hub-server; go test ./... -short -count=1 -``` - -Approved-real, packaged Desktop, signing, release upload, and production deploy are excluded unless explicitly approved in a task. diff --git a/docs/progress/MASTER.md b/docs/progress/MASTER.md deleted file mode 100644 index 6e4a77df..00000000 --- a/docs/progress/MASTER.md +++ /dev/null @@ -1,123 +0,0 @@ -# Real Foundation Hardening - Progress Tracker - -> **Task**: Desktop/Web chat workflow, shared transcript, data-boundary, and real E2E/Visual QA foundation hardening -> **Started**: 2026-06-28 -> **Last Updated**: 2026-06-29 -> **Mode**: GITHUB_STANDARD -> **Repo**: TokenDanceLab/AgentHub - -## GitHub Resources - -- **All Issues**: `gh issue list -R TokenDanceLab/AgentHub --label "spec-driven" --state all` -- **Project Board**: Not created; current `gh` token lacks `read:project`/Project scope. - -## References - -- [Project Overview](../analysis/project-overview.md) -- [Module Inventory](../analysis/module-inventory.md) -- [Risk Assessment](../analysis/risk-assessment.md) -- [Task Breakdown](../plan/task-breakdown.md) -- [Dependency Graph](../plan/dependency-graph.md) -- [Milestones](../plan/milestones.md) - -## Milestones - -| Phase | Name | Milestone URL | Open | Closed | Total | -|:--|:--|:--|--:|--:|--:| -| 1 | Evidence Contract Foundation | https://github.com/TokenDanceLab/AgentHub/milestone/17 | 0 | 4 | 4 | -| 2 | Shared Chat Timeline Hardening | https://github.com/TokenDanceLab/AgentHub/milestone/18 | 0 | 4 | 4 | -| 3 | Desktop/Web Boundary And Backend Truth | https://github.com/TokenDanceLab/AgentHub/milestone/19 | 0 | 3 | 3 | -| 4 | Real E2E And Visual QA Closure | https://github.com/TokenDanceLab/AgentHub/milestone/20 | 0 | 3 | 3 | -| 5 | Acceptance, Merge, Archive | https://github.com/TokenDanceLab/AgentHub/milestone/21 | 2 | 0 | 2 | - -## Issue Mapping - -| Task ID | Issue | Title | Status | -|:--|:--|:--|:--| -| T1.1 | #378 | Define chat-flow evidence manifest contract | closed via #395 | -| T1.2 | #379 | Align Visual QA viewports and report shape | closed via #396 | -| T1.3 | #380 | Reuse data-mode boundary helper in acceptance gates | closed via #397 | -| T1.4 | #381 | Document evidence bundle without rule duplication | closed via #399 | -| T2.1 | #382 | Add golden mixed-source transcript fixtures | closed via #401 | -| T2.2 | #383 | Harden optimistic send and auto-follow contract | closed via #402 | -| T2.3 | #384 | Harden card grouping and rounded-stack rules | closed via #404 | -| T2.4 | #385 | Keep markdown/table rendering and debug filtering clean | closed via #405 | -| T3.1 | #386 | Web Hub-only guarded-flow check | closed via #406 | -| T3.2 | #387 | Desktop entry-preflight vs workbench-runtime split | closed via #408 | -| T3.3 | #388 | Observed and approved-real manifest boundary | closed via #410 | -| T4.1 | #389 | Add focused chat acceptance gate | closed via #412 | -| T4.2 | #390 | Add semi-automated Visual QA artifact loop | closed via #414 | -| T4.3 | #391 | Keep packaged Desktop claim separate | closed via #416 | -| T5.1 | #392 | Run final acceptance matrix | open | -| T5.2 | #393 | Merge readiness and archive SPEC | open | - -## Quick Status Commands - -```powershell -gh api repos/TokenDanceLab/AgentHub/milestones --jq '.[] | select(.title | startswith("Phase ")) | "\(.title): \(.open_issues) open, \(.closed_issues) closed"' -gh issue list -R TokenDanceLab/AgentHub --milestone "Phase 1: Evidence Contract Foundation" --state open --json number,title -gh issue list -R TokenDanceLab/AgentHub --label "spec-driven" --state all --json number,title,state,milestone -``` - -## Phase Checklist - -- [x] Phase 1: Evidence Contract Foundation (4/4 tasks) - [milestone](https://github.com/TokenDanceLab/AgentHub/milestone/17) -- [x] Phase 2: Shared Chat Timeline Hardening (4/4 tasks) - [milestone](https://github.com/TokenDanceLab/AgentHub/milestone/18) -- [x] Phase 3: Desktop/Web Boundary And Backend Truth (3/3 tasks) - [milestone](https://github.com/TokenDanceLab/AgentHub/milestone/19) -- [x] Phase 4: Real E2E And Visual QA Closure (3/3 tasks) - [milestone](https://github.com/TokenDanceLab/AgentHub/milestone/20) -- [ ] Phase 5: Acceptance, Merge, Archive (0/2 tasks) - [milestone](https://github.com/TokenDanceLab/AgentHub/milestone/21) - -## Current Status - -**Active Phase**: Phase 5 -**Active Task**: T5.1 - Run final acceptance matrix (#392) -**Blockers**: None for execution. Phase 4 closed at 3/3 with drift_score 1. GitHub Project board requires refreshed `project` scope and is intentionally skipped. - -## Governance Status - -**Shared instruction surface**: `AGENTS.md` -**Claude Code instruction surface**: unavailable; no separate Claude-only rule surface is active -**Other platform rule surfaces**: none active for this SPEC -**Memory surface**: native Codex memory available; no repo fallback selected -**Memory fallback path**: none -**Project skill**: `.agents/skills/real-e2e-acceptance/SKILL.md` - -## Execution Telemetry - -Per-task telemetry is stored in GitHub issue comments before task closure. Adaptive drift state is stored in each milestone description. - -## Next Steps - -1. Start T5.1 (#392) in a separate task worktree from `origin/dev/delicious233`. -2. Run the final acceptance matrix and keep each evidence level honest. -3. Preserve `real_tested=false` boundaries unless an approved-real or packaged-release run is explicitly approved and executed. -4. After T5.1, run T5.2 (#393) for merge readiness and external SPEC archive. - -## Session Log - -| Date | Session | Summary | -|:--|:--|:--| -| 2026-06-28 | spec setup | Created analysis docs, plan docs, GitHub milestones #17-#21, and task issues #378-#393 from branch `spec/real-foundation-hardening`. | -| 2026-06-28 | T1.1 implementation | Added shared chat-flow evidence manifest contract and tests; full shared Vitest passed, targeted TypeScript passed, broad `app/shared lint` remains blocked by pre-existing story/test type debt unrelated to this task. | -| 2026-06-28 | T1.2 implementation | Aligned Web Visual QA desktop scenes to `1440x810`, added screenshot/DOM-metrics report output, tightened the real-e2e verifier, and generated a failing Visual QA report that now records the remaining shell brand-image assertion. | -| 2026-06-28 | T1.3 implementation | Added shared E2E request-decision helper and reused it from Desktop/Web Playwright boundary gates; Desktop chat-flow and Web stubbed Hub E2E passed with `real_tested=false` boundaries. | -| 2026-06-28 | adaptive replan sync | #380 merged via #397; Phase 1 drift reached the replan threshold, so #381 must stay as a narrow docs-only closure task before Phase 1 completes. | -| 2026-06-28 | T1.4 implementation | Linked the evidence bundle from architecture and roadmap without duplicating the real-e2e skill or AGENTS rules; scope remained docs-only. | -| 2026-06-28 | Phase 1 complete | #381 merged via #399; milestone #17 closed with 4/4 tasks complete and Phase 2 is now active. | -| 2026-06-28 | T2.1 implementation | Added shared golden mixed-source transcript fixtures and switched Web chat-flow E2E to consume the shared fixture; #401 merged. | -| 2026-06-28 | T2.2 implementation | Tightened optimistic send auto-follow so pending-to-confirmed reconciliation does not steal scrollback; Desktop/Web chat-flow E2E and Visual QA passed with `real_tested=false`; #402 merged. | -| 2026-06-28 | T2.3 implementation | Split related card stacks from unrelated consecutive cards, removed nested preview-card framing, and verified Desktop/Web chat-flow geometry with `real_tested=false`; #404 merged and #384 closed manually because non-default base did not auto-close it. | -| 2026-06-29 | T2.4 implementation | Expanded shared runtime diagnostic filtering, kept Markdown table rendering under shared/Web tests, verified Desktop/Web chat-flow and Visual QA with `real_tested=false`; #405 merged and #385 closed manually because non-default base did not auto-close it. | -| 2026-06-29 | T3.1 implementation | Enforced Web guarded Hub-only data boundary, added stubbed-Hub manifest `evidence_level`, blocked Local Edge/TDI/Gateway boundary attempts in Playwright, and verified Web/shared gates with `real_tested=false`; #406 merged and #386 closed manually because non-default base did not auto-close it. | -| 2026-06-29 | Phase 3 sync | Updated Phase 3 live state after #406: milestone #19 is 1/3 complete, T3.2 (#387) is the active next task, and adaptive milestone `completed_tasks` is 1. | -| 2026-06-29 | T3.2 implementation | Added Desktop entry/workbench data-boundary Playwright coverage, moved chat-flow phase marking to the Demo transition boundary, hardened disabled health polling against in-flight updates, and verified Desktop Vite gates with `real_tested=false`; #408 merged and #387 closed manually because non-default base did not auto-close it. | -| 2026-06-29 | Phase 3 sync | Updated Phase 3 live state after #408: milestone #19 is 2/3 complete, adaptive drift_score is 1, #388 has a drift warning, and T3.3 (#388) is the active next task. | -| 2026-06-29 | T3.3 implementation | Hardened observed/approved-real manifest boundaries, kept packaged-release claims separate, aligned smoke-matrix contract checks with current stubbed-Hub replay names, and verified shared/contract gates with `real_tested=false`; PR pending. | -| 2026-06-29 | Phase 3 complete | #388 merged via #410 and closed manually because non-default base did not auto-close it; milestone #19 is closed at 3/3 with adaptive drift_score 2, so Phase 4 requires a lightweight checkpoint before T4.1. | -| 2026-06-29 | T4.1 implementation | Added a Node-based focused chat acceptance bundle for shared unit, Desktop/Web Playwright, and Desktop/Web Visual QA; package entry passed with `real_tested=false`; PR pending. | -| 2026-06-29 | Phase 4 sync | #389 merged via #412 and closed manually because non-default base did not auto-close it; milestone #20 is 1/3 complete with drift_score 1, and #390 is the active next task with a drift warning. | -| 2026-06-29 | T4.2 implementation | Added Desktop/Web Visual QA metrics and report artifacts, locked their paths in the chat acceptance manifest, and verified the full chat acceptance bundle with `real_tested=false`; PR pending. | -| 2026-06-29 | Phase 4 sync | #390 merged via #414 and closed manually because non-default base did not auto-close it; milestone #20 is 2/3 complete with drift_score 1, and #391 is the active next task. | -| 2026-06-29 | T4.3 implementation | Added an explicit skipped `packaged-release` boundary row to chat acceptance, kept packaged Desktop proof opt-in, and verified docs/real-e2e/Tauri readiness contracts; PR pending. | -| 2026-06-29 | Phase 4 complete | #391 merged via #416 and closed manually because non-default base did not auto-close it; milestone #20 is closed at 3/3 with drift_score 1, and Phase 5 / #392 is active. | -| 2026-06-29 | T5.1 final matrix | Ran final chat acceptance, Visual QA artifact contracts, governance verifiers, and Tauri readiness contract; chat acceptance passed with only the opt-in `packaged-release` row skipped and `real_tested=false`; PR pending. | diff --git a/docs/roadmap.md b/docs/roadmap.md index 74856522..00a4f390 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -17,16 +17,7 @@ AgentHub 是 IM 形态的多 Agent 协作工作台。用户面对的是联系人 ## 当前 SPEC -当前 active spec-driven 专项是 Real Foundation Hardening,进度入口见 [progress/MASTER.md](progress/MASTER.md)。该专项负责 Desktop/Web chat workflow、shared transcript、data-boundary、real E2E/Visual QA foundation 的基础收口;证据等级矩阵仍只由 [.agents/skills/real-e2e-acceptance/SKILL.md](../.agents/skills/real-e2e-acceptance/SKILL.md) 维护。已完成的 Repo Structure Doc Tooling Cleanup 归档见 [history.md](history.md)。 - -当前专项状态: - -| Phase | 状态 | 说明 | -|---|---|---| -| Phase 1 Evidence Contract Foundation | 完成 | evidence manifest、Visual QA viewport/report、data-mode boundary helper、docs-only evidence bundle owner links 已合并 | -| Phase 2 Shared Chat Timeline Hardening | 完成 | shared transcript fixtures、optimistic send/auto-follow、card grouping、markdown/table/debug filtering 已合并 | -| Phase 3 Desktop/Web Boundary And Backend Truth | 进行中 | Web Hub-only guarded flow 已合并;Desktop entry/workbench split 和 observed/approved-real boundary 按 [progress/MASTER.md](progress/MASTER.md) 跟踪 | -| Phase 4-5 | 待执行 | real E2E/Visual QA closure、final acceptance、merge readiness 和 archive 仍按 [progress/MASTER.md](progress/MASTER.md) 跟踪 | +当前没有 active spec-driven 专项。Real Foundation Hardening 已完成并外部归档,负责收口 Desktop/Web chat workflow、shared transcript、data-boundary、real E2E/Visual QA foundation;归档入口见 [history.md](history.md)。证据等级矩阵仍只由 [.agents/skills/real-e2e-acceptance/SKILL.md](../.agents/skills/real-e2e-acceptance/SKILL.md) 维护。 最近完成: @@ -37,6 +28,7 @@ AgentHub 是 IM 形态的多 Agent 协作工作台。用户面对的是联系人 | Phase 3 Source And Test Alignment | 完成 | API/Hub、模块 README、进度 SSOT、聊天流、前端架构、后端性能/泄漏、Desktop packaged evidence、active docs 和 Web/Mobile client lanes 已对齐 | | Phase 4 Acceptance And Merge Readiness | 完成 | 聚合验收、架构审批、归档和合并准备;下一轮仓库结构清理必须单独开 SPEC | | Repo Structure Doc Tooling Cleanup | 完成 | 历史归档、ADR 摘要、root evidence、scripts/tests 分层、根级 wrapper 删除、SPEC 外部归档 | +| Real Foundation Hardening | 完成 | Chat-flow evidence manifest、Desktop/Web Playwright、Visual QA、data-boundary、observed/approved-real manifest boundary、packaged-release skipped boundary 和最终 acceptance matrix 已收口并归档 | ## 当前优先级