feat(e2e): scenario-based setup matrix + Phase 1 migration infrastructure by jyaunches · Pull Request #3363 · NVIDIA/NemoClaw

jyaunches · 2026-05-11T21:15:20Z

Summary

Supersedes #3290. Introduces declarative scenario-based E2E orchestration and the Phase 1 pre-flight infrastructure that prepares for wave-by-wave migration of the existing test/e2e/test-*.sh scripts into the new structure. The old legacy scripts remain in place and unchanged; no E2E workflow is deleted or muted by this PR.

The scope grew beyond the original #3290 title after rebase against main and the addition of Phase 1 helpers, fixtures, parity harness, and convention lint. Because upstream rules forbid force-push on feature branches, this is opened as a fresh PR with clean linear history rather than rewriting #3290 in place.

Related Issue

Supersedes #3290. No single tracking issue — foundation for the E2E scenario-matrix refactor discussed across #2737, #3154, and the E2E test-gap audit.

Changes

Part 1 — scenario-based E2E orchestration (commits 1–3, previously in #3290)

Declarative metadata (test/e2e/scenarios.yaml, expected-states.yaml, suites.yaml) for initial scenarios: Ubuntu cloud OpenClaw/Hermes, macOS, WSL, GPU local Ollama, Brev launchable, no-Docker negative.
TypeScript resolver/validator/plan-printer/coverage report under test/e2e/resolver/ (invoked via tsx, uses js-yaml already in root package.json; unit-tested in the Vitest cli project).
Shell helpers under test/e2e/lib/ that produce a normalized context at $E2E_CONTEXT_DIR (default .e2e/). Wraps existing sandbox-teardown.sh and install-path-refresh.sh; does not duplicate them.
Entry scripts: run-scenario.sh, run-suites.sh, coverage-report.sh with --plan-only, --dry-run (E2E_DRY_RUN=1) flags.
Suite scripts under test/e2e/suites/{smoke,inference,credentials,lifecycle,messaging,onboarding,platform,sandbox,security}/.
New workflow .github/workflows/e2e-scenarios.yaml — manual (workflow_dispatch) with scenario, plan_only, and suite_filter inputs. Existing workflows are unchanged.
.gitignore: add .e2e/ runtime context directory.

Part 2 — Phase 1 pre-flight migration infrastructure (commits 4–7, new)

test/e2e/MIGRATION.md — in-tree tracker mapping each legacy test-*.sh script to its target scenario + suite home. Includes the reuse-absorption table showing where pair-variant coverage (e.g. OpenClaw + Hermes) collapses into a single suite.
Fixtures (test/e2e/lib/fixtures/): fake-openai.sh, fake-{telegram,discord,slack}.sh (built on shared _fake-http-stub.sh), older-base-image.sh. Follow the inline-Node http.createServer pattern from test-messaging-providers.sh; no new framework.
Helpers (test/e2e/lib/): logging.sh (PASS/FAIL/=== Phase markers), sandbox-exec.sh (e2e_sandbox_exec / e2e_sandbox_exec_stdin). env.sh auto-sources logging.sh.
Assertion helpers (test/e2e/lib/assert/): inference-works.sh, no-credentials-leaked.sh, policy-preset-applied.sh, messaging-bridge-reachable.sh.
Install splits (test/e2e/lib/setup/): install-{repo,curl,ollama,launchable}.sh + install.sh dispatcher, so each migrated suite can compose the minimal install it needs.
Runtime probe wiring: run-scenario.sh --validate-only flag (mutually exclusive with --plan-only) that reads $E2E_CONTEXT_DIR/context.env, runs the expected-state validator, and emits JSON — skipping install/onboard/suites. resolver/index.ts gains E2E_PROBE_OVERRIDES_JSON escape hatch for probe keys with embedded underscores.
Convention lint: scripts/e2e/lint-conventions.ts (6 rules, Vitest-backed in the cli project) blocks new orphan test-*.sh scripts and enforces parity-map.yaml completeness.
Parity harness: scripts/e2e/compare-parity.sh, test/e2e/parity-map.yaml (seeded with 39 legacy-script placeholder entries), .github/workflows/e2e-parity-compare.yaml. This is the mechanism that will drive waves 2–12 of the migration — each migrated suite gets dispatched against the parity-compare workflow until zero divergence vs. its legacy counterpart.

What this PR does not change

Existing E2E workflows (nightly-e2e.yaml, macos-e2e.yaml, wsl-e2e.yaml, ollama-proxy-e2e.yaml, e2e-branch-validation.yaml, sandbox-images-and-e2e.yaml) run as before.
Existing test/e2e/test-*.sh scripts are not deleted. Per-wave retirement happens in follow-up PRs once parity-map.yaml entries report zero divergence.
No new behavior in the product itself; this is test-harness scaffolding only.

Migration sequencing (follow-up PRs)

Waves 2–12 migrate the 40 legacy test-*.sh scripts into the scenario matrix, one wave per PR, gated by parity-compare zero-divergence. Phase 13 is the merge gate that flips the default CI dispatch from nightly-e2e.yaml to e2e-parity-compare.yaml.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes — 3,258 cli tests, 0 failures (41 new tests from Phase 1)
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes (test/e2e/README.md, test/e2e/MIGRATION.md, AGENTS.md)
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Fork E2E evidence (Phase 1 branch dispatch)

Dispatched nightly-e2e.yaml on jyaunches/NemoClaw against this branch (with a temporary local-only repo-gate lift, not part of this PR). Result:

Outcome	Count	Notes
✅ success	35	All CPU E2E jobs green on fork
⏭ skipped by design	2	`gpu-e2e`, `gpu-double-onboard-e2e` — need NVIDIA self-hosted GPU runner
⏸ unrunnable on fork	1	`hermes-slack-e2e` — needs `linux-amd64-cpu4` runner (NVIDIA-internal)
⚠️ pre-existing fork-environment hang	1	`hermes-e2e` hangs during `nemohermes onboard` on fork (not a Phase 1 regression — same legacy script, no changes to onboard path)

35/35 of jobs that could run on the fork pass. This attests that Phase 1 infrastructure does not regress any existing E2E coverage.

Signed-off-by: Julie Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

New Features
- Introduced scenario-based end-to-end testing framework with declarative setup profiles and ordered validation suites.
Tests
- Added CI workflows for scenario execution and parity comparison with legacy tests.
- New validation suites for smoke tests, inference, credentials, platform checks, and security.
Documentation
- Added E2E testing guide and migration tracker for moving tests to scenario-based architecture.
Chores
- Added test infrastructure, fixtures, and assertion helpers supporting multiple platforms and configurations.

Reorganize E2E around declarative setup scenarios, reusable expected-state configs, and suite sequences, while keeping all existing E2E workflows unchanged. Adds: - test/e2e/scenarios.yaml, expected-states.yaml, suites.yaml — declarative metadata for initial scenarios (Ubuntu cloud OpenClaw/Hermes, macOS, WSL, GPU local Ollama, Brev launchable, no-Docker negative), expected states, and suite sequences. - test/e2e/resolver/ — TypeScript resolver/validator/plan-printer/coverage report invoked via tsx. Uses js-yaml (already in root package.json). Unit-tested in the Vitest cli project. - test/e2e/lib/{context,env,install,onboard,gateway,sandbox,artifacts, cleanup,emit-context-from-plan}.sh — reusable shell helpers producing a normalized context at \$E2E_CONTEXT_DIR (default .e2e/). Wraps existing test/e2e/lib/sandbox-teardown.sh and install-path-refresh.sh. - test/e2e/{run-scenario.sh, run-suites.sh, coverage-report.sh} — entry points: resolve + plan, execute suites in order, emit coverage matrix. --plan-only emits human-readable stdout and stable JSON at \$E2E_CONTEXT_DIR/plan.json; --dry-run (E2E_DRY_RUN=1) gates destructive actions. - test/e2e/suites/{smoke,inference,credentials,local-ollama-inference, ollama-proxy,platform-macos,platform-wsl,hermes-specific}/*.sh — suite step scripts. - .github/workflows/e2e-scenarios.yaml — manual (workflow_dispatch) runner accepting a scenario id with optional plan_only and suite_filter inputs. Existing nightly-e2e / macos-e2e / wsl-e2e / ollama-proxy-e2e / e2e-branch-validation / sandbox-images-and-e2e workflows are unchanged. - test/e2e/README.md — documents entrypoints, scenarios, suites, and the plan-only contract. Tests (11 new files in the Vitest cli project, 55 scenarios): - e2e-scenario-schema.test.ts, e2e-scenario-resolver.test.ts, e2e-context-helper.test.ts, e2e-lib-helpers.test.ts, e2e-suite-runner.test.ts, e2e-scenario-first-migration.test.ts, e2e-scenarios-workflow.test.ts, e2e-expected-state-validator.test.ts, e2e-scenario-additional-families.test.ts, e2e-coverage-report.test.ts, e2e-metadata-final-hygiene.test.ts. Guards: - Resolver-time and runtime enforcement of suite \`requires_state\` vs scenario expected_state. - Schema guards rejecting array-form \`expected_states\` and premature introduction of \`overrides\` / \`preflight-failure-no-sandbox\` before their declared first consumers. - Metadata hygiene guard tests (final metadata shape, coverage gaps surfaced in report). Other: - AGENTS.md: note the scenario-based runner under test/e2e/. - .gitignore: add .e2e/ runtime context directory. Scenarios that require off-host infrastructure (cloud secrets, macOS/WSL runners, GPU runner, Brev) are wired via workflow_dispatch and validated in follow-up runs; full retirement of legacy test/e2e/test-*.sh scripts is intentionally deferred and tracked separately. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

Under `set -u` on bash 3.2 (the default on macOS runners) \"\${arr[@]}\" on an empty array raises \"unbound variable\" and fails the summary loops at the end of a successful run. Switch to the \${arr[@]+...} safe-expansion pattern so the loops expand to nothing when no steps were recorded. Unblocks macos-e2e (Scenario 9.1) on PR #3290.

…ss CodeRabbit review Reshape the scaffolding so the file system reflects the scenario organization informed by the UAT / NV QA bug hotspot analysis (446 issues traced to 213 fix PRs), and fold in CodeRabbit's 15 actionable review items on PR #3290. ## Reorganization `test/e2e/lib/`: lib/ artifacts.sh cleanup.sh context.sh (generic scaffolding - unchanged) emit-context-from-plan.sh env.sh install-path-refresh.sh sandbox-teardown.sh (existing; preserved) setup/ <-- install.sh, onboard.sh (dimension dispatchers) assert/ <-- gateway-alive.sh (was gateway.sh), sandbox-alive.sh (was sandbox.sh) fixtures/ <-- roadmap README; fixtures land with first consumers `test/e2e/suites/` — grouped by functional area matching the bug hotspot buckets: suites/ smoke/ (unchanged; baseline) onboarding/ <-- hermes/ (was hermes-specific/) inference/ <-- cloud/ (was direct files), ollama-gpu/ (was local-ollama-inference/), ollama-auth-proxy/ (was ollama-proxy/) security/ <-- credentials/ (was at suites/credentials/) platform/ <-- macos/ (was platform-macos/), wsl/ (was platform-wsl/) lifecycle/ sandbox/ messaging/ (new dirs with roadmap READMEs) Each new directory ships with a README.md documenting the originating bug class, the legacy `test/e2e/test-*.sh` script (where one exists), and the planned coverage. Suite IDs in `suites.yaml` stay stable; only script paths move. ## CodeRabbit review items addressed 1. `.github/workflows/e2e-scenarios.yaml` — add a `resolve-runner` job that routes each scenario to the correct runner (macos-latest / windows-latest / self-hosted / ubuntu-latest) based on the scenario id prefix. Previously `runs-on: ubuntu-latest` was hard-coded for every scenario. 2. All test files — add `timeout: Number(process.env.E2E_SPAWN_TIMEOUT_MS ?? 60_000)` to every `spawnSync` call so a stuck subprocess cannot block a Vitest worker. 3. `coverage-report.sh` — use `npx --no-install tsx` so the lockfile pins the version; fail closed if tsx is missing. 4. `lib/context.sh` — validate keys as POSIX identifiers before interpolating into grep regexes; reject newlines in values that would corrupt the line-oriented `context.env` format. 5. `lib/emit-context-from-plan.sh` — fail fast if `plan.json` is missing its `scenario_id` field rather than silently seeding empty. 6. `lib/setup/install.sh` — pin the installer source via `E2E_INSTALLER_URL` / `E2E_INSTALLER_SHA256` overrides; download to a temp file and sha256-verify before exec instead of streaming `curl | bash` over the network. 7. `lib/assert/sandbox-alive.sh` — fix regex that had an empty first alternative (`"^|..."`) and therefore always matched. Replace with `"(^|[[:space:]])name([[:space:]]|$)"` to properly detect "sandbox not found". 8. `test/e2e/README.md` — regenerate to reflect the current 7-scenario catalog, new directory layout, runner contracts, and post-reorg roadmap. 9. `resolver/index.ts` (`validate-state`) — require an explicit `--probes-from-state` flag to seed probes from the expected state. `run-scenario.sh` passes the flag in `--dry-run` mode only; live mode now fails closed when real probes are missing rather than silently self-validating. 10. `run-scenario.sh` resolver fallback — use `npx --no-install tsx` and fail closed with a clear message if tsx is not installed. 11. `run-scenario.sh` (non-dry-run) — exit 4 instead of 0 when full suite execution is not yet wired for the scenario. Silent-pass is now observable in CI. 12-15. `suites/inference/**` — replace `curl ... | head -c N` with `body="$(curl ...)"; printf '%s\n' "${body:0:N}"`. The pipe pattern was brittle under `pipefail`: `head` closing early could make successful requests appear failed. ## Test state 55/55 Vitest `cli` tests pass after reorg and fixes. `prek run --all-files` exits 0. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

Introduces test/e2e/MIGRATION.md — an in-tree tracker of legacy test-*.sh scripts being migrated to the scenario matrix. Per-wave completion is recorded there as the migration progresses.

Adds 30 tests covering Phase 1 deliverables from specs/2026-05-11_e2e-test-migration/ (tests.md 1.A-1.H): - 1.A logging helpers (lib/logging.sh) - 1.B sandbox-exec helper (lib/sandbox-exec.sh) - 1.C fixtures (fake-openai, fake-{telegram,discord,slack}, older-base-image) - 1.D assertion helpers (inference-works, no-credentials-leaked, policy-preset-applied, messaging-bridge-reachable) - 1.E install dispatcher splits (install-{repo,curl,ollama,launchable}.sh) - 1.F run-scenario.sh --validate-only flag - 1.G convention lint (scripts/e2e/lint-conventions.ts) - 1.H parity harness (scripts/e2e/compare-parity.sh + parity-map.yaml) All 30 tests fail as expected (red phase) \u2014 implementation follows.

Lands shared fixtures, helpers, assertion helpers, install-method splits, conventions + lint, and the parity-compare CI harness that unblock the per-wave migration phases (2\u201312). Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1): Fixtures (test/e2e/lib/fixtures/): - fake-openai.sh: local OpenAI-compatible endpoint (Risk #2 mitigation) - fake-{telegram,discord,slack}.sh: messaging stubs via shared _fake-http-stub.sh harness - older-base-image.sh: tagged ghcr base-image Dockerfile generator Helpers (test/e2e/lib/): - logging.sh: canonical e2e_{section,info,pass,fail} with stable PASS:/FAIL:/=== Phase markers (absorbs reuse category #1) - sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting, exit-code propagation, and dry-run short-circuit (category #10) - env.sh: auto-sources logging.sh so every consumer gets it for free Assertion helpers (test/e2e/lib/assert/): - inference-works.sh: chat-completion round-trip - no-credentials-leaked.sh: credential-pattern scan - policy-preset-applied.sh: gateway policy preset verification - messaging-bridge-reachable.sh: L7 proxy / bridge reachability Install dispatcher splits (test/e2e/lib/setup/): - install-{repo,curl,ollama,launchable}.sh (four profiles) - install.sh: dispatcher routes by profile/method name (category #5) Runtime probe wiring: - run-scenario.sh: adds --validate-only flag (probe-only, no setup) - resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys with embedded underscores (e.g. security.policy_engine) Convention lint + parity harness: - scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite step scripts + requires parity-map.yaml entries for legacy scripts - scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL via parity-map.yaml; flaky: true marker supported (Risk #4) - test/e2e/parity-map.yaml: seeded with one entry per existing legacy script; migration phases 2\u201312 append assertion mappings - .github/workflows/e2e-parity-compare.yaml: dispatches legacy script + migrated scenario on same runner and diffs outcomes Tests (all passing, 41 total): - test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E) - test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H) - test/e2e-expected-state-validator.test.ts: +2 tests (1.F) No regressions: full cli Vitest project (3258 tests) still green.

Follow-up to Phase 1. Documents the 13 duplication categories being absorbed by the Wave 0 fixtures/asserts/conventions and the expected LOC reduction. No code changes.

coderabbitai · 2026-05-11T21:15:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a scenario-based E2E framework: metadata (scenarios/expected-states/suites), TS resolver/validator/reporting, Bash runners/libs/assertions, fixtures, docs, CI workflows for scenarios and parity, parity/lint tools, and comprehensive Vitest coverage.

Changes

Scenario-driven E2E system

Layer / File(s)	Summary
Resolver, validator, coverage reporting `test/e2e/runtime/resolver/*`	TypeScript CLI to load/validate metadata, resolve plans, validate expected states, and render coverage.
Runners and shared libs `test/e2e/runtime/.sh`, `test/e2e/runtime/lib/`	Bash entrypoints for scenario planning/execution/suites with standardized env, context, logging, artifacts.
Scenarios, states, suites `test/e2e/nemoclaw_scenarios/`, `test/e2e/validation_suites//`	YAML catalogs, onboarding/install helpers, fixtures, and assertion/suite scripts across platforms and inference.
CI workflows and tooling `.github/workflows/`, `scripts/e2e/`, `.gitignore`	Manual scenario workflow, parity-compare workflow, parity diff script, lint-conventions, and .e2e ignore.
Docs and tests `test/e2e/docs/`, `AGENTS.md`, `test/e2e/scenario-framework-tests/`, `vitest.config.ts`	README/migration docs, agent note, Vitest suites validating resolver, runners, workflows, fixtures, and conventions.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer
  participant GA as GitHub Actions
  participant RS as run-scenario.sh
  participant TV as TS Resolver/Validator
  participant SU as Suite Runner
  participant AR as Artifacts

  Dev->>GA: Trigger e2e-scenarios (scenario, plan_only)
  GA->>RS: Invoke with inputs
  RS->>TV: Resolve plan / validate-state
  TV-->>RS: plan.json / expected-state report
  RS->>SU: Execute suites (if enabled)
  SU-->>AR: Logs, context.env, reports

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

CI/CD, enhancement: feature, configuration

Suggested reviewers

cv

Poem

I mapped the trails the suites will tread,
With YAML stars to show the thread.
A rabbit taps the CI drum—
Plan, validate, the fixtures hum.
Logs like carrots safely stowed,
Parity checked, we hit the road. 🥕✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/e2e-scenario-matrix-phase1

coderabbitai

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (15)

test/e2e/suites/sandbox/README.md-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Capitalize the proper noun "Telegram".

The term "telegram" appears twice in lowercase when referring to the Telegram messaging platform. Proper nouns should be capitalized in documentation.

📝 Proposed fix

-- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram
-  policy lost on rebuild) + UAT `#2010` (telegram policy apparently applied
+- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram
+  policy lost on rebuild) + UAT `#2010` (Telegram policy apparently applied
   but gateway blocks it) remain live blind spots.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/sandbox/README.md` around lines 29 - 30, In the README entry
titled "Policy preservation during rebuild is untested." update the two
lowercase occurrences of "telegram" to the proper noun "Telegram" (e.g., in the
UAT references UAT `#1952` and UAT `#2010` lines) so both mentions of the messaging
platform are capitalized consistently.

AGENTS.md-30-30 (1)

30-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align E2E wording with the updated scenario-based architecture.

Line 30 now documents scenario-based E2E, but the Testing Strategy section still broadly states E2E runs on ephemeral Brev instances; tighten that wording so only the branch-validation project is Brev-specific.

As per coding guidelines: AGENTS.md must “Document agent architecture and design patterns” and “Use consistent formatting and structure for agent documentation”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` at line 30, Update the Testing Strategy section in AGENTS.md to
align with the scenario-based E2E wording on line referencing `test/e2e/`:
explicitly state that end-to-end tests use the scenario-based runner described
in `test/e2e/README.md`, and restrict the note about running on ephemeral Brev
instances to only the branch-validation project (mention `branch-validation` by
name). Ensure the doc follows the agent docs guidelines by keeping the structure
consistent (section title "Testing Strategy", a short paragraph, and a bullet
pointing to `test/e2e/README.md`) and update any conflicting sentences that
currently imply all E2E runs use Brev.

test/e2e/suites/messaging/README.md-8-9 (1)

8-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale onboarding reference in docs.

Line 8 references onboard.ts, but this migration path is wired through shell onboarding helpers (onboard.sh), so this is misleading for contributors.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/messaging/README.md` around lines 8 - 9, The README's
onboarding reference is stale: replace the mention of onboard.ts with the
correct shell helper onboard.sh and clarify that messaging migration is wired
through the shell onboarding helpers (onboard.sh) rather than a TypeScript
script; update the sentence that mentions "policy preset OR `onboard.ts`" to
reference "policy preset OR `onboard.sh` (shell onboarding helpers)" and ensure
the surrounding text explains that onboarding is performed via the shell helper
workflow.

test/e2e/resolver/validator.ts-60-77 (1)

60-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the unused key parameter from compare.

key is unused (Line 61), and the current calls pass it only to discard it.

🧹 Proposed fix

 function compare(
-  key: string,
   expected: ProbeValue,
   actual: ProbeValue | undefined,
 ): boolean {
   if (actual === undefined) return false;
   return expected === actual;
 }
@@
-    const ok = compare(key, expected, actual);
+    const ok = compare(expected, actual);
@@
-      const ok = compare(key, expected as ProbeValue, actual);
+      const ok = compare(expected as ProbeValue, actual);

As per coding guidelines, **/*.{js,ts,tsx,jsx} requires intentionally unused variables to be prefixed with _.

Also applies to: 91-94

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/resolver/validator.ts` around lines 60 - 77, The compare function
declares an unused parameter `key` which violates the lint rule for unused
variables; update the function signature in `compare` to remove the `key`
parameter (or rename it to `_key` if you prefer to signal intentional unused)
and then update all call sites (e.g., the call inside `validateExpectedState`
where `compare(key, expected, actual)` is invoked and the other occurrences
around lines 91-94) to pass only the remaining parameters (or keep passing
`_key` if you renamed it), ensuring types/signatures match.

test/e2e/suites/security/credentials/00-credentials-present.sh-11-14 (1)

11-14: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct ShellCheck source hints to match this script’s directory depth.

Line 11 and Line 13 currently reference ../../lib/...; from test/e2e/suites/security/credentials/, the proper relative hint is ../../../lib/....
Suggested patch
-# shellcheck source=../../lib/env.sh
+# shellcheck source=../../../lib/env.sh
 . "${LIB_DIR}/env.sh"
-# shellcheck source=../../lib/context.sh
+# shellcheck source=../../../lib/context.sh
 . "${LIB_DIR}/context.sh"
As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/security/credentials/00-credentials-present.sh` around lines
11 - 14, Update the ShellCheck source hint paths used by the script lines that
source env.sh and context.sh so they reflect the script's actual directory
depth: change the source hints from ../../lib/... to ../../../lib/... for the
lines sourcing ". \"${LIB_DIR}/env.sh\"" and ". \"${LIB_DIR}/context.sh\"";
ensure the corrected hints are the only change and that the script remains
covered by the repository .shellcheckrc linting rules.

test/e2e/suites/platform/macos/00-macos-smoke.sh-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix ShellCheck source hint paths for this nested directory.

Line 14 and Line 16 point to ../../lib/..., but this script is under test/e2e/suites/platform/macos/, so the correct hint is ../../../lib/.... Keeping the hint accurate ensures ShellCheck evaluates the intended sourced files.
Suggested patch
-# shellcheck source=../../lib/env.sh
+# shellcheck source=../../../lib/env.sh
 . "${LIB_DIR}/env.sh"
-# shellcheck source=../../lib/context.sh
+# shellcheck source=../../../lib/context.sh
 . "${LIB_DIR}/context.sh"
As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/platform/macos/00-macos-smoke.sh` around lines 14 - 17,
Update the ShellCheck source hint comments to point to the correct nested lib
path: change the two comments that currently read "shellcheck
source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use
"../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck
resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh"
and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are
corrected in the same script to satisfy the repository ShellCheck rules
(.shellcheckrc).

test/e2e/lib/assert/sandbox-alive.sh-31-35 (1)

31-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape or avoid regex metacharacters in the sandbox name matching.

Line 34 interpolates ${name} directly into the grep ERE pattern, so sandbox names with regex metacharacters (., +, *, [, ], etc.) produce false positives or negatives. Use grep -F for literal matching or the field-based awk approach for token boundary matching.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/lib/assert/sandbox-alive.sh` around lines 31 - 35, The current grep
-qE uses unescaped ${name} which lets regex metacharacters in sandbox names
break matching; change the check that runs "nemoclaw list | grep -qE ..." to a
literal token equality check (e.g., use awk or a loop) that compares fields to
the variable rather than interpolating it into a regex: capture the output of
"nemoclaw list", pass the shell variable name as a literal (awk -v name="$name")
and scan fields (for (i=1;i<=NF;i++) if ($i==name) found=1) and base the
success/failure on that result instead of the current grep -qE pattern. Ensure
you still write the same error message using ${name} when not found.

test/e2e/MIGRATION.md-62-62 (1)

62-62: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace emoji status markers with plain text legend tokens.

Line 62 uses emoji symbols in technical prose, which violates the markdown style rule.

As per coding guidelines: "No emoji in technical prose."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/MIGRATION.md` at line 62, The legend line currently uses emoji
symbols ("⬜", "🟨", "✅", "🔵") which violates the "No emoji in technical prose"
rule; replace them with plain-text tokens or words (e.g., "NOT_STARTED",
"IN_PROGRESS", "MIGRATED", "PARITY_VERIFIED" or simply "not started · in
progress · migrated · parity verified") so the line reads without emoji while
preserving the same meaning; update the line containing "Legend: ⬜ not started ·
🟨 in progress · ✅ migrated · 🔵 parity verified" accordingly.

test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh-25-26 (1)

25-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail the step when /api/tags returns an empty body.

Lines 25-26 currently allow an empty 2xx response to pass, which can hide gateway/inference regressions.

Proposed fix

 body="$(curl -fsS --max-time 10 "${url%/}/api/tags")"
+if [[ -z "${body}" ]]; then
+  echo "local-ollama-inference:ollama-models-health: empty response" >&2
+  exit 1
+fi
 printf '%s\n' "${body:0:512}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh` around lines
25 - 26, The test currently assigns the curl response to the variable body and
prints the first 512 chars but does not fail if body is empty; update the logic
after the curl call (the body="$(curl ...)" assignment and the printf) to detect
an empty response (e.g., test if body is zero-length) and exit non‑zero with a
clear error message when empty so the step fails on empty 2xx responses;
reference the body variable and the curl invocation in your change and ensure
the printf remains for debugging when non-empty.

test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a non-empty response assertion for the inference probe.

Line 29 currently treats any HTTP 2xx as success, even if the body is empty. That can mask broken inference routes with false-positive suite passes.

Proposed fix

 body="$(nemoclaw shell "${name}" -- curl -fsS --max-time 10 "http://${route}/v1/models")"
+if [[ -z "${body}" ]]; then
+  echo "inference:sandbox-inference-local: empty response" >&2
+  exit 1
+fi
 printf '%s\n' "${body:0:512}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh` around
lines 29 - 30, The probe currently treats any 2xx as success but doesn’t assert
the response body is non-empty; update the inference probe after the curl that
sets the variable body (from the nemoclaw shell call) to validate that body is
not empty—e.g., check that the length of the variable body (used in the printf)
is greater than zero and fail the script (non-zero exit/print a clear error) if
it is empty so empty-success responses are treated as failures.

scripts/e2e/lint-conventions.ts-145-146 (1)

145-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle missing value after --root as CLI misuse.

At Line 145, --root with no following value currently falls back to auto root instead of exiting with code 2. That makes invalid invocations look valid.

Proposed fix

-    if (a === "--root") root = args.shift();
+    if (a === "--root") {
+      const v = args.shift();
+      if (!v || v.startsWith("-")) {
+        process.stderr.write("lint-conventions: --root requires a value\n");
+        process.exit(2);
+      }
+      root = v;
+    }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/e2e/lint-conventions.ts` around lines 145 - 146, The CLI parsing
branch that handles the "--root" flag (where code does: if (a === "--root") root
= args.shift();) must validate that a value was actually provided; update the
parsing logic around the argument loop (referencing the variables/identifiers a,
args, and root) to check that args.shift() returns a non-empty, non-option
string and if not, print a concise usage/error message and exit with code 2
(e.g., process.exit(2)) to treat this as CLI misuse rather than silently falling
back to auto root. Ensure the same guard rejects cases where the next token
starts with "-" so a missing/invalid value is handled.

test/e2e-metadata-final-hygiene.test.ts-67-76 (1)

67-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an explicit existence check for expected_state references.

Right now, Line 67 only reads meta.expectedStates.expected_states[sc.expected_state] for negative-scenario logic; it does not fail when the key is missing. A bad expected_state reference can slip through if sc.suites is non-empty.

Proposed fix

   for (const [id, sc] of Object.entries(meta.scenarios.setup_scenarios)) {
     if (!sc.expected_state) {
       problems.push(`${id}: missing expected_state`);
       continue;
     }
+    if (!(sc.expected_state in meta.expectedStates.expected_states)) {
+      problems.push(`${id}: expected_state '${sc.expected_state}' not found`);
+      continue;
+    }
     // Negative scenarios (preflight failures) intentionally have no suites.
     const state = meta.expectedStates.expected_states[sc.expected_state] as {
       failure?: { expected?: boolean };
     };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-metadata-final-hygiene.test.ts` around lines 67 - 76, The code reads
meta.expectedStates.expected_states[sc.expected_state] without verifying the key
exists, so add an explicit existence check for sc.expected_state in
meta.expectedStates.expected_states before using it: if sc.expected_state is
missing or not a key, push a problem like `${id}: unknown expected_state
'${sc.expected_state}'` (or similar) and continue; only then set state =
meta.expectedStates.expected_states[sc.expected_state] and compute isNegative
from state?.failure?.expected. This ensures bad expected_state references are
caught even when sc.suites is non-empty.

scripts/e2e/lint-conventions.ts-76-84 (1)

76-84: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

no-section-call misses e2e_section despite rule intent.

Line 77 says this rule covers section/e2e_section, but Line 83 only matches section. This leaves a documented violation path unlinted.

Proposed fix

-        if (/^section\s+["']/.test(line)) {
-          return "calls section; filename carries the phase label";
+        if (/^(section|e2e_section)\s+/.test(line)) {
+          return "calls section/e2e_section; filename carries the phase label";
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/e2e/lint-conventions.ts` around lines 76 - 84, The rule object with
id "no-section-call" currently only checks for /^section\s+["']/ in its test
function; update that test's regex to also detect e2e_section and the combined
form section/e2e_section (for example:
/^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the
documented coverage; modify the regex inside the test arrow function that
iterates lines in the "no-section-call" rule accordingly.

test/e2e-lib-helpers.test.ts-198-198 (1)

198-198: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the established PATH fallback pattern in spawned-test env overrides.

At Line 198, Line 242, and Line 359, ${process.env.PATH} should include a fallback to avoid producing ...:undefined in constrained environments.

Proposed fix

-        { PATH: `${bin}:${process.env.PATH}` },
+        { PATH: `${bin}:${process.env.PATH || ""}` },
@@
-        { PATH: `${bin}:${process.env.PATH}`, TOKEN: "SHOULD_NOT_EXPAND" },
+        { PATH: `${bin}:${process.env.PATH || ""}`, TOKEN: "SHOULD_NOT_EXPAND" },
@@
-        { PATH: `${bin}:${process.env.PATH}` },
+        { PATH: `${bin}:${process.env.PATH || ""}` },

Based on learnings: “In this repo’s test suite, prefer the established POSIX PATH separator : … PATH: \${fakeBin}:${process.env.PATH || ""}``.”

Also applies to: 242-242, 359-359

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-lib-helpers.test.ts` at line 198, Three occurrences build
environment overrides like `{ PATH: `${bin}:${process.env.PATH}` }` which can
produce `:undefined`; update each override to use the established POSIX
separator and a safe fallback, e.g. replace `${bin}:${process.env.PATH}` with
`${fakeBin}:${process.env.PATH || ""}` (or `${bin}:${process.env.PATH || ""}` as
appropriate) so the PATH entry never contains "undefined"; apply this fix to all
instances of the PATH env override in the test file.

test/e2e/resolver/index.ts-79-98 (1)

79-98: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate command early and keep the allowed-command message accurate.

Right now unknown commands are rejected only after metadata load and scenario resolution, and the error text (Line 127) omits coverage. This can produce misleading failures for invalid CLI invocations.

Proposed fix

 function main(): number {
   let parsed: ReturnType<typeof parseArgs>;
@@
   const { command, scenarioId, contextDir, metadataDir } = parsed;
+  if (!["plan", "validate-state", "coverage"].includes(command)) {
+    process.stderr.write(
+      `resolver: unknown command '${command}' (expected: coverage|plan|validate-state)\n`,
+    );
+    return 2;
+  }
+
   if (command === "coverage") {
@@
-    process.stderr.write(
-      `resolver: unknown command '${command}' (expected: plan|validate-state <scenario-id>)\n`,
-    );
-    return 2;
+    process.stderr.write("resolver: internal error: unreachable command branch\n");
+    return 2;
   } catch (err) {

Also applies to: 126-129

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/resolver/index.ts` around lines 79 - 98, After destructuring parsed
(const { command, scenarioId, contextDir, metadataDir }), validate that command
is one of the supported commands (include "coverage" alongside the other allowed
commands) immediately — before calling loadMetadataFromDir or resolveScenario —
and if it's invalid write a clear error to stderr listing the exact allowed
commands (make sure "coverage" is included) and return the proper non-zero exit
code; update the existing unknown-command error message (the one currently
omitting "coverage") and apply the same early-validation logic to the later
branch referenced in the review so unknown commands are rejected before any
metadata loading or scenario resolution.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-parity-compare.yaml:
- Around line 82-87: Validate the github.event.inputs.legacy_script value before
executing it: ensure it matches the strict basename pattern
^test-[A-Za-z0-9._-]+\.sh$ and does not contain path separators or traversal (no
'/' or '..'), and fail the step with an error if it doesn't match; keep the
existing executability check for test/e2e/${{ github.event.inputs.legacy_script
}} but only call bash on that path after the pattern validation succeeds so only
allowed test-*.sh basenames under test/e2e are executed.

In @.github/workflows/e2e-scenarios.yaml:
- Around line 19-23: The workflow input "plan_only" currently defaults to
"false", causing manual dispatches to run the non-dry-run path that fails in
Phase 1; change the default value of the "plan_only" input to "true" so manual
runs default to dry-run, and verify the conditional that triggers the
non-dry-run path only executes when inputs.plan_only == 'false' (i.e., ensure
the steps that perform install/onboard/suites are gated by that condition).

In `@test/e2e/lib/assert/inference-works.sh`:
- Around line 53-59: The pipeline checks using printf "${out}" | grep -q
'"choices"' and '"content"' can produce false negatives under set -euo pipefail;
replace those grep pipelines with shell string tests against the out variable
(e.g. use [[ $out == *'"choices"'* ]] and [[ $out == *'"content"'* ]] in the
same file) so the checks don't rely on pipes, and make the script executable
(chmod +x) so it follows the existing executable assertion files; update the
failure messages to remain the same if tests fail.

In `@test/e2e/lib/assert/no-credentials-leaked.sh`:
- Around line 29-35: The credential regexes in the local patterns array miss
modern token formats; update the patterns array in no-credentials-leaked.sh (the
local patterns variable) to include OpenAI project keys (e.g.,
sk-proj-[A-Za-z0-9-]{...} and general sk-[A-Za-z0-9]{16,} variants), GitHub
fine-grained tokens (github_pat_[A-Za-z0-9_]{...}) and other GitHub token
prefixes (gho_, ghu_, ghs_, ghr_), and ensure existing patterns like 'sk-...',
'ghp_...', 'xox...', and 'AKIA...' remain but broadened where appropriate so the
new entries detect these modern formats; add or expand entries within the
patterns array accordingly.

In `@test/e2e/lib/context.sh`:
- Around line 65-69: _e2e_context_is_sensitive_key currently uses case patterns
like *TOKEN*/*SECRET* which are case-sensitive, so lowercase keys validated by
_e2e_context_validate_key leak secrets; update _e2e_context_is_sensitive_key to
normalize the input key to a single case (e.g., convert the parameter to
uppercase using bash expansion like key_upper=${1^^}) and run the case pattern
matching against that normalized variable so patterns (*TOKEN*, *SECRET*, etc.)
match regardless of original case.

In `@test/e2e/lib/setup/install-curl.sh`:
- Around line 40-41: The installer invocation (bash "${tmp}") can fail but its
exit status is lost because nemoclaw_refresh_install_env runs unconditionally;
change install-curl.sh to save the exit code of the installer (e.g., capture
"$?") immediately after running bash "${tmp}", then call
nemoclaw_refresh_install_env and finally exit with the saved installer status so
failures aren't masked; also make the script executable (chmod +x) so it meets
the execution requirement.

In `@test/e2e/lib/setup/install-ollama.sh`:
- Line 25: Replace the unsafe "curl | bash" pipeline by downloading
"${ollama_url}" to a temp file (use mktemp), check curl's exit status, verify
the downloaded installer checksum using the same checksum logic used by
e2e_install_curl, then execute the installer from the temp file with bash and
remove the temp file; update the block that currently contains `if ! curl -fsSL
--retry 3 --retry-delay 2 "${ollama_url}" | bash; then` to first save to a temp
path, validate the checksum, run `bash "$tempfile"` only on success, and fail
the script if any step (download, checksum, execution) fails.

In `@test/e2e/lib/setup/install-repo.sh`:
- Around line 24-28: The subshell that runs the two commands "( cd
\"${repo_root}\" || exit; npm install; npm link )" can mask failures if npm
install fails but npm link succeeds; update that subshell so npm link only runs
when npm install succeeds by chaining the commands with && (or enable errexit
inside the subshell), i.e., ensure the sequence uses "npm install && npm link"
(or "set -e" then run both) so a failing npm install causes the subshell and the
script to fail; locate the subshell containing the npm install and npm link
calls and replace the separated commands accordingly.

In `@test/e2e/lib/setup/onboard.sh`:
- Around line 20-38: The dry-run short-circuit (e2e_env_is_dry_run) currently
returns success before verifying the profile, letting invalid profiles pass;
change the flow so the profile is validated first using the existing case block
(the same cases for cloud-openclaw, cloud-hermes, local-ollama-openclaw) or a
small helper (e.g., e2e_is_supported_profile) that mirrors that case logic and
returns non-zero for unknown profiles, then perform the e2e_env_is_dry_run check
to print the “[dry-run] onboard profile=…” message and exit 0; finally, call the
actual onboarding functions (e2e_onboard_cloud_openclaw,
e2e_onboard_cloud_hermes, e2e_onboard_local_ollama_openclaw) only after
validation and dry-run handling.

In `@test/e2e/MIGRATION.md`:
- Around line 6-10: The opening summary claims "migrates all existing
`test/e2e/test-*.sh` scripts" but the status line later reads "0 / 40 scripts
migrated", creating a contradiction; update the text so both reflect the same
migration phase—either change the opening sentence to indicate a staged/partial
migration (e.g., "begins migration of ... Strategy B; legacy scripts remain...")
or update the status line to the correct migrated count; look for the exact
phrases "migrates all existing `test/e2e/test-*.sh` scripts" and "0 / 40 scripts
migrated" in MIGRATION.md and make them consistent.

In `@test/e2e/suites/inference/cloud/00-models-health.sh`:
- Around line 31-32: The piped truncation echo "${body}" | head -c 512 can fail
under set -o pipefail due to SIGPIPE; replace this pipeline by truncating the
shell variable with parameter expansion (take the first 512 bytes from the
variable named body using ${body:0:512}) and print that directly (use printf to
avoid echo semantics), keeping the separate echo afterwards to emit the newline
so behavior matches the original output.

---

Minor comments:
In `@AGENTS.md`:
- Line 30: Update the Testing Strategy section in AGENTS.md to align with the
scenario-based E2E wording on line referencing `test/e2e/`: explicitly state
that end-to-end tests use the scenario-based runner described in
`test/e2e/README.md`, and restrict the note about running on ephemeral Brev
instances to only the branch-validation project (mention `branch-validation` by
name). Ensure the doc follows the agent docs guidelines by keeping the structure
consistent (section title "Testing Strategy", a short paragraph, and a bullet
pointing to `test/e2e/README.md`) and update any conflicting sentences that
currently imply all E2E runs use Brev.

In `@scripts/e2e/lint-conventions.ts`:
- Around line 145-146: The CLI parsing branch that handles the "--root" flag
(where code does: if (a === "--root") root = args.shift();) must validate that a
value was actually provided; update the parsing logic around the argument loop
(referencing the variables/identifiers a, args, and root) to check that
args.shift() returns a non-empty, non-option string and if not, print a concise
usage/error message and exit with code 2 (e.g., process.exit(2)) to treat this
as CLI misuse rather than silently falling back to auto root. Ensure the same
guard rejects cases where the next token starts with "-" so a missing/invalid
value is handled.
- Around line 76-84: The rule object with id "no-section-call" currently only
checks for /^section\s+["']/ in its test function; update that test's regex to
also detect e2e_section and the combined form section/e2e_section (for example:
/^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the
documented coverage; modify the regex inside the test arrow function that
iterates lines in the "no-section-call" rule accordingly.

In `@test/e2e-lib-helpers.test.ts`:
- Line 198: Three occurrences build environment overrides like `{ PATH:
`${bin}:${process.env.PATH}` }` which can produce `:undefined`; update each
override to use the established POSIX separator and a safe fallback, e.g.
replace `${bin}:${process.env.PATH}` with `${fakeBin}:${process.env.PATH || ""}`
(or `${bin}:${process.env.PATH || ""}` as appropriate) so the PATH entry never
contains "undefined"; apply this fix to all instances of the PATH env override
in the test file.

In `@test/e2e-metadata-final-hygiene.test.ts`:
- Around line 67-76: The code reads
meta.expectedStates.expected_states[sc.expected_state] without verifying the key
exists, so add an explicit existence check for sc.expected_state in
meta.expectedStates.expected_states before using it: if sc.expected_state is
missing or not a key, push a problem like `${id}: unknown expected_state
'${sc.expected_state}'` (or similar) and continue; only then set state =
meta.expectedStates.expected_states[sc.expected_state] and compute isNegative
from state?.failure?.expected. This ensures bad expected_state references are
caught even when sc.suites is non-empty.

In `@test/e2e/lib/assert/sandbox-alive.sh`:
- Around line 31-35: The current grep -qE uses unescaped ${name} which lets
regex metacharacters in sandbox names break matching; change the check that runs
"nemoclaw list | grep -qE ..." to a literal token equality check (e.g., use awk
or a loop) that compares fields to the variable rather than interpolating it
into a regex: capture the output of "nemoclaw list", pass the shell variable
name as a literal (awk -v name="$name") and scan fields (for (i=1;i<=NF;i++) if
($i==name) found=1) and base the success/failure on that result instead of the
current grep -qE pattern. Ensure you still write the same error message using
${name} when not found.

In `@test/e2e/MIGRATION.md`:
- Line 62: The legend line currently uses emoji symbols ("⬜", "🟨", "✅", "🔵")
which violates the "No emoji in technical prose" rule; replace them with
plain-text tokens or words (e.g., "NOT_STARTED", "IN_PROGRESS", "MIGRATED",
"PARITY_VERIFIED" or simply "not started · in progress · migrated · parity
verified") so the line reads without emoji while preserving the same meaning;
update the line containing "Legend: ⬜ not started · 🟨 in progress · ✅ migrated
· 🔵 parity verified" accordingly.

In `@test/e2e/resolver/index.ts`:
- Around line 79-98: After destructuring parsed (const { command, scenarioId,
contextDir, metadataDir }), validate that command is one of the supported
commands (include "coverage" alongside the other allowed commands) immediately —
before calling loadMetadataFromDir or resolveScenario — and if it's invalid
write a clear error to stderr listing the exact allowed commands (make sure
"coverage" is included) and return the proper non-zero exit code; update the
existing unknown-command error message (the one currently omitting "coverage")
and apply the same early-validation logic to the later branch referenced in the
review so unknown commands are rejected before any metadata loading or scenario
resolution.

In `@test/e2e/resolver/validator.ts`:
- Around line 60-77: The compare function declares an unused parameter `key`
which violates the lint rule for unused variables; update the function signature
in `compare` to remove the `key` parameter (or rename it to `_key` if you prefer
to signal intentional unused) and then update all call sites (e.g., the call
inside `validateExpectedState` where `compare(key, expected, actual)` is invoked
and the other occurrences around lines 91-94) to pass only the remaining
parameters (or keep passing `_key` if you renamed it), ensuring types/signatures
match.

In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh`:
- Around line 29-30: The probe currently treats any 2xx as success but doesn’t
assert the response body is non-empty; update the inference probe after the curl
that sets the variable body (from the nemoclaw shell call) to validate that body
is not empty—e.g., check that the length of the variable body (used in the
printf) is greater than zero and fail the script (non-zero exit/print a clear
error) if it is empty so empty-success responses are treated as failures.

In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh`:
- Around line 25-26: The test currently assigns the curl response to the
variable body and prints the first 512 chars but does not fail if body is empty;
update the logic after the curl call (the body="$(curl ...)" assignment and the
printf) to detect an empty response (e.g., test if body is zero-length) and exit
non‑zero with a clear error message when empty so the step fails on empty 2xx
responses; reference the body variable and the curl invocation in your change
and ensure the printf remains for debugging when non-empty.

In `@test/e2e/suites/messaging/README.md`:
- Around line 8-9: The README's onboarding reference is stale: replace the
mention of onboard.ts with the correct shell helper onboard.sh and clarify that
messaging migration is wired through the shell onboarding helpers (onboard.sh)
rather than a TypeScript script; update the sentence that mentions "policy
preset OR `onboard.ts`" to reference "policy preset OR `onboard.sh` (shell
onboarding helpers)" and ensure the surrounding text explains that onboarding is
performed via the shell helper workflow.

In `@test/e2e/suites/platform/macos/00-macos-smoke.sh`:
- Around line 14-17: Update the ShellCheck source hint comments to point to the
correct nested lib path: change the two comments that currently read "shellcheck
source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use
"../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck
resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh"
and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are
corrected in the same script to satisfy the repository ShellCheck rules
(.shellcheckrc).

In `@test/e2e/suites/sandbox/README.md`:
- Around line 29-30: In the README entry titled "Policy preservation during
rebuild is untested." update the two lowercase occurrences of "telegram" to the
proper noun "Telegram" (e.g., in the UAT references UAT `#1952` and UAT `#2010`
lines) so both mentions of the messaging platform are capitalized consistently.

In `@test/e2e/suites/security/credentials/00-credentials-present.sh`:
- Around line 11-14: Update the ShellCheck source hint paths used by the script
lines that source env.sh and context.sh so they reflect the script's actual
directory depth: change the source hints from ../../lib/... to ../../../lib/...
for the lines sourcing ". \"${LIB_DIR}/env.sh\"" and ".
\"${LIB_DIR}/context.sh\""; ensure the corrected hints are the only change and
that the script remains covered by the repository .shellcheckrc linting rules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7146ad7a-3e0c-461e-bc05-64bd68afcaf8

📥 Commits

Reviewing files that changed from the base of the PR and between fc07c37 and 387767d.

📒 Files selected for processing (81)

.github/workflows/e2e-parity-compare.yaml
.github/workflows/e2e-scenarios.yaml
.gitignore
AGENTS.md
scripts/e2e/compare-parity.sh
scripts/e2e/lint-conventions.ts
test/e2e-context-helper.test.ts
test/e2e-convention-lint.test.ts
test/e2e-coverage-report.test.ts
test/e2e-expected-state-validator.test.ts
test/e2e-lib-helpers.test.ts
test/e2e-metadata-final-hygiene.test.ts
test/e2e-scenario-additional-families.test.ts
test/e2e-scenario-first-migration.test.ts
test/e2e-scenario-resolver.test.ts
test/e2e-scenario-schema.test.ts
test/e2e-scenarios-workflow.test.ts
test/e2e-suite-runner.test.ts
test/e2e/MIGRATION.md
test/e2e/README.md
test/e2e/coverage-report.sh
test/e2e/expected-states.yaml
test/e2e/lib/artifacts.sh
test/e2e/lib/assert/README.md
test/e2e/lib/assert/gateway-alive.sh
test/e2e/lib/assert/inference-works.sh
test/e2e/lib/assert/messaging-bridge-reachable.sh
test/e2e/lib/assert/no-credentials-leaked.sh
test/e2e/lib/assert/policy-preset-applied.sh
test/e2e/lib/assert/sandbox-alive.sh
test/e2e/lib/cleanup.sh
test/e2e/lib/context.sh
test/e2e/lib/emit-context-from-plan.sh
test/e2e/lib/env.sh
test/e2e/lib/fixtures/README.md
test/e2e/lib/fixtures/_fake-http-stub.sh
test/e2e/lib/fixtures/fake-discord.sh
test/e2e/lib/fixtures/fake-openai.sh
test/e2e/lib/fixtures/fake-slack.sh
test/e2e/lib/fixtures/fake-telegram.sh
test/e2e/lib/fixtures/older-base-image.sh
test/e2e/lib/logging.sh
test/e2e/lib/sandbox-exec.sh
test/e2e/lib/setup/README.md
test/e2e/lib/setup/install-curl.sh
test/e2e/lib/setup/install-launchable.sh
test/e2e/lib/setup/install-ollama.sh
test/e2e/lib/setup/install-repo.sh
test/e2e/lib/setup/install.sh
test/e2e/lib/setup/onboard.sh
test/e2e/parity-map.yaml
test/e2e/resolver/coverage.ts
test/e2e/resolver/index.ts
test/e2e/resolver/js-yaml.d.ts
test/e2e/resolver/load.ts
test/e2e/resolver/plan.ts
test/e2e/resolver/schema.ts
test/e2e/resolver/validator.ts
test/e2e/run-scenario.sh
test/e2e/run-suites.sh
test/e2e/scenarios.yaml
test/e2e/suites.yaml
test/e2e/suites/inference/cloud/00-models-health.sh
test/e2e/suites/inference/cloud/01-chat-completion.sh
test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh
test/e2e/suites/inference/ollama-auth-proxy/00-proxy-reachable.sh
test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh
test/e2e/suites/inference/ollama-gpu/01-ollama-chat-completion.sh
test/e2e/suites/lifecycle/README.md
test/e2e/suites/messaging/README.md
test/e2e/suites/onboarding/README.md
test/e2e/suites/onboarding/hermes/00-hermes-health.sh
test/e2e/suites/platform/macos/00-macos-smoke.sh
test/e2e/suites/platform/wsl/00-wsl-smoke.sh
test/e2e/suites/sandbox/README.md
test/e2e/suites/security/README.md
test/e2e/suites/security/credentials/00-credentials-present.sh
test/e2e/suites/smoke/00-cli-available.sh
test/e2e/suites/smoke/01-gateway-health.sh
test/e2e/suites/smoke/02-sandbox-listed.sh
test/e2e/suites/smoke/03-sandbox-shell.sh

Delete the 8 placeholder sub-READMEs under lib/{assert,fixtures,setup}/ and suites/{lifecycle,messaging,onboarding,sandbox,security}/ — their roadmap content was stale before it even landed, and the yaml files (scenarios.yaml, expected-states.yaml, suites.yaml) plus MIGRATION.md are the real sources of truth. Rewrite test/e2e/README.md to be brief but broad: the core model, the three declarative inputs, how to run, file layout, and how to extend. Removes the matrix catalog (now lives in scenarios.yaml) and the how-to-add-* cookbook (now: 'read the schema, edit the yaml').

wscurran · 2026-05-11T21:48:13Z

✨
Related open PRs:

Consolidates all documentation for the e2e setup-scenario matrix under test/e2e/docs/. The top level of test/e2e/ now contains only code, runners, and declarative inputs — no prose. - test/e2e/README.md -> test/e2e/docs/README.md - test/e2e/MIGRATION.md -> test/e2e/docs/MIGRATION.md Fix 4 relative links inside docs/README.md that pointed at sibling yaml/resolver paths; now they go one directory up.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/e2e/docs/MIGRATION.md (1)

6-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve the migration-status contradiction in the opening summary.

The opening states "This PR migrates all existing test/e2e/test-*.sh scripts" but Line 58 reports 0 / 40 scripts migrated, and the PR objectives confirm this is "test-harness scaffolding only; no product behavior changes." This contradiction makes the scope and merge gate unclear.

📝 Proposed fix

-This PR migrates all existing `test/e2e/test-*.sh` scripts into the
-scenario-based runner introduced by PR `#3290`. Full deep migration
-(Strategy B). Legacy scripts remain in the repo during this PR and run
-in parallel for 1–2 nightly cycles after merge; a follow-up PR retires
-them once parity is verified.
+This PR introduces Phase 1 migration infrastructure for moving existing
+`test/e2e/test-*.sh` scripts into the scenario-based runner introduced
+by PR `#3290` (Strategy B). Legacy scripts remain unchanged in this PR;
+migration waves 1–11 will move them into scenarios, followed by parity
+verification and retirement in a future phase.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/docs/MIGRATION.md` around lines 6 - 10, Update the opening summary
in MIGRATION.md to remove the contradictory claim that "This PR migrates all
existing `test/e2e/test-*.sh` scripts" and instead state that this PR provides
scenario-runner scaffolding only (no scripts migrated yet), aligning it with the
reported "0 / 40 scripts migrated" status and the PR objective that this is
test-harness scaffolding only; ensure the updated text makes the merge gate
clear by explaining legacy scripts remain and full migration will follow in a
separate PR.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/README.md`:
- Around line 19-23: The relative links in test/e2e/docs/README.md still point
to peer files in test/e2e; update each link target (e.g., `scenarios.yaml`,
`expected-states.yaml`, `suites.yaml` and any other single-segment filenames
referenced in the document) to prepend `../` so they resolve to the parent
directory (e.g., `../scenarios.yaml`, `../expected-states.yaml`,
`../suites.yaml`); search the file for plain filename links and replace them
with the `../`-prefixed versions to fix all broken references.

---

Duplicate comments:
In `@test/e2e/docs/MIGRATION.md`:
- Around line 6-10: Update the opening summary in MIGRATION.md to remove the
contradictory claim that "This PR migrates all existing `test/e2e/test-*.sh`
scripts" and instead state that this PR provides scenario-runner scaffolding
only (no scripts migrated yet), aligning it with the reported "0 / 40 scripts
migrated" status and the PR objective that this is test-harness scaffolding
only; ensure the updated text makes the merge gate clear by explaining legacy
scripts remain and full migration will follow in a separate PR.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 55bd4844-0af6-45a2-8537-068a41e01a54

📥 Commits

Reviewing files that changed from the base of the PR and between 77d1d38 and dc6cb9e.

📒 Files selected for processing (2)

test/e2e/docs/MIGRATION.md
test/e2e/docs/README.md

…untime/ Moves orchestration machinery (the part that runs any scenario/suite and doesn't depend on scenario- or suite-specific domain) into a dedicated runtime/ bucket. - run-scenario.sh, run-suites.sh, coverage-report.sh -> runtime/ - resolver/*.ts -> runtime/resolver/ - lib/{context,env,logging,cleanup,artifacts,sandbox-teardown}.sh -> runtime/lib/ No behavior change — paths only. Internal source-paths in moved files are updated to the new layout as part of this commit.

Each yaml now lives in the directory that owns its concern: - scenarios.yaml -> nemoclaw_scenarios/scenarios.yaml - expected-states.yaml -> nemoclaw_scenarios/expected-states.yaml - suites.yaml -> validation_suites/suites.yaml - parity-map.yaml -> docs/parity-map.yaml (migration-era tracker) Paths are updated in callers in subsequent commits.

…narios/ The install dimension is part of scenario setup. Colocates install scripts with the other scenario-side files. Renames for clarity at the same time: lib/setup/install.sh -> install/dispatch.sh lib/setup/install-repo.sh -> install/repo-current.sh lib/setup/install-curl.sh -> install/public-curl.sh lib/setup/install-launchable.sh -> install/launchable.sh lib/setup/install-ollama.sh -> install/ollama.sh lib/install-path-refresh.sh -> install/helpers/install-path-refresh.sh lib/emit-context-from-plan.sh -> helpers/emit-context-from-plan.sh

The old lib/setup/onboard.sh was a single file: one dispatcher + three per-path worker functions (cloud-openclaw, cloud-hermes, local-ollama-openclaw). Splitting matches the install-side layout and lets each onboarding path be found and edited in one file. lib/setup/onboard.sh -> onboard/dispatch.sh + one file per path: cloud-openclaw.sh cloud-hermes.sh local-ollama-openclaw.sh dispatch.sh sources the three sibling files and retains the e2e_onboard() case-statement dispatcher.

Fixtures (fake-openai, fake-telegram, fake-discord, fake-slack, older-base-image, and the shared _fake-http-stub) are all scenario- setup scaffolding — they exist to support scenarios that need a fake inference endpoint, fake messaging provider, or an older base image to exercise upgrade paths. lib/fixtures/ -> nemoclaw_scenarios/fixtures/

…nder validation_suites/ Everything that validates a set-up scenario now lives in one bucket. lib/assert/ -> validation_suites/assert/ lib/sandbox-exec.sh -> validation_suites/sandbox-exec.sh suites/smoke/ -> validation_suites/smoke/ suites/inference/ -> validation_suites/inference/ suites/platform/ -> validation_suites/platform/ suites/security/ -> validation_suites/security/ suites/onboarding/hermes/ -> validation_suites/hermes/ The hermes/ rename deliberately drops the 'onboarding/' parent. The Hermes suite is functional validation of a Hermes sandbox, not an onboarding step — keeping it under 'onboarding/' alongside nemoclaw_scenarios/onboard/ invited confusion.

Vitest tests, parity tooling, and CI workflows now point at the new per-bucket locations: scenarios.yaml / expected-states.yaml -> nemoclaw_scenarios/ suites.yaml -> validation_suites/ parity-map.yaml -> docs/ run-*.sh / coverage-report.sh -> runtime/ Updated files: - .github/workflows/e2e-scenarios.yaml - .github/workflows/e2e-parity-compare.yaml - scripts/e2e/compare-parity.sh - scripts/e2e/lint-conventions.ts - 12 test/e2e-*.test.ts files - test/e2e/docs/README.md (relative-link repairs) No functional changes.

…th-refresh install-path-refresh.sh moved from test/e2e/lib/ to test/e2e/nemoclaw_scenarios/install/helpers/. The 8 legacy test-*.sh scripts that source it need their source paths updated while the scripts still exist (they will be deleted wave-by-wave during migration). Updated (no logic change, only the source path line): test-diagnostics.sh test-deployment-services.sh test-credential-migration.sh test-cloud-onboard-e2e.sh test-cloud-inference-e2e.sh test-network-policy.sh test-openclaw-inference-switch.sh test-hermes-inference-switch.sh

The e2e-metadata-final-hygiene test asserts that test/e2e/docs/README.md contains a heading matching /adding a new setup scenario|how to add/i. The recent README rewrite used 'Adding to the matrix' which didn't match either alternative. Rename to 'How to add a scenario, state, or suite' — clearer and keeps the test green.

…ded on main Drift from main — test-hermes-inference-switch.sh, test-openclaw-inference-switch.sh, and test-openshell-gateway-upgrade.sh were added to main after this branch forked. The convention lint correctly flagged them as orphans (Risk #1: new legacy scripts with no migration mapping). Added placeholder entries (empty scenario + assertions) to bring the map back in sync with the legacy surface area. Actual mappings land with their respective migration waves.

github-actions · 2026-05-12T15:05:33Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: e2e-scenarios-plan-only, e2e-parity-compare-empty-diff

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. This PR is purely additive E2E scaffolding: a new scenario-based runner, parity tooling, manual-dispatch workflows, and docs. No production source is modified, no existing legacy test/e2e/test-*.sh script is changed, and the new workflows are workflow_dispatch only. Real product user flows (install, onboard, sandbox lifecycle, inference routing, credentials, security boundaries, deployment) are unaffected, so no required E2E job is needed to merge. Existing nightly-e2e/macos-e2e/wsl-e2e/regression-e2e workflows continue to validate product behavior on their normal cadence.

Optional E2E

e2e-scenarios-plan-only (low): Smoke-validate the new declarative scenario runner end-to-end via --plan-only against a representative ubuntu scenario. This exercises test/e2e/runtime/resolver/{load,plan,validator,schema}.ts and the scenarios.yaml/expected-states.yaml/suites.yaml chain without any destructive side effects, catching schema/reference mistakes that scenario-framework-tests may miss.
e2e-parity-compare-empty-diff (low): Confidence check that the parity comparator (scripts/e2e/compare-parity.sh + parity-map.yaml) renders a clean zero-divergence report when no legacy_script is provided. Validates the new e2e-parity-compare workflow plumbing without consuming legacy E2E budget.

New E2E recommendations

vitest-config (high): The new vitest project 'e2e-scenario-framework' (test/e2e/scenario-framework-tests/**) is declared in vitest.config.ts but is not invoked by the prek pre-push hook in .pre-commit-config.yaml (which only runs --project cli and --project plugin) nor by any GitHub workflow. Without wiring, scaffolding regressions in run-scenario.sh, resolver/, parity tooling, and convention lint will pass CI silently.
- Suggested test: Add a CI step (e.g. in .github/actions/basic-checks/action.yaml or a new prek hook) that runs npx vitest run --project e2e-scenario-framework on every PR.
e2e-framework-scaffolding (medium): scripts/e2e/lint-conventions.ts enforces the migration-spec conventions (no /tmp/.log, no own traps, parity-map coverage of new test-.sh, current parity inventory) but no PR workflow invokes it. A new legacy script or stale parity-inventory.generated.json would land unblocked.
- Suggested test: Add a PR-time job (or prek pre-push hook) that runs npx tsx scripts/e2e/lint-conventions.ts and npx tsx scripts/e2e/check-parity-map.ts so the conventions documented in test/e2e/docs/README.md are actually enforced.
ci-workflows-manual-only (medium): e2e-scenarios.yaml and e2e-parity-compare.yaml are workflow_dispatch only, so PRs that touch the scenario runner cannot get automatic coverage. Once the framework stabilizes, a small pull_request trigger that runs --plan-only for a fixed scenario list would catch resolver/schema regressions automatically.
- Suggested test: Extend e2e-scenarios.yaml with a pull_request trigger that always runs run-scenario.sh <id> --plan-only for a small representative scenario set (ubuntu, gpu, macos, wsl) when test/e2e/** changes.

copy-pr-bot · 2026-05-12T15:13:17Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…rix-phase1

…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

jyaunches added 7 commits May 11, 2026 17:05

docs(e2e): add MIGRATION.md tracking legacy-to-scenario mapping

a3215e7

Introduces test/e2e/MIGRATION.md — an in-tree tracker of legacy test-*.sh scripts being migrated to the scenario matrix. Per-wave completion is recorded there as the migration progresses.

docs(e2e): expand MIGRATION.md with reuse-absorption table

387767d

Follow-up to Phase 1. Documents the 13 duplication categories being absorbed by the Wave 0 fixtures/asserts/conventions and the expected LOC reduction. No code changes.

jyaunches mentioned this pull request May 11, 2026

feat(e2e): introduce scenario-based setup matrix and runner #3290

Closed

12 tasks

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

wscurran added enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps labels May 11, 2026

github-advanced-security AI found potential problems May 11, 2026

View reviewed changes

Comment thread test/e2e/scenario-framework-tests/e2e-context-helper.test.ts Fixed

Comment thread test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts Fixed

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread test/e2e/docs/README.md Outdated

jyaunches added 10 commits May 12, 2026 10:47

jyaunches marked this pull request as draft May 12, 2026 15:13

jyaunches added 15 commits May 13, 2026 11:05

fix(e2e): seed new legacy parity entries

4e04f12

docs(spec): simplify e2e parity plan

6f865d9

docs(spec): add e2e parity test plan

47ac12f

docs(spec): add e2e parity validation plan

eaca286

fix(e2e): dedupe parity map entries

385d86a

fix(e2e): untrack ignored parity spec

abefc0d

docs(spec): apply e2e parity design review

492c30d

docs(spec): apply e2e parity implementation review

f614360

test: Add failing tests for Phase 1

9cf738a

feat: Implement Phase 1 - Inventory Legacy Assertions

7920672

Mark Phase 1 as completed [7920672]

618d8cc

test: Add failing tests for Phase 2

089015c

feat: Implement Phase 2 - Enforce Parity Map Schema

3f24605

Mark Phase 2 as completed [3f24605]

2f07256

test(cli): relax installer version timeout

26ad7b7

github-advanced-security AI found potential problems May 13, 2026

View reviewed changes

Comment thread scripts/e2e/extract-legacy-assertions.ts Fixed

jyaunches and others added 10 commits May 13, 2026 11:48

fix(e2e): satisfy parity script hooks

0dc6950

style(e2e): format parity scripts

97958fb

fix(e2e): untrack parity spec again

5c84816

style(e2e): format convention lint

dec8b36

Merge remote-tracking branch 'origin/main' into feat/e2e-scenario-mat…

c223fa6

…rix-phase1

fix(e2e): seed gateway drift parity entry

830bd05

test(e2e): complete parity validation mapping

3528aa0

style(e2e): format retired wrapper lint

fd8ac54

Merge branch 'main' into feat/e2e-scenario-matrix-phase1

893c126

Potential fix for pull request finding 'CodeQL / Unused variable, imp…

e175d18

…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

cv approved these changes May 13, 2026

View reviewed changes

jyaunches merged commit 1a15445 into main May 13, 2026
25 checks passed

coderabbitai Bot mentioned this pull request May 14, 2026

fix(e2e): run scenario workflow for real #3493

Open

17 tasks

cv assigned cjagwani May 14, 2026

Conversation

jyaunches commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Part 1 — scenario-based E2E orchestration (commits 1–3, previously in #3290)

Part 2 — Phase 1 pre-flight migration infrastructure (commits 4–7, new)

What this PR does not change

Migration sequencing (follow-up PRs)

Type of Change

Verification

Fork E2E evidence (Phase 1 branch dispatch)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wscurran commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

Pi Semantic E2E Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jyaunches commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading