Skip to content

feat(e2e): scenario-based setup matrix + Phase 1 migration infrastructure#3363

Merged
jyaunches merged 63 commits into
mainfrom
feat/e2e-scenario-matrix-phase1
May 13, 2026
Merged

feat(e2e): scenario-based setup matrix + Phase 1 migration infrastructure#3363
jyaunches merged 63 commits into
mainfrom
feat/e2e-scenario-matrix-phase1

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented May 11, 2026

Summary

Supersedes #3290. Introduces declarative scenario-based E2E orchestration and the Phase 1 pre-flight infrastructure that prepares for wave-by-wave migration of the existing test/e2e/test-*.sh scripts into the new structure. The old legacy scripts remain in place and unchanged; no E2E workflow is deleted or muted by this PR.

The scope grew beyond the original #3290 title after rebase against main and the addition of Phase 1 helpers, fixtures, parity harness, and convention lint. Because upstream rules forbid force-push on feature branches, this is opened as a fresh PR with clean linear history rather than rewriting #3290 in place.

Related Issue

Supersedes #3290. No single tracking issue — foundation for the E2E scenario-matrix refactor discussed across #2737, #3154, and the E2E test-gap audit.

Changes

Part 1 — scenario-based E2E orchestration (commits 1–3, previously in #3290)

  • Declarative metadata (test/e2e/scenarios.yaml, expected-states.yaml, suites.yaml) for initial scenarios: Ubuntu cloud OpenClaw/Hermes, macOS, WSL, GPU local Ollama, Brev launchable, no-Docker negative.
  • TypeScript resolver/validator/plan-printer/coverage report under test/e2e/resolver/ (invoked via tsx, uses js-yaml already in root package.json; unit-tested in the Vitest cli project).
  • Shell helpers under test/e2e/lib/ that produce a normalized context at $E2E_CONTEXT_DIR (default .e2e/). Wraps existing sandbox-teardown.sh and install-path-refresh.sh; does not duplicate them.
  • Entry scripts: run-scenario.sh, run-suites.sh, coverage-report.sh with --plan-only, --dry-run (E2E_DRY_RUN=1) flags.
  • Suite scripts under test/e2e/suites/{smoke,inference,credentials,lifecycle,messaging,onboarding,platform,sandbox,security}/.
  • New workflow .github/workflows/e2e-scenarios.yaml — manual (workflow_dispatch) with scenario, plan_only, and suite_filter inputs. Existing workflows are unchanged.
  • .gitignore: add .e2e/ runtime context directory.

Part 2 — Phase 1 pre-flight migration infrastructure (commits 4–7, new)

  • test/e2e/MIGRATION.md — in-tree tracker mapping each legacy test-*.sh script to its target scenario + suite home. Includes the reuse-absorption table showing where pair-variant coverage (e.g. OpenClaw + Hermes) collapses into a single suite.
  • Fixtures (test/e2e/lib/fixtures/): fake-openai.sh, fake-{telegram,discord,slack}.sh (built on shared _fake-http-stub.sh), older-base-image.sh. Follow the inline-Node http.createServer pattern from test-messaging-providers.sh; no new framework.
  • Helpers (test/e2e/lib/): logging.sh (PASS/FAIL/=== Phase markers), sandbox-exec.sh (e2e_sandbox_exec / e2e_sandbox_exec_stdin). env.sh auto-sources logging.sh.
  • Assertion helpers (test/e2e/lib/assert/): inference-works.sh, no-credentials-leaked.sh, policy-preset-applied.sh, messaging-bridge-reachable.sh.
  • Install splits (test/e2e/lib/setup/): install-{repo,curl,ollama,launchable}.sh + install.sh dispatcher, so each migrated suite can compose the minimal install it needs.
  • Runtime probe wiring: run-scenario.sh --validate-only flag (mutually exclusive with --plan-only) that reads $E2E_CONTEXT_DIR/context.env, runs the expected-state validator, and emits JSON — skipping install/onboard/suites. resolver/index.ts gains E2E_PROBE_OVERRIDES_JSON escape hatch for probe keys with embedded underscores.
  • Convention lint: scripts/e2e/lint-conventions.ts (6 rules, Vitest-backed in the cli project) blocks new orphan test-*.sh scripts and enforces parity-map.yaml completeness.
  • Parity harness: scripts/e2e/compare-parity.sh, test/e2e/parity-map.yaml (seeded with 39 legacy-script placeholder entries), .github/workflows/e2e-parity-compare.yaml. This is the mechanism that will drive waves 2–12 of the migration — each migrated suite gets dispatched against the parity-compare workflow until zero divergence vs. its legacy counterpart.

What this PR does not change

  • Existing E2E workflows (nightly-e2e.yaml, macos-e2e.yaml, wsl-e2e.yaml, ollama-proxy-e2e.yaml, e2e-branch-validation.yaml, sandbox-images-and-e2e.yaml) run as before.
  • Existing test/e2e/test-*.sh scripts are not deleted. Per-wave retirement happens in follow-up PRs once parity-map.yaml entries report zero divergence.
  • No new behavior in the product itself; this is test-harness scaffolding only.

Migration sequencing (follow-up PRs)

Waves 2–12 migrate the 40 legacy test-*.sh scripts into the scenario matrix, one wave per PR, gated by parity-compare zero-divergence. Phase 13 is the merge gate that flips the default CI dispatch from nightly-e2e.yaml to e2e-parity-compare.yaml.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes — 3,258 cli tests, 0 failures (41 new tests from Phase 1)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes (test/e2e/README.md, test/e2e/MIGRATION.md, AGENTS.md)
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Fork E2E evidence (Phase 1 branch dispatch)

Dispatched nightly-e2e.yaml on jyaunches/NemoClaw against this branch (with a temporary local-only repo-gate lift, not part of this PR). Result:

Outcome Count Notes
✅ success 35 All CPU E2E jobs green on fork
⏭ skipped by design 2 gpu-e2e, gpu-double-onboard-e2e — need NVIDIA self-hosted GPU runner
⏸ unrunnable on fork 1 hermes-slack-e2e — needs linux-amd64-cpu4 runner (NVIDIA-internal)
⚠️ pre-existing fork-environment hang 1 hermes-e2e hangs during nemohermes onboard on fork (not a Phase 1 regression — same legacy script, no changes to onboard path)

35/35 of jobs that could run on the fork pass. This attests that Phase 1 infrastructure does not regress any existing E2E coverage.


Signed-off-by: Julie Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

  • New Features

    • Introduced scenario-based end-to-end testing framework with declarative setup profiles and ordered validation suites.
  • Tests

    • Added CI workflows for scenario execution and parity comparison with legacy tests.
    • New validation suites for smoke tests, inference, credentials, platform checks, and security.
  • Documentation

    • Added E2E testing guide and migration tracker for moving tests to scenario-based architecture.
  • Chores

    • Added test infrastructure, fixtures, and assertion helpers supporting multiple platforms and configurations.

Review Change Stack

jyaunches added 7 commits May 11, 2026 17:05
Reorganize E2E around declarative setup scenarios, reusable expected-state
configs, and suite sequences, while keeping all existing E2E workflows
unchanged.

Adds:

- test/e2e/scenarios.yaml, expected-states.yaml, suites.yaml — declarative
  metadata for initial scenarios (Ubuntu cloud OpenClaw/Hermes, macOS, WSL,
  GPU local Ollama, Brev launchable, no-Docker negative), expected states,
  and suite sequences.
- test/e2e/resolver/ — TypeScript resolver/validator/plan-printer/coverage
  report invoked via tsx. Uses js-yaml (already in root package.json).
  Unit-tested in the Vitest cli project.
- test/e2e/lib/{context,env,install,onboard,gateway,sandbox,artifacts,
  cleanup,emit-context-from-plan}.sh — reusable shell helpers producing a
  normalized context at \$E2E_CONTEXT_DIR (default .e2e/). Wraps existing
  test/e2e/lib/sandbox-teardown.sh and install-path-refresh.sh.
- test/e2e/{run-scenario.sh, run-suites.sh, coverage-report.sh} — entry
  points: resolve + plan, execute suites in order, emit coverage matrix.
  --plan-only emits human-readable stdout and stable JSON at
  \$E2E_CONTEXT_DIR/plan.json; --dry-run (E2E_DRY_RUN=1) gates
  destructive actions.
- test/e2e/suites/{smoke,inference,credentials,local-ollama-inference,
  ollama-proxy,platform-macos,platform-wsl,hermes-specific}/*.sh — suite
  step scripts.
- .github/workflows/e2e-scenarios.yaml — manual (workflow_dispatch) runner
  accepting a scenario id with optional plan_only and suite_filter inputs.
  Existing nightly-e2e / macos-e2e / wsl-e2e / ollama-proxy-e2e /
  e2e-branch-validation / sandbox-images-and-e2e workflows are unchanged.
- test/e2e/README.md — documents entrypoints, scenarios, suites, and the
  plan-only contract.

Tests (11 new files in the Vitest cli project, 55 scenarios):

- e2e-scenario-schema.test.ts, e2e-scenario-resolver.test.ts,
  e2e-context-helper.test.ts, e2e-lib-helpers.test.ts,
  e2e-suite-runner.test.ts, e2e-scenario-first-migration.test.ts,
  e2e-scenarios-workflow.test.ts, e2e-expected-state-validator.test.ts,
  e2e-scenario-additional-families.test.ts, e2e-coverage-report.test.ts,
  e2e-metadata-final-hygiene.test.ts.

Guards:

- Resolver-time and runtime enforcement of suite \`requires_state\` vs
  scenario expected_state.
- Schema guards rejecting array-form \`expected_states\` and premature
  introduction of \`overrides\` / \`preflight-failure-no-sandbox\` before
  their declared first consumers.
- Metadata hygiene guard tests (final metadata shape, coverage gaps
  surfaced in report).

Other:

- AGENTS.md: note the scenario-based runner under test/e2e/.
- .gitignore: add .e2e/ runtime context directory.

Scenarios that require off-host infrastructure (cloud secrets, macOS/WSL
runners, GPU runner, Brev) are wired via workflow_dispatch and validated
in follow-up runs; full retirement of legacy test/e2e/test-*.sh scripts is
intentionally deferred and tracked separately.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Under `set -u` on bash 3.2 (the default on macOS runners) \"\${arr[@]}\"
on an empty array raises \"unbound variable\" and fails the summary
loops at the end of a successful run. Switch to the \${arr[@]+...}
safe-expansion pattern so the loops expand to nothing when no steps
were recorded.

Unblocks macos-e2e (Scenario 9.1) on PR #3290.
…ss CodeRabbit review

Reshape the scaffolding so the file system reflects the scenario
organization informed by the UAT / NV QA bug hotspot analysis (446
issues traced to 213 fix PRs), and fold in CodeRabbit's 15 actionable
review items on PR #3290.

## Reorganization

`test/e2e/lib/`:

    lib/
      artifacts.sh cleanup.sh context.sh   (generic scaffolding - unchanged)
      emit-context-from-plan.sh env.sh
      install-path-refresh.sh sandbox-teardown.sh   (existing; preserved)
      setup/     <-- install.sh, onboard.sh (dimension dispatchers)
      assert/    <-- gateway-alive.sh (was gateway.sh),
                     sandbox-alive.sh (was sandbox.sh)
      fixtures/  <-- roadmap README; fixtures land with first consumers

`test/e2e/suites/` — grouped by functional area matching the bug
hotspot buckets:

    suites/
      smoke/                  (unchanged; baseline)
      onboarding/             <-- hermes/ (was hermes-specific/)
      inference/              <-- cloud/ (was direct files),
                                  ollama-gpu/ (was local-ollama-inference/),
                                  ollama-auth-proxy/ (was ollama-proxy/)
      security/               <-- credentials/ (was at suites/credentials/)
      platform/               <-- macos/ (was platform-macos/),
                                  wsl/ (was platform-wsl/)
      lifecycle/ sandbox/ messaging/   (new dirs with roadmap READMEs)

Each new directory ships with a README.md documenting the originating
bug class, the legacy `test/e2e/test-*.sh` script (where one exists),
and the planned coverage. Suite IDs in `suites.yaml` stay stable; only
script paths move.

## CodeRabbit review items addressed

1. `.github/workflows/e2e-scenarios.yaml` — add a `resolve-runner` job
   that routes each scenario to the correct runner
   (macos-latest / windows-latest / self-hosted / ubuntu-latest) based
   on the scenario id prefix. Previously `runs-on: ubuntu-latest` was
   hard-coded for every scenario.
2. All test files — add `timeout: Number(process.env.E2E_SPAWN_TIMEOUT_MS
   ?? 60_000)` to every `spawnSync` call so a stuck subprocess cannot
   block a Vitest worker.
3. `coverage-report.sh` — use `npx --no-install tsx` so the lockfile
   pins the version; fail closed if tsx is missing.
4. `lib/context.sh` — validate keys as POSIX identifiers before
   interpolating into grep regexes; reject newlines in values that
   would corrupt the line-oriented `context.env` format.
5. `lib/emit-context-from-plan.sh` — fail fast if `plan.json` is
   missing its `scenario_id` field rather than silently seeding empty.
6. `lib/setup/install.sh` — pin the installer source via
   `E2E_INSTALLER_URL` / `E2E_INSTALLER_SHA256` overrides; download to
   a temp file and sha256-verify before exec instead of streaming
   `curl | bash` over the network.
7. `lib/assert/sandbox-alive.sh` — fix regex that had an empty first
   alternative (`"^|..."`) and therefore always matched. Replace with
   `"(^|[[:space:]])name([[:space:]]|$)"` to properly detect
   "sandbox not found".
8. `test/e2e/README.md` — regenerate to reflect the current 7-scenario
   catalog, new directory layout, runner contracts, and post-reorg
   roadmap.
9. `resolver/index.ts` (`validate-state`) — require an explicit
   `--probes-from-state` flag to seed probes from the expected state.
   `run-scenario.sh` passes the flag in `--dry-run` mode only; live
   mode now fails closed when real probes are missing rather than
   silently self-validating.
10. `run-scenario.sh` resolver fallback — use `npx --no-install tsx`
    and fail closed with a clear message if tsx is not installed.
11. `run-scenario.sh` (non-dry-run) — exit 4 instead of 0 when full
    suite execution is not yet wired for the scenario. Silent-pass is
    now observable in CI.
12-15. `suites/inference/**` — replace `curl ... | head -c N` with
    `body="$(curl ...)"; printf '%s\n' "${body:0:N}"`. The pipe pattern
    was brittle under `pipefail`: `head` closing early could make
    successful requests appear failed.

## Test state

55/55 Vitest `cli` tests pass after reorg and fixes.
`prek run --all-files` exits 0.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Introduces test/e2e/MIGRATION.md — an in-tree tracker of legacy
test-*.sh scripts being migrated to the scenario matrix. Per-wave
completion is recorded there as the migration progresses.
Adds 30 tests covering Phase 1 deliverables from
specs/2026-05-11_e2e-test-migration/ (tests.md 1.A-1.H):

- 1.A logging helpers (lib/logging.sh)
- 1.B sandbox-exec helper (lib/sandbox-exec.sh)
- 1.C fixtures (fake-openai, fake-{telegram,discord,slack}, older-base-image)
- 1.D assertion helpers (inference-works, no-credentials-leaked,
  policy-preset-applied, messaging-bridge-reachable)
- 1.E install dispatcher splits (install-{repo,curl,ollama,launchable}.sh)
- 1.F run-scenario.sh --validate-only flag
- 1.G convention lint (scripts/e2e/lint-conventions.ts)
- 1.H parity harness (scripts/e2e/compare-parity.sh + parity-map.yaml)

All 30 tests fail as expected (red phase) \u2014 implementation follows.
Lands shared fixtures, helpers, assertion helpers, install-method
splits, conventions + lint, and the parity-compare CI harness that
unblock the per-wave migration phases (2\u201312).

Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1):

Fixtures (test/e2e/lib/fixtures/):
- fake-openai.sh: local OpenAI-compatible endpoint (Risk #2 mitigation)
- fake-{telegram,discord,slack}.sh: messaging stubs via shared
  _fake-http-stub.sh harness
- older-base-image.sh: tagged ghcr base-image Dockerfile generator

Helpers (test/e2e/lib/):
- logging.sh: canonical e2e_{section,info,pass,fail} with stable
  PASS:/FAIL:/=== Phase markers (absorbs reuse category #1)
- sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting,
  exit-code propagation, and dry-run short-circuit (category #10)
- env.sh: auto-sources logging.sh so every consumer gets it for free

Assertion helpers (test/e2e/lib/assert/):
- inference-works.sh: chat-completion round-trip
- no-credentials-leaked.sh: credential-pattern scan
- policy-preset-applied.sh: gateway policy preset verification
- messaging-bridge-reachable.sh: L7 proxy / bridge reachability

Install dispatcher splits (test/e2e/lib/setup/):
- install-{repo,curl,ollama,launchable}.sh (four profiles)
- install.sh: dispatcher routes by profile/method name (category #5)

Runtime probe wiring:
- run-scenario.sh: adds --validate-only flag (probe-only, no setup)
- resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys
  with embedded underscores (e.g. security.policy_engine)

Convention lint + parity harness:
- scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite
  step scripts + requires parity-map.yaml entries for legacy scripts
- scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL
  via parity-map.yaml; flaky: true marker supported (Risk #4)
- test/e2e/parity-map.yaml: seeded with one entry per existing legacy
  script; migration phases 2\u201312 append assertion mappings
- .github/workflows/e2e-parity-compare.yaml: dispatches legacy script
  + migrated scenario on same runner and diffs outcomes

Tests (all passing, 41 total):
- test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E)
- test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H)
- test/e2e-expected-state-validator.test.ts: +2 tests (1.F)

No regressions: full cli Vitest project (3258 tests) still green.
Follow-up to Phase 1. Documents the 13 duplication categories being
absorbed by the Wave 0 fixtures/asserts/conventions and the expected
LOC reduction. No code changes.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a scenario-based E2E framework: metadata (scenarios/expected-states/suites), TS resolver/validator/reporting, Bash runners/libs/assertions, fixtures, docs, CI workflows for scenarios and parity, parity/lint tools, and comprehensive Vitest coverage.

Changes

Scenario-driven E2E system

Layer / File(s) Summary
Resolver, validator, coverage reporting
test/e2e/runtime/resolver/*
TypeScript CLI to load/validate metadata, resolve plans, validate expected states, and render coverage.
Runners and shared libs
test/e2e/runtime/*.sh, test/e2e/runtime/lib/*
Bash entrypoints for scenario planning/execution/suites with standardized env, context, logging, artifacts.
Scenarios, states, suites
test/e2e/nemoclaw_scenarios/*, test/e2e/validation_suites/**/*
YAML catalogs, onboarding/install helpers, fixtures, and assertion/suite scripts across platforms and inference.
CI workflows and tooling
.github/workflows/*, scripts/e2e/*, .gitignore
Manual scenario workflow, parity-compare workflow, parity diff script, lint-conventions, and .e2e ignore.
Docs and tests
test/e2e/docs/*, AGENTS.md, test/e2e/scenario-framework-tests/*, vitest.config.ts
README/migration docs, agent note, Vitest suites validating resolver, runners, workflows, fixtures, and conventions.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer
  participant GA as GitHub Actions
  participant RS as run-scenario.sh
  participant TV as TS Resolver/Validator
  participant SU as Suite Runner
  participant AR as Artifacts

  Dev->>GA: Trigger e2e-scenarios (scenario, plan_only)
  GA->>RS: Invoke with inputs
  RS->>TV: Resolve plan / validate-state
  TV-->>RS: plan.json / expected-state report
  RS->>SU: Execute suites (if enabled)
  SU-->>AR: Logs, context.env, reports
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

CI/CD, enhancement: feature, configuration

Suggested reviewers

  • cv

Poem

I mapped the trails the suites will tread,
With YAML stars to show the thread.
A rabbit taps the CI drum—
Plan, validate, the fixtures hum.
Logs like carrots safely stowed,
Parity checked, we hit the road. 🥕✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/e2e-scenario-matrix-phase1

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (15)
test/e2e/suites/sandbox/README.md-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Capitalize the proper noun "Telegram".

The term "telegram" appears twice in lowercase when referring to the Telegram messaging platform. Proper nouns should be capitalized in documentation.

📝 Proposed fix
-- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram
-  policy lost on rebuild) + UAT `#2010` (telegram policy apparently applied
+- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram
+  policy lost on rebuild) + UAT `#2010` (Telegram policy apparently applied
   but gateway blocks it) remain live blind spots.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/sandbox/README.md` around lines 29 - 30, In the README entry
titled "Policy preservation during rebuild is untested." update the two
lowercase occurrences of "telegram" to the proper noun "Telegram" (e.g., in the
UAT references UAT `#1952` and UAT `#2010` lines) so both mentions of the messaging
platform are capitalized consistently.
AGENTS.md-30-30 (1)

30-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align E2E wording with the updated scenario-based architecture.

Line 30 now documents scenario-based E2E, but the Testing Strategy section still broadly states E2E runs on ephemeral Brev instances; tighten that wording so only the branch-validation project is Brev-specific.

As per coding guidelines: AGENTS.md must “Document agent architecture and design patterns” and “Use consistent formatting and structure for agent documentation”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@AGENTS.md` at line 30, Update the Testing Strategy section in AGENTS.md to
align with the scenario-based E2E wording on line referencing `test/e2e/`:
explicitly state that end-to-end tests use the scenario-based runner described
in `test/e2e/README.md`, and restrict the note about running on ephemeral Brev
instances to only the branch-validation project (mention `branch-validation` by
name). Ensure the doc follows the agent docs guidelines by keeping the structure
consistent (section title "Testing Strategy", a short paragraph, and a bullet
pointing to `test/e2e/README.md`) and update any conflicting sentences that
currently imply all E2E runs use Brev.
test/e2e/suites/messaging/README.md-8-9 (1)

8-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale onboarding reference in docs.

Line 8 references onboard.ts, but this migration path is wired through shell onboarding helpers (onboard.sh), so this is misleading for contributors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/messaging/README.md` around lines 8 - 9, The README's
onboarding reference is stale: replace the mention of onboard.ts with the
correct shell helper onboard.sh and clarify that messaging migration is wired
through the shell onboarding helpers (onboard.sh) rather than a TypeScript
script; update the sentence that mentions "policy preset OR `onboard.ts`" to
reference "policy preset OR `onboard.sh` (shell onboarding helpers)" and ensure
the surrounding text explains that onboarding is performed via the shell helper
workflow.
test/e2e/resolver/validator.ts-60-77 (1)

60-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the unused key parameter from compare.

key is unused (Line 61), and the current calls pass it only to discard it.

🧹 Proposed fix
 function compare(
-  key: string,
   expected: ProbeValue,
   actual: ProbeValue | undefined,
 ): boolean {
   if (actual === undefined) return false;
   return expected === actual;
 }
@@
-    const ok = compare(key, expected, actual);
+    const ok = compare(expected, actual);
@@
-      const ok = compare(key, expected as ProbeValue, actual);
+      const ok = compare(expected as ProbeValue, actual);

As per coding guidelines, **/*.{js,ts,tsx,jsx} requires intentionally unused variables to be prefixed with _.

Also applies to: 91-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/resolver/validator.ts` around lines 60 - 77, The compare function
declares an unused parameter `key` which violates the lint rule for unused
variables; update the function signature in `compare` to remove the `key`
parameter (or rename it to `_key` if you prefer to signal intentional unused)
and then update all call sites (e.g., the call inside `validateExpectedState`
where `compare(key, expected, actual)` is invoked and the other occurrences
around lines 91-94) to pass only the remaining parameters (or keep passing
`_key` if you renamed it), ensuring types/signatures match.
test/e2e/suites/security/credentials/00-credentials-present.sh-11-14 (1)

11-14: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct ShellCheck source hints to match this script’s directory depth.

Line 11 and Line 13 currently reference ../../lib/...; from test/e2e/suites/security/credentials/, the proper relative hint is ../../../lib/....

Suggested patch
-# shellcheck source=../../lib/env.sh
+# shellcheck source=../../../lib/env.sh
 . "${LIB_DIR}/env.sh"
-# shellcheck source=../../lib/context.sh
+# shellcheck source=../../../lib/context.sh
 . "${LIB_DIR}/context.sh"
As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/security/credentials/00-credentials-present.sh` around lines
11 - 14, Update the ShellCheck source hint paths used by the script lines that
source env.sh and context.sh so they reflect the script's actual directory
depth: change the source hints from ../../lib/... to ../../../lib/... for the
lines sourcing ". \"${LIB_DIR}/env.sh\"" and ". \"${LIB_DIR}/context.sh\"";
ensure the corrected hints are the only change and that the script remains
covered by the repository .shellcheckrc linting rules.
test/e2e/suites/platform/macos/00-macos-smoke.sh-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix ShellCheck source hint paths for this nested directory.

Line 14 and Line 16 point to ../../lib/..., but this script is under test/e2e/suites/platform/macos/, so the correct hint is ../../../lib/.... Keeping the hint accurate ensures ShellCheck evaluates the intended sourced files.

Suggested patch
-# shellcheck source=../../lib/env.sh
+# shellcheck source=../../../lib/env.sh
 . "${LIB_DIR}/env.sh"
-# shellcheck source=../../lib/context.sh
+# shellcheck source=../../../lib/context.sh
 . "${LIB_DIR}/context.sh"
As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/platform/macos/00-macos-smoke.sh` around lines 14 - 17,
Update the ShellCheck source hint comments to point to the correct nested lib
path: change the two comments that currently read "shellcheck
source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use
"../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck
resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh"
and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are
corrected in the same script to satisfy the repository ShellCheck rules
(.shellcheckrc).
test/e2e/lib/assert/sandbox-alive.sh-31-35 (1)

31-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape or avoid regex metacharacters in the sandbox name matching.

Line 34 interpolates ${name} directly into the grep ERE pattern, so sandbox names with regex metacharacters (., +, *, [, ], etc.) produce false positives or negatives. Use grep -F for literal matching or the field-based awk approach for token boundary matching.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/lib/assert/sandbox-alive.sh` around lines 31 - 35, The current grep
-qE uses unescaped ${name} which lets regex metacharacters in sandbox names
break matching; change the check that runs "nemoclaw list | grep -qE ..." to a
literal token equality check (e.g., use awk or a loop) that compares fields to
the variable rather than interpolating it into a regex: capture the output of
"nemoclaw list", pass the shell variable name as a literal (awk -v name="$name")
and scan fields (for (i=1;i<=NF;i++) if ($i==name) found=1) and base the
success/failure on that result instead of the current grep -qE pattern. Ensure
you still write the same error message using ${name} when not found.
test/e2e/MIGRATION.md-62-62 (1)

62-62: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace emoji status markers with plain text legend tokens.

Line 62 uses emoji symbols in technical prose, which violates the markdown style rule.

As per coding guidelines: "No emoji in technical prose."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/MIGRATION.md` at line 62, The legend line currently uses emoji
symbols ("⬜", "🟨", "✅", "🔵") which violates the "No emoji in technical prose"
rule; replace them with plain-text tokens or words (e.g., "NOT_STARTED",
"IN_PROGRESS", "MIGRATED", "PARITY_VERIFIED" or simply "not started · in
progress · migrated · parity verified") so the line reads without emoji while
preserving the same meaning; update the line containing "Legend: ⬜ not started ·
🟨 in progress · ✅ migrated · 🔵 parity verified" accordingly.
test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh-25-26 (1)

25-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail the step when /api/tags returns an empty body.

Lines 25-26 currently allow an empty 2xx response to pass, which can hide gateway/inference regressions.

Proposed fix
 body="$(curl -fsS --max-time 10 "${url%/}/api/tags")"
+if [[ -z "${body}" ]]; then
+  echo "local-ollama-inference:ollama-models-health: empty response" >&2
+  exit 1
+fi
 printf '%s\n' "${body:0:512}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh` around lines
25 - 26, The test currently assigns the curl response to the variable body and
prints the first 512 chars but does not fail if body is empty; update the logic
after the curl call (the body="$(curl ...)" assignment and the printf) to detect
an empty response (e.g., test if body is zero-length) and exit non‑zero with a
clear error message when empty so the step fails on empty 2xx responses;
reference the body variable and the curl invocation in your change and ensure
the printf remains for debugging when non-empty.
test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a non-empty response assertion for the inference probe.

Line 29 currently treats any HTTP 2xx as success, even if the body is empty. That can mask broken inference routes with false-positive suite passes.

Proposed fix
 body="$(nemoclaw shell "${name}" -- curl -fsS --max-time 10 "http://${route}/v1/models")"
+if [[ -z "${body}" ]]; then
+  echo "inference:sandbox-inference-local: empty response" >&2
+  exit 1
+fi
 printf '%s\n' "${body:0:512}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh` around
lines 29 - 30, The probe currently treats any 2xx as success but doesn’t assert
the response body is non-empty; update the inference probe after the curl that
sets the variable body (from the nemoclaw shell call) to validate that body is
not empty—e.g., check that the length of the variable body (used in the printf)
is greater than zero and fail the script (non-zero exit/print a clear error) if
it is empty so empty-success responses are treated as failures.
scripts/e2e/lint-conventions.ts-145-146 (1)

145-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle missing value after --root as CLI misuse.

At Line 145, --root with no following value currently falls back to auto root instead of exiting with code 2. That makes invalid invocations look valid.

Proposed fix
-    if (a === "--root") root = args.shift();
+    if (a === "--root") {
+      const v = args.shift();
+      if (!v || v.startsWith("-")) {
+        process.stderr.write("lint-conventions: --root requires a value\n");
+        process.exit(2);
+      }
+      root = v;
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/e2e/lint-conventions.ts` around lines 145 - 146, The CLI parsing
branch that handles the "--root" flag (where code does: if (a === "--root") root
= args.shift();) must validate that a value was actually provided; update the
parsing logic around the argument loop (referencing the variables/identifiers a,
args, and root) to check that args.shift() returns a non-empty, non-option
string and if not, print a concise usage/error message and exit with code 2
(e.g., process.exit(2)) to treat this as CLI misuse rather than silently falling
back to auto root. Ensure the same guard rejects cases where the next token
starts with "-" so a missing/invalid value is handled.
test/e2e-metadata-final-hygiene.test.ts-67-76 (1)

67-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an explicit existence check for expected_state references.

Right now, Line 67 only reads meta.expectedStates.expected_states[sc.expected_state] for negative-scenario logic; it does not fail when the key is missing. A bad expected_state reference can slip through if sc.suites is non-empty.

Proposed fix
   for (const [id, sc] of Object.entries(meta.scenarios.setup_scenarios)) {
     if (!sc.expected_state) {
       problems.push(`${id}: missing expected_state`);
       continue;
     }
+    if (!(sc.expected_state in meta.expectedStates.expected_states)) {
+      problems.push(`${id}: expected_state '${sc.expected_state}' not found`);
+      continue;
+    }
     // Negative scenarios (preflight failures) intentionally have no suites.
     const state = meta.expectedStates.expected_states[sc.expected_state] as {
       failure?: { expected?: boolean };
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-metadata-final-hygiene.test.ts` around lines 67 - 76, The code reads
meta.expectedStates.expected_states[sc.expected_state] without verifying the key
exists, so add an explicit existence check for sc.expected_state in
meta.expectedStates.expected_states before using it: if sc.expected_state is
missing or not a key, push a problem like `${id}: unknown expected_state
'${sc.expected_state}'` (or similar) and continue; only then set state =
meta.expectedStates.expected_states[sc.expected_state] and compute isNegative
from state?.failure?.expected. This ensures bad expected_state references are
caught even when sc.suites is non-empty.
scripts/e2e/lint-conventions.ts-76-84 (1)

76-84: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

no-section-call misses e2e_section despite rule intent.

Line 77 says this rule covers section/e2e_section, but Line 83 only matches section. This leaves a documented violation path unlinted.

Proposed fix
-        if (/^section\s+["']/.test(line)) {
-          return "calls section; filename carries the phase label";
+        if (/^(section|e2e_section)\s+/.test(line)) {
+          return "calls section/e2e_section; filename carries the phase label";
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/e2e/lint-conventions.ts` around lines 76 - 84, The rule object with
id "no-section-call" currently only checks for /^section\s+["']/ in its test
function; update that test's regex to also detect e2e_section and the combined
form section/e2e_section (for example:
/^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the
documented coverage; modify the regex inside the test arrow function that
iterates lines in the "no-section-call" rule accordingly.
test/e2e-lib-helpers.test.ts-198-198 (1)

198-198: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the established PATH fallback pattern in spawned-test env overrides.

At Line 198, Line 242, and Line 359, ${process.env.PATH} should include a fallback to avoid producing ...:undefined in constrained environments.

Proposed fix
-        { PATH: `${bin}:${process.env.PATH}` },
+        { PATH: `${bin}:${process.env.PATH || ""}` },
@@
-        { PATH: `${bin}:${process.env.PATH}`, TOKEN: "SHOULD_NOT_EXPAND" },
+        { PATH: `${bin}:${process.env.PATH || ""}`, TOKEN: "SHOULD_NOT_EXPAND" },
@@
-        { PATH: `${bin}:${process.env.PATH}` },
+        { PATH: `${bin}:${process.env.PATH || ""}` },

Based on learnings: “In this repo’s test suite, prefer the established POSIX PATH separator :PATH: \${fakeBin}:${process.env.PATH || ""}``.”

Also applies to: 242-242, 359-359

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-lib-helpers.test.ts` at line 198, Three occurrences build
environment overrides like `{ PATH: `${bin}:${process.env.PATH}` }` which can
produce `:undefined`; update each override to use the established POSIX
separator and a safe fallback, e.g. replace `${bin}:${process.env.PATH}` with
`${fakeBin}:${process.env.PATH || ""}` (or `${bin}:${process.env.PATH || ""}` as
appropriate) so the PATH entry never contains "undefined"; apply this fix to all
instances of the PATH env override in the test file.
test/e2e/resolver/index.ts-79-98 (1)

79-98: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate command early and keep the allowed-command message accurate.

Right now unknown commands are rejected only after metadata load and scenario resolution, and the error text (Line 127) omits coverage. This can produce misleading failures for invalid CLI invocations.

Proposed fix
 function main(): number {
   let parsed: ReturnType<typeof parseArgs>;
@@
   const { command, scenarioId, contextDir, metadataDir } = parsed;
+  if (!["plan", "validate-state", "coverage"].includes(command)) {
+    process.stderr.write(
+      `resolver: unknown command '${command}' (expected: coverage|plan|validate-state)\n`,
+    );
+    return 2;
+  }
+
   if (command === "coverage") {
@@
-    process.stderr.write(
-      `resolver: unknown command '${command}' (expected: plan|validate-state <scenario-id>)\n`,
-    );
-    return 2;
+    process.stderr.write("resolver: internal error: unreachable command branch\n");
+    return 2;
   } catch (err) {

Also applies to: 126-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/resolver/index.ts` around lines 79 - 98, After destructuring parsed
(const { command, scenarioId, contextDir, metadataDir }), validate that command
is one of the supported commands (include "coverage" alongside the other allowed
commands) immediately — before calling loadMetadataFromDir or resolveScenario —
and if it's invalid write a clear error to stderr listing the exact allowed
commands (make sure "coverage" is included) and return the proper non-zero exit
code; update the existing unknown-command error message (the one currently
omitting "coverage") and apply the same early-validation logic to the later
branch referenced in the review so unknown commands are rejected before any
metadata loading or scenario resolution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-parity-compare.yaml:
- Around line 82-87: Validate the github.event.inputs.legacy_script value before
executing it: ensure it matches the strict basename pattern
^test-[A-Za-z0-9._-]+\.sh$ and does not contain path separators or traversal (no
'/' or '..'), and fail the step with an error if it doesn't match; keep the
existing executability check for test/e2e/${{ github.event.inputs.legacy_script
}} but only call bash on that path after the pattern validation succeeds so only
allowed test-*.sh basenames under test/e2e are executed.

In @.github/workflows/e2e-scenarios.yaml:
- Around line 19-23: The workflow input "plan_only" currently defaults to
"false", causing manual dispatches to run the non-dry-run path that fails in
Phase 1; change the default value of the "plan_only" input to "true" so manual
runs default to dry-run, and verify the conditional that triggers the
non-dry-run path only executes when inputs.plan_only == 'false' (i.e., ensure
the steps that perform install/onboard/suites are gated by that condition).

In `@test/e2e/lib/assert/inference-works.sh`:
- Around line 53-59: The pipeline checks using printf "${out}" | grep -q
'"choices"' and '"content"' can produce false negatives under set -euo pipefail;
replace those grep pipelines with shell string tests against the out variable
(e.g. use [[ $out == *'"choices"'* ]] and [[ $out == *'"content"'* ]] in the
same file) so the checks don't rely on pipes, and make the script executable
(chmod +x) so it follows the existing executable assertion files; update the
failure messages to remain the same if tests fail.

In `@test/e2e/lib/assert/no-credentials-leaked.sh`:
- Around line 29-35: The credential regexes in the local patterns array miss
modern token formats; update the patterns array in no-credentials-leaked.sh (the
local patterns variable) to include OpenAI project keys (e.g.,
sk-proj-[A-Za-z0-9-]{...} and general sk-[A-Za-z0-9]{16,} variants), GitHub
fine-grained tokens (github_pat_[A-Za-z0-9_]{...}) and other GitHub token
prefixes (gho_, ghu_, ghs_, ghr_), and ensure existing patterns like 'sk-...',
'ghp_...', 'xox...', and 'AKIA...' remain but broadened where appropriate so the
new entries detect these modern formats; add or expand entries within the
patterns array accordingly.

In `@test/e2e/lib/context.sh`:
- Around line 65-69: _e2e_context_is_sensitive_key currently uses case patterns
like *TOKEN*/*SECRET* which are case-sensitive, so lowercase keys validated by
_e2e_context_validate_key leak secrets; update _e2e_context_is_sensitive_key to
normalize the input key to a single case (e.g., convert the parameter to
uppercase using bash expansion like key_upper=${1^^}) and run the case pattern
matching against that normalized variable so patterns (*TOKEN*, *SECRET*, etc.)
match regardless of original case.

In `@test/e2e/lib/setup/install-curl.sh`:
- Around line 40-41: The installer invocation (bash "${tmp}") can fail but its
exit status is lost because nemoclaw_refresh_install_env runs unconditionally;
change install-curl.sh to save the exit code of the installer (e.g., capture
"$?") immediately after running bash "${tmp}", then call
nemoclaw_refresh_install_env and finally exit with the saved installer status so
failures aren't masked; also make the script executable (chmod +x) so it meets
the execution requirement.

In `@test/e2e/lib/setup/install-ollama.sh`:
- Line 25: Replace the unsafe "curl | bash" pipeline by downloading
"${ollama_url}" to a temp file (use mktemp), check curl's exit status, verify
the downloaded installer checksum using the same checksum logic used by
e2e_install_curl, then execute the installer from the temp file with bash and
remove the temp file; update the block that currently contains `if ! curl -fsSL
--retry 3 --retry-delay 2 "${ollama_url}" | bash; then` to first save to a temp
path, validate the checksum, run `bash "$tempfile"` only on success, and fail
the script if any step (download, checksum, execution) fails.

In `@test/e2e/lib/setup/install-repo.sh`:
- Around line 24-28: The subshell that runs the two commands "( cd
\"${repo_root}\" || exit; npm install; npm link )" can mask failures if npm
install fails but npm link succeeds; update that subshell so npm link only runs
when npm install succeeds by chaining the commands with && (or enable errexit
inside the subshell), i.e., ensure the sequence uses "npm install && npm link"
(or "set -e" then run both) so a failing npm install causes the subshell and the
script to fail; locate the subshell containing the npm install and npm link
calls and replace the separated commands accordingly.

In `@test/e2e/lib/setup/onboard.sh`:
- Around line 20-38: The dry-run short-circuit (e2e_env_is_dry_run) currently
returns success before verifying the profile, letting invalid profiles pass;
change the flow so the profile is validated first using the existing case block
(the same cases for cloud-openclaw, cloud-hermes, local-ollama-openclaw) or a
small helper (e.g., e2e_is_supported_profile) that mirrors that case logic and
returns non-zero for unknown profiles, then perform the e2e_env_is_dry_run check
to print the “[dry-run] onboard profile=…” message and exit 0; finally, call the
actual onboarding functions (e2e_onboard_cloud_openclaw,
e2e_onboard_cloud_hermes, e2e_onboard_local_ollama_openclaw) only after
validation and dry-run handling.

In `@test/e2e/MIGRATION.md`:
- Around line 6-10: The opening summary claims "migrates all existing
`test/e2e/test-*.sh` scripts" but the status line later reads "0 / 40 scripts
migrated", creating a contradiction; update the text so both reflect the same
migration phase—either change the opening sentence to indicate a staged/partial
migration (e.g., "begins migration of ... Strategy B; legacy scripts remain...")
or update the status line to the correct migrated count; look for the exact
phrases "migrates all existing `test/e2e/test-*.sh` scripts" and "0 / 40 scripts
migrated" in MIGRATION.md and make them consistent.

In `@test/e2e/suites/inference/cloud/00-models-health.sh`:
- Around line 31-32: The piped truncation echo "${body}" | head -c 512 can fail
under set -o pipefail due to SIGPIPE; replace this pipeline by truncating the
shell variable with parameter expansion (take the first 512 bytes from the
variable named body using ${body:0:512}) and print that directly (use printf to
avoid echo semantics), keeping the separate echo afterwards to emit the newline
so behavior matches the original output.

---

Minor comments:
In `@AGENTS.md`:
- Line 30: Update the Testing Strategy section in AGENTS.md to align with the
scenario-based E2E wording on line referencing `test/e2e/`: explicitly state
that end-to-end tests use the scenario-based runner described in
`test/e2e/README.md`, and restrict the note about running on ephemeral Brev
instances to only the branch-validation project (mention `branch-validation` by
name). Ensure the doc follows the agent docs guidelines by keeping the structure
consistent (section title "Testing Strategy", a short paragraph, and a bullet
pointing to `test/e2e/README.md`) and update any conflicting sentences that
currently imply all E2E runs use Brev.

In `@scripts/e2e/lint-conventions.ts`:
- Around line 145-146: The CLI parsing branch that handles the "--root" flag
(where code does: if (a === "--root") root = args.shift();) must validate that a
value was actually provided; update the parsing logic around the argument loop
(referencing the variables/identifiers a, args, and root) to check that
args.shift() returns a non-empty, non-option string and if not, print a concise
usage/error message and exit with code 2 (e.g., process.exit(2)) to treat this
as CLI misuse rather than silently falling back to auto root. Ensure the same
guard rejects cases where the next token starts with "-" so a missing/invalid
value is handled.
- Around line 76-84: The rule object with id "no-section-call" currently only
checks for /^section\s+["']/ in its test function; update that test's regex to
also detect e2e_section and the combined form section/e2e_section (for example:
/^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the
documented coverage; modify the regex inside the test arrow function that
iterates lines in the "no-section-call" rule accordingly.

In `@test/e2e-lib-helpers.test.ts`:
- Line 198: Three occurrences build environment overrides like `{ PATH:
`${bin}:${process.env.PATH}` }` which can produce `:undefined`; update each
override to use the established POSIX separator and a safe fallback, e.g.
replace `${bin}:${process.env.PATH}` with `${fakeBin}:${process.env.PATH || ""}`
(or `${bin}:${process.env.PATH || ""}` as appropriate) so the PATH entry never
contains "undefined"; apply this fix to all instances of the PATH env override
in the test file.

In `@test/e2e-metadata-final-hygiene.test.ts`:
- Around line 67-76: The code reads
meta.expectedStates.expected_states[sc.expected_state] without verifying the key
exists, so add an explicit existence check for sc.expected_state in
meta.expectedStates.expected_states before using it: if sc.expected_state is
missing or not a key, push a problem like `${id}: unknown expected_state
'${sc.expected_state}'` (or similar) and continue; only then set state =
meta.expectedStates.expected_states[sc.expected_state] and compute isNegative
from state?.failure?.expected. This ensures bad expected_state references are
caught even when sc.suites is non-empty.

In `@test/e2e/lib/assert/sandbox-alive.sh`:
- Around line 31-35: The current grep -qE uses unescaped ${name} which lets
regex metacharacters in sandbox names break matching; change the check that runs
"nemoclaw list | grep -qE ..." to a literal token equality check (e.g., use awk
or a loop) that compares fields to the variable rather than interpolating it
into a regex: capture the output of "nemoclaw list", pass the shell variable
name as a literal (awk -v name="$name") and scan fields (for (i=1;i<=NF;i++) if
($i==name) found=1) and base the success/failure on that result instead of the
current grep -qE pattern. Ensure you still write the same error message using
${name} when not found.

In `@test/e2e/MIGRATION.md`:
- Line 62: The legend line currently uses emoji symbols ("⬜", "🟨", "✅", "🔵")
which violates the "No emoji in technical prose" rule; replace them with
plain-text tokens or words (e.g., "NOT_STARTED", "IN_PROGRESS", "MIGRATED",
"PARITY_VERIFIED" or simply "not started · in progress · migrated · parity
verified") so the line reads without emoji while preserving the same meaning;
update the line containing "Legend: ⬜ not started · 🟨 in progress · ✅ migrated
· 🔵 parity verified" accordingly.

In `@test/e2e/resolver/index.ts`:
- Around line 79-98: After destructuring parsed (const { command, scenarioId,
contextDir, metadataDir }), validate that command is one of the supported
commands (include "coverage" alongside the other allowed commands) immediately —
before calling loadMetadataFromDir or resolveScenario — and if it's invalid
write a clear error to stderr listing the exact allowed commands (make sure
"coverage" is included) and return the proper non-zero exit code; update the
existing unknown-command error message (the one currently omitting "coverage")
and apply the same early-validation logic to the later branch referenced in the
review so unknown commands are rejected before any metadata loading or scenario
resolution.

In `@test/e2e/resolver/validator.ts`:
- Around line 60-77: The compare function declares an unused parameter `key`
which violates the lint rule for unused variables; update the function signature
in `compare` to remove the `key` parameter (or rename it to `_key` if you prefer
to signal intentional unused) and then update all call sites (e.g., the call
inside `validateExpectedState` where `compare(key, expected, actual)` is invoked
and the other occurrences around lines 91-94) to pass only the remaining
parameters (or keep passing `_key` if you renamed it), ensuring types/signatures
match.

In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh`:
- Around line 29-30: The probe currently treats any 2xx as success but doesn’t
assert the response body is non-empty; update the inference probe after the curl
that sets the variable body (from the nemoclaw shell call) to validate that body
is not empty—e.g., check that the length of the variable body (used in the
printf) is greater than zero and fail the script (non-zero exit/print a clear
error) if it is empty so empty-success responses are treated as failures.

In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh`:
- Around line 25-26: The test currently assigns the curl response to the
variable body and prints the first 512 chars but does not fail if body is empty;
update the logic after the curl call (the body="$(curl ...)" assignment and the
printf) to detect an empty response (e.g., test if body is zero-length) and exit
non‑zero with a clear error message when empty so the step fails on empty 2xx
responses; reference the body variable and the curl invocation in your change
and ensure the printf remains for debugging when non-empty.

In `@test/e2e/suites/messaging/README.md`:
- Around line 8-9: The README's onboarding reference is stale: replace the
mention of onboard.ts with the correct shell helper onboard.sh and clarify that
messaging migration is wired through the shell onboarding helpers (onboard.sh)
rather than a TypeScript script; update the sentence that mentions "policy
preset OR `onboard.ts`" to reference "policy preset OR `onboard.sh` (shell
onboarding helpers)" and ensure the surrounding text explains that onboarding is
performed via the shell helper workflow.

In `@test/e2e/suites/platform/macos/00-macos-smoke.sh`:
- Around line 14-17: Update the ShellCheck source hint comments to point to the
correct nested lib path: change the two comments that currently read "shellcheck
source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use
"../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck
resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh"
and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are
corrected in the same script to satisfy the repository ShellCheck rules
(.shellcheckrc).

In `@test/e2e/suites/sandbox/README.md`:
- Around line 29-30: In the README entry titled "Policy preservation during
rebuild is untested." update the two lowercase occurrences of "telegram" to the
proper noun "Telegram" (e.g., in the UAT references UAT `#1952` and UAT `#2010`
lines) so both mentions of the messaging platform are capitalized consistently.

In `@test/e2e/suites/security/credentials/00-credentials-present.sh`:
- Around line 11-14: Update the ShellCheck source hint paths used by the script
lines that source env.sh and context.sh so they reflect the script's actual
directory depth: change the source hints from ../../lib/... to ../../../lib/...
for the lines sourcing ". \"${LIB_DIR}/env.sh\"" and ".
\"${LIB_DIR}/context.sh\""; ensure the corrected hints are the only change and
that the script remains covered by the repository .shellcheckrc linting rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7146ad7a-3e0c-461e-bc05-64bd68afcaf8

📥 Commits

Reviewing files that changed from the base of the PR and between fc07c37 and 387767d.

📒 Files selected for processing (81)
  • .github/workflows/e2e-parity-compare.yaml
  • .github/workflows/e2e-scenarios.yaml
  • .gitignore
  • AGENTS.md
  • scripts/e2e/compare-parity.sh
  • scripts/e2e/lint-conventions.ts
  • test/e2e-context-helper.test.ts
  • test/e2e-convention-lint.test.ts
  • test/e2e-coverage-report.test.ts
  • test/e2e-expected-state-validator.test.ts
  • test/e2e-lib-helpers.test.ts
  • test/e2e-metadata-final-hygiene.test.ts
  • test/e2e-scenario-additional-families.test.ts
  • test/e2e-scenario-first-migration.test.ts
  • test/e2e-scenario-resolver.test.ts
  • test/e2e-scenario-schema.test.ts
  • test/e2e-scenarios-workflow.test.ts
  • test/e2e-suite-runner.test.ts
  • test/e2e/MIGRATION.md
  • test/e2e/README.md
  • test/e2e/coverage-report.sh
  • test/e2e/expected-states.yaml
  • test/e2e/lib/artifacts.sh
  • test/e2e/lib/assert/README.md
  • test/e2e/lib/assert/gateway-alive.sh
  • test/e2e/lib/assert/inference-works.sh
  • test/e2e/lib/assert/messaging-bridge-reachable.sh
  • test/e2e/lib/assert/no-credentials-leaked.sh
  • test/e2e/lib/assert/policy-preset-applied.sh
  • test/e2e/lib/assert/sandbox-alive.sh
  • test/e2e/lib/cleanup.sh
  • test/e2e/lib/context.sh
  • test/e2e/lib/emit-context-from-plan.sh
  • test/e2e/lib/env.sh
  • test/e2e/lib/fixtures/README.md
  • test/e2e/lib/fixtures/_fake-http-stub.sh
  • test/e2e/lib/fixtures/fake-discord.sh
  • test/e2e/lib/fixtures/fake-openai.sh
  • test/e2e/lib/fixtures/fake-slack.sh
  • test/e2e/lib/fixtures/fake-telegram.sh
  • test/e2e/lib/fixtures/older-base-image.sh
  • test/e2e/lib/logging.sh
  • test/e2e/lib/sandbox-exec.sh
  • test/e2e/lib/setup/README.md
  • test/e2e/lib/setup/install-curl.sh
  • test/e2e/lib/setup/install-launchable.sh
  • test/e2e/lib/setup/install-ollama.sh
  • test/e2e/lib/setup/install-repo.sh
  • test/e2e/lib/setup/install.sh
  • test/e2e/lib/setup/onboard.sh
  • test/e2e/parity-map.yaml
  • test/e2e/resolver/coverage.ts
  • test/e2e/resolver/index.ts
  • test/e2e/resolver/js-yaml.d.ts
  • test/e2e/resolver/load.ts
  • test/e2e/resolver/plan.ts
  • test/e2e/resolver/schema.ts
  • test/e2e/resolver/validator.ts
  • test/e2e/run-scenario.sh
  • test/e2e/run-suites.sh
  • test/e2e/scenarios.yaml
  • test/e2e/suites.yaml
  • test/e2e/suites/inference/cloud/00-models-health.sh
  • test/e2e/suites/inference/cloud/01-chat-completion.sh
  • test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh
  • test/e2e/suites/inference/ollama-auth-proxy/00-proxy-reachable.sh
  • test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh
  • test/e2e/suites/inference/ollama-gpu/01-ollama-chat-completion.sh
  • test/e2e/suites/lifecycle/README.md
  • test/e2e/suites/messaging/README.md
  • test/e2e/suites/onboarding/README.md
  • test/e2e/suites/onboarding/hermes/00-hermes-health.sh
  • test/e2e/suites/platform/macos/00-macos-smoke.sh
  • test/e2e/suites/platform/wsl/00-wsl-smoke.sh
  • test/e2e/suites/sandbox/README.md
  • test/e2e/suites/security/README.md
  • test/e2e/suites/security/credentials/00-credentials-present.sh
  • test/e2e/suites/smoke/00-cli-available.sh
  • test/e2e/suites/smoke/01-gateway-health.sh
  • test/e2e/suites/smoke/02-sandbox-listed.sh
  • test/e2e/suites/smoke/03-sandbox-shell.sh

Comment thread .github/workflows/e2e-parity-compare.yaml
Comment thread .github/workflows/e2e-scenarios.yaml
Comment thread test/e2e/validation_suites/assert/inference-works.sh
Comment thread test/e2e/validation_suites/assert/no-credentials-leaked.sh
Comment thread test/e2e/runtime/lib/context.sh
Comment thread test/e2e/nemoclaw_scenarios/install/ollama.sh
Comment thread test/e2e/nemoclaw_scenarios/install/repo-current.sh
Comment thread test/e2e/lib/setup/onboard.sh Outdated
Comment thread test/e2e/docs/MIGRATION.md
Comment thread test/e2e/validation_suites/inference/cloud/00-models-health.sh
Delete the 8 placeholder sub-READMEs under lib/{assert,fixtures,setup}/
and suites/{lifecycle,messaging,onboarding,sandbox,security}/ — their
roadmap content was stale before it even landed, and the yaml files
(scenarios.yaml, expected-states.yaml, suites.yaml) plus MIGRATION.md
are the real sources of truth.

Rewrite test/e2e/README.md to be brief but broad: the core model, the
three declarative inputs, how to run, file layout, and how to extend.
Removes the matrix catalog (now lives in scenarios.yaml) and the
how-to-add-* cookbook (now: 'read the schema, edit the yaml').
@wscurran wscurran added enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps labels May 11, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Consolidates all documentation for the e2e setup-scenario matrix under
test/e2e/docs/. The top level of test/e2e/ now contains only
code, runners, and declarative inputs — no prose.

- test/e2e/README.md       -> test/e2e/docs/README.md
- test/e2e/MIGRATION.md    -> test/e2e/docs/MIGRATION.md

Fix 4 relative links inside docs/README.md that pointed at sibling
yaml/resolver paths; now they go one directory up.
Comment thread test/e2e/scenario-framework-tests/e2e-context-helper.test.ts Fixed
Comment thread test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/e2e/docs/MIGRATION.md (1)

6-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve the migration-status contradiction in the opening summary.

The opening states "This PR migrates all existing test/e2e/test-*.sh scripts" but Line 58 reports 0 / 40 scripts migrated, and the PR objectives confirm this is "test-harness scaffolding only; no product behavior changes." This contradiction makes the scope and merge gate unclear.

📝 Proposed fix
-This PR migrates all existing `test/e2e/test-*.sh` scripts into the
-scenario-based runner introduced by PR `#3290`. Full deep migration
-(Strategy B). Legacy scripts remain in the repo during this PR and run
-in parallel for 1–2 nightly cycles after merge; a follow-up PR retires
-them once parity is verified.
+This PR introduces Phase 1 migration infrastructure for moving existing
+`test/e2e/test-*.sh` scripts into the scenario-based runner introduced
+by PR `#3290` (Strategy B). Legacy scripts remain unchanged in this PR;
+migration waves 1–11 will move them into scenarios, followed by parity
+verification and retirement in a future phase.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/docs/MIGRATION.md` around lines 6 - 10, Update the opening summary
in MIGRATION.md to remove the contradictory claim that "This PR migrates all
existing `test/e2e/test-*.sh` scripts" and instead state that this PR provides
scenario-runner scaffolding only (no scripts migrated yet), aligning it with the
reported "0 / 40 scripts migrated" status and the PR objective that this is
test-harness scaffolding only; ensure the updated text makes the merge gate
clear by explaining legacy scripts remain and full migration will follow in a
separate PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/docs/README.md`:
- Around line 19-23: The relative links in test/e2e/docs/README.md still point
to peer files in test/e2e; update each link target (e.g., `scenarios.yaml`,
`expected-states.yaml`, `suites.yaml` and any other single-segment filenames
referenced in the document) to prepend `../` so they resolve to the parent
directory (e.g., `../scenarios.yaml`, `../expected-states.yaml`,
`../suites.yaml`); search the file for plain filename links and replace them
with the `../`-prefixed versions to fix all broken references.

---

Duplicate comments:
In `@test/e2e/docs/MIGRATION.md`:
- Around line 6-10: Update the opening summary in MIGRATION.md to remove the
contradictory claim that "This PR migrates all existing `test/e2e/test-*.sh`
scripts" and instead state that this PR provides scenario-runner scaffolding
only (no scripts migrated yet), aligning it with the reported "0 / 40 scripts
migrated" status and the PR objective that this is test-harness scaffolding
only; ensure the updated text makes the merge gate clear by explaining legacy
scripts remain and full migration will follow in a separate PR.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 55bd4844-0af6-45a2-8537-068a41e01a54

📥 Commits

Reviewing files that changed from the base of the PR and between 77d1d38 and dc6cb9e.

📒 Files selected for processing (2)
  • test/e2e/docs/MIGRATION.md
  • test/e2e/docs/README.md

Comment thread test/e2e/docs/README.md Outdated
jyaunches added 10 commits May 12, 2026 10:47
…untime/

Moves orchestration machinery (the part that runs any scenario/suite
and doesn't depend on scenario- or suite-specific domain) into a
dedicated runtime/ bucket.

- run-scenario.sh, run-suites.sh, coverage-report.sh -> runtime/
- resolver/*.ts                                      -> runtime/resolver/
- lib/{context,env,logging,cleanup,artifacts,sandbox-teardown}.sh -> runtime/lib/

No behavior change — paths only. Internal source-paths in moved files
are updated to the new layout as part of this commit.
Each yaml now lives in the directory that owns its concern:

- scenarios.yaml       -> nemoclaw_scenarios/scenarios.yaml
- expected-states.yaml -> nemoclaw_scenarios/expected-states.yaml
- suites.yaml          -> validation_suites/suites.yaml
- parity-map.yaml      -> docs/parity-map.yaml  (migration-era tracker)

Paths are updated in callers in subsequent commits.
…narios/

The install dimension is part of scenario setup. Colocates install
scripts with the other scenario-side files.

Renames for clarity at the same time:
  lib/setup/install.sh           -> install/dispatch.sh
  lib/setup/install-repo.sh      -> install/repo-current.sh
  lib/setup/install-curl.sh      -> install/public-curl.sh
  lib/setup/install-launchable.sh -> install/launchable.sh
  lib/setup/install-ollama.sh    -> install/ollama.sh
  lib/install-path-refresh.sh    -> install/helpers/install-path-refresh.sh
  lib/emit-context-from-plan.sh  -> helpers/emit-context-from-plan.sh
The old lib/setup/onboard.sh was a single file: one dispatcher +
three per-path worker functions (cloud-openclaw, cloud-hermes,
local-ollama-openclaw). Splitting matches the install-side layout
and lets each onboarding path be found and edited in one file.

  lib/setup/onboard.sh  ->  onboard/dispatch.sh + one file per path:
                              cloud-openclaw.sh
                              cloud-hermes.sh
                              local-ollama-openclaw.sh

dispatch.sh sources the three sibling files and retains the
e2e_onboard() case-statement dispatcher.
Fixtures (fake-openai, fake-telegram, fake-discord, fake-slack,
older-base-image, and the shared _fake-http-stub) are all scenario-
setup scaffolding — they exist to support scenarios that need a
fake inference endpoint, fake messaging provider, or an older base
image to exercise upgrade paths.

  lib/fixtures/  ->  nemoclaw_scenarios/fixtures/
…nder validation_suites/

Everything that validates a set-up scenario now lives in one bucket.

  lib/assert/                  -> validation_suites/assert/
  lib/sandbox-exec.sh          -> validation_suites/sandbox-exec.sh
  suites/smoke/                -> validation_suites/smoke/
  suites/inference/            -> validation_suites/inference/
  suites/platform/             -> validation_suites/platform/
  suites/security/             -> validation_suites/security/
  suites/onboarding/hermes/    -> validation_suites/hermes/

The hermes/ rename deliberately drops the 'onboarding/' parent. The
Hermes suite is functional validation of a Hermes sandbox, not an
onboarding step — keeping it under 'onboarding/' alongside
nemoclaw_scenarios/onboard/ invited confusion.
Vitest tests, parity tooling, and CI workflows now point at the new
per-bucket locations:

  scenarios.yaml / expected-states.yaml  ->  nemoclaw_scenarios/
  suites.yaml                            ->  validation_suites/
  parity-map.yaml                        ->  docs/
  run-*.sh / coverage-report.sh          ->  runtime/

Updated files:
  - .github/workflows/e2e-scenarios.yaml
  - .github/workflows/e2e-parity-compare.yaml
  - scripts/e2e/compare-parity.sh
  - scripts/e2e/lint-conventions.ts
  - 12 test/e2e-*.test.ts files
  - test/e2e/docs/README.md (relative-link repairs)

No functional changes.
…th-refresh

install-path-refresh.sh moved from test/e2e/lib/ to
test/e2e/nemoclaw_scenarios/install/helpers/. The 8 legacy
test-*.sh scripts that source it need their source paths updated
while the scripts still exist (they will be deleted wave-by-wave
during migration).

Updated (no logic change, only the source path line):
  test-diagnostics.sh
  test-deployment-services.sh
  test-credential-migration.sh
  test-cloud-onboard-e2e.sh
  test-cloud-inference-e2e.sh
  test-network-policy.sh
  test-openclaw-inference-switch.sh
  test-hermes-inference-switch.sh
The e2e-metadata-final-hygiene test asserts that test/e2e/docs/README.md
contains a heading matching /adding a new setup scenario|how to add/i.
The recent README rewrite used 'Adding to the matrix' which didn't
match either alternative. Rename to 'How to add a scenario, state,
or suite' — clearer and keeps the test green.
…ded on main

Drift from main — test-hermes-inference-switch.sh,
test-openclaw-inference-switch.sh, and test-openshell-gateway-upgrade.sh
were added to main after this branch forked. The convention lint
correctly flagged them as orphans (Risk #1: new legacy scripts with
no migration mapping).

Added placeholder entries (empty scenario + assertions) to bring the
map back in sync with the legacy surface area. Actual mappings land
with their respective migration waves.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: e2e-scenarios-plan-only, e2e-parity-compare-empty-diff

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. This PR is purely additive E2E scaffolding: a new scenario-based runner, parity tooling, manual-dispatch workflows, and docs. No production source is modified, no existing legacy test/e2e/test-*.sh script is changed, and the new workflows are workflow_dispatch only. Real product user flows (install, onboard, sandbox lifecycle, inference routing, credentials, security boundaries, deployment) are unaffected, so no required E2E job is needed to merge. Existing nightly-e2e/macos-e2e/wsl-e2e/regression-e2e workflows continue to validate product behavior on their normal cadence.

Optional E2E

  • e2e-scenarios-plan-only (low): Smoke-validate the new declarative scenario runner end-to-end via --plan-only against a representative ubuntu scenario. This exercises test/e2e/runtime/resolver/{load,plan,validator,schema}.ts and the scenarios.yaml/expected-states.yaml/suites.yaml chain without any destructive side effects, catching schema/reference mistakes that scenario-framework-tests may miss.
  • e2e-parity-compare-empty-diff (low): Confidence check that the parity comparator (scripts/e2e/compare-parity.sh + parity-map.yaml) renders a clean zero-divergence report when no legacy_script is provided. Validates the new e2e-parity-compare workflow plumbing without consuming legacy E2E budget.

New E2E recommendations

  • vitest-config (high): The new vitest project 'e2e-scenario-framework' (test/e2e/scenario-framework-tests/**) is declared in vitest.config.ts but is not invoked by the prek pre-push hook in .pre-commit-config.yaml (which only runs --project cli and --project plugin) nor by any GitHub workflow. Without wiring, scaffolding regressions in run-scenario.sh, resolver/, parity tooling, and convention lint will pass CI silently.
    • Suggested test: Add a CI step (e.g. in .github/actions/basic-checks/action.yaml or a new prek hook) that runs npx vitest run --project e2e-scenario-framework on every PR.
  • e2e-framework-scaffolding (medium): scripts/e2e/lint-conventions.ts enforces the migration-spec conventions (no /tmp/.log, no own traps, parity-map coverage of new test-.sh, current parity inventory) but no PR workflow invokes it. A new legacy script or stale parity-inventory.generated.json would land unblocked.
    • Suggested test: Add a PR-time job (or prek pre-push hook) that runs npx tsx scripts/e2e/lint-conventions.ts and npx tsx scripts/e2e/check-parity-map.ts so the conventions documented in test/e2e/docs/README.md are actually enforced.
  • ci-workflows-manual-only (medium): e2e-scenarios.yaml and e2e-parity-compare.yaml are workflow_dispatch only, so PRs that touch the scenario runner cannot get automatic coverage. Once the framework stabilizes, a small pull_request trigger that runs --plan-only for a fixed scenario list would catch resolver/schema regressions automatically.
    • Suggested test: Extend e2e-scenarios.yaml with a pull_request trigger that always runs run-scenario.sh <id> --plan-only for a small representative scenario set (ubuntu, gpu, macos, wsl) when test/e2e/** changes.

@jyaunches jyaunches marked this pull request as draft May 12, 2026 15:13
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment thread scripts/e2e/extract-legacy-assertions.ts Fixed
@jyaunches jyaunches merged commit 1a15445 into main May 13, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. v0.0.41 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants