feat(e2e): scenario-based setup matrix + Phase 1 migration infrastructure#3363
Conversation
Reorganize E2E around declarative setup scenarios, reusable expected-state
configs, and suite sequences, while keeping all existing E2E workflows
unchanged.
Adds:
- test/e2e/scenarios.yaml, expected-states.yaml, suites.yaml — declarative
metadata for initial scenarios (Ubuntu cloud OpenClaw/Hermes, macOS, WSL,
GPU local Ollama, Brev launchable, no-Docker negative), expected states,
and suite sequences.
- test/e2e/resolver/ — TypeScript resolver/validator/plan-printer/coverage
report invoked via tsx. Uses js-yaml (already in root package.json).
Unit-tested in the Vitest cli project.
- test/e2e/lib/{context,env,install,onboard,gateway,sandbox,artifacts,
cleanup,emit-context-from-plan}.sh — reusable shell helpers producing a
normalized context at \$E2E_CONTEXT_DIR (default .e2e/). Wraps existing
test/e2e/lib/sandbox-teardown.sh and install-path-refresh.sh.
- test/e2e/{run-scenario.sh, run-suites.sh, coverage-report.sh} — entry
points: resolve + plan, execute suites in order, emit coverage matrix.
--plan-only emits human-readable stdout and stable JSON at
\$E2E_CONTEXT_DIR/plan.json; --dry-run (E2E_DRY_RUN=1) gates
destructive actions.
- test/e2e/suites/{smoke,inference,credentials,local-ollama-inference,
ollama-proxy,platform-macos,platform-wsl,hermes-specific}/*.sh — suite
step scripts.
- .github/workflows/e2e-scenarios.yaml — manual (workflow_dispatch) runner
accepting a scenario id with optional plan_only and suite_filter inputs.
Existing nightly-e2e / macos-e2e / wsl-e2e / ollama-proxy-e2e /
e2e-branch-validation / sandbox-images-and-e2e workflows are unchanged.
- test/e2e/README.md — documents entrypoints, scenarios, suites, and the
plan-only contract.
Tests (11 new files in the Vitest cli project, 55 scenarios):
- e2e-scenario-schema.test.ts, e2e-scenario-resolver.test.ts,
e2e-context-helper.test.ts, e2e-lib-helpers.test.ts,
e2e-suite-runner.test.ts, e2e-scenario-first-migration.test.ts,
e2e-scenarios-workflow.test.ts, e2e-expected-state-validator.test.ts,
e2e-scenario-additional-families.test.ts, e2e-coverage-report.test.ts,
e2e-metadata-final-hygiene.test.ts.
Guards:
- Resolver-time and runtime enforcement of suite \`requires_state\` vs
scenario expected_state.
- Schema guards rejecting array-form \`expected_states\` and premature
introduction of \`overrides\` / \`preflight-failure-no-sandbox\` before
their declared first consumers.
- Metadata hygiene guard tests (final metadata shape, coverage gaps
surfaced in report).
Other:
- AGENTS.md: note the scenario-based runner under test/e2e/.
- .gitignore: add .e2e/ runtime context directory.
Scenarios that require off-host infrastructure (cloud secrets, macOS/WSL
runners, GPU runner, Brev) are wired via workflow_dispatch and validated
in follow-up runs; full retirement of legacy test/e2e/test-*.sh scripts is
intentionally deferred and tracked separately.
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Under `set -u` on bash 3.2 (the default on macOS runners) \"\${arr[@]}\"
on an empty array raises \"unbound variable\" and fails the summary
loops at the end of a successful run. Switch to the \${arr[@]+...}
safe-expansion pattern so the loops expand to nothing when no steps
were recorded.
Unblocks macos-e2e (Scenario 9.1) on PR #3290.
…ss CodeRabbit review Reshape the scaffolding so the file system reflects the scenario organization informed by the UAT / NV QA bug hotspot analysis (446 issues traced to 213 fix PRs), and fold in CodeRabbit's 15 actionable review items on PR #3290. ## Reorganization `test/e2e/lib/`: lib/ artifacts.sh cleanup.sh context.sh (generic scaffolding - unchanged) emit-context-from-plan.sh env.sh install-path-refresh.sh sandbox-teardown.sh (existing; preserved) setup/ <-- install.sh, onboard.sh (dimension dispatchers) assert/ <-- gateway-alive.sh (was gateway.sh), sandbox-alive.sh (was sandbox.sh) fixtures/ <-- roadmap README; fixtures land with first consumers `test/e2e/suites/` — grouped by functional area matching the bug hotspot buckets: suites/ smoke/ (unchanged; baseline) onboarding/ <-- hermes/ (was hermes-specific/) inference/ <-- cloud/ (was direct files), ollama-gpu/ (was local-ollama-inference/), ollama-auth-proxy/ (was ollama-proxy/) security/ <-- credentials/ (was at suites/credentials/) platform/ <-- macos/ (was platform-macos/), wsl/ (was platform-wsl/) lifecycle/ sandbox/ messaging/ (new dirs with roadmap READMEs) Each new directory ships with a README.md documenting the originating bug class, the legacy `test/e2e/test-*.sh` script (where one exists), and the planned coverage. Suite IDs in `suites.yaml` stay stable; only script paths move. ## CodeRabbit review items addressed 1. `.github/workflows/e2e-scenarios.yaml` — add a `resolve-runner` job that routes each scenario to the correct runner (macos-latest / windows-latest / self-hosted / ubuntu-latest) based on the scenario id prefix. Previously `runs-on: ubuntu-latest` was hard-coded for every scenario. 2. All test files — add `timeout: Number(process.env.E2E_SPAWN_TIMEOUT_MS ?? 60_000)` to every `spawnSync` call so a stuck subprocess cannot block a Vitest worker. 3. `coverage-report.sh` — use `npx --no-install tsx` so the lockfile pins the version; fail closed if tsx is missing. 4. `lib/context.sh` — validate keys as POSIX identifiers before interpolating into grep regexes; reject newlines in values that would corrupt the line-oriented `context.env` format. 5. `lib/emit-context-from-plan.sh` — fail fast if `plan.json` is missing its `scenario_id` field rather than silently seeding empty. 6. `lib/setup/install.sh` — pin the installer source via `E2E_INSTALLER_URL` / `E2E_INSTALLER_SHA256` overrides; download to a temp file and sha256-verify before exec instead of streaming `curl | bash` over the network. 7. `lib/assert/sandbox-alive.sh` — fix regex that had an empty first alternative (`"^|..."`) and therefore always matched. Replace with `"(^|[[:space:]])name([[:space:]]|$)"` to properly detect "sandbox not found". 8. `test/e2e/README.md` — regenerate to reflect the current 7-scenario catalog, new directory layout, runner contracts, and post-reorg roadmap. 9. `resolver/index.ts` (`validate-state`) — require an explicit `--probes-from-state` flag to seed probes from the expected state. `run-scenario.sh` passes the flag in `--dry-run` mode only; live mode now fails closed when real probes are missing rather than silently self-validating. 10. `run-scenario.sh` resolver fallback — use `npx --no-install tsx` and fail closed with a clear message if tsx is not installed. 11. `run-scenario.sh` (non-dry-run) — exit 4 instead of 0 when full suite execution is not yet wired for the scenario. Silent-pass is now observable in CI. 12-15. `suites/inference/**` — replace `curl ... | head -c N` with `body="$(curl ...)"; printf '%s\n' "${body:0:N}"`. The pipe pattern was brittle under `pipefail`: `head` closing early could make successful requests appear failed. ## Test state 55/55 Vitest `cli` tests pass after reorg and fixes. `prek run --all-files` exits 0. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Introduces test/e2e/MIGRATION.md — an in-tree tracker of legacy test-*.sh scripts being migrated to the scenario matrix. Per-wave completion is recorded there as the migration progresses.
Adds 30 tests covering Phase 1 deliverables from
specs/2026-05-11_e2e-test-migration/ (tests.md 1.A-1.H):
- 1.A logging helpers (lib/logging.sh)
- 1.B sandbox-exec helper (lib/sandbox-exec.sh)
- 1.C fixtures (fake-openai, fake-{telegram,discord,slack}, older-base-image)
- 1.D assertion helpers (inference-works, no-credentials-leaked,
policy-preset-applied, messaging-bridge-reachable)
- 1.E install dispatcher splits (install-{repo,curl,ollama,launchable}.sh)
- 1.F run-scenario.sh --validate-only flag
- 1.G convention lint (scripts/e2e/lint-conventions.ts)
- 1.H parity harness (scripts/e2e/compare-parity.sh + parity-map.yaml)
All 30 tests fail as expected (red phase) \u2014 implementation follows.
Lands shared fixtures, helpers, assertion helpers, install-method splits, conventions + lint, and the parity-compare CI harness that unblock the per-wave migration phases (2\u201312). Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1): Fixtures (test/e2e/lib/fixtures/): - fake-openai.sh: local OpenAI-compatible endpoint (Risk #2 mitigation) - fake-{telegram,discord,slack}.sh: messaging stubs via shared _fake-http-stub.sh harness - older-base-image.sh: tagged ghcr base-image Dockerfile generator Helpers (test/e2e/lib/): - logging.sh: canonical e2e_{section,info,pass,fail} with stable PASS:/FAIL:/=== Phase markers (absorbs reuse category #1) - sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting, exit-code propagation, and dry-run short-circuit (category #10) - env.sh: auto-sources logging.sh so every consumer gets it for free Assertion helpers (test/e2e/lib/assert/): - inference-works.sh: chat-completion round-trip - no-credentials-leaked.sh: credential-pattern scan - policy-preset-applied.sh: gateway policy preset verification - messaging-bridge-reachable.sh: L7 proxy / bridge reachability Install dispatcher splits (test/e2e/lib/setup/): - install-{repo,curl,ollama,launchable}.sh (four profiles) - install.sh: dispatcher routes by profile/method name (category #5) Runtime probe wiring: - run-scenario.sh: adds --validate-only flag (probe-only, no setup) - resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys with embedded underscores (e.g. security.policy_engine) Convention lint + parity harness: - scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite step scripts + requires parity-map.yaml entries for legacy scripts - scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL via parity-map.yaml; flaky: true marker supported (Risk #4) - test/e2e/parity-map.yaml: seeded with one entry per existing legacy script; migration phases 2\u201312 append assertion mappings - .github/workflows/e2e-parity-compare.yaml: dispatches legacy script + migrated scenario on same runner and diffs outcomes Tests (all passing, 41 total): - test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E) - test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H) - test/e2e-expected-state-validator.test.ts: +2 tests (1.F) No regressions: full cli Vitest project (3258 tests) still green.
Follow-up to Phase 1. Documents the 13 duplication categories being absorbed by the Wave 0 fixtures/asserts/conventions and the expected LOC reduction. No code changes.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a scenario-based E2E framework: metadata (scenarios/expected-states/suites), TS resolver/validator/reporting, Bash runners/libs/assertions, fixtures, docs, CI workflows for scenarios and parity, parity/lint tools, and comprehensive Vitest coverage. ChangesScenario-driven E2E system
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant GA as GitHub Actions
participant RS as run-scenario.sh
participant TV as TS Resolver/Validator
participant SU as Suite Runner
participant AR as Artifacts
Dev->>GA: Trigger e2e-scenarios (scenario, plan_only)
GA->>RS: Invoke with inputs
RS->>TV: Resolve plan / validate-state
TV-->>RS: plan.json / expected-state report
RS->>SU: Execute suites (if enabled)
SU-->>AR: Logs, context.env, reports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (15)
test/e2e/suites/sandbox/README.md-29-30 (1)
29-30:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCapitalize the proper noun "Telegram".
The term "telegram" appears twice in lowercase when referring to the Telegram messaging platform. Proper nouns should be capitalized in documentation.
📝 Proposed fix
-- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram - policy lost on rebuild) + UAT `#2010` (telegram policy apparently applied +- **Policy preservation during rebuild is untested.** UAT `#1952` (Telegram + policy lost on rebuild) + UAT `#2010` (Telegram policy apparently applied but gateway blocks it) remain live blind spots.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/sandbox/README.md` around lines 29 - 30, In the README entry titled "Policy preservation during rebuild is untested." update the two lowercase occurrences of "telegram" to the proper noun "Telegram" (e.g., in the UAT references UAT `#1952` and UAT `#2010` lines) so both mentions of the messaging platform are capitalized consistently.AGENTS.md-30-30 (1)
30-30:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign E2E wording with the updated scenario-based architecture.
Line 30 now documents scenario-based E2E, but the Testing Strategy section still broadly states E2E runs on ephemeral Brev instances; tighten that wording so only the branch-validation project is Brev-specific.
As per coding guidelines:
AGENTS.mdmust “Document agent architecture and design patterns” and “Use consistent formatting and structure for agent documentation”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@AGENTS.md` at line 30, Update the Testing Strategy section in AGENTS.md to align with the scenario-based E2E wording on line referencing `test/e2e/`: explicitly state that end-to-end tests use the scenario-based runner described in `test/e2e/README.md`, and restrict the note about running on ephemeral Brev instances to only the branch-validation project (mention `branch-validation` by name). Ensure the doc follows the agent docs guidelines by keeping the structure consistent (section title "Testing Strategy", a short paragraph, and a bullet pointing to `test/e2e/README.md`) and update any conflicting sentences that currently imply all E2E runs use Brev.test/e2e/suites/messaging/README.md-8-9 (1)
8-9:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix stale onboarding reference in docs.
Line 8 references
onboard.ts, but this migration path is wired through shell onboarding helpers (onboard.sh), so this is misleading for contributors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/messaging/README.md` around lines 8 - 9, The README's onboarding reference is stale: replace the mention of onboard.ts with the correct shell helper onboard.sh and clarify that messaging migration is wired through the shell onboarding helpers (onboard.sh) rather than a TypeScript script; update the sentence that mentions "policy preset OR `onboard.ts`" to reference "policy preset OR `onboard.sh` (shell onboarding helpers)" and ensure the surrounding text explains that onboarding is performed via the shell helper workflow.test/e2e/resolver/validator.ts-60-77 (1)
60-77:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the unused
keyparameter fromcompare.
keyis unused (Line 61), and the current calls pass it only to discard it.🧹 Proposed fix
function compare( - key: string, expected: ProbeValue, actual: ProbeValue | undefined, ): boolean { if (actual === undefined) return false; return expected === actual; } @@ - const ok = compare(key, expected, actual); + const ok = compare(expected, actual); @@ - const ok = compare(key, expected as ProbeValue, actual); + const ok = compare(expected as ProbeValue, actual);As per coding guidelines,
**/*.{js,ts,tsx,jsx}requires intentionally unused variables to be prefixed with_.Also applies to: 91-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/resolver/validator.ts` around lines 60 - 77, The compare function declares an unused parameter `key` which violates the lint rule for unused variables; update the function signature in `compare` to remove the `key` parameter (or rename it to `_key` if you prefer to signal intentional unused) and then update all call sites (e.g., the call inside `validateExpectedState` where `compare(key, expected, actual)` is invoked and the other occurrences around lines 91-94) to pass only the remaining parameters (or keep passing `_key` if you renamed it), ensuring types/signatures match.test/e2e/suites/security/credentials/00-credentials-present.sh-11-14 (1)
11-14:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCorrect ShellCheck source hints to match this script’s directory depth.
Line 11 and Line 13 currently reference
../../lib/...; fromtest/e2e/suites/security/credentials/, the proper relative hint is../../../lib/....As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.Suggested patch
-# shellcheck source=../../lib/env.sh +# shellcheck source=../../../lib/env.sh . "${LIB_DIR}/env.sh" -# shellcheck source=../../lib/context.sh +# shellcheck source=../../../lib/context.sh . "${LIB_DIR}/context.sh"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/security/credentials/00-credentials-present.sh` around lines 11 - 14, Update the ShellCheck source hint paths used by the script lines that source env.sh and context.sh so they reflect the script's actual directory depth: change the source hints from ../../lib/... to ../../../lib/... for the lines sourcing ". \"${LIB_DIR}/env.sh\"" and ". \"${LIB_DIR}/context.sh\""; ensure the corrected hints are the only change and that the script remains covered by the repository .shellcheckrc linting rules.test/e2e/suites/platform/macos/00-macos-smoke.sh-14-17 (1)
14-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix ShellCheck source hint paths for this nested directory.
Line 14 and Line 16 point to
../../lib/..., but this script is undertest/e2e/suites/platform/macos/, so the correct hint is../../../lib/.... Keeping the hint accurate ensures ShellCheck evaluates the intended sourced files.As per coding guidelines, `**/*.{sh,bash}` must enforce ShellCheck linting using the repository `.shellcheckrc`.Suggested patch
-# shellcheck source=../../lib/env.sh +# shellcheck source=../../../lib/env.sh . "${LIB_DIR}/env.sh" -# shellcheck source=../../lib/context.sh +# shellcheck source=../../../lib/context.sh . "${LIB_DIR}/context.sh"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/platform/macos/00-macos-smoke.sh` around lines 14 - 17, Update the ShellCheck source hint comments to point to the correct nested lib path: change the two comments that currently read "shellcheck source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use "../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh" and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are corrected in the same script to satisfy the repository ShellCheck rules (.shellcheckrc).test/e2e/lib/assert/sandbox-alive.sh-31-35 (1)
31-35:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEscape or avoid regex metacharacters in the sandbox name matching.
Line 34 interpolates
${name}directly into the grep ERE pattern, so sandbox names with regex metacharacters (.,+,*,[,], etc.) produce false positives or negatives. Usegrep -Ffor literal matching or the field-based awk approach for token boundary matching.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/lib/assert/sandbox-alive.sh` around lines 31 - 35, The current grep -qE uses unescaped ${name} which lets regex metacharacters in sandbox names break matching; change the check that runs "nemoclaw list | grep -qE ..." to a literal token equality check (e.g., use awk or a loop) that compares fields to the variable rather than interpolating it into a regex: capture the output of "nemoclaw list", pass the shell variable name as a literal (awk -v name="$name") and scan fields (for (i=1;i<=NF;i++) if ($i==name) found=1) and base the success/failure on that result instead of the current grep -qE pattern. Ensure you still write the same error message using ${name} when not found.test/e2e/MIGRATION.md-62-62 (1)
62-62:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace emoji status markers with plain text legend tokens.
Line 62 uses emoji symbols in technical prose, which violates the markdown style rule.
As per coding guidelines: "No emoji in technical prose."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/MIGRATION.md` at line 62, The legend line currently uses emoji symbols ("⬜", "🟨", "✅", "🔵") which violates the "No emoji in technical prose" rule; replace them with plain-text tokens or words (e.g., "NOT_STARTED", "IN_PROGRESS", "MIGRATED", "PARITY_VERIFIED" or simply "not started · in progress · migrated · parity verified") so the line reads without emoji while preserving the same meaning; update the line containing "Legend: ⬜ not started · 🟨 in progress · ✅ migrated · 🔵 parity verified" accordingly.test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh-25-26 (1)
25-26:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFail the step when
/api/tagsreturns an empty body.Lines 25-26 currently allow an empty 2xx response to pass, which can hide gateway/inference regressions.
Proposed fix
body="$(curl -fsS --max-time 10 "${url%/}/api/tags")" +if [[ -z "${body}" ]]; then + echo "local-ollama-inference:ollama-models-health: empty response" >&2 + exit 1 +fi printf '%s\n' "${body:0:512}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh` around lines 25 - 26, The test currently assigns the curl response to the variable body and prints the first 512 chars but does not fail if body is empty; update the logic after the curl call (the body="$(curl ...)" assignment and the printf) to detect an empty response (e.g., test if body is zero-length) and exit non‑zero with a clear error message when empty so the step fails on empty 2xx responses; reference the body variable and the curl invocation in your change and ensure the printf remains for debugging when non-empty.test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh-29-30 (1)
29-30:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a non-empty response assertion for the inference probe.
Line 29 currently treats any HTTP 2xx as success, even if the body is empty. That can mask broken inference routes with false-positive suite passes.
Proposed fix
body="$(nemoclaw shell "${name}" -- curl -fsS --max-time 10 "http://${route}/v1/models")" +if [[ -z "${body}" ]]; then + echo "inference:sandbox-inference-local: empty response" >&2 + exit 1 +fi printf '%s\n' "${body:0:512}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh` around lines 29 - 30, The probe currently treats any 2xx as success but doesn’t assert the response body is non-empty; update the inference probe after the curl that sets the variable body (from the nemoclaw shell call) to validate that body is not empty—e.g., check that the length of the variable body (used in the printf) is greater than zero and fail the script (non-zero exit/print a clear error) if it is empty so empty-success responses are treated as failures.scripts/e2e/lint-conventions.ts-145-146 (1)
145-146:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHandle missing value after
--rootas CLI misuse.At Line 145,
--rootwith no following value currently falls back to auto root instead of exiting with code 2. That makes invalid invocations look valid.Proposed fix
- if (a === "--root") root = args.shift(); + if (a === "--root") { + const v = args.shift(); + if (!v || v.startsWith("-")) { + process.stderr.write("lint-conventions: --root requires a value\n"); + process.exit(2); + } + root = v; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/e2e/lint-conventions.ts` around lines 145 - 146, The CLI parsing branch that handles the "--root" flag (where code does: if (a === "--root") root = args.shift();) must validate that a value was actually provided; update the parsing logic around the argument loop (referencing the variables/identifiers a, args, and root) to check that args.shift() returns a non-empty, non-option string and if not, print a concise usage/error message and exit with code 2 (e.g., process.exit(2)) to treat this as CLI misuse rather than silently falling back to auto root. Ensure the same guard rejects cases where the next token starts with "-" so a missing/invalid value is handled.test/e2e-metadata-final-hygiene.test.ts-67-76 (1)
67-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd an explicit existence check for
expected_statereferences.Right now, Line 67 only reads
meta.expectedStates.expected_states[sc.expected_state]for negative-scenario logic; it does not fail when the key is missing. A badexpected_statereference can slip through ifsc.suitesis non-empty.Proposed fix
for (const [id, sc] of Object.entries(meta.scenarios.setup_scenarios)) { if (!sc.expected_state) { problems.push(`${id}: missing expected_state`); continue; } + if (!(sc.expected_state in meta.expectedStates.expected_states)) { + problems.push(`${id}: expected_state '${sc.expected_state}' not found`); + continue; + } // Negative scenarios (preflight failures) intentionally have no suites. const state = meta.expectedStates.expected_states[sc.expected_state] as { failure?: { expected?: boolean }; };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-metadata-final-hygiene.test.ts` around lines 67 - 76, The code reads meta.expectedStates.expected_states[sc.expected_state] without verifying the key exists, so add an explicit existence check for sc.expected_state in meta.expectedStates.expected_states before using it: if sc.expected_state is missing or not a key, push a problem like `${id}: unknown expected_state '${sc.expected_state}'` (or similar) and continue; only then set state = meta.expectedStates.expected_states[sc.expected_state] and compute isNegative from state?.failure?.expected. This ensures bad expected_state references are caught even when sc.suites is non-empty.scripts/e2e/lint-conventions.ts-76-84 (1)
76-84:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
no-section-callmissese2e_sectiondespite rule intent.Line 77 says this rule covers
section/e2e_section, but Line 83 only matchessection. This leaves a documented violation path unlinted.Proposed fix
- if (/^section\s+["']/.test(line)) { - return "calls section; filename carries the phase label"; + if (/^(section|e2e_section)\s+/.test(line)) { + return "calls section/e2e_section; filename carries the phase label"; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/e2e/lint-conventions.ts` around lines 76 - 84, The rule object with id "no-section-call" currently only checks for /^section\s+["']/ in its test function; update that test's regex to also detect e2e_section and the combined form section/e2e_section (for example: /^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the documented coverage; modify the regex inside the test arrow function that iterates lines in the "no-section-call" rule accordingly.test/e2e-lib-helpers.test.ts-198-198 (1)
198-198:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the established PATH fallback pattern in spawned-test env overrides.
At Line 198, Line 242, and Line 359,
${process.env.PATH}should include a fallback to avoid producing...:undefinedin constrained environments.Proposed fix
- { PATH: `${bin}:${process.env.PATH}` }, + { PATH: `${bin}:${process.env.PATH || ""}` }, @@ - { PATH: `${bin}:${process.env.PATH}`, TOKEN: "SHOULD_NOT_EXPAND" }, + { PATH: `${bin}:${process.env.PATH || ""}`, TOKEN: "SHOULD_NOT_EXPAND" }, @@ - { PATH: `${bin}:${process.env.PATH}` }, + { PATH: `${bin}:${process.env.PATH || ""}` },Based on learnings: “In this repo’s test suite, prefer the established POSIX PATH separator
:…PATH: \${fakeBin}:${process.env.PATH || ""}``.”Also applies to: 242-242, 359-359
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-lib-helpers.test.ts` at line 198, Three occurrences build environment overrides like `{ PATH: `${bin}:${process.env.PATH}` }` which can produce `:undefined`; update each override to use the established POSIX separator and a safe fallback, e.g. replace `${bin}:${process.env.PATH}` with `${fakeBin}:${process.env.PATH || ""}` (or `${bin}:${process.env.PATH || ""}` as appropriate) so the PATH entry never contains "undefined"; apply this fix to all instances of the PATH env override in the test file.test/e2e/resolver/index.ts-79-98 (1)
79-98:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate command early and keep the allowed-command message accurate.
Right now unknown commands are rejected only after metadata load and scenario resolution, and the error text (Line 127) omits
coverage. This can produce misleading failures for invalid CLI invocations.Proposed fix
function main(): number { let parsed: ReturnType<typeof parseArgs>; @@ const { command, scenarioId, contextDir, metadataDir } = parsed; + if (!["plan", "validate-state", "coverage"].includes(command)) { + process.stderr.write( + `resolver: unknown command '${command}' (expected: coverage|plan|validate-state)\n`, + ); + return 2; + } + if (command === "coverage") { @@ - process.stderr.write( - `resolver: unknown command '${command}' (expected: plan|validate-state <scenario-id>)\n`, - ); - return 2; + process.stderr.write("resolver: internal error: unreachable command branch\n"); + return 2; } catch (err) {Also applies to: 126-129
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/resolver/index.ts` around lines 79 - 98, After destructuring parsed (const { command, scenarioId, contextDir, metadataDir }), validate that command is one of the supported commands (include "coverage" alongside the other allowed commands) immediately — before calling loadMetadataFromDir or resolveScenario — and if it's invalid write a clear error to stderr listing the exact allowed commands (make sure "coverage" is included) and return the proper non-zero exit code; update the existing unknown-command error message (the one currently omitting "coverage") and apply the same early-validation logic to the later branch referenced in the review so unknown commands are rejected before any metadata loading or scenario resolution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e-parity-compare.yaml:
- Around line 82-87: Validate the github.event.inputs.legacy_script value before
executing it: ensure it matches the strict basename pattern
^test-[A-Za-z0-9._-]+\.sh$ and does not contain path separators or traversal (no
'/' or '..'), and fail the step with an error if it doesn't match; keep the
existing executability check for test/e2e/${{ github.event.inputs.legacy_script
}} but only call bash on that path after the pattern validation succeeds so only
allowed test-*.sh basenames under test/e2e are executed.
In @.github/workflows/e2e-scenarios.yaml:
- Around line 19-23: The workflow input "plan_only" currently defaults to
"false", causing manual dispatches to run the non-dry-run path that fails in
Phase 1; change the default value of the "plan_only" input to "true" so manual
runs default to dry-run, and verify the conditional that triggers the
non-dry-run path only executes when inputs.plan_only == 'false' (i.e., ensure
the steps that perform install/onboard/suites are gated by that condition).
In `@test/e2e/lib/assert/inference-works.sh`:
- Around line 53-59: The pipeline checks using printf "${out}" | grep -q
'"choices"' and '"content"' can produce false negatives under set -euo pipefail;
replace those grep pipelines with shell string tests against the out variable
(e.g. use [[ $out == *'"choices"'* ]] and [[ $out == *'"content"'* ]] in the
same file) so the checks don't rely on pipes, and make the script executable
(chmod +x) so it follows the existing executable assertion files; update the
failure messages to remain the same if tests fail.
In `@test/e2e/lib/assert/no-credentials-leaked.sh`:
- Around line 29-35: The credential regexes in the local patterns array miss
modern token formats; update the patterns array in no-credentials-leaked.sh (the
local patterns variable) to include OpenAI project keys (e.g.,
sk-proj-[A-Za-z0-9-]{...} and general sk-[A-Za-z0-9]{16,} variants), GitHub
fine-grained tokens (github_pat_[A-Za-z0-9_]{...}) and other GitHub token
prefixes (gho_, ghu_, ghs_, ghr_), and ensure existing patterns like 'sk-...',
'ghp_...', 'xox...', and 'AKIA...' remain but broadened where appropriate so the
new entries detect these modern formats; add or expand entries within the
patterns array accordingly.
In `@test/e2e/lib/context.sh`:
- Around line 65-69: _e2e_context_is_sensitive_key currently uses case patterns
like *TOKEN*/*SECRET* which are case-sensitive, so lowercase keys validated by
_e2e_context_validate_key leak secrets; update _e2e_context_is_sensitive_key to
normalize the input key to a single case (e.g., convert the parameter to
uppercase using bash expansion like key_upper=${1^^}) and run the case pattern
matching against that normalized variable so patterns (*TOKEN*, *SECRET*, etc.)
match regardless of original case.
In `@test/e2e/lib/setup/install-curl.sh`:
- Around line 40-41: The installer invocation (bash "${tmp}") can fail but its
exit status is lost because nemoclaw_refresh_install_env runs unconditionally;
change install-curl.sh to save the exit code of the installer (e.g., capture
"$?") immediately after running bash "${tmp}", then call
nemoclaw_refresh_install_env and finally exit with the saved installer status so
failures aren't masked; also make the script executable (chmod +x) so it meets
the execution requirement.
In `@test/e2e/lib/setup/install-ollama.sh`:
- Line 25: Replace the unsafe "curl | bash" pipeline by downloading
"${ollama_url}" to a temp file (use mktemp), check curl's exit status, verify
the downloaded installer checksum using the same checksum logic used by
e2e_install_curl, then execute the installer from the temp file with bash and
remove the temp file; update the block that currently contains `if ! curl -fsSL
--retry 3 --retry-delay 2 "${ollama_url}" | bash; then` to first save to a temp
path, validate the checksum, run `bash "$tempfile"` only on success, and fail
the script if any step (download, checksum, execution) fails.
In `@test/e2e/lib/setup/install-repo.sh`:
- Around line 24-28: The subshell that runs the two commands "( cd
\"${repo_root}\" || exit; npm install; npm link )" can mask failures if npm
install fails but npm link succeeds; update that subshell so npm link only runs
when npm install succeeds by chaining the commands with && (or enable errexit
inside the subshell), i.e., ensure the sequence uses "npm install && npm link"
(or "set -e" then run both) so a failing npm install causes the subshell and the
script to fail; locate the subshell containing the npm install and npm link
calls and replace the separated commands accordingly.
In `@test/e2e/lib/setup/onboard.sh`:
- Around line 20-38: The dry-run short-circuit (e2e_env_is_dry_run) currently
returns success before verifying the profile, letting invalid profiles pass;
change the flow so the profile is validated first using the existing case block
(the same cases for cloud-openclaw, cloud-hermes, local-ollama-openclaw) or a
small helper (e.g., e2e_is_supported_profile) that mirrors that case logic and
returns non-zero for unknown profiles, then perform the e2e_env_is_dry_run check
to print the “[dry-run] onboard profile=…” message and exit 0; finally, call the
actual onboarding functions (e2e_onboard_cloud_openclaw,
e2e_onboard_cloud_hermes, e2e_onboard_local_ollama_openclaw) only after
validation and dry-run handling.
In `@test/e2e/MIGRATION.md`:
- Around line 6-10: The opening summary claims "migrates all existing
`test/e2e/test-*.sh` scripts" but the status line later reads "0 / 40 scripts
migrated", creating a contradiction; update the text so both reflect the same
migration phase—either change the opening sentence to indicate a staged/partial
migration (e.g., "begins migration of ... Strategy B; legacy scripts remain...")
or update the status line to the correct migrated count; look for the exact
phrases "migrates all existing `test/e2e/test-*.sh` scripts" and "0 / 40 scripts
migrated" in MIGRATION.md and make them consistent.
In `@test/e2e/suites/inference/cloud/00-models-health.sh`:
- Around line 31-32: The piped truncation echo "${body}" | head -c 512 can fail
under set -o pipefail due to SIGPIPE; replace this pipeline by truncating the
shell variable with parameter expansion (take the first 512 bytes from the
variable named body using ${body:0:512}) and print that directly (use printf to
avoid echo semantics), keeping the separate echo afterwards to emit the newline
so behavior matches the original output.
---
Minor comments:
In `@AGENTS.md`:
- Line 30: Update the Testing Strategy section in AGENTS.md to align with the
scenario-based E2E wording on line referencing `test/e2e/`: explicitly state
that end-to-end tests use the scenario-based runner described in
`test/e2e/README.md`, and restrict the note about running on ephemeral Brev
instances to only the branch-validation project (mention `branch-validation` by
name). Ensure the doc follows the agent docs guidelines by keeping the structure
consistent (section title "Testing Strategy", a short paragraph, and a bullet
pointing to `test/e2e/README.md`) and update any conflicting sentences that
currently imply all E2E runs use Brev.
In `@scripts/e2e/lint-conventions.ts`:
- Around line 145-146: The CLI parsing branch that handles the "--root" flag
(where code does: if (a === "--root") root = args.shift();) must validate that a
value was actually provided; update the parsing logic around the argument loop
(referencing the variables/identifiers a, args, and root) to check that
args.shift() returns a non-empty, non-option string and if not, print a concise
usage/error message and exit with code 2 (e.g., process.exit(2)) to treat this
as CLI misuse rather than silently falling back to auto root. Ensure the same
guard rejects cases where the next token starts with "-" so a missing/invalid
value is handled.
- Around line 76-84: The rule object with id "no-section-call" currently only
checks for /^section\s+["']/ in its test function; update that test's regex to
also detect e2e_section and the combined form section/e2e_section (for example:
/^(?:section|e2e_section|section\/e2e_section)\s+["']/) so the rule enforces the
documented coverage; modify the regex inside the test arrow function that
iterates lines in the "no-section-call" rule accordingly.
In `@test/e2e-lib-helpers.test.ts`:
- Line 198: Three occurrences build environment overrides like `{ PATH:
`${bin}:${process.env.PATH}` }` which can produce `:undefined`; update each
override to use the established POSIX separator and a safe fallback, e.g.
replace `${bin}:${process.env.PATH}` with `${fakeBin}:${process.env.PATH || ""}`
(or `${bin}:${process.env.PATH || ""}` as appropriate) so the PATH entry never
contains "undefined"; apply this fix to all instances of the PATH env override
in the test file.
In `@test/e2e-metadata-final-hygiene.test.ts`:
- Around line 67-76: The code reads
meta.expectedStates.expected_states[sc.expected_state] without verifying the key
exists, so add an explicit existence check for sc.expected_state in
meta.expectedStates.expected_states before using it: if sc.expected_state is
missing or not a key, push a problem like `${id}: unknown expected_state
'${sc.expected_state}'` (or similar) and continue; only then set state =
meta.expectedStates.expected_states[sc.expected_state] and compute isNegative
from state?.failure?.expected. This ensures bad expected_state references are
caught even when sc.suites is non-empty.
In `@test/e2e/lib/assert/sandbox-alive.sh`:
- Around line 31-35: The current grep -qE uses unescaped ${name} which lets
regex metacharacters in sandbox names break matching; change the check that runs
"nemoclaw list | grep -qE ..." to a literal token equality check (e.g., use awk
or a loop) that compares fields to the variable rather than interpolating it
into a regex: capture the output of "nemoclaw list", pass the shell variable
name as a literal (awk -v name="$name") and scan fields (for (i=1;i<=NF;i++) if
($i==name) found=1) and base the success/failure on that result instead of the
current grep -qE pattern. Ensure you still write the same error message using
${name} when not found.
In `@test/e2e/MIGRATION.md`:
- Line 62: The legend line currently uses emoji symbols ("⬜", "🟨", "✅", "🔵")
which violates the "No emoji in technical prose" rule; replace them with
plain-text tokens or words (e.g., "NOT_STARTED", "IN_PROGRESS", "MIGRATED",
"PARITY_VERIFIED" or simply "not started · in progress · migrated · parity
verified") so the line reads without emoji while preserving the same meaning;
update the line containing "Legend: ⬜ not started · 🟨 in progress · ✅ migrated
· 🔵 parity verified" accordingly.
In `@test/e2e/resolver/index.ts`:
- Around line 79-98: After destructuring parsed (const { command, scenarioId,
contextDir, metadataDir }), validate that command is one of the supported
commands (include "coverage" alongside the other allowed commands) immediately —
before calling loadMetadataFromDir or resolveScenario — and if it's invalid
write a clear error to stderr listing the exact allowed commands (make sure
"coverage" is included) and return the proper non-zero exit code; update the
existing unknown-command error message (the one currently omitting "coverage")
and apply the same early-validation logic to the later branch referenced in the
review so unknown commands are rejected before any metadata loading or scenario
resolution.
In `@test/e2e/resolver/validator.ts`:
- Around line 60-77: The compare function declares an unused parameter `key`
which violates the lint rule for unused variables; update the function signature
in `compare` to remove the `key` parameter (or rename it to `_key` if you prefer
to signal intentional unused) and then update all call sites (e.g., the call
inside `validateExpectedState` where `compare(key, expected, actual)` is invoked
and the other occurrences around lines 91-94) to pass only the remaining
parameters (or keep passing `_key` if you renamed it), ensuring types/signatures
match.
In `@test/e2e/suites/inference/cloud/02-inference-local-from-sandbox.sh`:
- Around line 29-30: The probe currently treats any 2xx as success but doesn’t
assert the response body is non-empty; update the inference probe after the curl
that sets the variable body (from the nemoclaw shell call) to validate that body
is not empty—e.g., check that the length of the variable body (used in the
printf) is greater than zero and fail the script (non-zero exit/print a clear
error) if it is empty so empty-success responses are treated as failures.
In `@test/e2e/suites/inference/ollama-gpu/00-ollama-models-health.sh`:
- Around line 25-26: The test currently assigns the curl response to the
variable body and prints the first 512 chars but does not fail if body is empty;
update the logic after the curl call (the body="$(curl ...)" assignment and the
printf) to detect an empty response (e.g., test if body is zero-length) and exit
non‑zero with a clear error message when empty so the step fails on empty 2xx
responses; reference the body variable and the curl invocation in your change
and ensure the printf remains for debugging when non-empty.
In `@test/e2e/suites/messaging/README.md`:
- Around line 8-9: The README's onboarding reference is stale: replace the
mention of onboard.ts with the correct shell helper onboard.sh and clarify that
messaging migration is wired through the shell onboarding helpers (onboard.sh)
rather than a TypeScript script; update the sentence that mentions "policy
preset OR `onboard.ts`" to reference "policy preset OR `onboard.sh` (shell
onboarding helpers)" and ensure the surrounding text explains that onboarding is
performed via the shell helper workflow.
In `@test/e2e/suites/platform/macos/00-macos-smoke.sh`:
- Around line 14-17: Update the ShellCheck source hint comments to point to the
correct nested lib path: change the two comments that currently read "shellcheck
source=../../lib/env.sh" and "shellcheck source=../../lib/context.sh" to use
"../../../lib/env.sh" and "../../../lib/context.sh" respectively so ShellCheck
resolves the intended files referenced by the subsequent . "${LIB_DIR}/env.sh"
and . "${LIB_DIR}/context.sh" source lines; ensure both hint comments are
corrected in the same script to satisfy the repository ShellCheck rules
(.shellcheckrc).
In `@test/e2e/suites/sandbox/README.md`:
- Around line 29-30: In the README entry titled "Policy preservation during
rebuild is untested." update the two lowercase occurrences of "telegram" to the
proper noun "Telegram" (e.g., in the UAT references UAT `#1952` and UAT `#2010`
lines) so both mentions of the messaging platform are capitalized consistently.
In `@test/e2e/suites/security/credentials/00-credentials-present.sh`:
- Around line 11-14: Update the ShellCheck source hint paths used by the script
lines that source env.sh and context.sh so they reflect the script's actual
directory depth: change the source hints from ../../lib/... to ../../../lib/...
for the lines sourcing ". \"${LIB_DIR}/env.sh\"" and ".
\"${LIB_DIR}/context.sh\""; ensure the corrected hints are the only change and
that the script remains covered by the repository .shellcheckrc linting rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7146ad7a-3e0c-461e-bc05-64bd68afcaf8
📒 Files selected for processing (81)
.github/workflows/e2e-parity-compare.yaml.github/workflows/e2e-scenarios.yaml.gitignoreAGENTS.mdscripts/e2e/compare-parity.shscripts/e2e/lint-conventions.tstest/e2e-context-helper.test.tstest/e2e-convention-lint.test.tstest/e2e-coverage-report.test.tstest/e2e-expected-state-validator.test.tstest/e2e-lib-helpers.test.tstest/e2e-metadata-final-hygiene.test.tstest/e2e-scenario-additional-families.test.tstest/e2e-scenario-first-migration.test.tstest/e2e-scenario-resolver.test.tstest/e2e-scenario-schema.test.tstest/e2e-scenarios-workflow.test.tstest/e2e-suite-runner.test.tstest/e2e/MIGRATION.mdtest/e2e/README.mdtest/e2e/coverage-report.shtest/e2e/expected-states.yamltest/e2e/lib/artifacts.shtest/e2e/lib/assert/README.mdtest/e2e/lib/assert/gateway-alive.shtest/e2e/lib/assert/inference-works.shtest/e2e/lib/assert/messaging-bridge-reachable.shtest/e2e/lib/assert/no-credentials-leaked.shtest/e2e/lib/assert/policy-preset-applied.shtest/e2e/lib/assert/sandbox-alive.shtest/e2e/lib/cleanup.shtest/e2e/lib/context.shtest/e2e/lib/emit-context-from-plan.shtest/e2e/lib/env.shtest/e2e/lib/fixtures/README.mdtest/e2e/lib/fixtures/_fake-http-stub.shtest/e2e/lib/fixtures/fake-discord.shtest/e2e/lib/fixtures/fake-openai.shtest/e2e/lib/fixtures/fake-slack.shtest/e2e/lib/fixtures/fake-telegram.shtest/e2e/lib/fixtures/older-base-image.shtest/e2e/lib/logging.shtest/e2e/lib/sandbox-exec.shtest/e2e/lib/setup/README.mdtest/e2e/lib/setup/install-curl.shtest/e2e/lib/setup/install-launchable.shtest/e2e/lib/setup/install-ollama.shtest/e2e/lib/setup/install-repo.shtest/e2e/lib/setup/install.shtest/e2e/lib/setup/onboard.shtest/e2e/parity-map.yamltest/e2e/resolver/coverage.tstest/e2e/resolver/index.tstest/e2e/resolver/js-yaml.d.tstest/e2e/resolver/load.tstest/e2e/resolver/plan.tstest/e2e/resolver/schema.tstest/e2e/resolver/validator.tstest/e2e/run-scenario.shtest/e2e/run-suites.shtest/e2e/scenarios.yamltest/e2e/suites.yamltest/e2e/suites/inference/cloud/00-models-health.shtest/e2e/suites/inference/cloud/01-chat-completion.shtest/e2e/suites/inference/cloud/02-inference-local-from-sandbox.shtest/e2e/suites/inference/ollama-auth-proxy/00-proxy-reachable.shtest/e2e/suites/inference/ollama-gpu/00-ollama-models-health.shtest/e2e/suites/inference/ollama-gpu/01-ollama-chat-completion.shtest/e2e/suites/lifecycle/README.mdtest/e2e/suites/messaging/README.mdtest/e2e/suites/onboarding/README.mdtest/e2e/suites/onboarding/hermes/00-hermes-health.shtest/e2e/suites/platform/macos/00-macos-smoke.shtest/e2e/suites/platform/wsl/00-wsl-smoke.shtest/e2e/suites/sandbox/README.mdtest/e2e/suites/security/README.mdtest/e2e/suites/security/credentials/00-credentials-present.shtest/e2e/suites/smoke/00-cli-available.shtest/e2e/suites/smoke/01-gateway-health.shtest/e2e/suites/smoke/02-sandbox-listed.shtest/e2e/suites/smoke/03-sandbox-shell.sh
Delete the 8 placeholder sub-READMEs under lib/{assert,fixtures,setup}/
and suites/{lifecycle,messaging,onboarding,sandbox,security}/ — their
roadmap content was stale before it even landed, and the yaml files
(scenarios.yaml, expected-states.yaml, suites.yaml) plus MIGRATION.md
are the real sources of truth.
Rewrite test/e2e/README.md to be brief but broad: the core model, the
three declarative inputs, how to run, file layout, and how to extend.
Removes the matrix catalog (now lives in scenarios.yaml) and the
how-to-add-* cookbook (now: 'read the schema, edit the yaml').
Consolidates all documentation for the e2e setup-scenario matrix under test/e2e/docs/. The top level of test/e2e/ now contains only code, runners, and declarative inputs — no prose. - test/e2e/README.md -> test/e2e/docs/README.md - test/e2e/MIGRATION.md -> test/e2e/docs/MIGRATION.md Fix 4 relative links inside docs/README.md that pointed at sibling yaml/resolver paths; now they go one directory up.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/docs/MIGRATION.md (1)
6-10:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve the migration-status contradiction in the opening summary.
The opening states "This PR migrates all existing
test/e2e/test-*.shscripts" but Line 58 reports0 / 40 scripts migrated, and the PR objectives confirm this is "test-harness scaffolding only; no product behavior changes." This contradiction makes the scope and merge gate unclear.📝 Proposed fix
-This PR migrates all existing `test/e2e/test-*.sh` scripts into the -scenario-based runner introduced by PR `#3290`. Full deep migration -(Strategy B). Legacy scripts remain in the repo during this PR and run -in parallel for 1–2 nightly cycles after merge; a follow-up PR retires -them once parity is verified. +This PR introduces Phase 1 migration infrastructure for moving existing +`test/e2e/test-*.sh` scripts into the scenario-based runner introduced +by PR `#3290` (Strategy B). Legacy scripts remain unchanged in this PR; +migration waves 1–11 will move them into scenarios, followed by parity +verification and retirement in a future phase.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/docs/MIGRATION.md` around lines 6 - 10, Update the opening summary in MIGRATION.md to remove the contradictory claim that "This PR migrates all existing `test/e2e/test-*.sh` scripts" and instead state that this PR provides scenario-runner scaffolding only (no scripts migrated yet), aligning it with the reported "0 / 40 scripts migrated" status and the PR objective that this is test-harness scaffolding only; ensure the updated text makes the merge gate clear by explaining legacy scripts remain and full migration will follow in a separate PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/docs/README.md`:
- Around line 19-23: The relative links in test/e2e/docs/README.md still point
to peer files in test/e2e; update each link target (e.g., `scenarios.yaml`,
`expected-states.yaml`, `suites.yaml` and any other single-segment filenames
referenced in the document) to prepend `../` so they resolve to the parent
directory (e.g., `../scenarios.yaml`, `../expected-states.yaml`,
`../suites.yaml`); search the file for plain filename links and replace them
with the `../`-prefixed versions to fix all broken references.
---
Duplicate comments:
In `@test/e2e/docs/MIGRATION.md`:
- Around line 6-10: Update the opening summary in MIGRATION.md to remove the
contradictory claim that "This PR migrates all existing `test/e2e/test-*.sh`
scripts" and instead state that this PR provides scenario-runner scaffolding
only (no scripts migrated yet), aligning it with the reported "0 / 40 scripts
migrated" status and the PR objective that this is test-harness scaffolding
only; ensure the updated text makes the merge gate clear by explaining legacy
scripts remain and full migration will follow in a separate PR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 55bd4844-0af6-45a2-8537-068a41e01a54
📒 Files selected for processing (2)
test/e2e/docs/MIGRATION.mdtest/e2e/docs/README.md
…untime/
Moves orchestration machinery (the part that runs any scenario/suite
and doesn't depend on scenario- or suite-specific domain) into a
dedicated runtime/ bucket.
- run-scenario.sh, run-suites.sh, coverage-report.sh -> runtime/
- resolver/*.ts -> runtime/resolver/
- lib/{context,env,logging,cleanup,artifacts,sandbox-teardown}.sh -> runtime/lib/
No behavior change — paths only. Internal source-paths in moved files
are updated to the new layout as part of this commit.
Each yaml now lives in the directory that owns its concern: - scenarios.yaml -> nemoclaw_scenarios/scenarios.yaml - expected-states.yaml -> nemoclaw_scenarios/expected-states.yaml - suites.yaml -> validation_suites/suites.yaml - parity-map.yaml -> docs/parity-map.yaml (migration-era tracker) Paths are updated in callers in subsequent commits.
…narios/ The install dimension is part of scenario setup. Colocates install scripts with the other scenario-side files. Renames for clarity at the same time: lib/setup/install.sh -> install/dispatch.sh lib/setup/install-repo.sh -> install/repo-current.sh lib/setup/install-curl.sh -> install/public-curl.sh lib/setup/install-launchable.sh -> install/launchable.sh lib/setup/install-ollama.sh -> install/ollama.sh lib/install-path-refresh.sh -> install/helpers/install-path-refresh.sh lib/emit-context-from-plan.sh -> helpers/emit-context-from-plan.sh
The old lib/setup/onboard.sh was a single file: one dispatcher +
three per-path worker functions (cloud-openclaw, cloud-hermes,
local-ollama-openclaw). Splitting matches the install-side layout
and lets each onboarding path be found and edited in one file.
lib/setup/onboard.sh -> onboard/dispatch.sh + one file per path:
cloud-openclaw.sh
cloud-hermes.sh
local-ollama-openclaw.sh
dispatch.sh sources the three sibling files and retains the
e2e_onboard() case-statement dispatcher.
Fixtures (fake-openai, fake-telegram, fake-discord, fake-slack, older-base-image, and the shared _fake-http-stub) are all scenario- setup scaffolding — they exist to support scenarios that need a fake inference endpoint, fake messaging provider, or an older base image to exercise upgrade paths. lib/fixtures/ -> nemoclaw_scenarios/fixtures/
…nder validation_suites/ Everything that validates a set-up scenario now lives in one bucket. lib/assert/ -> validation_suites/assert/ lib/sandbox-exec.sh -> validation_suites/sandbox-exec.sh suites/smoke/ -> validation_suites/smoke/ suites/inference/ -> validation_suites/inference/ suites/platform/ -> validation_suites/platform/ suites/security/ -> validation_suites/security/ suites/onboarding/hermes/ -> validation_suites/hermes/ The hermes/ rename deliberately drops the 'onboarding/' parent. The Hermes suite is functional validation of a Hermes sandbox, not an onboarding step — keeping it under 'onboarding/' alongside nemoclaw_scenarios/onboard/ invited confusion.
Vitest tests, parity tooling, and CI workflows now point at the new per-bucket locations: scenarios.yaml / expected-states.yaml -> nemoclaw_scenarios/ suites.yaml -> validation_suites/ parity-map.yaml -> docs/ run-*.sh / coverage-report.sh -> runtime/ Updated files: - .github/workflows/e2e-scenarios.yaml - .github/workflows/e2e-parity-compare.yaml - scripts/e2e/compare-parity.sh - scripts/e2e/lint-conventions.ts - 12 test/e2e-*.test.ts files - test/e2e/docs/README.md (relative-link repairs) No functional changes.
…th-refresh install-path-refresh.sh moved from test/e2e/lib/ to test/e2e/nemoclaw_scenarios/install/helpers/. The 8 legacy test-*.sh scripts that source it need their source paths updated while the scripts still exist (they will be deleted wave-by-wave during migration). Updated (no logic change, only the source path line): test-diagnostics.sh test-deployment-services.sh test-credential-migration.sh test-cloud-onboard-e2e.sh test-cloud-inference-e2e.sh test-network-policy.sh test-openclaw-inference-switch.sh test-hermes-inference-switch.sh
The e2e-metadata-final-hygiene test asserts that test/e2e/docs/README.md contains a heading matching /adding a new setup scenario|how to add/i. The recent README rewrite used 'Adding to the matrix' which didn't match either alternative. Rename to 'How to add a scenario, state, or suite' — clearer and keeps the test green.
…ded on main Drift from main — test-hermes-inference-switch.sh, test-openclaw-inference-switch.sh, and test-openshell-gateway-upgrade.sh were added to main after this branch forked. The convention lint correctly flagged them as orphans (Risk #1: new legacy scripts with no migration mapping). Added placeholder entries (empty scenario + assertions) to bring the map back in sync with the legacy surface area. Actual mappings land with their respective migration waves.
E2E Advisor RecommendationRequired E2E: None Full advisor summaryPi Semantic E2E AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Summary
Supersedes #3290. Introduces declarative scenario-based E2E orchestration and the Phase 1 pre-flight infrastructure that prepares for wave-by-wave migration of the existing
test/e2e/test-*.shscripts into the new structure. The old legacy scripts remain in place and unchanged; no E2E workflow is deleted or muted by this PR.The scope grew beyond the original #3290 title after rebase against
mainand the addition of Phase 1 helpers, fixtures, parity harness, and convention lint. Because upstream rules forbid force-push on feature branches, this is opened as a fresh PR with clean linear history rather than rewriting #3290 in place.Related Issue
Supersedes #3290. No single tracking issue — foundation for the E2E scenario-matrix refactor discussed across #2737, #3154, and the E2E test-gap audit.
Changes
Part 1 — scenario-based E2E orchestration (commits 1–3, previously in #3290)
test/e2e/scenarios.yaml,expected-states.yaml,suites.yaml) for initial scenarios: Ubuntu cloud OpenClaw/Hermes, macOS, WSL, GPU local Ollama, Brev launchable, no-Docker negative.test/e2e/resolver/(invoked viatsx, usesjs-yamlalready in rootpackage.json; unit-tested in the Vitestcliproject).test/e2e/lib/that produce a normalized context at$E2E_CONTEXT_DIR(default.e2e/). Wraps existingsandbox-teardown.shandinstall-path-refresh.sh; does not duplicate them.run-scenario.sh,run-suites.sh,coverage-report.shwith--plan-only,--dry-run(E2E_DRY_RUN=1) flags.test/e2e/suites/{smoke,inference,credentials,lifecycle,messaging,onboarding,platform,sandbox,security}/..github/workflows/e2e-scenarios.yaml— manual (workflow_dispatch) withscenario,plan_only, andsuite_filterinputs. Existing workflows are unchanged..gitignore: add.e2e/runtime context directory.Part 2 — Phase 1 pre-flight migration infrastructure (commits 4–7, new)
test/e2e/MIGRATION.md— in-tree tracker mapping each legacytest-*.shscript to its target scenario + suite home. Includes the reuse-absorption table showing where pair-variant coverage (e.g. OpenClaw + Hermes) collapses into a single suite.test/e2e/lib/fixtures/):fake-openai.sh,fake-{telegram,discord,slack}.sh(built on shared_fake-http-stub.sh),older-base-image.sh. Follow the inline-Nodehttp.createServerpattern fromtest-messaging-providers.sh; no new framework.test/e2e/lib/):logging.sh(PASS/FAIL/=== Phase markers),sandbox-exec.sh(e2e_sandbox_exec/e2e_sandbox_exec_stdin).env.shauto-sourceslogging.sh.test/e2e/lib/assert/):inference-works.sh,no-credentials-leaked.sh,policy-preset-applied.sh,messaging-bridge-reachable.sh.test/e2e/lib/setup/):install-{repo,curl,ollama,launchable}.sh+install.shdispatcher, so each migrated suite can compose the minimal install it needs.run-scenario.sh --validate-onlyflag (mutually exclusive with--plan-only) that reads$E2E_CONTEXT_DIR/context.env, runs the expected-state validator, and emits JSON — skipping install/onboard/suites.resolver/index.tsgainsE2E_PROBE_OVERRIDES_JSONescape hatch for probe keys with embedded underscores.scripts/e2e/lint-conventions.ts(6 rules, Vitest-backed in thecliproject) blocks new orphantest-*.shscripts and enforcesparity-map.yamlcompleteness.scripts/e2e/compare-parity.sh,test/e2e/parity-map.yaml(seeded with 39 legacy-script placeholder entries),.github/workflows/e2e-parity-compare.yaml. This is the mechanism that will drive waves 2–12 of the migration — each migrated suite gets dispatched against the parity-compare workflow until zero divergence vs. its legacy counterpart.What this PR does not change
nightly-e2e.yaml,macos-e2e.yaml,wsl-e2e.yaml,ollama-proxy-e2e.yaml,e2e-branch-validation.yaml,sandbox-images-and-e2e.yaml) run as before.test/e2e/test-*.shscripts are not deleted. Per-wave retirement happens in follow-up PRs onceparity-map.yamlentries report zero divergence.Migration sequencing (follow-up PRs)
Waves 2–12 migrate the 40 legacy
test-*.shscripts into the scenario matrix, one wave per PR, gated by parity-compare zero-divergence. Phase 13 is the merge gate that flips the default CI dispatch fromnightly-e2e.yamltoe2e-parity-compare.yaml.Type of Change
Verification
npx prek run --all-filespassesnpm testpasses — 3,258clitests, 0 failures (41 new tests from Phase 1)test/e2e/README.md,test/e2e/MIGRATION.md,AGENTS.md)make docsbuilds without warnings (doc changes only)Fork E2E evidence (Phase 1 branch dispatch)
Dispatched
nightly-e2e.yamlonjyaunches/NemoClawagainst this branch (with a temporary local-only repo-gate lift, not part of this PR). Result:gpu-e2e,gpu-double-onboard-e2e— need NVIDIA self-hosted GPU runnerhermes-slack-e2e— needslinux-amd64-cpu4runner (NVIDIA-internal)hermes-e2ehangs duringnemohermes onboardon fork (not a Phase 1 regression — same legacy script, no changes to onboard path)35/35 of jobs that could run on the fork pass. This attests that Phase 1 infrastructure does not regress any existing E2E coverage.
Signed-off-by: Julie Yaunches jyaunches@nvidia.com
Summary by CodeRabbit
New Features
Tests
Documentation
Chores