docs(evals): add initial integrations e2e spec by khaliqgant · Pull Request #127 · AgentWorkforce/relayfile

khaliqgant · 2026-05-09T19:33:55Z

Summary

Adds a live E2E eval suite for the initial Relayfile integrations: Linear, Slack, Notion, and GitHub.
Specifies setup/OAuth/mount prerequisites, discovery-contract checks, provider-specific file-native writeback flows, cleanup rules, and evidence artifacts.
Adds a rubric for PASS/BLOCKED/FAIL review so another agent can execute the run consistently.

Testing

Not run live; this is a live-provider eval spec requiring OAuth and disposable provider resources.
Sanity checked the Markdown suite locally for ASCII/structure before committing.

coderabbitai · 2026-05-09T19:34:09Z

📝 Walkthrough

Hidden review stack artifact

Walkthrough

This PR adds a comprehensive E2E specification for initial Relayfile integrations (Linear, Slack, Notion, GitHub): seven live cases validating mounted discovery and file-native writeback, shared polling/evidence helpers, an acceptance rubric, and trajectory records documenting the completed spec.

Changes

Initial Integrations E2E Evaluation Suite

Layer / File(s)	Summary
Foundation & Preconditions `evals/suites/initial-integrations-e2e/cases.md`	E2E suite scope spans four initial providers; global preconditions define required tools, OAuth setup, environment variables, run metadata, and safety blocking rules.
Evidence Contract & Helpers `evals/suites/initial-integrations-e2e/cases.md`	Evidence bundle contract enumerates required output artifacts and includes shared helpers (`wait_for_file_contains`, `wait_for_writeback_drain`, `wait_for_provider_roots`) for bounded polling and error capture.
Setup, Mount & Discovery `evals/suites/initial-integrations-e2e/cases.md`	`initial-e2e.setup-connect-mount` creates workspace, connects integrations, pulls and mounts state; `initial-e2e.discovery-contract` validates mounted `.adapter.md`, `.schema.json`, `.create.example.json`, forbids `new.json`, and runs Node schema validation.
Provider Writeback Cases `evals/suites/initial-integrations-e2e/cases.md`	Cases test file-native writeback for Linear (issue create/patch/delete), Slack (message/reply/reaction), Notion (page create/patch), and GitHub (PR review submit); each uses non-canonical filenames, checks read-only rejection, and records provider evidence.
Final Health & Validation `evals/suites/initial-integrations-e2e/cases.md`	`initial-e2e.final-health-and-regression-sweep` captures final writeback/Relayfile status, ensures no pending/dead-lettered operations, and enforces absence of `new.json`.
Acceptance Rubric `evals/suites/initial-integrations-e2e/rubric.md`	Defines PASS/BLOCKED/FAIL criteria, required connections and discovery behaviors, terminal writeback state (`pending: 0`, empty `deadLettered`), and an evidence-review checklist.
Trajectory Metadata `.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json`, `.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md`, `.trajectories/index.json`	Adds a completed trajectory JSON/markdown documenting the spec completion and updates the trajectories index `lastUpdated` and entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Add relayfile eval harness #114

Poem

🐰 I hopped through mounts and schemas bright,
Connected Slack and Linear by moonlight,
Notion and GitHub joined the play,
Files wrote back and then ran away—
E2E done! I nibble a carrot of delight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding documentation for an initial integrations E2E evaluation spec.
Description check	✅ Passed	The description is directly related to the changeset, explaining the E2E eval suite for initial Relayfile integrations and testing approach.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/initial-integrations-e2e-eval

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json:
- Line 19: The projectId field contains a machine-specific absolute path;
replace this value with a portable repo-relative identifier or remove the
projectId entry altogether if unused. Locate the "projectId" key in the JSON
blob (symbol: projectId) and change
"/Users/khaliqgant/Projects/AgentWorkforce/relayfile" to a neutral value such as
"relayfile" or "./relayfile" (or delete the projectId property) so the metadata
no longer leaks local environment details.

In @.trajectories/index.json:
- Line 244: Replace the absolute user-specific path value in the "path" field
inside .trajectories/index.json with a repository-relative path (e.g., change
"/Users/khaliqgant/Projects/AgentWorkforce/relayfile/.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json"
to ".trajectories/completed/2026-05/traj_brjdrgcnnwhs.json"); update the JSON
entry so the "path" key holds the repo-relative string to avoid leaking local
environment details and ensure portability.

In `@evals/suites/initial-integrations-e2e/cases.md`:
- Around line 341-343: The jq invocations that build JSON from literals/args
(the lines using jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " +
$run)}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>") must run with no
input, so add the -n flag to jq (e.g., jq -n --arg run "$EVAL_RUN_ID" ...) to
prevent jq from waiting for stdin or failing in automated runs; apply the same
-n addition to the other similar jq invocation around the EVAL_LOCAL_DIR path at
the second occurrence.
- Around line 176-183: After mounting in background with relayfile mount, add a
bounded readiness poll before running the deterministic assertions (relayfile
status, relayfile tree, relayfile writeback status): call the existing polling
helper to wait until the mount is reported ready (e.g., relayfile status shows
the workspace is mounted and provider roots/pending==0, or relayfile tree
returns the expected root listing) with a sensible timeout and interval, then
proceed to tee the outputs; apply the same readiness-wait change to the similar
block referenced at lines 187-194 to avoid race flakes.
- Around line 102-103: The grep in wait_for_file_contains currently treats
$needle as a regex which can mis-match when needle contains metacharacters;
change the check that uses grep -q "$needle" "$target" to use fixed-string mode
grep -Fq "$needle" "$target" so $needle is matched literally (locate the shell
function wait_for_file_contains and the lines referencing target and needle to
update the grep invocation).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e0a4326f-4ab3-4725-8385-8f41917129a4

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad8074 and 78c4cb8.

📒 Files selected for processing (5)

.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json
.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md
.trajectories/index.json
evals/suites/initial-integrations-e2e/cases.md
evals/suites/initial-integrations-e2e/rubric.md

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-05-09T19:36:24Z

+   jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \
+     > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"


🟡 Missing jq -n flag causes eval script to hang on stdin

The jq command on line 341 constructs a new JSON object ({description: ...}) but is missing the -n flag, unlike the create commands on lines 327, 408, and 527 which all correctly use jq -n. Without -n, jq reads from stdin and will block indefinitely in an interactive terminal. An agent following this template would copy the jq invocation as-is (only substituting the <canonical-linear-issue-path> placeholder) and produce a hanging command. The fix is to add -n to match the pattern used everywhere else in the file.

Suggested change

jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \

> "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

jq -n --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \

> "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-09T19:36:25Z

+5. Attempt a read-only mutation against the canonical issue:
+
+   ```bash
+   jq '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"


🟡 Missing jq -n flag causes eval script to hang on stdin

The jq command on line 348 constructs a new JSON object ({id: "not-the-real-id"}) but is missing the -n flag. Same root cause as the patch command above — without -n, jq reads from stdin and will block. Every other jq invocation in this file that creates a new object uses -n (lines 327, 408, 527).

Suggested change

jq '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

jq -n '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/suites/initial-integrations-e2e/cases.md`:
- Around line 198-203: The readiness/drain helper calls wait_for_provider_roots
and wait_for_writeback_drain can return non-zero but the script keeps running;
modify the block that calls wait_for_provider_roots and wait_for_writeback_drain
so failures immediately abort the run: check each command's exit status and on
non-zero print a clear error message and exit non-zero (or enable strict mode
like set -e at the top of the script), and also treat any "dead letters"
detection from wait_for_writeback_drain as a fatal condition—use the functions'
return codes (wait_for_provider_roots, wait_for_writeback_drain) to gate
continuing to the evidence collection commands (relayfile status/tree/writeback
status).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9b41f23e-1e4a-48eb-b6af-f1b55c5a6bc1

📥 Commits

Reviewing files that changed from the base of the PR and between 78c4cb8 and 98cd3f4.

📒 Files selected for processing (5)

.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json
.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md
.trajectories/index.json
evals/suites/initial-integrations-e2e/cases.md
evals/suites/initial-integrations-e2e/rubric.md

✅ Files skipped from review due to trivial changes (3)

.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md
.trajectories/index.json
.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json

coderabbitai · 2026-05-09T19:41:29Z

+wait_for_provider_roots 180
+wait_for_writeback_drain 180
+relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt"
+relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt"
+relayfile writeback status "$EVAL_WORKSPACE" --json \
+  | tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when readiness/drain checks time out or detect dead letters.

On Line 198 and Line 199, the wait helpers can return non-zero, but the script continues because the block doesn’t enforce abort semantics. That can produce misleading evidence and false PASS interpretation.

Suggested doc patch

-wait_for_provider_roots 180 -wait_for_writeback_drain 180 +wait_for_provider_roots 180 || { + echo "BLOCKED_PROVIDER_ROOTS_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" + exit 22 +} +wait_for_writeback_drain 180 || { + rc=$? + if [ "$rc" -eq 2 ]; then + echo "FAIL_DEAD_LETTERED_WRITEBACK" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" + else + echo "BLOCKED_WRITEBACK_DRAIN_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" + fi + exit 23 +} relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

wait_for_provider_roots 180

wait_for_writeback_drain 180

relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt"

relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt"

relayfile writeback status "$EVAL_WORKSPACE" --json \

| tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json"

wait_for_provider_roots 180 || {

echo "BLOCKED_PROVIDER_ROOTS_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"

exit 22

}

wait_for_writeback_drain 180 || {

rc=$?

if [ "$rc" -eq 2 ]; then

echo "FAIL_DEAD_LETTERED_WRITEBACK" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"

else

echo "BLOCKED_WRITEBACK_DRAIN_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"

fi

exit 23

}

relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt"

relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt"

relayfile writeback status "$EVAL_WORKSPACE" --json \

| tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/suites/initial-integrations-e2e/cases.md` around lines 198 - 203, The readiness/drain helper calls wait_for_provider_roots and wait_for_writeback_drain can return non-zero but the script keeps running; modify the block that calls wait_for_provider_roots and wait_for_writeback_drain so failures immediately abort the run: check each command's exit status and on non-zero print a clear error message and exit non-zero (or enable strict mode like set -e at the top of the script), and also treat any "dead letters" detection from wait_for_writeback_drain as a fatal condition—use the functions' return codes (wait_for_provider_roots, wait_for_writeback_drain) to gate continuing to the evidence collection commands (relayfile status/tree/writeback status).

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 9, 2026

View reviewed changes

docs(evals): add initial integrations e2e spec

98cd3f4

khaliqgant force-pushed the codex/initial-integrations-e2e-eval branch from 78c4cb8 to 98cd3f4 Compare May 9, 2026 19:39

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

khaliqgant merged commit 5fe347c into main May 9, 2026
7 checks passed

khaliqgant deleted the codex/initial-integrations-e2e-eval branch May 9, 2026 19:55

This was referenced May 9, 2026

Webhook docs #128

Open

docs: positioning strategy — storage bridge, integration priority, pricing, market framing #129

Closed

		jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \
		> "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

	jq '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"
	jq -n '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>"

Conversation

khaliqgant commented May 9, 2026

Summary

Testing

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hidden review stack artifact

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 9, 2026 •

edited

Loading