Skip to content

communicate: add codex adapter via app-server JSON-RPC (parity with claude-sdk) #803

@willwashburn

Description

@willwashburn

Problem

Communicate-mode adapters in packages/sdk/src/communicate/adapters/ cover
claude-sdk, pi, ai-sdk, crewai, google-adk, langgraph, and
openai-agents — but there is no codex adapter. Codex is currently
reachable only through tier-1 PTY mode (relay.codex.spawn(...) in
packages/sdk/src/relay.ts:463), which means:

  • Inbound messages are delivered as keystroke injection into a PTY, racing
    with TUI state and dependent on per-platform node-pty / conpty / winpty
    bindings (see the per-platform broker binaries broker-darwin-arm64,
    broker-linux-arm64, broker-win32-x64, …).
  • "Agent ready" detection is prompt-sniffing in broker/src/helpers.rs
    (cf. the broader fix in broker: composable wait-conditions for CLI readiness (steal from ht) #800).
  • There is no structured signal for turn started / turn completed; the
    delivery_injected → active → verified state machine has to infer from
    PTY output.
  • thread/fork-style multi-agent patterns (branch N workers from one
    checkpoint) aren't expressible at all.

The reason claude-sdk is in the structured tier and codex isn't is purely
that Anthropic shipped an embeddable library
(@anthropic-ai/claude-agent-sdk) and OpenAI hadn't shipped a stable
control plane. That has now changed.

Prior art

OpenAI now ships codex app-server:
a JSON-RPC 2.0 control plane over stdio (default), unix socket, or
experimental websocket. It is what the official Codex VS Code extension
talks to. Surface area relevant to a relay adapter:

  • thread/start | resume | fork — synchronous threadId at spawn time,
    plus ephemeral: true for in-memory forks.
  • turn/start | steer | interrupt — structured input delivery and clean
    cancel/steer mid-flight.
  • item/started, item/completed, turn/started, turn/completed
    notifications — the structured analog of PostToolUse / Stop hooks.
  • mcpServerStatus/list, config/mcpServer/reload — register
    relaycast as an MCP server on the agent without re-spawning.
  • initialize / initialized handshake as the readiness signal (replaces
    prompt sniffing for the codex case).

Schema is regenerated per Codex version
(codex app-server generate-ts); we already pin
version: '0.124.0' in packages/shared/cli-registry.yaml.

Proposal

Add packages/sdk/src/communicate/adapters/codex.ts, modeled on
claude-sdk.ts. The shape mapping is direct:

claude-sdk.ts does codex adapter equivalent
Add relaycast to options.mcpServers Ensure relaycast MCP server present in config.toml (or register via config/mcpServer/reload) before thread/start
PostToolUse hook returns systemMessage Subscribe to item/completed; on relevant items, drain inbox via relay.inbox() and call turn/steer with the formatted messages
Stop hook returns { continue: true, systemMessage } On turn/completed, drain inbox; if non-empty, prepend formatted messages to the next turn/start input

Unlike claude-sdk, this adapter is out-of-process — it spawns or attaches
to a codex app-server over stdio JSON-RPC. The wiring is more code than
an in-process hook adapter, but it sits in the same architectural tier and
removes the PTY dependency for codex.

Sketch

// packages/sdk/src/communicate/adapters/codex.ts
export interface CodexAdapterOptions {
  cwd?: string;
  model?: string;
  permissionProfile?: string;
  // Stdio transport by default; ws/unix-socket as future opt-ins.
}

export function onRelay(
  name: string,
  options: CodexAdapterOptions,
  relay: RelayLike = new Relay(name),
): CodexHandle {
  // 1. Spawn `codex app-server` with stdio transport, do initialize
  //    handshake with clientInfo.name = 'agent_relay'.
  // 2. Ensure relaycast MCP server registered.
  // 3. thread/start → record threadId synchronously.
  // 4. Subscribe to item/* and turn/* notifications.
  // 5. On item/completed: drainInbox(relay) → turn/steer if non-empty.
  // 6. On turn/completed: drainInbox(relay) → store for next turn/start.
  // 7. Expose .send(text) → turn/start, .interrupt() → turn/interrupt,
  //    .fork(opts) → thread/fork, .close() → graceful shutdown.
}

Files to touch

  • New: packages/sdk/src/communicate/adapters/codex.ts
  • New: packages/sdk/src/communicate/adapters/codex-jsonrpc.ts
    (transport + handshake; reusable if we ever talk to other JSON-RPC agents
    the same way — acp-bridge is precedent)
  • Update: packages/sdk/src/communicate/index.ts
    add onCodexRelay export and a discriminator branch in onRelay()
  • Update: packages/sdk/src/communicate/adapters/index.ts
  • Update: packages/sdk/package.json — no new runtime dep; codex is a
    binary we shell out to (already in cli-registry.yaml)
  • Tests: mirror tests/communicate/adapters/test_claude_sdk.py against a
    fake JSON-RPC peer
  • Docs: short note in the communicate README about the new adapter and the
    PTY-vs-app-server tradeoff

Caveats / scope

  • Doesn't replace PTY mode. Tier-1 PTY remains for foreground
    "user-watching" sessions and for claude / gemini / cursor. This adapter
    targets SDK-driven background workers and headless/CI flows where TTY
    allocation is awkward.
  • No user-facing TUI. The app-server is a backend; if relay wants to
    surface activity to a watching human, it has to render the
    item/* stream itself (out of scope here).
  • Schema versioning. Pin against codex 0.124.0 from
    cli-registry.yaml; add a version probe at handshake time and fail fast
    if the connected app-server is older than what the adapter expects. Some
    methods (dynamic tools, realtime, webrtc) require
    capabilities.experimentalApi — adapter opts in only for what it uses.
  • WebSocket transport is documented as experimental/unsupported in the
    codex README; this adapter uses stdio. Unix socket later if useful.
  • Auth / config inheritance. The adapter should run codex app-server
    with the user's existing $CODEX_HOME, so ChatGPT/API auth and
    config.toml settings carry over without extra wiring.

Why now

  • Codex now has a stable control plane to bind to; communicate mode has had
    a codex-shaped hole since it was introduced.
  • It removes one of the two things keeping the broker on the hot path for
    codex (the other being foreground PTY UX).
  • It's a clean prerequisite for richer telemetry / trajectory consumption
    per turn — turn/completed includes structured token usage and item
    history that PTY parsing has to reconstruct heuristically.

Effort

Medium. Adapter + JSON-RPC client + tests is ~500–1000 LOC. No new runtime
deps. Risk concentrated in the lifecycle / reconnection logic; the
event-shape mapping itself is well-defined by the codex schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions