Skip to content

Add per-sub-agent timeout / token budget / cancellation API #1276

@kevinlims

Description

@kevinlims

Summary

Consumers running multi-sub-agent sessions have no way to bound an individual sub-agent's execution time or cost. The only cancellation APIs are session-scoped (session.abort(), session.disconnect()), which kill the whole session including other concurrent sub-agents. A consumer with one runaway sub-agent has to choose between letting it burn cost/SLA or killing the whole session and losing the work of other agents.

Scenario

const session = await client.createSession({
  customAgents: [
    { name: "researcher", description: "Codebase research", prompt: "..." },  // expected ~5 min
    { name: "fixer",      description: "Apply fixes",      prompt: "..." },  // expected ~15 min
  ],
});

await session.sendAndWait({ prompt: "Dispatch researcher then fixer." });
// On a complex codebase, researcher runs 90 minutes instead of 5.
// There's no way to kill JUST the researcher — only the whole session.

What's missing

CustomAgentConfig currently accepts {name, displayName?, description?, tools?, prompt, mcpServers?, infer?, skills?}. No maxTurnMs, maxTokens, maxToolCalls, or budget field.

Session class exposes session.abort() (cancels the current message, session remains valid — though see #1273 for in-flight sendAndWait still hanging) and session.disconnect() (kills the whole session). No session.agent("name").cancel() or session.cancelSubagent(toolCallId) equivalent.

Suggested API

  • Per-agent budget options on CustomAgentConfig: maxTurnMs, maxTokens, maxToolCalls. SDK/CLI enforces them and emits subagent.budget_exceeded events when violated.
  • Per-sub-agent cancellation API: session.cancelSubagent({ name?, toolCallId? }) that kills a specific sub-agent without disconnecting the session.

Consumer impact

Production pipelines with cost or SLA requirements have to layer their own per-agent monitor watching subagent.started / subagent.completed events and tracking per-agent runtime. On overage they either log-and-tolerate (losing cost containment) or kill the whole session (losing work of other concurrent agents). Both choices are bad.

Production observation: a single misbehaving sub-agent (multi-tenant analysis worker) burned ~$150-200 in a single session before the surrounding watchdog hit the per-session ceiling. With a per-agent budget, the loss would have been ~$15.

Related

Environment

- SDK: @github/copilot-sdk@0.3.0
- CLI: @github/copilot@1.0.45
- Node: 22 LTS
- OS: Windows 11
- Model: claude-sonnet-4-6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions