Skip to content

fix(mcp): forward maxResultSizeChars via _meta to bypass Zod annotation stripping#756

Merged
dltn merged 6 commits intoanthropics:mainfrom
ChristianRyan:fix/mcp-large-output-max-result-size-annotation
Apr 2, 2026
Merged

fix(mcp): forward maxResultSizeChars via _meta to bypass Zod annotation stripping#756
dltn merged 6 commits intoanthropics:mainfrom
ChristianRyan:fix/mcp-large-output-max-result-size-annotation

Conversation

@ChristianRyan
Copy link
Copy Markdown
Contributor

@ChristianRyan ChristianRyan commented Mar 27, 2026

Summary

  • Populates _meta["anthropic/maxResultSizeChars"] from ToolAnnotations(maxResultSizeChars=N) in create_sdk_mcp_server
  • Includes _meta in the JSONRPC tools/list response so the CLI subprocess receives it
  • Adds test coverage documenting the two-layer spill regression and confirming the _meta pipeline

Background

Large MCP tool results (>50K chars — DB schemas, DDLs, data profiles) are silently saved to a temp file with only a 2KB preview sent to Claude. This blocks external customers from upgrading past SDK v0.1.17 (Dec 2025).

Root cause — two independent spill layers in the bundled CLI:

Layer File Threshold Controlled by
1 mcpValidation.ts ~25K tokens MAX_MCP_OUTPUT_TOKENS env var ✓
2 toolResultStorage.ts 50K chars, hardcoded Nothing — Math.min(MCPTool=100K, DEFAULT=50K)

The MCP TypeScript SDK's ToolAnnotations Zod schema uses z.core.$strip, which silently discards unknown fields at parse time. So maxResultSizeChars cannot be passed via annotations. Instead, we use _meta with a namespaced key — the same pattern already used by anthropic/searchHint and anthropic/alwaysLoad in the CLI.

Changes

src/claude_agent_sdk/__init__.pycreate_sdk_mcp_server extracts maxResultSizeChars from ToolAnnotations and populates Tool._meta["anthropic/maxResultSizeChars"]

src/claude_agent_sdk/_internal/query.py_handle_sdk_mcp_request includes _meta in the JSONRPC tools/list response payload

tests/test_mcp_large_output.py — 18 tests documenting both spill layers, env-var passthrough, layer-2 boundary, and <persisted-output> detection

tests/test_sdk_mcp_integration.pytest_max_result_size_chars_annotation_flows_to_cli confirms _meta["anthropic/maxResultSizeChars"] appears in the tools/list response

Usage (no API change)

@tool(
    "get_schema",
    "Returns a large DB schema.",
    {},
    annotations=ToolAnnotations(maxResultSizeChars=500_000),
)
async def get_schema(args): ...

Companion CLI PR

https://github.com/anthropics/claude-cli-internal/pull/24940 — reads _meta["anthropic/maxResultSizeChars"] in client.ts and removes the Math.min clamp in getPersistenceThreshold

Test plan

  • tests/test_mcp_large_output.py — 18 tests passing
  • test_max_result_size_chars_annotation_flows_to_cli — confirms _meta pipeline end-to-end
  • Existing test_tool_annotations_in_jsonrpc still passes (standard annotations unaffected)

🤖 Generated with Claude Code

…tion fix

Two-layer spill in the bundled CLI (mcpValidation.ts + toolResultStorage.ts)
causes MCP tool results >50K chars to be saved to a temp file with only a
2KB preview sent to Claude, silently degrading quality for data-agent use
cases (DB schemas, DDLs, data profiles).

Root cause (confirmed via claude-cli-internal, 2026-03-27):
  Layer 1 — MAX_MCP_OUTPUT_TOKENS controls mcpValidation.ts (~25K token
    default). Raising this env var works and the SDK passes it through
    correctly.
  Layer 2 — toolResultStorage.ts maybePersistLargeToolResult has a
    hardcoded 50K char clamp (toolLimits.ts DEFAULT_MAX_RESULT_SIZE_CHARS).
    No env var exists to raise it; MCPTool declares 100K but
    Math.min(100K, 50K) = 50K makes the declaration moot.

The chosen fix (Option 3) requires a companion change in claude-cli-internal:
  MCPTool.ts reads annotations.maxResultSizeChars from the tools/list MCP
  response and passes it as declaredMaxResultSizeChars, bypassing the
  Math.min clamp when explicitly set. No SDK code changes are needed —
  ToolAnnotations already has extra="allow" and the annotation pipeline
  already serialises arbitrary extra fields into the JSONRPC payload.

This commit adds:
  - tests/test_mcp_large_output.py: full regression documentation covering
    both spill layers, env-var passthrough correctness, and message-parser
    behaviour for inline vs <persisted-output> content
  - test_sdk_mcp_integration.py: test_max_result_size_chars_annotation_flows_to_cli
    confirms maxResultSizeChars set via ToolAnnotations(maxResultSizeChars=N)
    appears in the JSONRPC tools/list payload the CLI receives — locking in
    the SDK half of the Option 3 fix

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@f9fc8e0). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #756   +/-   ##
=======================================
  Coverage        ?   84.17%           
=======================================
  Files           ?       14           
  Lines           ?     2470           
  Branches        ?        0           
=======================================
  Hits            ?     2079           
  Misses          ?      391           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChristianRyan and others added 2 commits March 27, 2026 17:22
…on stripping

The MCP TypeScript SDK's ToolAnnotations Zod schema uses z.core.$strip, which
silently discards unknown fields (including maxResultSizeChars) at parse time.
The companion CLI change (claude-cli-internal#24940) reads the value from
_meta['anthropic/maxResultSizeChars'] instead, following the same pattern as
anthropic/searchHint and anthropic/alwaysLoad.

Changes:
- create_sdk_mcp_server: extract maxResultSizeChars from ToolAnnotations and
  populate Tool._meta['anthropic/maxResultSizeChars'] when present
- query.py _handle_sdk_mcp_request tools/list: include _meta in the JSONRPC
  response payload so the CLI subprocess receives it

Users set the threshold via ToolAnnotations (no API change):
  @tool("get_schema", "Returns DB schema", {},
        annotations=ToolAnnotations(maxResultSizeChars=500_000))
  async def get_schema(args): ...

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The MCP SDK's Zod schema strips unknown annotation fields, so the value
is forwarded via _meta['anthropic/maxResultSizeChars'] instead.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@ChristianRyan ChristianRyan changed the title tests: document large-MCP-result spill regression and lock in Option 3 annotation fix fix(mcp): forward maxResultSizeChars via _meta to bypass Zod annotation stripping Mar 31, 2026
ChristianRyan and others added 2 commits March 31, 2026 14:08
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@km-anthropic
Copy link
Copy Markdown
Collaborator

@claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, well-scoped fix forwarding maxResultSizeChars via _meta with good test coverage.

Extended reasoning...

Overview

This PR makes two small production changes: (1) a _build_meta() helper in __init__.py that extracts maxResultSizeChars from ToolAnnotations and places it under _meta["anthropic/maxResultSizeChars"], and (2) two lines in query.py to include _meta in the JSONRPC tools/list response. Two test files are added/modified to cover the new behavior.

Security risks

None. The change forwards an existing annotation value through an established metadata channel (_meta with namespaced keys). No auth, crypto, or permission logic is affected.

Level of scrutiny

Low — the production diff is under 20 lines across two files, follows the existing _meta pattern used by anthropic/searchHint and anthropic/alwaysLoad, and is well-covered by tests. The switch from Tool(...) to Tool.model_validate({...}) is mechanical and necessary to pass _meta through Pydantic.

Other factors

Codecov reports all modified lines are covered. The two bug reports are both nits about the same dead-code variable in a single test (test_persisted_output_is_not_full_content) — low impact, test-only. No outstanding reviewer comments to address.

Comment on lines +288 to +290
msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT))
assert isinstance(msg, UserMessage)
blocks = [b for b in msg.content if isinstance(b, ToolResultBlock)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In test_persisted_output_is_not_full_content, the large_content variable is dead code — it is defined but never used as input to any function. The assertion blocks[0].content != large_content is trivially true because it compares two completely unrelated hardcoded strings, so the test does not actually verify what its docstring claims ("Claude receives only the 2 KB preview, not the original large content").

Extended reasoning...

What the bug is

In test_persisted_output_is_not_full_content (line 288), the variable large_content is assigned "THE ANSWER IS: 42\n" + "padding\n" * 5000 but is never used as input to any operation. The test then parses PERSISTED_CONTENT (a module-level constant simulating CLI spill output) and asserts blocks[0].content != large_content.

Why the assertion is trivially true

PERSISTED_CONTENT is a hardcoded string starting with <persisted-output>\n... containing a 2KB preview of "x" * 2000. large_content is "THE ANSWER IS: 42\n" + "padding\n" * 5000. These two strings share no common prefix and will never be equal regardless of any code behavior. The != assertion will always pass.

Step-by-step proof

  1. large_content = "THE ANSWER IS: 42\n" + "padding\n" * 5000 — a ~40KB string starting with "THE ANSWER IS: 42"
  2. msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT)) — parses the hardcoded PERSISTED_CONTENT, not large_content
  3. blocks[0].content will equal PERSISTED_CONTENT, which starts with <persisted-output>
  4. blocks[0].content != large_content compares PERSISTED_CONTENT to large_content — trivially True
  5. The test passes but proves nothing about the system under test

What the test should verify

The docstring says "Claude receives only the 2 KB preview, not the original large content." To meaningfully test this, large_content should be used as the input to _user_message_with_tool_result(), and then the assertion should verify the output differs from the input (or check that the output is the truncated preview). As written, large_content is dead code — a linter like ruff would flag it as F841.

Impact

This is a test-only issue with no production impact. The test provides a false sense of coverage for the persisted-output behavior. If the message parser were changed to, say, strip <persisted-output> tags or transform content in an unexpected way, this test would not catch the regression it claims to guard against.

Comment on lines +288 to +290
msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT))
assert isinstance(msg, UserMessage)
blocks = [b for b in msg.content if isinstance(b, ToolResultBlock)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In test_persisted_output_is_not_full_content, large_content is assigned but never used as input to the test. The assertion blocks[0].content != large_content is trivially true since it compares two completely unrelated hardcoded strings (PERSISTED_CONTENT vs large_content). Remove the dead variable or feed large_content through the test fixture so the assertion is meaningful.

Extended reasoning...

The test test_persisted_output_is_not_full_content (line 288) defines large_content = "THE ANSWER IS: 42\n" + "padding\n" * 5000 but then parses PERSISTED_CONTENT (a module-level constant) via _user_message_with_tool_result(PERSISTED_CONTENT). The assertion on line 294 checks blocks[0].content != large_content, which compares the parsed PERSISTED_CONTENT string against the locally-defined large_content string.

These two strings are completely unrelated — PERSISTED_CONTENT is a <persisted-output> XML wrapper with a 2KB preview, while large_content is "THE ANSWER IS: 42\n" followed by 5000 lines of "padding\n". The != assertion will always pass regardless of any code changes, making this a no-op test.

The test docstring claims to verify "Claude receives only the 2 KB preview, not the original large content," but the test never connects large_content to the simulated tool result. For the test to be meaningful, large_content would need to be the original content that gets spilled, and the test would need to verify that the parsed result differs from that original. As written, a linter (ruff F841) would flag large_content as an unused local variable.

The impact is low since this is a test-only issue with no production consequences, but the test provides false confidence — it appears to verify spill behavior but actually verifies nothing.

To fix: either remove the dead large_content variable and rewrite the assertion to check something meaningful (e.g., that the content length is approximately 2KB, matching the preview size), or restructure the test so large_content is actually used as the pre-spill input.

…ssertion

The test_persisted_output_is_not_full_content test defined large_content
but never used it as input — the != assertion was trivially true.
Now verifies the parsed preview is under the layer-2 threshold.
@dltn dltn merged commit e94a74d into anthropics:main Apr 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants