feat(codex): emulate Claude server-side defer_loading mechanism by Adamcf123 · Pull Request #1892 · router-for-me/CLIProxyAPI

Adamcf123 · 2026-03-05T17:17:47Z

Problem

The previous fix (d26ad82) only stripped the defer_loading field from tools to avoid Codex 400 errors, but left the core mechanism broken in two ways:

All tool schemas were forwarded to Codex upfront, including tools that Claude's server would have hidden until explicitly loaded. This defeats the context-window management purpose of defer_loading.
tool_reference content blocks in message history were passed through unhandled — Codex does not know this content type, so conversation history became malformed.

What Claude's server actually does

When advanced-tool-use-2025-11-20 beta is active, Claude's server:

Only injects non-deferred tool schemas into the model's visible context
Exposes deferred tool names via <available-deferred-tools> in the system prompt (name only, no schema)
When the model calls ToolSearch and the client returns tool_reference, Claude's server injects the full schema text into the next prompt

Each HTTP request maps to one LLM inference pass. tool_reference handling happens at prompt-construction time, not as a separate inference.

This PR

The proxy now fully emulates the Claude server role for the Codex translation path:

1. Pre-scan tools array (before message processing)
Builds two maps from the request's tools array:

toolSchemaMap: tool name → {description, input_schema} for schema injection
deferredToolNames: set of tools carrying defer_loading: true

2. Convert tool_reference → schema text
When a tool_reference content block is encountered inside a tool_result, it is replaced with an input_text block containing:

Tool '<name>' is now available.

Description: <description from tools array>

Parameters:
<input_schema.properties JSON>

The content is sourced from the original tool definition in the request — no hardcoding. The tool name is also recorded as loaded.

3. Filter the tools array
Only tools that pass either condition are forwarded to Codex:

Not marked defer_loading: true (always visible)
Marked defer_loading: true AND already loaded via a tool_reference in message history

This reproduces the "tools appear progressively as the model loads them" behavior that Claude's server provides natively.

Test coverage

7 unit tests in codex_claude_request_test.go:

Test	Scenario
`InitialRequest`	Deferred tools filtered out on first request
`WithToolReference`	Loaded tool appears in tools array; schema text injected
`MultipleTools`	Only referenced tool added; others remain hidden
`DuplicateToolReference`	Same tool referenced twice — appears once in tools, schema injected each time
`UnknownToolReference`	Defensive path: tool_reference for unknown tool — no panic, no schema section
`AllDeferredNoReference`	All tools deferred, none loaded — `tools: []` is valid JSON, not null
`MixedContentInToolResult`	text + tool_reference in same tool_result — correct sequential indexing

End-to-end validation: verified against real Codex traffic (2026-03-06 logs, 32 requests, all HTTP 200, correct filtering and injection behavior confirmed).

Previously, d26ad82 only stripped the defer_loading field to fix 400 errors, leaving the core mechanism broken: Codex received all tool schemas upfront and tool_reference content blocks in message history were passed through unhandled. This commit makes the proxy fully emulate what Claude's server does: - Pre-scan the tools array to build toolSchemaMap and deferredToolNames before processing messages, so tool_reference handling can look up the correct schema at conversion time. - Filter the tools array: only non-deferred tools and deferred tools already loaded via tool_reference are forwarded to Codex. - Convert tool_reference content blocks to input_text containing the full schema text (description + parameters from the original tool definition), matching what Claude's server injects into the prompt. Adds codex_claude_request_test.go covering three scenarios: initial request filtering, tool_reference injection with full schema text, and multi-tool partial loading.

Cover four high-risk scenarios not included in the initial test suite: - Duplicate tool_reference for the same tool across turns - tool_reference pointing to a tool not in the tools array (defensive path) - All tools deferred with no tool_reference (empty tools array output) - text and tool_reference mixed in the same tool_result content block

gemini-code-assist · 2026-03-05T17:18:05Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Codex translator's ability to manage tool visibility and context by accurately emulating Claude's defer_loading mechanism. It resolves previous issues where all tool schemas were sent upfront and tool_reference blocks were unhandled, leading to more efficient context management and preventing errors in conversation history. The changes ensure that tools are introduced to the model only when relevant, mirroring the behavior of Claude's native server-side processing.

Highlights

Emulated Claude's defer_loading: The proxy now fully emulates Claude's server-side defer_loading mechanism for tools in the Codex translation path, ensuring tools are exposed progressively.
Handled tool_reference content blocks: tool_reference content blocks within tool_result messages are now correctly converted into input_text blocks containing the tool's description and schema, preventing malformed conversation history.
Improved tool filtering: Tools are now filtered based on their defer_loading status and whether they have been explicitly loaded via a tool_reference, optimizing context window management.
Comprehensive test coverage: New unit tests have been added to cover various scenarios related to defer_loading, tool_reference handling, and mixed content in tool_result messages.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

internal/translator/codex/claude/codex_claude_request.go
- Implemented pre-scanning of the tools array to identify deferred tools and cache their schemas.
- Added logic to convert tool_reference content blocks within tool_result messages into input_text blocks containing the tool's description and schema.
- Modified tool filtering to only include non-deferred tools or deferred tools that have been explicitly loaded via a tool_reference.
internal/translator/codex/claude/codex_claude_request_test.go
- Added comprehensive unit tests for defer_loading functionality, covering initial requests, tool_reference handling, multiple tools, duplicate references, unknown tool references, all-deferred scenarios, and mixed content in tool_result.

Activity

A new feature was introduced to emulate Claude's server-side defer_loading mechanism for tools.
New unit tests were added to ensure the correct behavior of the defer_loading and tool_reference handling.
End-to-end validation was performed against real Codex traffic, confirming correct filtering and injection behavior.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the limitations of the previous defer_loading implementation by fully emulating Claude's server-side mechanism for handling deferred tools. The changes introduce a robust pre-scanning phase to identify and cache tool schemas, correctly process tool_reference content blocks by injecting the full schema text, and accurately filter the tools array based on their deferred status and whether they have been loaded. The addition of comprehensive unit tests, covering various scenarios including initial requests, tool referencing, multiple tools, duplicate references, unknown tool references, all-deferred tools, and mixed content, demonstrates a thorough approach to ensuring correctness and reliability. The solution is well-engineered and directly resolves the context-window management and malformed conversation history issues.

Adamcf123 added 2 commits March 6, 2026 00:55

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

Adamcf123 mentioned this pull request Mar 5, 2026

bug(codex): defer_loading strips field only — all tool schemas still forwarded upfront, and the expected tool search is not effective using models #1894

Closed

Adamcf123 closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(codex): emulate Claude server-side defer_loading mechanism#1892

feat(codex): emulate Claude server-side defer_loading mechanism#1892
Adamcf123 wants to merge 2 commits intorouter-for-me:mainfrom
Adamcf123:feat/codex-defer-loading

Adamcf123 commented Mar 5, 2026

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Adamcf123 commented Mar 5, 2026

Problem

What Claude's server actually does

This PR

Test coverage

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant