Skip to content

feat(codex): emulate Claude server-side defer_loading mechanism#1892

Closed
Adamcf123 wants to merge 2 commits intorouter-for-me:mainfrom
Adamcf123:feat/codex-defer-loading
Closed

feat(codex): emulate Claude server-side defer_loading mechanism#1892
Adamcf123 wants to merge 2 commits intorouter-for-me:mainfrom
Adamcf123:feat/codex-defer-loading

Conversation

@Adamcf123
Copy link
Contributor

Problem

The previous fix (d26ad82) only stripped the defer_loading field from tools to avoid Codex 400 errors, but left the core mechanism broken in two ways:

  1. All tool schemas were forwarded to Codex upfront, including tools that Claude's server would have hidden until explicitly loaded. This defeats the context-window management purpose of defer_loading.
  2. tool_reference content blocks in message history were passed through unhandled — Codex does not know this content type, so conversation history became malformed.

What Claude's server actually does

When advanced-tool-use-2025-11-20 beta is active, Claude's server:

  • Only injects non-deferred tool schemas into the model's visible context
  • Exposes deferred tool names via <available-deferred-tools> in the system prompt (name only, no schema)
  • When the model calls ToolSearch and the client returns tool_reference, Claude's server injects the full schema text into the next prompt

Each HTTP request maps to one LLM inference pass. tool_reference handling happens at prompt-construction time, not as a separate inference.

This PR

The proxy now fully emulates the Claude server role for the Codex translation path:

1. Pre-scan tools array (before message processing)
Builds two maps from the request's tools array:

  • toolSchemaMap: tool name → {description, input_schema} for schema injection
  • deferredToolNames: set of tools carrying defer_loading: true

2. Convert tool_reference → schema text
When a tool_reference content block is encountered inside a tool_result, it is replaced with an input_text block containing:

Tool '<name>' is now available.

Description: <description from tools array>

Parameters:
<input_schema.properties JSON>

The content is sourced from the original tool definition in the request — no hardcoding. The tool name is also recorded as loaded.

3. Filter the tools array
Only tools that pass either condition are forwarded to Codex:

  • Not marked defer_loading: true (always visible)
  • Marked defer_loading: true AND already loaded via a tool_reference in message history

This reproduces the "tools appear progressively as the model loads them" behavior that Claude's server provides natively.

Test coverage

7 unit tests in codex_claude_request_test.go:

Test Scenario
InitialRequest Deferred tools filtered out on first request
WithToolReference Loaded tool appears in tools array; schema text injected
MultipleTools Only referenced tool added; others remain hidden
DuplicateToolReference Same tool referenced twice — appears once in tools, schema injected each time
UnknownToolReference Defensive path: tool_reference for unknown tool — no panic, no schema section
AllDeferredNoReference All tools deferred, none loaded — tools: [] is valid JSON, not null
MixedContentInToolResult text + tool_reference in same tool_result — correct sequential indexing

End-to-end validation: verified against real Codex traffic (2026-03-06 logs, 32 requests, all HTTP 200, correct filtering and injection behavior confirmed).

Previously, d26ad82 only stripped the defer_loading field to fix 400
errors, leaving the core mechanism broken: Codex received all tool
schemas upfront and tool_reference content blocks in message history
were passed through unhandled.

This commit makes the proxy fully emulate what Claude's server does:

- Pre-scan the tools array to build toolSchemaMap and deferredToolNames
  before processing messages, so tool_reference handling can look up
  the correct schema at conversion time.
- Filter the tools array: only non-deferred tools and deferred tools
  already loaded via tool_reference are forwarded to Codex.
- Convert tool_reference content blocks to input_text containing the
  full schema text (description + parameters from the original tool
  definition), matching what Claude's server injects into the prompt.

Adds codex_claude_request_test.go covering three scenarios: initial
request filtering, tool_reference injection with full schema text, and
multi-tool partial loading.
Cover four high-risk scenarios not included in the initial test suite:
- Duplicate tool_reference for the same tool across turns
- tool_reference pointing to a tool not in the tools array (defensive path)
- All tools deferred with no tool_reference (empty tools array output)
- text and tool_reference mixed in the same tool_result content block
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Codex translator's ability to manage tool visibility and context by accurately emulating Claude's defer_loading mechanism. It resolves previous issues where all tool schemas were sent upfront and tool_reference blocks were unhandled, leading to more efficient context management and preventing errors in conversation history. The changes ensure that tools are introduced to the model only when relevant, mirroring the behavior of Claude's native server-side processing.

Highlights

  • Emulated Claude's defer_loading: The proxy now fully emulates Claude's server-side defer_loading mechanism for tools in the Codex translation path, ensuring tools are exposed progressively.
  • Handled tool_reference content blocks: tool_reference content blocks within tool_result messages are now correctly converted into input_text blocks containing the tool's description and schema, preventing malformed conversation history.
  • Improved tool filtering: Tools are now filtered based on their defer_loading status and whether they have been explicitly loaded via a tool_reference, optimizing context window management.
  • Comprehensive test coverage: New unit tests have been added to cover various scenarios related to defer_loading, tool_reference handling, and mixed content in tool_result messages.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/translator/codex/claude/codex_claude_request.go
    • Implemented pre-scanning of the tools array to identify deferred tools and cache their schemas.
    • Added logic to convert tool_reference content blocks within tool_result messages into input_text blocks containing the tool's description and schema.
    • Modified tool filtering to only include non-deferred tools or deferred tools that have been explicitly loaded via a tool_reference.
  • internal/translator/codex/claude/codex_claude_request_test.go
    • Added comprehensive unit tests for defer_loading functionality, covering initial requests, tool_reference handling, multiple tools, duplicate references, unknown tool references, all-deferred scenarios, and mixed content in tool_result.
Activity
  • A new feature was introduced to emulate Claude's server-side defer_loading mechanism for tools.
  • New unit tests were added to ensure the correct behavior of the defer_loading and tool_reference handling.
  • End-to-end validation was performed against real Codex traffic, confirming correct filtering and injection behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the limitations of the previous defer_loading implementation by fully emulating Claude's server-side mechanism for handling deferred tools. The changes introduce a robust pre-scanning phase to identify and cache tool schemas, correctly process tool_reference content blocks by injecting the full schema text, and accurately filter the tools array based on their deferred status and whether they have been loaded. The addition of comprehensive unit tests, covering various scenarios including initial requests, tool referencing, multiple tools, duplicate references, unknown tool references, all-deferred tools, and mixed content, demonstrates a thorough approach to ensuring correctness and reliability. The solution is well-engineered and directly resolves the context-window management and malformed conversation history issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant