feat(codex): emulate Claude server-side defer_loading mechanism#1892
feat(codex): emulate Claude server-side defer_loading mechanism#1892Adamcf123 wants to merge 2 commits intorouter-for-me:mainfrom
Conversation
Previously, d26ad82 only stripped the defer_loading field to fix 400 errors, leaving the core mechanism broken: Codex received all tool schemas upfront and tool_reference content blocks in message history were passed through unhandled. This commit makes the proxy fully emulate what Claude's server does: - Pre-scan the tools array to build toolSchemaMap and deferredToolNames before processing messages, so tool_reference handling can look up the correct schema at conversion time. - Filter the tools array: only non-deferred tools and deferred tools already loaded via tool_reference are forwarded to Codex. - Convert tool_reference content blocks to input_text containing the full schema text (description + parameters from the original tool definition), matching what Claude's server injects into the prompt. Adds codex_claude_request_test.go covering three scenarios: initial request filtering, tool_reference injection with full schema text, and multi-tool partial loading.
Cover four high-risk scenarios not included in the initial test suite: - Duplicate tool_reference for the same tool across turns - tool_reference pointing to a tool not in the tools array (defensive path) - All tools deferred with no tool_reference (empty tools array output) - text and tool_reference mixed in the same tool_result content block
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Codex translator's ability to manage tool visibility and context by accurately emulating Claude's Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the limitations of the previous defer_loading implementation by fully emulating Claude's server-side mechanism for handling deferred tools. The changes introduce a robust pre-scanning phase to identify and cache tool schemas, correctly process tool_reference content blocks by injecting the full schema text, and accurately filter the tools array based on their deferred status and whether they have been loaded. The addition of comprehensive unit tests, covering various scenarios including initial requests, tool referencing, multiple tools, duplicate references, unknown tool references, all-deferred tools, and mixed content, demonstrates a thorough approach to ensuring correctness and reliability. The solution is well-engineered and directly resolves the context-window management and malformed conversation history issues.
Problem
The previous fix (d26ad82) only stripped the
defer_loadingfield from tools to avoid Codex 400 errors, but left the core mechanism broken in two ways:defer_loading.tool_referencecontent blocks in message history were passed through unhandled — Codex does not know this content type, so conversation history became malformed.What Claude's server actually does
When
advanced-tool-use-2025-11-20beta is active, Claude's server:<available-deferred-tools>in the system prompt (name only, no schema)tool_reference, Claude's server injects the full schema text into the next promptEach HTTP request maps to one LLM inference pass.
tool_referencehandling happens at prompt-construction time, not as a separate inference.This PR
The proxy now fully emulates the Claude server role for the Codex translation path:
1. Pre-scan tools array (before message processing)
Builds two maps from the request's
toolsarray:toolSchemaMap: tool name →{description, input_schema}for schema injectiondeferredToolNames: set of tools carryingdefer_loading: true2. Convert
tool_reference→ schema textWhen a
tool_referencecontent block is encountered inside atool_result, it is replaced with aninput_textblock containing:The content is sourced from the original tool definition in the request — no hardcoding. The tool name is also recorded as loaded.
3. Filter the tools array
Only tools that pass either condition are forwarded to Codex:
defer_loading: true(always visible)defer_loading: trueAND already loaded via atool_referencein message historyThis reproduces the "tools appear progressively as the model loads them" behavior that Claude's server provides natively.
Test coverage
7 unit tests in
codex_claude_request_test.go:InitialRequestWithToolReferenceMultipleToolsDuplicateToolReferenceUnknownToolReferenceAllDeferredNoReferencetools: []is valid JSON, not nullMixedContentInToolResultEnd-to-end validation: verified against real Codex traffic (2026-03-06 logs, 32 requests, all HTTP 200, correct filtering and injection behavior confirmed).