Skip to content

fix: DeepSeek tool call parsing - nested objects & JSON repair#94

Closed
valkryhx wants to merge 4 commits intoCJackHwang:mainfrom
valkryhx:main
Closed

fix: DeepSeek tool call parsing - nested objects & JSON repair#94
valkryhx wants to merge 4 commits intoCJackHwang:mainfrom
valkryhx:main

Conversation

@valkryhx
Copy link
Copy Markdown

@valkryhx valkryhx commented Mar 16, 2026

Summary

修复 DeepSeek 工具调用解析问题,支持嵌套 JSON 对象和缺失数组括号的自动修复。

Problem

DeepSeek 在返回 tool calls 时有时会输出不规范的 JSON:

  • 缺失数组方括号:{"todos": {"content": "task1"}, {"content": "task2"}}
  • 嵌套对象中的方括号丢失:{"input": {"q": "value"}}, {"input": {"path": "file"}}
  • 未加引号的键名:{tool_calls: [...]}

这些情况导致工具调用被当作普通文本返回,客户端无法识别和执行。

Solution

1. 升级正则表达式支持单层嵌套

// 修复前:无法处理嵌套 {}
var missingArrayBracketsPattern = regexp.MustCompile(`(:\s*)(\{[^{}]*\}(?:\s*,\s*\{[^{}]*\})+)`)

// 修复后:支持单层嵌套
var missingArrayBracketsPattern = regexp.MustCompile(`(:\s*)(\{(?:[^{}]|\{[^{}]*\})*\}(?:\s*,\s*\{(?:[^{}]|\{[^{}]*\})*\})+)`)

2. 添加 RepairLooseJSON 函数

  • 修复未加引号的键名:{key: -> {"key":
  • 修复缺失的数组括号:{"a":1}, {"b":2} -> [{"a":1}, {"b":2}]

3. 增强关键词检测

支持多种 tool call 语法:

  • tool_calls
  • function.name:
  • [tool_call_history]

4. 添加 OOM 保护

  • 限制回溯搜索范围
  • 限制 JSON 对象扫描长度

Files Changed

File Change
internal/util/toolcalls_parse.go 新增 RepairLooseJSON 函数和正则修复
internal/util/toolcalls_test.go 新增 10+ 个测试用例
internal/util/toolcalls_candidates.go 增强关键词检测和 OOM 保护
internal/adapter/openai/tool_sieve_core.go 支持多关键词模式匹配
internal/js/helpers/stream-tool-sieve/sieve.js JS 版本同步修复
TESTING.md 新增调试命令文档
README.md 新增测试快速参考

Test Cases Added

  1. 单层嵌套对象 - 2个元素
  2. 3个单层嵌套对象
  3. 混合嵌套 - 对象和原始值混合
  4. 4个嵌套对象
  5. 无空格逗号分隔
  6. 对象内包含数组
  7. DeepSeek 8皇后真实输出
  8. 简单无嵌套对象
  9. 复杂单层嵌套
  10. 5个嵌套对象

Verification

go test -v -run 'TestParseToolCalls|TestRepair' ./internal/util/
# All 33 tests pass

- Fix: Expand stream sieve keywords to support function.name: and [TOOL_CALL_HISTORY]

- Fix: Add repairInvalidJSONBackslashes to handle unescaped backslashes in Windows paths

- Sync: Update JS stream sieve to match Go implementation

- Test: Add unit tests for backslash repair and deepseek format parsing

- Tool: Move repair json test tool to tests/repair_json_tool.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7318d1f4a8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return -1
}
lower := strings.ToLower(s)
keywords := []string{"tool_calls", "function.name:", "[tool_call_history]"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict capture start to payloads consumeToolCapture can parse

Including function.name: and [tool_call_history] in findToolSegmentStart makes the sieve enter capture mode for text-kv style output, but consumeToolCapture still only looks for tool_calls, so these captures never become ready until stream finalization; in practice, any chunk containing function.name: can cause the rest of the response to be buffered and emitted late as plain text instead of structured tool calls.

Useful? React with 👍 / 👎.

if i+1 < len(runes) {
next := runes[i+1]
switch next {
case '"', '\\', '/', 'b', 'f', 'n', 'r', 't':
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape \n/\t when repairing Windows-style path backslashes

The repair routine treats \n, \t, \r, etc. as already-valid escapes and leaves them unchanged, so inputs like C:\new\tools (model intended literal backslashes) are decoded as newline/tab characters after JSON unmarshal; this silently mutates tool arguments and can execute the wrong path/command instead of preserving the original Windows path text.

Useful? React with 👍 / 👎.

huangxun added 2 commits March 17, 2026 16:24
- Upgrade missingArrayBracketsPattern regex to support single-level nested {} objects
- This fixes DeepSeek's list hallucination where tool call JSON objects contain nested fields like {"input": {"q": "value"}}
- Add comprehensive test cases covering 2-5 nested objects, mixed nested/primitive fields, and real DeepSeek 8-queen output patterns
- Add RepairLooseJSON function to repair unquoted keys and missing array brackets

Fixes: DeepSeek tool call parsing with nested JSON objects
…ds and safety limits

- Add support for multiple keywords: tool_calls, function.name:, [tool_call_history]
- Add OOM protection with search limits in extractToolCallObjects
- Add max scan length limit in extractJSONObject to prevent OOM on unclosed objects
- Update tool_sieve to handle more tool call patterns
- Add loose JSON repair in parseToolCallPayload for better error recovery

This improves DeepSeek tool call parsing robustness.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 17, 2026

@valkryhx is attempting to deploy a commit to the cjack's projects Team on Vercel.

A member of the Team first needs to authorize it.

- Add targeted test commands to TESTING.md for debugging tool call issues
- Add quick test commands reference in README.md
- Document specific test cases for DeepSeek tool call parsing
@valkryhx valkryhx changed the title fix: correctly parse and emit tool calls from DeepSeek responses fix: DeepSeek tool call parsing - nested objects & JSON repair Mar 17, 2026
@CJackHwang CJackHwang closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants