Skip to content

fix: guard against IndexError when LLM API returns empty choices list#1876

Open
qizwiz wants to merge 1 commit into
microsoft:mainfrom
qizwiz:fix/llm-response-empty-choices-crash
Open

fix: guard against IndexError when LLM API returns empty choices list#1876
qizwiz wants to merge 1 commit into
microsoft:mainfrom
qizwiz:fix/llm-response-empty-choices-crash

Conversation

@qizwiz
Copy link
Copy Markdown

@qizwiz qizwiz commented May 14, 2026

Problem

Three places in markitdown call response.choices[0].message.content immediately after client.chat.completions.create(...) without checking whether choices is non-empty:

  • packages/markitdown/src/markitdown/converters/_image_converter.py:138
  • packages/markitdown/src/markitdown/converters/_llm_caption.py:50
  • packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py:102

The OpenAI API (and OpenAI-compatible providers) can return an empty choices list in three documented scenarios:

  1. Content filtering — when the image triggers a policy violation, the API returns finish_reason: "content_filter" with an empty choices list
  2. Streaming edge cases — SSE stream closed before any choices are emitted
  3. OpenAI-compatible providers — local LLMs, proxies, and alternative providers may return non-standard response shapes

In all three cases, choices[0] raises IndexError: list index out of range. This crash is silent in development (dev images don't hit content filters) and surfaces in production on real user content.

Formal verification

This was found via pact static analysis and formally verified with Z3 SMT:

Bug model (SAT): content_filtered=True → choices_len=0, access_index=0 → 0≥0 → IndexError
Fix model (UNSAT): With if not response.choices guard, access_attempted ∧ choices_len=0 is a contradiction — IndexError is unreachable on all trigger paths.

Fix

# _image_converter.py and _llm_caption.py
response = client.chat.completions.create(model=model, messages=messages)
if not response.choices:
    return None
return response.choices[0].message.content

# _ocr_service.py (inline — consistent with existing `text or ""` guard below)
text = response.choices[0].message.content if response.choices else None

The _ocr_service.py path already has a bare except Exception that returns OCRResult(text="") on failure, so the None propagates safely through the existing text.strip() if text else "" guard on the next line.

Prior art

This exact crash pattern appears in multiple open issues across the LLM ecosystem: plastic-labs/honcho#676, aden-hive/hive#4767, TheR1D/shell_gpt#741, langchain-community#475.

@qizwiz
Copy link
Copy Markdown
Author

qizwiz commented May 14, 2026

@microsoft-github-policy-service agree

The OpenAI API can return an empty choices list when:
- Content filtering blocks the image response
- A streaming edge case closes before choices are emitted
- An OpenAI-compatible provider returns a non-standard response shape

In all three cases, `response.choices[0]` raises IndexError. This is a
silent crash in production — content filters fire on real user images,
not on dev test images, so the bug is invisible in local testing.

Three affected paths:
- _image_converter.py: return None when no choices (caller handles None)
- _llm_caption.py: return None when no choices (caller handles None)
- _ocr_service.py: inline ternary, consistent with existing `text or ""`
  guard already on the next line

Formally verified: Z3 SMT solver proves IndexError is satisfiable under
content_filtered=True → choices_len=0 (SAT), and proves the guard makes
it UNSAT — no assignment produces IndexError after the check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@qizwiz qizwiz force-pushed the fix/llm-response-empty-choices-crash branch from 5719e76 to 9f80bf3 Compare May 14, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant