Skip to content

Comments

Send x-llamastack-provider-data header with MCP auth to llama-stack#1179

Open
max-svistunov wants to merge 1 commit intolightspeed-core:mainfrom
max-svistunov:lcore-1328-mcp-headers
Open

Send x-llamastack-provider-data header with MCP auth to llama-stack#1179
max-svistunov wants to merge 1 commit intolightspeed-core:mainfrom
max-svistunov:lcore-1328-mcp-headers

Conversation

@max-svistunov
Copy link
Contributor

@max-svistunov max-svistunov commented Feb 19, 2026

Description

Restore the x-llamastack-provider-data HTTP header that regressed (stopped being sent). Now llama-stack can forward auth credentials per-MCP server again.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude Opus 4.6
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

  • Related Issue # LCORE-1328
  • Closes # LCORE-1328

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Unit tests:

uv run pytest tests/unit/utils/test_responses.py -v

On live stack:

Send a POST /v1/streaming_query with MCP-HEADERS: {"mock-mcp": {"X-Authorization": "my-secret-client-token"}}, confirm MCP receives them.

Summary by CodeRabbit

  • New Features

    • Support for sending provider-specific headers with API requests to enable downstream provider authentication.
    • API request parameters extended to accept per-request extra headers.
  • Tests

    • Added unit tests verifying provider header construction and absence when not applicable.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

Adds MCP provider-data header construction and propagation: a helper builds an x-llamastack-provider-data header from MCP tool definitions and prepare_responses_params now returns ResponsesApiParams with an extra_headers field carrying those headers when present.

Changes

Cohort / File(s) Summary
Types: Responses API params
src/utils/types.py
Added optional extra_headers: Optional[dict[str, str]] field to ResponsesApiParams to carry per-request HTTP headers (e.g. x-llamastack-provider-data).
Responses helper & flow
src/utils/responses.py
Added _build_provider_data_headers(tools) helper that extracts MCP tool headers and builds x-llamastack-provider-data; prepare_responses_params now computes and includes extra_headers in the returned ResponsesApiParams.
Unit tests
tests/unit/utils/test_responses.py
Added two async tests verifying that prepare_responses_params populates extra_headers with MCP provider-data when MCP tools include headers and leaves it None when no MCP headers exist.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: restoring the x-llamastack-provider-data header for MCP authentication. It is concise, specific, and directly reflects the core functionality added in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/responses.py (1)

315-327: ⚠️ Potential issue | 🔴 Critical

Extract extra_headers from ResponsesApiParams.model_dump() before passing to client.responses.create().

The regression is not fixed: all three callers (a2a.py line 330, query.py line 33, streaming_query.py line 42) use **responses_params.model_dump(), which incorrectly injects extra_headers into the JSON body instead of passing it as a transport-level HTTP header kwarg. The corrected pattern must be:

body = responses_params.model_dump(exclude={"extra_headers"}, exclude_none=True)
await client.responses.create(**body, extra_headers=responses_params.extra_headers)

Without this fix, the x-llamastack-provider-data header carrying MCP tool authentication is silently discarded, leaving the backend without required provider credentials.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 315 - 327,
ResponsesApiParams.model_dump() is including extra_headers in the JSON body;
change all call sites that do
client.responses.create(**responses_params.model_dump(), ...) to instead extract
the body via responses_params.model_dump(exclude={"extra_headers"},
exclude_none=True) and pass responses_params.extra_headers as the extra_headers
kwarg to client.responses.create(...,
extra_headers=responses_params.extra_headers). Update the callers that build
ResponsesApiParams (the places invoking ResponsesApiParams and then calling
model_dump before client.responses.create) to use this two-step pattern so the
x-llamastack-provider-data header is sent as a transport header rather than part
of the JSON payload.
🧹 Nitpick comments (1)
tests/unit/utils/test_responses.py (1)

930-938: "extra_headers" in dumped is always True — the assertion is vacuous.

Pydantic's model_dump() includes every declared field regardless of its value, so the presence check can never fail and adds no test signal. The meaningful assertion is the is not None check on line 936.

🧹 Proposed cleanup
-        dumped = result.model_dump()
-        assert (
-            "extra_headers" in dumped
-        ), "extra_headers missing from ResponsesApiParams"
-        assert (
-            dumped["extra_headers"] is not None
-        ), "extra_headers should not be None when MCP tools have headers"
+        dumped = result.model_dump()
+        assert (
+            dumped["extra_headers"] is not None
+        ), "extra_headers should not be None when MCP tools have headers"
         assert "x-llamastack-provider-data" in dumped["extra_headers"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 930 - 938, The assertion
`"extra_headers" in dumped` is vacuous because Pydantic's model_dump() always
includes declared fields; remove that line and instead ensure meaningful checks:
keep the existing `assert dumped["extra_headers"] is not None` and the key
presence assertion `assert "x-llamastack-provider-data" in
dumped["extra_headers"]` (or strengthen it to `assert dumped["extra_headers"]`
to ensure non-empty), referencing the test variable `dumped` produced by
`result.model_dump()` and the `extra_headers` field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/responses.py`:
- Around line 220-224: The dict comprehension uses tool["server_url"] which can
raise KeyError if a tool dict lacks that key; update the comprehension in
responses.py (the block that builds mcp_headers) to safely access server_url
(e.g., use tool.get("server_url")) and include an extra guard so only tools with
a non-empty server_url, type == "mcp", and headers are included. Ensure the
resulting mcp_headers still maps server_url -> headers and keep the dict typing
intact.

---

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 315-327: ResponsesApiParams.model_dump() is including
extra_headers in the JSON body; change all call sites that do
client.responses.create(**responses_params.model_dump(), ...) to instead extract
the body via responses_params.model_dump(exclude={"extra_headers"},
exclude_none=True) and pass responses_params.extra_headers as the extra_headers
kwarg to client.responses.create(...,
extra_headers=responses_params.extra_headers). Update the callers that build
ResponsesApiParams (the places invoking ResponsesApiParams and then calling
model_dump before client.responses.create) to use this two-step pattern so the
x-llamastack-provider-data header is sent as a transport header rather than part
of the JSON payload.

---

Nitpick comments:
In `@tests/unit/utils/test_responses.py`:
- Around line 930-938: The assertion `"extra_headers" in dumped` is vacuous
because Pydantic's model_dump() always includes declared fields; remove that
line and instead ensure meaningful checks: keep the existing `assert
dumped["extra_headers"] is not None` and the key presence assertion `assert
"x-llamastack-provider-data" in dumped["extra_headers"]` (or strengthen it to
`assert dumped["extra_headers"]` to ensure non-empty), referencing the test
variable `dumped` produced by `result.model_dump()` and the `extra_headers`
field.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/responses.py (1)

315-327: ⚠️ Potential issue | 🔴 Critical

Verify extra_headers are consumed as HTTP-level parameters, not body fields, in all endpoint callers.

The issue identified in the original review is confirmed across all three endpoint callers:

  • src/app/endpoints/query.py line 301
  • src/app/endpoints/streaming_query.py lines 287–288
  • src/app/endpoints/a2a.py line 330

All three use **responses_params.model_dump() to unpack the ResponsesApiParams object. This forces extra_headers into the request body kwargs rather than passing it as the HTTP-level extra_headers keyword argument to client.responses.create(). The headers are silently lost.

The correct pattern is:

params_dict = responses_params.model_dump(exclude={'extra_headers'})
stream = await client.responses.create(
    **params_dict,
    extra_headers=responses_params.extra_headers
)

Without this fix, MCP provider authentication headers will never reach the Llama Stack backend, reintroducing the exact regression the PR aims to solve.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 315 - 327, The
ResponsesApiParams.extra_headers field is being included in the request body
because callers use responses_params.model_dump() without excluding it; update
each caller that invokes client.responses.create (e.g., where responses_params
is used in query/streaming_query/a2a endpoints) to first do params_dict =
responses_params.model_dump(exclude={'extra_headers'}) and then call
client.responses.create(**params_dict,
extra_headers=responses_params.extra_headers, stream=stream if applicable) so
extra_headers is passed as the HTTP-level keyword argument to
client.responses.create rather than serialized into the request body.
🧹 Nitpick comments (1)
src/utils/responses.py (1)

201-229: Dual auth propagation — intentional but worth documenting.

get_mcp_tools already embeds headers inside each MCP tool_def entry (line 420). _build_provider_data_headers then extracts those same headers and re-encodes them as x-llamastack-provider-data. Auth credentials therefore travel to llama-stack via two channels simultaneously.

Per the official llama-stack documentation, mcp_headers should be keyed by server URL, which this implementation correctly does, so the format is sound. The dual propagation appears intentional (restoring the header mechanism as a regression fix while keeping the existing in-body headers as a fallback), but a short inline comment explaining the coexistence would prevent future readers from treating one path as a dead-code removal candidate.

Additionally, the llama-stack convention from published examples uses X-LlamaStack-Provider-Data (mixed case) as the header name, while the code uses lowercase x-llamastack-provider-data. HTTP headers are case-insensitive so there is no functional difference, but aligning casing with the documented convention aids readability.

✏️ Optional: document dual-channel intent + align casing
-    return {"x-llamastack-provider-data": json.dumps({"mcp_headers": mcp_headers})}
+    # Auth headers are also present in each tool_def["headers"] for llama-stack
+    # implementations that read from the tool definition directly; this header
+    # provides the same data via the standard provider-data mechanism.
+    return {"X-LlamaStack-Provider-Data": json.dumps({"mcp_headers": mcp_headers})}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 201 - 229, Add a short inline comment in
_build_provider_data_headers explaining that get_mcp_tools already embeds
per-server headers into each tool_def and that this function intentionally
re-encodes them into the Llama Stack provider header as a dual-propagation
fallback; also change the returned header key from "x-llamastack-provider-data"
to the documented casing "X-LlamaStack-Provider-Data" to match examples (HTTP
headers are case-insensitive) while preserving the existing json payload
structure and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 315-327: The ResponsesApiParams.extra_headers field is being
included in the request body because callers use responses_params.model_dump()
without excluding it; update each caller that invokes client.responses.create
(e.g., where responses_params is used in query/streaming_query/a2a endpoints) to
first do params_dict = responses_params.model_dump(exclude={'extra_headers'})
and then call client.responses.create(**params_dict,
extra_headers=responses_params.extra_headers, stream=stream if applicable) so
extra_headers is passed as the HTTP-level keyword argument to
client.responses.create rather than serialized into the request body.

---

Duplicate comments:
In `@src/utils/responses.py`:
- Around line 220-224: The comprehension assigning mcp_headers is already safe
because the filter includes tool.get("server_url"), so no code change is needed;
remove the duplicate review comment and mark the prior review as resolved (or
delete the redundant "[duplicate_comment]" note) to avoid confusion, referencing
the mcp_headers dict comprehension and the tools iteration so reviewers know
which code the resolution applies to.

---

Nitpick comments:
In `@src/utils/responses.py`:
- Around line 201-229: Add a short inline comment in
_build_provider_data_headers explaining that get_mcp_tools already embeds
per-server headers into each tool_def and that this function intentionally
re-encodes them into the Llama Stack provider header as a dual-propagation
fallback; also change the returned header key from "x-llamastack-provider-data"
to the documented casing "X-LlamaStack-Provider-Data" to match examples (HTTP
headers are case-insensitive) while preserving the existing json payload
structure and behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant