Skip to content

Prevent update_chat_ctx from deleting in-flight function calls using function call attributes#5021

Closed
StianHanssen wants to merge 3 commits intolivekit:mainfrom
StianHanssen:fix-tool-sync-can-cause-item-loss-function-call-approach
Closed

Prevent update_chat_ctx from deleting in-flight function calls using function call attributes#5021
StianHanssen wants to merge 3 commits intolivekit:mainfrom
StianHanssen:fix-tool-sync-can-cause-item-loss-function-call-approach

Conversation

@StianHanssen
Copy link
Copy Markdown
Contributor

Summary

update_chat_ctx can delete in-flight function_call items from the OpenAI Realtime server, causing cascading "failed to insert item: previous_item_id not found" corruption of _remote_chat_ctx.

The root cause is a timing gap between two context-tracking structures:

  • _remote_chat_ctx: Updated immediately when the server sends conversation.item.added
  • _agent._chat_ctx: Updated later, only when tool execution starts (_tool_execution_started_cb)

If update_chat_ctx runs during this window (e.g. from context management), the diff sees the function_call in remote but not in local, treats it as intentionally removed, and sends a delete event. The existing _is_content_empty guard only protects message items — function_call items pass through unconditionally.

A unit test gist replicating the exact pipeline demonstrates how update_chat_ctx deletes in-flight function_call items.

Fix

Use a shared-object flag (extra["dispatched"]) on FunctionCall items to distinguish in-flight from intentionally removed function calls.

  1. openai_item_to_livekit_item sets extra["dispatched"] = False when creating a FunctionCall from a server event.

  2. _handle_function_call reuses the same FunctionCall object from _remote_chat_ctx instead of creating a new one. This is safe because conversation.item.added always precedes response.output_item.done on the websocket. The same Python object is now shared across both contexts.

  3. _create_update_chat_ctx_events skips deletion for any function_call with extra["dispatched"] == False. Once the flag is True, summarization and other callers can delete the item normally.

  4. agent_activity.py sets extra["dispatched"] = True when tool execution starts or when all function calls for a generation are finalized (after await exe_task). Since the object is shared, this is visible to the diff guard immediately — no cross-package signaling needed.

Future consideration

This fix only tracks function_call items. function_call_output items are currently client-initiated (manual_function_calls=True), so they enter _agent._chat_ctx before _remote_chat_ctx and are not vulnerable to this race. If auto_tool_reply_generation is enabled in a future configuration (server-generated outputs), a guard should be added to cover function_call_output items as well.

devin-ai-integration[bot]

This comment was marked as resolved.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

@StianHanssen I just realized the error failed to insert item: previous_item_id not found happens when the realtime API returns the tool response message and somehow the tool call is deleted from the remote_chat_ctx.

That means the tool is already done and update_chat_ctx including the tool call and tool output was already called internally. Before the OAI returns the tool response, another update_chat_ctx is called (maybe from the summarization) that deleted the tool call so the race condition happens.

It's actually fine the summarization step deletes inflight tool calls, the internal update_chat_ctx after tool executed will add the tool call item back. Update: it's an issue with the following that we read the session's chat ctx and only append the tool output. But this is easy to fix by computing the diff and ensuring the tool call and tool output both present here.

chat_ctx = self._rt_session.chat_ctx.copy()
chat_ctx.items.extend(new_fnc_outputs)

This happens because there is no lock between local update_chat_ctx and the realtime API response creation. We may monitor if there a response (either from tool output, or from generate_reply) is in creating, and avoid update_chat_ctx during this period.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

StianHanssen commented Mar 13, 2026

@longcw Thanks for the reply, I really appreciate all the time you have dedicated to this issue.

From what I understand, this is what we have observed happening:
The failed to insert item: previous_item_id not found, belongs to the next thing arriving. So the tool call is already gone, and when the next item (i.e. a message) arrives, we get a failure to insert it into remote context. Furthermore, this results in a cascading effect where there are items having a previous ID linked to the message that just failed, and that one also fails. So you will see several of these failures in a row. Remote context may end up missing the next 3 or more items or any kind.

Even if the operation succeeded on the OpenAI server, what happens after is that the diff finds certain items still missing in remote context, and tries to insert them over again into OpenAI server. So even in the event of success on OpenAI server, you will get duplicate messages pushed.

We have seen this manifest in the agent loosing track, by repeating itself, or even repeating tool calls. In our system, repeating tool calls like that has a very negative effect because we use them for conversation flow control in the agent.

From what I gather form my analysis, the inflight function call removal triggers a cascading failure that has noticeable negative effect on the agent.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

The failed to insert item: previous_item_id not found, belongs to the next thing arriving. So the tool call is already gone, and when the next item (i.e. a message) arrives, we get a failure to insert it into remote context.

yeah this is exactly what I mean, and from my understanding this happens because the remote_chat_ctx got modified while the OAI api is generating a new response. to avoid this, we can await the potential response.create in update_chat_ctx, or generate_reply and tool output generation should await the inprogress update_chat_ctx.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

actually, I tested that removing the tool call from the chat_ctx before tool execution and return the tool output solely works fine, the OAI realtime can generate a tool response without tool call in the chat ctx. I didn't mean this is the correct behavior but most likely this is not the reason of the error you have seen.

so my proposal is to track the in-progress generate_reply, share the lock of update_chat_ctx with it.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

StianHanssen commented Mar 13, 2026

to avoid this, we can await the potential response.create in update_chat_ctx, or generate_reply and tool output generation should await the inprogress update_chat_ctx.

Hmm, I don't think the problem is solely that a tool call arrives during update_chat_ctx. It is about the time delay between when that item arrives in the two contexts.

  1. A tool call comes from OpenAI before update_chat_ctx so no await needed in this instance.
    1. This goes into remote context
    2. We also put it into the channel that will eventually put it into local context, when it is actually called in LiveKit.
  2. update_chat_ctx happens
    1. At this point, it is in remote_chat_ctx, but not in local context.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

a tool call arrives during update_chat_ctx

no, I mean a tool response triggered by response.create arrives during update_chat_ctx, or any new responses from generate_reply.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

I didn't mean this is the correct behavior but most likely this is not the reason of the error you have seen.

We have some more extensive tests in our test suit our product, that simulates the agent running with this scenario, and we were able to reproduce the issue. I also found the solutions I have put forth resolves the issue.

We have been using my fork of LiveKit for the past week or so, without the issue appearing again.

All that said, it could very well be a more nuanced issue is happening below that this resolves accidentally.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

no, I mean a tool response triggered by response.create arrives during update_chat_ctx, or any new responses from generate_reply.

Sorry, perhaps I am still misunderstanding. I will take some time to read over and double-check my understanding with you

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

yeah your fix works by avoiding deleting the tool call item, but I think the root cause is the tool response arrives during the update_chat_ctx. this is a race condition between update_chat_ctx and generate_reply in general. it can happen to not only tool call, but also message items when generate_reply is called during update_chat_ctx deleting the latest message item.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

StianHanssen commented Mar 13, 2026

@longcw Just to confirm, am I correct to describe the scenario you say as:

  1. generate_reply sends response.create → server starts generating a response
  2. Summarization's update_chat_ctx sends conversation.item.delete for item X
  3. Server processes the delete → sends conversation.item.deleted → we remove X from _remote_chat_ctx
  4. Server's response sends conversation.item.added for a new item with previous_item_id = X (stale, because the response was committed before the delete was processed)
  5. X not found in _remote_chat_ctx → insert fails

I looked over the logs for such failure cases. And this is what we see:

  1. T+0ms - Summarization begins. It calls update_chat_ctx twice: once to insert the summary, once to delete old items (we want to make sure summary is in before we delete old items).
  2. T+82ms - While summarization is running, the OpenAI server finalizes a function_call item as part of its response. This enters _remote_chat_ctx immediately via conversation.item.added. It has not yet reached _agent._chat_ctx because that only happens when _tool_execution_started_cb fires, which requires the item to travel through the function stream and be dispatched by _execute_tools_task.
  3. T+227ms - Summarization's first update_chat_ctx completes (summary inserted).
  4. T+434ms - Summarization's second update_chat_ctx runs (delete old items). Inside _create_update_chat_ctx_events, the diff compares _remote_chat_ctx (which contains the function_call) against the caller's ChatContext (which does not, since the function_call hasn't reached _agent._chat_ctx yet). The diff puts the function_call in to_remove. The _is_content_empty guard only protects message items with empty content, so function_call items pass through. A conversation.item.delete event is sent. The server confirms deletion.
  5. T+1,057ms - _tool_execution_started_cb finally fires, appending the now server-deleted function_call to _agent._chat_ctx. Permanent divergence between the two contexts.
  6. T+2,474ms - The server sends conversation.item.added for the next item in its response, with previous_item_id pointing to the deleted function_call. It no longer exists in _remote_chat_ctx, so the insert fails.
  7. T+2,474ms onward - Every subsequent item that chains off the failed one also fails. Four consecutive items are lost. The agent loses track of the conversation.

So based on the logs, the function_call was deleted ~400ms after creation but ~600ms before tool execution even started. If I understand you correctly, your scenario assumes the tool has already executed and update_chat_ctx has already synced the tool call + output to the server. In our case, the function_call was deleted 400ms after arriving from the server but 600ms before _tool_execution_started_cb even fired. No tool output exists yet, no response is being generated from tool results. The function_call just hasn't reached _agent._chat_ctx yet, so the diff treats it as removed. It's a race between update_chat_ctx and the delayed arrival into local context, not between update_chat_ctx and generate_reply.

That is not to say, the scenario you outlined isn't realistic, that could also be a real race condition. But it looks like two different issues to me.

Sorry, if I am still misunderstanding your proposal here 🙏

Perhaps, if we have a branch with the lock solution, I could apply it to our product and give it a test. It should be quite quick to confirm if it solves our issue.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 13, 2026

yes, that's what my assumption.

T+2,474ms - The server sends conversation.item.added for the next item in its response, with previous_item_id pointing to the deleted function_call. It no longer exists in _remote_chat_ctx, so the insert fails.

from your logs, the error happened here, way after the function call was deleted at T+434ms.

the question is how this new item generated and why it has the deleted FC as the previous item, is it the tool response or a voice triggered response? it looks like it's not tool response since the previous_item is the function call, otherwise the previous one should be the tool output.

if it's a voice triggered response, it's returned after the FC deleted but use the FC as the previous item. then it looks like a race condition on OAI's side to me. IMO this is a general issue that may happen to FC or regular messages, only if the latest item deleted while a new response is in creating.

proposal

  • add locks in our side for the update_chat_ctx and generate_reply to protect part of it
  • for the voice triggered response, add a fallback that we always add the item missing the previous item to the end of the remote_chat_ctx to avoid follow-up errors
  • update tool output should add tool call back if it's deleted

for your specific case, maybe you can always keep the latest item after summarization. if you already have that logic then the in-flight FC is the exception that it's deleted unexpectedly. and that's why preventing it from deletion fixed the issue?

@StianHanssen
Copy link
Copy Markdown
Contributor Author

StianHanssen commented Mar 13, 2026

the question is how this new item generated and why it has the deleted FC as the previous item, is it the tool response or a voice triggered response?

I am quite puzzled by this as well. I thought it could be the function_call_output was placed at the end of the conversation (via the diff's previous_item_id, not next to the function_call) because the function_call was already deleted from the snapshot when the diff ran, but I am really not sure if that could be the case 🤔

add locks in our side for the update_chat_ctx and generate_reply to protect part of it

This seems sensible to me. I am a little concerned of how much it can delay response from the model, as the timeout for update_chat_ctx is 5 seconds. Worth testing to see the impact.

for the voice triggered response, add a fallback that we always add the item missing the previous item to the end of the remote_chat_ctx to avoid follow-up errors

Hmm, yes, that could maybe work. I think it might trigger diff to try to reorder again on the next update_chat_ctx, which may have a slight risk.

update tool output should add tool call back if it's deleted

Yeah, this makes total sense if we can assume function_call_output always arrive after function_call. I really do think this should be what normally happens, but I feel a bit of unease because I have logs showing otherwise.

for your specific case, maybe you can always keep the latest item after summarization. if you already have that logic then the in-flight FC is the exception that it's deleted unexpectedly. and that's why preventing it from deletion fixed the issue?

That is exactly right. Our summarization only targets old items (50% of the oldest chat context). The in-flight FC was brand new, it should never have been in to_remove. The diff caught it because it was in _remote_chat_ctx but hadn't reached _agent._chat_ctx yet. This was what caught us by surprise initially because we only operate on old items, but somehow it affects new items trying to insert.

Perhaps a good next action on my part is to try to understand why the next item was not function_call_output. That could allow me to be certain this solution would also resolve my issue.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

@longcw I did some more investigation. I added logging to _create_update_chat_ctx_events to capture the diff output, the remote/local context at diff time, and each outgoing create/delete event with its previous_item_id. Here is the timeline from a reproduction.

Starting state:

Server conversation: [msg1(user), msg2(assistant), msg3(user), msg4(assistant), msg5(user)]

The server has just received user speech (msg5) and starts generating a response.

Step 1: Server generates a response containing a function_call

Server creates function_call as part of response. conversation.item.added arrives, inserting it into _remote_chat_ctx. It has NOT yet reached _agent._chat_ctx (that only happens when _tool_execution_started_cb fires).

  • Server: [msg1, msg2, msg3, msg4, msg5, function_call]
  • _remote_chat_ctx: [msg1, msg2, msg3, msg4, msg5, function_call]
  • _agent._chat_ctx: [msg1, msg2, msg3, msg4, msg5] (no function_call yet)

Step 2: User speaks. The server starts generating a new response (VAD)

While the function_call is waiting to be dispatched to tool execution, the agent is triggered via VAD. The server detects end of speech and commits a new response while the function_call still exists on the server, so the new response item (msg6) gets committed with previous_item_id = function_call.

That response was still generating (audio streaming etc), so conversation.item.added for msg6 had not arrived yet.

Step 3: Summarization's update_chat_ctx runs

Our summarization calls self.agent.update_chat_ctx(chat_ctx) which passes a copy of _agent._chat_ctx to the diff (we make the copy contains the newest items from _agent._chat_ctx)

Diff compares:

  • _remote_chat_ctx: [msg1, msg2, msg3, msg4, msg5, function_call]
  • chat_ctx: [summ1, msg4, msg5]

function_call is in remote but not in local. Diff puts it in to_remove. DELETE sent. Server confirms. function_call removed from _remote_chat_ctx.

Server: [summ1, msg4, msg5] (function_call deleted)
_remote_chat_ctx: [summ1, msg4, msg5]
_agent._chat_ctx: [summ1, msg4, msg5]

Step 4: Tool executes

_tool_execution_started_cb fires. function_call appended to _agent._chat_ctx. Tool runs and completes.

_remote_chat_ctx: [summ1, msg4, msg5]
_agent._chat_ctx: [summ1, msg4, msg5, function_call]

Step 5: LiveKit's internal update_chat_ctx sends function_call_output

agent_activity.py line 2757-2758 runs:

chat_ctx = self._rt_session.chat_ctx.copy()   # reads _remote_chat_ctx, NOT _agent._chat_ctx
chat_ctx.items.extend(new_fnc_outputs)         # appends function_call_output

This builds the context from _remote_chat_ctx (which does NOT have function_call) and appends function_call_output. Calls self._rt_session.update_chat_ctx(chat_ctx).

Diff compares:

  • _remote_chat_ctx: [summ1, msg4, msg5]
  • passed context: [summ1, msg4, msg5, function_call_output]

Only function_call_output is new. Diff creates it with previous_item_id = msg5 (the last item). Server accepts. conversation.item.added arrives with previous_item_id = msg5. Insert into _remote_chat_ctx succeeds.

_remote_chat_ctx: [summ1, msg4, msg5, function_call_output]

Step 6: A later server-initiated response arrives

The next failing item is a server-created message with previous_item_id = function_call.

This item belongs to the response started in Step 2, so the predecessor link had already been committed before the delete in Step 3.

function_call is not in _remote_chat_ctx. Insert fails. Every subsequent item that chains off msg6 also fails. Cascade begins.

TLDR:

Two different update_chat_ctx callers produce different results:

  • Summarization passes _agent._chat_ctx (no function_call) which causes the deletion
  • LiveKit's internal tool output code reads _remote_chat_ctx (also no function_call) so the function_call_output is placed at the end, disconnected from its function_call

And the first item that fails to insert is not the function_call_output but the response message, because:

  • function_call_output is client-initiated, we control its previous_item_id (set to msg5, which exists)
  • The response message is server-initiated, the server controls its previous_item_id (set to function_call, which was deleted)

I also found that this happened each time I reproduced it. It was never function_call_output being the next item that fails to insert.

@StianHanssen
Copy link
Copy Markdown
Contributor Author

StianHanssen commented Mar 16, 2026

In conclusion, these are my thoughts:

I think your proposal may technically work, though I cannot say for certain without testing it. Because it attempts to repair the issue rather than prevent it, I think it comes with some risks:

  • _remote_chat_ctx will no longer always represent the server state exactly, and there is some risk of drift, because orphaned items would be appended at the end. If we go that route, I think it would be important to document clearly that _remote_chat_ctx can diverge from the server in certain edge cases.
  • compute_chat_ctx_diff may later try to restore the order to match local context, which could result in additional delete/create operations. Since those operations can affect the front of the context, there is a risk of negatively impacting the agent.

My personal view is:

  • In general, I think it is risky to manipulate the front of the chat context on the OpenAI server, because we know the server uses that context to generate responses (more recent = more important), and because we send individual operations rather than a transaction. That means the server can begin generating a response in the middle of a multi-step update. So while we can keep improving the robustness of front-of-context operations, I would rather avoid putting ourselves in that situation in the first place. I would even suggest documenting that this should generally be avoided where possible.
  • I think the shared lock you propose makes sense for the race condition you described.
    • However, it does not prevent the issue I am describing. However, appending orphaned items at the end would circumvent the issue with the risks described above.
  • When it comes to the particular problem I am trying to solve, I think a preventative solution such as the one I put forward in this PR is safer. It also has the benefit of maintaining _remote_chat_ctx as a mirror of the OpenAI server.
  • That said, I would be happy to test your proposal as well and see whether it introduces any adverse effects. I can understand the desirability of keeping the logic within the plugin.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 16, 2026

Hi @StianHanssen, thanks for your investigation! I agree with your concerns about my proposal and I noticed them as well when I was thinking about the implementation.

the root cause is the out-of-sync between rt_session._remote_chat_ctx and agent._chat_ctx, and I think it may happen to not only the function calls but also regular messages. the chat message is added to the agent._chat_ctx after the audio playout, while the one in remote_chat_ctx is once it's created.

I'll create a fix that creates placeholder items in local chat_ctx when remote_chat_ctx updated, for both function call and chat messages.

@longcw
Copy link
Copy Markdown
Contributor

longcw commented Mar 16, 2026

this should fixed the issue #5114

@longcw longcw closed this in #5114 Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants