Skip to content

Comments

Add support for final-answer tool calls#570

Open
eb8680 wants to merge 18 commits intomasterfrom
eb-final-answer
Open

Add support for final-answer tool calls#570
eb8680 wants to merge 18 commits intomasterfrom
eb-final-answer

Conversation

@eb8680
Copy link
Contributor

@eb8680 eb8680 commented Feb 16, 2026

Addresses #549

This PR adds a new effectful.ops.types.Annotation for Tool return types, effectful.handlers.llm.template.IsFinal, such that when a Template generates a call to an IsFinal-annotated Tool whose return type is compatible with that of the Template, the result of the tool call will be returned directly from call_assistant and Template.__apply__ as the final result of the Template call with no further call_assistant turns or serialization/postprocessing.

This PR corresponds to one fairly conservative corner of the design space sketched in #549:

  • Only well-formed and successful tool calls can be used as final answers - if an LLM-generated IsFinal tool call cannot be decoded with valid arguments and a static output type that matches the original Template's, or if executing the tool call triggers an error at runtime that is captured by RetryLLMHandler, the IsFinal annotation will be disregarded and the LLM will proceed to another call_assistant step as usual.
  • it does not allow a Template to use arbitrary type-compatible but un-annotated Tools to get a final answer - if a Tool is not an explicitly IsFinal-annotated final-answer tool, its output will always be sent through at least one more call_assistant step.
  • It does not allow a Template to use an IsFinal-annotated Tool with incompatible return type as a non-final tool; attempting to call such a tool will trigger an error.
  • It does not require the LLM to always generate an IsFinal tool call to compute a final answer when at least one type-compatible IsFinal-annotated Tool is available - the LLM can use different tools or even none at all to get a result.

Any or all of these points might be things to consider changing prior to landing this PR - I would expect the functionality in #549 and especially this instantiation to be useful mostly in cases like #526 where we have something very general like a code-generation or text-to-image-generation tool which can be used with many Templates and which we always want to use whenever it is type-compatible.

This would probably be much easier to use in conjunction with polymorphism #489

@eb8680 eb8680 linked an issue Feb 16, 2026 that may be closed by this pull request
@eb8680 eb8680 marked this pull request as ready for review February 24, 2026 07:51
@eb8680 eb8680 requested a review from datvo06 February 24, 2026 07:51
@datvo06
Copy link
Contributor

datvo06 commented Feb 24, 2026

I'm rerunning the test first. It seems like we ran out of quota for testing notebook.

Copy link
Contributor

@datvo06 datvo06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first glance, I think the check in call_assistant can cause some troubles (below). I don't have a clear answer on how to address them yet, but we can add tests, mark xfail and create issues.

return_annotation = typing.get_args(tool_sig.return_annotation)[0]
if not issubclass(
_simple_type(return_annotation), response_format.base
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might cause trouble in cases where we use IsFinal with return_annotation that doesn't match the outer template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this script fail due to mismatching between final_text return type and Template return type. But I guess it's ok.

from typing import Annotated

from effectful.handlers.llm import Template, Tool
from effectful.handlers.llm.completions import (
    LiteLLMProvider,
    ToolCallDecodingError,
    completion,
)
from effectful.handlers.llm.template import IsFinal
from effectful.ops.semantics import handler
from effectful.ops.syntax import ObjectInterpretation, implements
from effectful.ops.types import NotHandled
from tests.test_handlers_llm_template import make_tool_call_response


@Tool.define
def final_text() -> Annotated[str, IsFinal]:
    """Return a final text result."""
    return "123"


@Template.define
def task() -> int:
    """Call final_text."""
    raise NotHandled


with handler(LiteLLMProvider(model="gpt-4o-mini")):
    task()

Result:

  File "/Users/nguyendat/Marc/effectful/effectful/handlers/llm/completions.py", line 239, in call_assistant
    raise ToolCallDecodingError(
effectful.handlers.llm.completions.ToolCallDecodingError: Error decoding tool call 'final_text': IsFinal tool 'final_text' has signature <Signature () -> Annotated[str, <effectful.handlers.llm.template._IsFinalAnnotation object at 0x100d3d100>]>, but the enclosing template expects <class 'int'>.. Please provide a valid response and try again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is more troublesome, but it is also because python forbids class check. Still, this would work fine if we don't have the check there for IsFinal.

from typing import Annotated, TypedDict

from effectful.handlers.llm import Template, Tool
from effectful.handlers.llm.completions import (
    LiteLLMProvider,
    ToolCallDecodingError,
    completion,
)
from effectful.handlers.llm.template import IsFinal
from effectful.ops.semantics import handler
from effectful.ops.syntax import ObjectInterpretation, implements
from effectful.ops.types import NotHandled
from tests.test_handlers_llm_template import make_tool_call_response


class Payload(TypedDict):
    x: int


@Tool.define
def final_payload() -> Annotated[Payload, IsFinal]:
    """Return final payload."""
    return {"x": 1}


@Template.define
def task() -> Payload:
    """Call final_payload."""
    raise NotHandled


with handler(LiteLLMProvider(model="gpt-4o-mini")):
    task()

Result:

  File "/Users/nguyendat/Marc/effectful/effectful/handlers/llm/completions.py", line 239, in call_assistant
    raise ToolCallDecodingError(
effectful.handlers.llm.completions.ToolCallDecodingError: Error decoding tool call 'final_payload': TypedDict does not support instance and class checks. Please provide a valid response and try again.
(effectful) ➜  effectful git:(eb-final-answer) ✗ python effectful/handlers/llm/repro_typed_dict.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Literal:

from typing import Annotated, Literal, TypedDict

from effectful.handlers.llm import Template, Tool
from effectful.handlers.llm.completions import (
    LiteLLMProvider,
    ToolCallDecodingError,
    completion,
)
from effectful.handlers.llm.template import IsFinal
from effectful.ops.semantics import handler
from effectful.ops.syntax import ObjectInterpretation, implements
from effectful.ops.types import NotHandled
from tests.test_handlers_llm_template import make_tool_call_response


@Tool.define
def final_payload() -> Annotated[Literal[1, 2, 3], IsFinal]:
    """Return final payload."""
    return 1


@Template.define
def task() -> Literal[1, 2, 3]:
    """Call final_payload."""
    raise NotHandled


with handler(LiteLLMProvider(model="gpt-4o-mini")):
    task()

Result:

File "/Users/nguyendat/Marc/effectful/effectful/handlers/llm/completions.py", line 239, in call_assistant
    raise ToolCallDecodingError(
effectful.handlers.llm.completions.ToolCallDecodingError: Error decoding tool call 'final_payload': Subscripted generics cannot be used with class and instance checks. Please provide a valid response and try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior on these examples is consistent with the design choices laid out in the PR description, so those choices probably need to be revisited. For maximum flexibility we might want to let the LLM choose whether a tool call is final, instead of relying solely on the annotation as in this PR. For example, we could inject a fake is_final argument into every tool schema sent to the LLM and read off its value from the tool call request. We should probably also collect a few more examples like this that reflect more realistic use cases of this behavior.

@jfeser
Copy link
Contributor

jfeser commented Feb 25, 2026

If the primary goal is to enable the LLM to use tools that produce output that doesn't roundtrip through text, it might be simpler to attack that problem directly. I'm skeptical that tool calling training will allow LLMs to effectively use tools that have the side effect of ending the interaction.

One alternative could be to encode "unencodable" tool output using a textual pointer. This could be a hash or some other unique string that the LLM could either produce as output or pass as an argument to a further tool call. The encoding and decoding logic would be responsible for maintaining the shared state that would map these pointers back to their objects. This approach would also have the pleasant side effect of enabling the LLM to chain tools that produce e.g. images or video without needing to write a script.

@eb8680
Copy link
Contributor Author

eb8680 commented Feb 25, 2026

I'm skeptical that tool calling training will allow LLMs to effectively use tools that have the side effect of ending the interaction.

Isn't this just a funny sort of structured output, followed by extra information from the tool call in the next user message? It's possible that it doesn't work that well in practice but I know it's a pattern that's used quite a bit in smolagents, among other libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Templates should be able to return Tool call results as final answers

3 participants