Conversation
|
Also added corresponding tests |
effectful/handlers/llm/encoding.py
Outdated
| # required fields in the JSON schema. | ||
| # | ||
| # https://docs.python.org/3/library/typing.html#annotating-tuples | ||
| if (len(args) == 2 and args[1] is Ellipsis) or len(args) == 1: |
There was a problem hiding this comment.
All of this new logic is specific to tuple. You should probably have separate Encodable.define patterns for Sequence, MutableSequence and tuple. The tuple pattern should return a SequenceEncodable when the type parameter is unspecified (ty is tuple) or variadic (tuple or tuple[T, ...]), but a TupleEncodable when it is a finitary tuple (typing.get_origin(ty) is tuple) and (tuple[()], tuple[T], tuple[T1, T2], etc).
There was a problem hiding this comment.
I updated it to three separate dispatch patterns:
_encodable_sequencefor immutable sequence._encodable_mutable_sequencefor mutable sequence._encodable_tuplefor tuple. When thetyis unspecified or variadic, it returns aSequenceEncodable, otherwise it return aTupleEncodable.
effectful/handlers/llm/encoding.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class ImmutableSequenceEncodable[T](MutableSequenceEncodable[T]): |
There was a problem hiding this comment.
This subtyping relationship should be reversed: collections.abc.MutableSequence is a subtype of collections.abc.Sequence, so ImmutableSequenceEncodable should just be SequenceEncodable and MutableSequenceEncodable should be a subtype of SequenceEncodable.
| assert decoded == value | ||
| assert isinstance(decoded, tuple) | ||
| # enc type produces an "items" JSON schema (OpenAI compatible) | ||
| Model = pydantic.create_model("Model", value=(encodable.enc, ...)) |
There was a problem hiding this comment.
I'm a little confused about the need for Model in all these tests. I wonder if it might be better to land #548 first.
There was a problem hiding this comment.
I see. Encodable worked in test, but in practice, when I try running tuple encodable, I got errored. Let me see if I can reproduce it.
There was a problem hiding this comment.
Yeah, in conjunction with #566, this would also fail.
"""Reproduces the tuple[str] prefixItems schema bug (errors on master).
On master, pydantic generates {"prefixItems": [...], "type": "array"} for
tuple types, which OpenAI rejects with:
"Invalid schema for function: array schema missing items"
"""
from effectful.handlers.llm import Template
from effectful.handlers.llm.completions import LiteLLMProvider
from effectful.ops.semantics import handler
from effectful.ops.types import NotHandled
@Template.define
def propose(tool_name: str, params: tuple[str, str]) -> str:
"""Propose how to call {tool_name} with {params}."""
raise NotHandled
@Template.define
def haiku(topic: str) -> str:
"""Write a haiku about {topic}."""
raise NotHandled
with handler(LiteLLMProvider()):
print(haiku("cherry blossoms"))Result:
raise exception_type(
^^^^^^^^^^^^^^^
File "/Users/nguyendat/Marc/effectful/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2378, in exception_type
raise e
File "/Users/nguyendat/Marc/effectful/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 444, in exception_type
raise BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - Invalid schema for function 'propose': In context=('properties', 'params'), array schema missing items.That's because OpenAI doesn't like pydantic's tuple schema on the prefixItems:
schema: {
"properties": {
"params": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "string"
},
{
"type": "string"
}
],
"title": "Params",
"type": "array"
}
},
"required": [
"params"
],
"title": "Debug",
"type": "object"
}There was a problem hiding this comment.
With the model_cls as substitute for tuple, it's ok:
schema: {
"$defs": {
"TupleItems": {
"additionalProperties": false,
"properties": {
"item_0": {
"title": "Item 0",
"type": "string"
},
"item_1": {
"title": "Item 1",
"type": "string"
}
},
"required": [
"item_0",
"item_1"
],
"title": "TupleItems",
"type": "object"
}
},
"properties": {
"params": {
"$ref": "#/$defs/TupleItems"
}
},
"required": [
"params"
],
"title": "Debug",
"type": "object"
}
effectful/handlers/llm/encoding.py
Outdated
| model_instance = self.model_cls( | ||
| **{f"item_{i}": v for i, v in enumerate(encoded_value)} | ||
| ) | ||
| adapter = pydantic.TypeAdapter(self.model_cls) |
There was a problem hiding this comment.
I'm confused about this. Why do we need adapter if model_cls is already a BaseModel? Alternatively, why do we need model_cls at all? model_cls feels like an unnecessary layer of indirection.
There was a problem hiding this comment.
Right, we don't need TypeAdapter here at all. We still need model_cls to avoid the serialization error.
I'm removing type adapter here.
effectful/handlers/llm/encoding.py
Outdated
| encoded_ty = typing.cast( | ||
| type[typing.Any], | ||
| tuple[*(enc.enc for enc in element_encoders)], # type: ignore | ||
| typing.Annotated[model_cls, pydantic.AfterValidator(_model_to_tuple)], |
There was a problem hiding this comment.
Why is this necessary? Shouldn't this be happening in decode?
There was a problem hiding this comment.
It was silly. I wanted to make minimal modifications to test_type_to_encodable_type_tuple(), which tested that value was still a tuple. I moved it to decode now.
tests/test_handlers_llm_encoding.py
Outdated
| encoded = encodable.encode(value) | ||
| # Encodes to a list (like list[str]) | ||
| assert isinstance(encoded, list) | ||
| assert encoded == ["hello", "world", "foo"] |
There was a problem hiding this comment.
There are a bunch of tests here (including existing ones untouched by this PR, to be fair) that rely on inlined gold values and will break immediately under any changes to Encodable. It would be better if possible to write parametrized tests that check equational laws of Encodable which should be insensitive to implementation details.
There was a problem hiding this comment.
I added both parameterized tests and OpenAI serialization tests.
Even though it is not complete, it's our best effort so far: Neither OpenAI doc nor tools include a complete local schema checking.
This PR closes #566 . The actual problem was Pydantic serializes Python tuple types using prefixItems in JSON Schema, which is valid JSON Schema but not in OpenAI's accepted subset. OpenAI requires either items (for arrays) or properties (for objects).
This affected all tuple forms: tuple[str], tuple[str, ...], tuple[int, str], etc.
This PR follows tuple annotation https://docs.python.org/3/library/typing.html#annotating-tuples and divide the encoding into two cases:
tuple[T, ...] and tuple[T]are handled with ImmutableSequenceEncodable (variable length)tuple[T1, T2, ...] or tuple[T, T, T]are handled with TupleEncodable, which creates a pydantic model for serialization.