Skip to content

Fix Tuple Encoding#568

Open
datvo06 wants to merge 15 commits intomasterfrom
dn_fix_tuple_encoding
Open

Fix Tuple Encoding#568
datvo06 wants to merge 15 commits intomasterfrom
dn_fix_tuple_encoding

Conversation

@datvo06
Copy link
Contributor

@datvo06 datvo06 commented Feb 13, 2026

This PR closes #566 . The actual problem was Pydantic serializes Python tuple types using prefixItems in JSON Schema, which is valid JSON Schema but not in OpenAI's accepted subset. OpenAI requires either items (for arrays) or properties (for objects).
This affected all tuple forms: tuple[str], tuple[str, ...], tuple[int, str], etc.

This PR follows tuple annotation https://docs.python.org/3/library/typing.html#annotating-tuples and divide the encoding into two cases:

  • tuple[T, ...] and tuple[T] are handled with ImmutableSequenceEncodable (variable length)
  • tuple[T1, T2, ...] or tuple[T, T, T] are handled with TupleEncodable, which creates a pydantic model for serialization.

@datvo06
Copy link
Contributor Author

datvo06 commented Feb 13, 2026

Also added corresponding tests

Copy link
Contributor

@eb8680 eb8680 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a pass at this. I think it needs some revision to avoid exacerbating #553

# required fields in the JSON schema.
#
# https://docs.python.org/3/library/typing.html#annotating-tuples
if (len(args) == 2 and args[1] is Ellipsis) or len(args) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this new logic is specific to tuple. You should probably have separate Encodable.define patterns for Sequence, MutableSequence and tuple. The tuple pattern should return a SequenceEncodable when the type parameter is unspecified (ty is tuple) or variadic (tuple or tuple[T, ...]), but a TupleEncodable when it is a finitary tuple (typing.get_origin(ty) is tuple) and (tuple[()], tuple[T], tuple[T1, T2], etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to three separate dispatch patterns:

  • _encodable_sequence for immutable sequence.
  • _encodable_mutable_sequence for mutable sequence.
  • _encodable_tuple for tuple. When the ty is unspecified or variadic, it returns a SequenceEncodable, otherwise it return a TupleEncodable.



@dataclass
class ImmutableSequenceEncodable[T](MutableSequenceEncodable[T]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subtyping relationship should be reversed: collections.abc.MutableSequence is a subtype of collections.abc.Sequence, so ImmutableSequenceEncodable should just be SequenceEncodable and MutableSequenceEncodable should be a subtype of SequenceEncodable.

assert decoded == value
assert isinstance(decoded, tuple)
# enc type produces an "items" JSON schema (OpenAI compatible)
Model = pydantic.create_model("Model", value=(encodable.enc, ...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the need for Model in all these tests. I wonder if it might be better to land #548 first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Encodable worked in test, but in practice, when I try running tuple encodable, I got errored. Let me see if I can reproduce it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in conjunction with #566, this would also fail.

"""Reproduces the tuple[str] prefixItems schema bug (errors on master).

On master, pydantic generates {"prefixItems": [...], "type": "array"} for
tuple types, which OpenAI rejects with:
  "Invalid schema for function: array schema missing items"
"""

from effectful.handlers.llm import Template
from effectful.handlers.llm.completions import LiteLLMProvider
from effectful.ops.semantics import handler
from effectful.ops.types import NotHandled


@Template.define
def propose(tool_name: str, params: tuple[str, str]) -> str:
    """Propose how to call {tool_name} with {params}."""
    raise NotHandled


@Template.define
def haiku(topic: str) -> str:
    """Write a haiku about {topic}."""
    raise NotHandled


with handler(LiteLLMProvider()):
    print(haiku("cherry blossoms"))

Result:

    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Users/nguyendat/Marc/effectful/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2378, in exception_type
    raise e
  File "/Users/nguyendat/Marc/effectful/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 444, in exception_type
    raise BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - Invalid schema for function 'propose': In context=('properties', 'params'), array schema missing items.

That's because OpenAI doesn't like pydantic's tuple schema on the prefixItems:

schema: {
  "properties": {
    "params": {
      "maxItems": 2,
      "minItems": 2,
      "prefixItems": [
        {
          "type": "string"
        },
        {
          "type": "string"
        }
      ],
      "title": "Params",
      "type": "array"
    }
  },
  "required": [
    "params"
  ],
  "title": "Debug",
  "type": "object"
}

Copy link
Contributor Author

@datvo06 datvo06 Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the model_cls as substitute for tuple, it's ok:

schema: {
  "$defs": {
    "TupleItems": {
      "additionalProperties": false,
      "properties": {
        "item_0": {
          "title": "Item 0",
          "type": "string"
        },
        "item_1": {
          "title": "Item 1",
          "type": "string"
        }
      },
      "required": [
        "item_0",
        "item_1"
      ],
      "title": "TupleItems",
      "type": "object"
    }
  },
  "properties": {
    "params": {
      "$ref": "#/$defs/TupleItems"
    }
  },
  "required": [
    "params"
  ],
  "title": "Debug",
  "type": "object"
}

model_instance = self.model_cls(
**{f"item_{i}": v for i, v in enumerate(encoded_value)}
)
adapter = pydantic.TypeAdapter(self.model_cls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about this. Why do we need adapter if model_cls is already a BaseModel? Alternatively, why do we need model_cls at all? model_cls feels like an unnecessary layer of indirection.

Copy link
Contributor Author

@datvo06 datvo06 Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we don't need TypeAdapter here at all. We still need model_cls to avoid the serialization error.
I'm removing type adapter here.

encoded_ty = typing.cast(
type[typing.Any],
tuple[*(enc.enc for enc in element_encoders)], # type: ignore
typing.Annotated[model_cls, pydantic.AfterValidator(_model_to_tuple)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Shouldn't this be happening in decode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was silly. I wanted to make minimal modifications to test_type_to_encodable_type_tuple(), which tested that value was still a tuple. I moved it to decode now.

encoded = encodable.encode(value)
# Encodes to a list (like list[str])
assert isinstance(encoded, list)
assert encoded == ["hello", "world", "foo"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of tests here (including existing ones untouched by this PR, to be fair) that rely on inlined gold values and will break immediately under any changes to Encodable. It would be better if possible to write parametrized tests that check equational laws of Encodable which should be insensitive to implementation details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added both parameterized tests and OpenAI serialization tests.
Even though it is not complete, it's our best effort so far: Neither OpenAI doc nor tools include a complete local schema checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tools with tuple arguments cause llm calls to fail.

2 participants