Skip to content

Commit 7aae28c

Browse files
authored
Merge pull request #92 from openai/docs-updates-for-entities-and-image-gen
[docs] Updates for entities and image generation
2 parents 494a96d + 6646f83 commit 7aae28c

File tree

4 files changed

+226
-4
lines changed

4 files changed

+226
-4
lines changed

docs/guides/add-annotations.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,14 @@ yield ThreadItemDoneEvent(
5858

5959
## Annotating with custom entities
6060

61-
Inline annotations are not yet supported for entity sources, but you can still attach `EntitySource` items as annotations so they appear in the Sources list below the message.
61+
You can attach `EntitySource` items as annotations to show entity references inline in assistant text and in the **Sources** list below the message.
62+
63+
Entity annotations support a few UI-focused fields:
64+
65+
- `icon`: Controls the icon shown for the entity in the default inline/hover UI.
66+
- `label`: Customizes what's shown in the default entity hover header (when you are not rendering a custom preview).
67+
- `inline_label`: Shows a label inline instead of an icon.
68+
- `interactive=True`: Wires the annotation to client-side callbacks (`ChatKitOptions.entities.onClick` and `ChatKitOptions.entities.onRequestPreview`).
6269

6370
```python
6471
from datetime import datetime
@@ -70,15 +77,22 @@ from chatkit.types import (
7077
ThreadItemDoneEvent,
7178
)
7279

80+
text = "Here are the ACME account details for reference."
81+
7382
annotations = [
7483
Annotation(
7584
source=EntitySource(
7685
id="customer_123",
7786
title="ACME Corp",
7887
description="Enterprise plan · 500 seats",
7988
icon="suitcase",
89+
label="Customer",
90+
interactive=True,
91+
# Free-form data object passed to your client-side entity callbacks
8092
data={"href": "https://crm.example.com/customers/123"},
81-
)
93+
),
94+
# `index` controls where the inline marker is placed in the text.
95+
index=text.index("ACME") + len("ACME"),
8296
)
8397
]
8498

@@ -89,12 +103,12 @@ yield ThreadItemDoneEvent(
89103
created_at=datetime.now(),
90104
content=[
91105
AssistantMessageContent(
92-
text="Here are the ACME account details for reference.",
106+
text=text,
93107
annotations=annotations,
94108
)
95109
],
96110
)
97111
)
98112
```
99113

100-
Provide richer previews and navigation by handling [`entities.onRequestPreview`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onrequestpreview) and [`entities.onClick`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onclick) in ChatKit.js, using the `data` payload to pass entity information and deep link into your app.
114+
Provide richer previews and navigation by handling [`entities.onRequestPreview`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onrequestpreview) and [`entities.onClick`](https://openai.github.io/chatkit-js/api/openai/chatkit/type-aliases/entitiesoption/#onclick) in ChatKit.js. These callbacks are only invoked for entity annotations with `interactive=True`; use the `data` payload to pass entity information and deep link into your app.

docs/guides/respond-to-user-message.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,7 @@ async def load_document(ctx: RunContextWrapper[AgentContext], document_id: str):
274274
- [Update the client during a response](update-client-during-response.md)
275275
- [Build interactive responses with widgets](build-interactive-responses-with-widgets.md)
276276
- [Add annotations in assistant messages](add-annotations.md)
277+
- [Stream generated images](stream-generated-images.md)
277278
- [Keep your app in sync with ChatKit](keep-your-app-in-sync-with-chatkit.md)
278279
- [Let your app draft and send messages](let-your-app-draft-and-send-messages.md)
279280
- [Handle feedback](handle-feedback.md)
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Stream generated images
2+
3+
Stream generated images to the client while your agent is running, and persist them in a storage-friendly format.
4+
5+
This guide covers:
6+
7+
- Adding an image generation tool to your agent
8+
- Converting streamed base64 images into URLs so your datastore does not store raw base64 strings
9+
- Converting generated image thread items to model input for continued conversation
10+
- Streaming partial images (progressive previews)
11+
12+
## Add an image generation tool to your agent
13+
14+
To let the model generate images, add the Agents SDK image generation tool to your agent's tool list.
15+
16+
```python
17+
from agents import Agent
18+
from agents.tool import ImageGenerationTool
19+
20+
21+
agent = Agent(
22+
name="designer",
23+
instructions="Generate images when asked.",
24+
tools=[ImageGenerationTool(tool_config={"type": "image_generation"})],
25+
)
26+
```
27+
28+
Once enabled, `stream_agent_response` will translate image generation output into ChatKit thread items:
29+
30+
- A `GeneratedImageItem` is added when an image generation call starts.
31+
- It is updated (for partial images) and finalized when the result arrives.
32+
33+
## Avoid storing raw base64 in your datastore
34+
35+
By default, ChatKit stores generated images as a data URL (for example, `data:image/png;base64,...`) by using `ResponseStreamConverter.base64_image_to_url`.
36+
37+
That's convenient for demos, but it can bloat your persisted thread items. In production, you'll usually want to:
38+
39+
- Write the bytes to object storage / a file store
40+
- Persist only a URL (or a signed URL) on the `GeneratedImageItem`
41+
42+
### Override `ResponseStreamConverter.base64_image_to_url`
43+
44+
Subclass `ResponseStreamConverter` and override `base64_image_to_url`. This method is called for both:
45+
46+
- Final images
47+
- Partial images (when `partial_images` streaming is enabled)
48+
49+
```python
50+
import base64
51+
52+
from chatkit.agents import ResponseStreamConverter
53+
54+
55+
class MyResponseStreamConverter(ResponseStreamConverter):
56+
async def base64_image_to_url(
57+
self,
58+
image_id: str,
59+
base64_image: str,
60+
partial_image_index: int | None = None,
61+
) -> str:
62+
# `image_id` stays the same for the whole generation call (including partial updates).
63+
# Use `partial_image_index` to derive distinct blob IDs for each partial image.
64+
blob_id = (
65+
image_id
66+
if partial_image_index is None
67+
else f"{image_id}-partial-{partial_image_index}"
68+
)
69+
# Replace `upload_blob(...)` with your app's storage call (S3, GCS, filesystem, etc).
70+
# It should return a URL that your client can load later.
71+
url = upload_blob(
72+
blob_id,
73+
base64.b64decode(base64_image),
74+
"image/png",
75+
)
76+
return url
77+
```
78+
79+
### Pass your converter to `stream_agent_response`
80+
81+
Create your converter and pass it into `stream_agent_response`. The returned URL will be what gets persisted on the `GeneratedImageItem`.
82+
83+
```python
84+
from agents import Runner
85+
86+
from chatkit.agents import AgentContext, stream_agent_response
87+
88+
89+
async def respond(...):
90+
agent_context = AgentContext(
91+
thread=thread,
92+
store=self.store,
93+
request_context=context,
94+
previous_response_id=thread.previous_response_id,
95+
)
96+
result = Runner.run_streamed(agent, input_items, context=agent_context)
97+
98+
async for event in stream_agent_response(
99+
agent_context,
100+
result,
101+
converter=MyResponseStreamConverter(),
102+
):
103+
yield event
104+
```
105+
106+
## Convert generated image thread items to model input
107+
108+
On later turns, you'll often feed prior thread items (including generated images) back into the model as context.
109+
110+
By default, `ThreadItemConverter.generated_image_to_input` sends the generated image back to the model as:
111+
112+
- A short text preface
113+
- An `input_image` content part with `image_url=item.image.url`
114+
115+
If `item.image.url` is not publicly reachable by the model runtime (for example, it's a private intranet URL, or a localhost URL, or requires cookies), image understanding and image-to-image flows may fail.
116+
117+
Two common fixes:
118+
119+
- Convert the stored image back into a base64 `data:` URL when building model input
120+
- Generate a temporary public (signed) URL for the duration of the run
121+
122+
### Override `ThreadItemConverter.generated_image_to_input`
123+
124+
Override `generated_image_to_input` and replace `image_url` with something the image API can fetch.
125+
126+
```python
127+
import base64
128+
129+
from openai.types.responses import ResponseInputImageParam, ResponseInputTextParam
130+
from openai.types.responses.response_input_item_param import Message
131+
132+
from chatkit.agents import ThreadItemConverter
133+
from chatkit.types import GeneratedImageItem
134+
135+
136+
class MyThreadItemConverter(ThreadItemConverter):
137+
async def generated_image_to_input(self, item: GeneratedImageItem):
138+
if not item.image:
139+
return None
140+
141+
# Option A: rehydrate to a data URL (works when you can fetch bytes yourself).
142+
# Replace `download_blob(...)` with your app's storage call to fetch the image bytes.
143+
image_bytes = download_blob(item.image.id)
144+
b64 = base64.b64encode(image_bytes).decode("utf-8")
145+
image_url = f"data:image/png;base64,{b64}"
146+
147+
# Option B: generate a temporary public URL instead:
148+
# image_url = create_signed_url(item.image.id, expires_in_seconds=60)
149+
150+
return Message(
151+
type="message",
152+
role="user",
153+
content=[
154+
ResponseInputTextParam(
155+
type="input_text",
156+
text="The following image was generated by the agent.",
157+
),
158+
ResponseInputImageParam(
159+
type="input_image",
160+
detail="auto",
161+
image_url=image_url,
162+
),
163+
],
164+
)
165+
```
166+
167+
When building your model input, use your custom converter instead of `simple_to_agent_input`:
168+
169+
```python
170+
input_items = await MyThreadItemConverter().to_agent_input(items)
171+
```
172+
173+
## Stream partial images (progressive previews)
174+
175+
You can stream partial images so users see progressive previews as the image is being generated.
176+
177+
### Enable partial images in the tool config
178+
179+
Set `partial_images` in the tool config:
180+
181+
```python
182+
from agents.tool import ImageGenerationTool
183+
184+
image_tool = ImageGenerationTool(
185+
tool_config={"type": "image_generation", "partial_images": 3},
186+
)
187+
```
188+
189+
### Show progress for partial images
190+
191+
Pass the same `partial_images` value to `ResponseStreamConverter` (or your subclass). ChatKit uses it to compute a `progress` value (between 0 and 1) for each partial image update.
192+
193+
```python
194+
async for event in stream_agent_response(
195+
agent_context,
196+
result,
197+
converter=MyResponseStreamConverter(partial_images=3),
198+
):
199+
yield event
200+
```
201+
202+
During the run, ChatKit will emit:
203+
204+
- `ThreadItemAddedEvent` for the initial `GeneratedImageItem`
205+
- `ThreadItemUpdatedEvent` with `GeneratedImageUpdated(image=..., progress=...)` for each partial image
206+
- `ThreadItemDoneEvent` when the final image arrives

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ nav:
5555
- Update the client during a response: guides/update-client-during-response.md
5656
- Build interactive responses with widgets: guides/build-interactive-responses-with-widgets.md
5757
- Add annotations in assistant messages: guides/add-annotations.md
58+
- Stream generated images: guides/stream-generated-images.md
5859
- Keep your app in sync with ChatKit: guides/keep-your-app-in-sync-with-chatkit.md
5960
- Let your app draft and send messages: guides/let-your-app-draft-and-send-messages.md
6061
- Handle feedback: guides/handle-feedback.md

0 commit comments

Comments
 (0)