|
| 1 | +# Stream generated images |
| 2 | + |
| 3 | +Stream generated images to the client while your agent is running, and persist them in a storage-friendly format. |
| 4 | + |
| 5 | +This guide covers: |
| 6 | + |
| 7 | +- Adding an image generation tool to your agent |
| 8 | +- Converting streamed base64 images into URLs so your datastore does not store raw base64 strings |
| 9 | +- Converting generated image thread items to model input for continued conversation |
| 10 | +- Streaming partial images (progressive previews) |
| 11 | + |
| 12 | +## Add an image generation tool to your agent |
| 13 | + |
| 14 | +To let the model generate images, add the Agents SDK image generation tool to your agent's tool list. |
| 15 | + |
| 16 | +```python |
| 17 | +from agents import Agent |
| 18 | +from agents.tool import ImageGenerationTool |
| 19 | + |
| 20 | + |
| 21 | +agent = Agent( |
| 22 | + name="designer", |
| 23 | + instructions="Generate images when asked.", |
| 24 | + tools=[ImageGenerationTool(tool_config={"type": "image_generation"})], |
| 25 | +) |
| 26 | +``` |
| 27 | + |
| 28 | +Once enabled, `stream_agent_response` will translate image generation output into ChatKit thread items: |
| 29 | + |
| 30 | +- A `GeneratedImageItem` is added when an image generation call starts. |
| 31 | +- It is updated (for partial images) and finalized when the result arrives. |
| 32 | + |
| 33 | +## Avoid storing raw base64 in your datastore |
| 34 | + |
| 35 | +By default, ChatKit stores generated images as a data URL (for example, `data:image/png;base64,...`) by using `ResponseStreamConverter.base64_image_to_url`. |
| 36 | + |
| 37 | +That's convenient for demos, but it can bloat your persisted thread items. In production, you'll usually want to: |
| 38 | + |
| 39 | +- Write the bytes to object storage / a file store |
| 40 | +- Persist only a URL (or a signed URL) on the `GeneratedImageItem` |
| 41 | + |
| 42 | +### Override `ResponseStreamConverter.base64_image_to_url` |
| 43 | + |
| 44 | +Subclass `ResponseStreamConverter` and override `base64_image_to_url`. This method is called for both: |
| 45 | + |
| 46 | +- Final images |
| 47 | +- Partial images (when `partial_images` streaming is enabled) |
| 48 | + |
| 49 | +```python |
| 50 | +import base64 |
| 51 | + |
| 52 | +from chatkit.agents import ResponseStreamConverter |
| 53 | + |
| 54 | + |
| 55 | +class MyResponseStreamConverter(ResponseStreamConverter): |
| 56 | + async def base64_image_to_url( |
| 57 | + self, |
| 58 | + image_id: str, |
| 59 | + base64_image: str, |
| 60 | + partial_image_index: int | None = None, |
| 61 | + ) -> str: |
| 62 | + # `image_id` stays the same for the whole generation call (including partial updates). |
| 63 | + # Use `partial_image_index` to derive distinct blob IDs for each partial image. |
| 64 | + blob_id = ( |
| 65 | + image_id |
| 66 | + if partial_image_index is None |
| 67 | + else f"{image_id}-partial-{partial_image_index}" |
| 68 | + ) |
| 69 | + # Replace `upload_blob(...)` with your app's storage call (S3, GCS, filesystem, etc). |
| 70 | + # It should return a URL that your client can load later. |
| 71 | + url = upload_blob( |
| 72 | + blob_id, |
| 73 | + base64.b64decode(base64_image), |
| 74 | + "image/png", |
| 75 | + ) |
| 76 | + return url |
| 77 | +``` |
| 78 | + |
| 79 | +### Pass your converter to `stream_agent_response` |
| 80 | + |
| 81 | +Create your converter and pass it into `stream_agent_response`. The returned URL will be what gets persisted on the `GeneratedImageItem`. |
| 82 | + |
| 83 | +```python |
| 84 | +from agents import Runner |
| 85 | + |
| 86 | +from chatkit.agents import AgentContext, stream_agent_response |
| 87 | + |
| 88 | + |
| 89 | +async def respond(...): |
| 90 | + agent_context = AgentContext( |
| 91 | + thread=thread, |
| 92 | + store=self.store, |
| 93 | + request_context=context, |
| 94 | + previous_response_id=thread.previous_response_id, |
| 95 | + ) |
| 96 | + result = Runner.run_streamed(agent, input_items, context=agent_context) |
| 97 | + |
| 98 | + async for event in stream_agent_response( |
| 99 | + agent_context, |
| 100 | + result, |
| 101 | + converter=MyResponseStreamConverter(), |
| 102 | + ): |
| 103 | + yield event |
| 104 | +``` |
| 105 | + |
| 106 | +## Convert generated image thread items to model input |
| 107 | + |
| 108 | +On later turns, you'll often feed prior thread items (including generated images) back into the model as context. |
| 109 | + |
| 110 | +By default, `ThreadItemConverter.generated_image_to_input` sends the generated image back to the model as: |
| 111 | + |
| 112 | +- A short text preface |
| 113 | +- An `input_image` content part with `image_url=item.image.url` |
| 114 | + |
| 115 | +If `item.image.url` is not publicly reachable by the model runtime (for example, it's a private intranet URL, or a localhost URL, or requires cookies), image understanding and image-to-image flows may fail. |
| 116 | + |
| 117 | +Two common fixes: |
| 118 | + |
| 119 | +- Convert the stored image back into a base64 `data:` URL when building model input |
| 120 | +- Generate a temporary public (signed) URL for the duration of the run |
| 121 | + |
| 122 | +### Override `ThreadItemConverter.generated_image_to_input` |
| 123 | + |
| 124 | +Override `generated_image_to_input` and replace `image_url` with something the image API can fetch. |
| 125 | + |
| 126 | +```python |
| 127 | +import base64 |
| 128 | + |
| 129 | +from openai.types.responses import ResponseInputImageParam, ResponseInputTextParam |
| 130 | +from openai.types.responses.response_input_item_param import Message |
| 131 | + |
| 132 | +from chatkit.agents import ThreadItemConverter |
| 133 | +from chatkit.types import GeneratedImageItem |
| 134 | + |
| 135 | + |
| 136 | +class MyThreadItemConverter(ThreadItemConverter): |
| 137 | + async def generated_image_to_input(self, item: GeneratedImageItem): |
| 138 | + if not item.image: |
| 139 | + return None |
| 140 | + |
| 141 | + # Option A: rehydrate to a data URL (works when you can fetch bytes yourself). |
| 142 | + # Replace `download_blob(...)` with your app's storage call to fetch the image bytes. |
| 143 | + image_bytes = download_blob(item.image.id) |
| 144 | + b64 = base64.b64encode(image_bytes).decode("utf-8") |
| 145 | + image_url = f"data:image/png;base64,{b64}" |
| 146 | + |
| 147 | + # Option B: generate a temporary public URL instead: |
| 148 | + # image_url = create_signed_url(item.image.id, expires_in_seconds=60) |
| 149 | + |
| 150 | + return Message( |
| 151 | + type="message", |
| 152 | + role="user", |
| 153 | + content=[ |
| 154 | + ResponseInputTextParam( |
| 155 | + type="input_text", |
| 156 | + text="The following image was generated by the agent.", |
| 157 | + ), |
| 158 | + ResponseInputImageParam( |
| 159 | + type="input_image", |
| 160 | + detail="auto", |
| 161 | + image_url=image_url, |
| 162 | + ), |
| 163 | + ], |
| 164 | + ) |
| 165 | +``` |
| 166 | + |
| 167 | +When building your model input, use your custom converter instead of `simple_to_agent_input`: |
| 168 | + |
| 169 | +```python |
| 170 | +input_items = await MyThreadItemConverter().to_agent_input(items) |
| 171 | +``` |
| 172 | + |
| 173 | +## Stream partial images (progressive previews) |
| 174 | + |
| 175 | +You can stream partial images so users see progressive previews as the image is being generated. |
| 176 | + |
| 177 | +### Enable partial images in the tool config |
| 178 | + |
| 179 | +Set `partial_images` in the tool config: |
| 180 | + |
| 181 | +```python |
| 182 | +from agents.tool import ImageGenerationTool |
| 183 | + |
| 184 | +image_tool = ImageGenerationTool( |
| 185 | + tool_config={"type": "image_generation", "partial_images": 3}, |
| 186 | +) |
| 187 | +``` |
| 188 | + |
| 189 | +### Show progress for partial images |
| 190 | + |
| 191 | +Pass the same `partial_images` value to `ResponseStreamConverter` (or your subclass). ChatKit uses it to compute a `progress` value (between 0 and 1) for each partial image update. |
| 192 | + |
| 193 | +```python |
| 194 | +async for event in stream_agent_response( |
| 195 | + agent_context, |
| 196 | + result, |
| 197 | + converter=MyResponseStreamConverter(partial_images=3), |
| 198 | +): |
| 199 | + yield event |
| 200 | +``` |
| 201 | + |
| 202 | +During the run, ChatKit will emit: |
| 203 | + |
| 204 | +- `ThreadItemAddedEvent` for the initial `GeneratedImageItem` |
| 205 | +- `ThreadItemUpdatedEvent` with `GeneratedImageUpdated(image=..., progress=...)` for each partial image |
| 206 | +- `ThreadItemDoneEvent` when the final image arrives |
0 commit comments