[codex] Support raw image refs for multimodal rendering by eligotts · Pull Request #89 · PrimeIntellect-ai/renderers

eligotts · 2026-06-18T07:04:47Z

Summary

adds generic mmraw:v2 raw multimodal refs in renderers.mm_store, parsed as RawMMRef objects with family, fingerprint, modality, hash, asset id, and adapter-owned payload
emits strict prime_raw_mm_item envelopes instead of processed image payloads for Qwen-VL and Kimi K2.5 image rendering
keeps adapter-specific layout details in renderer-owned payloads (image_grid_thw for Qwen, grid_thws/media token metadata for Kimi)
supports materializing all raw image refs for retry paths after vLLM multimodal cache misses
keeps run-scoped image asset refs file-backed so downstream Prime-RL trainer materializes images with its own processor

Companion PRs

Prime-RL: Support v1 raw multimodal image offload prime-rl#2836
Verifiers: [codex] Support raw image offload in v1 train client verifiers#1746

Notes

Draft/WIP: stacked with the Verifiers and Prime-RL raw image offload PRs.
Verifiers is expected to offload image content to file://.../assets/images/... refs before rendering.
This intentionally treats raw image refs as the supported path, not processed multimodal feature sidecars.

Validation

End-to-end hosted-style smoke through Prime-RL with /home/ubuntu/renderers, /home/ubuntu/verifiers, and /home/ubuntu/prime-rl-v1-raw-mm-offload completed inference, env rollouts, train batch creation, trainer step 0, and decoded strict trainer-bound raw image refs.

Note

Support raw image refs for multimodal rendering in Qwen and Qwen3-VL renderers

Reworks Qwen35Renderer and Qwen3VLRenderer to emit image descriptors (hash, image_grid_thw, placeholder counts) instead of pixel tensors, removing HF processor and per-instance image cache dependencies.
Adds materialize_image_refs to both renderers and RendererPool, converting descriptor-only image items into run-scoped image references at request time.
Introduces mm_store.py, a new module providing utilities for run-scoped image reference construction, on-disk asset offloading, and layout fingerprinting.
Updates generate() in client.py with a materialize_all_image_refs flag; when set, it materializes image refs before request dispatch and builds image-ref selectors for Qwen instead of base64-encoded tensor payloads.
Replaces image_cache_max with explicit image layout parameters (patch_size, merge_size, min/max_pixels, etc.) in Qwen renderer configs.
Risk: generate() raises NotImplementedError if materialize_all_image_refs=True and the renderer lacks materialize_image_refs; pixel tensors are no longer embedded in multi_modal_data.

^{Macroscope summarized 32d5a9d. (Automatic summaries will resume when PR exits draft mode or review begins).}

Support raw image refs for multimodal rendering

32d5a9d

This was referenced Jun 18, 2026

[codex] Support raw image offload in v1 train client PrimeIntellect-ai/verifiers#1746

Draft

Support v1 raw multimodal image offload PrimeIntellect-ai/prime-rl#2836

Draft

Emit generic raw multimodal refs

4bc1766

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Support raw image refs for multimodal rendering#89

[codex] Support raw image refs for multimodal rendering#89
eligotts wants to merge 2 commits into
mainfrom
codex/raw-image-assets-renderers

eligotts commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eligotts commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion PRs

Notes

Validation

Support raw image refs for multimodal rendering in Qwen and Qwen3-VL renderers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eligotts commented Jun 18, 2026 •

edited

Loading