Skip to content

[train][multimodal][1/3] Add vision support to generate() in new inference stack#1494

Merged
SumanthRH merged 2 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/train-vlm-generate
Apr 13, 2026
Merged

[train][multimodal][1/3] Add vision support to generate() in new inference stack#1494
SumanthRH merged 2 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/train-vlm-generate

Conversation

@nithinvc
Copy link
Copy Markdown
Contributor

@nithinvc nithinvc commented Apr 10, 2026

Summary

1/3 PRs for for #1493 - multi-turn VLM generator

Adds multimodal generation support to the inference client, enabling RemoteInferenceClient.generate() to forward multi-modal features (image hashes, placeholder ranges, kwargs) from vLLM's render endpoint through to the generation endpoint.

  • Thread mm_features through RemoteInferenceClient.generate() and _generate_single(), conditionally attaching them as "features" in the HTTP payload to the vLLM server
  • Add mm_processor_cache_gb=0 to vLLM CLI args to disable the multimodal processor cache. Required otherwise vLLM won't return multi-modal features for repeated image rendering (/render is not idempotent).
  • Add unit test verifying mm_features are forwarded in the HTTP payload via mock server
  • Add GPU integration test (test_generate_with_multimodal_features_red_square) that exercises the full render -> generate round-trip with a VLM

Test plan

  • Existing test_remote_inference_client.py tests pass: uv run pytest tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py -v
  • New TestMultiModalGeneration test passes: verifies mm_features reach the server payload
  • GPU integration test passes (requires local vLLM): SKYRL_LOCAL_VLLM=1 uv run --isolated --extra dev --extra fsdp pytest tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_vlm_inference_generation.py -m vllm -v

Open with Devin

gemini-code-assist[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@nithinvc nithinvc changed the title [train][multimodal][1/2] Add vision support to generate() in new inference stack [train][multimodal][1/3] Add vision support to generate() in new inference stack Apr 11, 2026
@SumanthRH
Copy link
Copy Markdown
Member

SKYRL_LOCAL_VLLM=1 uv run --isolated --extra dev --extra fsdp pytest tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_vlm_inference_generation.py -m vllm -v

What is SKYRL_LOCAL_VLLM @nithinvc? Tests should pass with vllm 0.19.0

@nithinvc
Copy link
Copy Markdown
Contributor Author

A temporary flag till vllm-project/vllm#38405 gets merged in. It's been approved but the auto-merger hasn't merged it in (there's some test timeout unrelated to my changes). The tests will likely only pass with vllm 0.20.0 when this is in

@SumanthRH
Copy link
Copy Markdown
Member

Ok sounds good. let's get this in anyways

@SumanthRH SumanthRH merged commit 66d401a into NovaSky-AI:main Apr 13, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants