[skyrl][tinker] Use VLLMRenderer in SkyRL train backend by nithinvc · Pull Request #1496 · NovaSky-AI/SkyRL

nithinvc · 2026-04-10T23:38:31Z

Summary

Integrates the VLLMRenderer (landed in #1464) into the SkyRL train backend so that VLM training batches include image placeholder tokens and decoded vision tensors (pixel_values, image_grid_thw).

When using new inference (_SKYRL_USE_NEW_INFERENCE), _to_training_batch lazily creates a VLLMRenderer and renders all ModelInputs through it.
Extracts pixel_values and image_grid_thw from rendered outputs and adds them to the TrainingInputBatch as TensorList entries (one tensor per batch element, since patch counts vary per image).
Extends _pad_batch to handle TensorList fields by cycling and cloning entries, matching the existing padding strategy for regular tensors.
Reorders forward_backward and forward to call _to_training_batch before _sleep_inference_engines, since the renderer needs the inference servers need to be initialized. Note that this does not wake the KV cache or model GPU memory since that is explicitly done in save_weights_for_sampler via the dispatcher.

E2E Tinker VLM Classifier Curves

With #1484 , we can now run tinker vision language recipes against the SkyRL. Merging both closes #1200

Example:

 _SKYRL_USE_NEW_INFERENCE=1 uv run --extra tinker --extra fsdp -m skyrl.tinker.api \
    --base-model "Qwen/Qwen3-VL-8B-Instruct" \
    --backend fsdp \
    --backend-config '{"trainer.placement.policy_num_gpus_per_node": 8, "generator.inference_engine.num_engines": 8, "trainer.placement.colocate_all": true, "trainer.use_sample_packing": false}'

Cookbook

 TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch python -m \
    tinker_cookbook.recipes.vlm_classifier.train  \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-VL-4B-Instruct" \
    dataset=caltech101

Train nll:

Val nll:

Val accuracy:

gemini-code-assist

Code Review

This pull request integrates VLLMRenderer and TensorList into the SkyRLTrainBackend to support multi-modal data processing. Key changes include updating _to_training_batch to utilize the new renderer, handling vision-related tensors (pixel_values, image_grid_thw), and adjusting the sequence of engine sleep calls. Feedback highlights a potential ValueError when mixing multi-modal and text-only inputs due to inconsistent batch sizes, concerns regarding the renderer's dependency on active inference engines after reordering sleep calls, and performance overhead from using asyncio.run in the training hot path.

gemini-code-assist · 2026-04-10T23:40:07Z

+            if self._renderer is None:
+                self._ensure_inference_engines()
+                self._renderer = VLLMRenderer(self._inference_engine_client, self._cfg.trainer.policy.model.path)
+            rendered_inputs = asyncio.run(self._renderer(prepared_batch.all_model_inputs))


Using asyncio.run inside _to_training_batch introduces significant overhead because it creates and tears down a new event loop for every call. Since _to_training_batch is called on every training step, this can impact performance.

Consider using a persistent event loop or an alternative approach to bridge the synchronous backend methods with the asynchronous renderer client, especially since this is a hot path during training.

This is good, and in general I think the entire backend should move to a single persistent event loop. However, the scope of this PR is only on correctness for now unless requested

add renderer to skyrl train backend

ecfe130

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

handle mixed batch inputs

dcc2cb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl][tinker] Use VLLMRenderer in SkyRL train backend#1496

[skyrl][tinker] Use VLLMRenderer in SkyRL train backend#1496
nithinvc wants to merge 2 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/tinker-apply-renderer

nithinvc commented Apr 10, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

nithinvc Apr 11, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nithinvc commented Apr 10, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

E2E Tinker VLM Classifier Curves

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

nithinvc Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nithinvc commented Apr 10, 2026 •

edited by devin-ai-integration bot

Loading