[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput by nithinvc · Pull Request #1498 · NovaSky-AI/SkyRL

nithinvc · 2026-04-11T01:33:04Z

Summary

Final PR addressing #1493 . This addresses the trainer changes required for compatibility with the new generator.

Add VLM (vision-language model) training support by propagating pixel_values and image_grid_thw through the trainer pipeline
Extract vision inputs from generator output, wrap them in TensorList for variable-length handling, and pass them through to the training batch and forward pass
Update padding logic to handle TensorList types by cloning and concatenating entries instead of using tensor padding

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

devin-ai-integration

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

SumanthRH

LGTM

SumanthRH · 2026-04-14T22:02:01Z

@nithinvc I will get this PR in first given pending changes in #1486

CharlieFRuan · 2026-04-15T02:38:03Z

        data_save_dir.mkdir(parents=True, exist_ok=True)
        data.save(data_save_dir / f"{file_name}.pkl")

    def pad_batch(self, training_input: TrainingInputBatch) -> TrainingInputBatch:


@nithinvc Out of curiosity, did you run into this pad_batch() codepath? I have the impression that this is only needed for step-wise training in which the batch_size in TrainingInputBatch is not deterministic. Is that also the case for VLM training, or you are mostly supporting step-wise training for VLM training?

No not directly - this is for future step-wise support which seemed straightforward to add in this pr. Agree, the non-step-wise training skips this codepath

got it, thanks for clarifying!

… support - Moves `pad_batch` out of `RayPPOTrainer` into a module-level function in `training_batch.py` so that dispatch-level callers can share it. - Adds a `mode` kwarg: ``train_batch`` (callers own the full batch and want uids/trajectory_ids metadata extended with synthetic pad entries) vs ``mini_batch`` (callers pad a transient slice and must not touch parent metadata that would not correspond to the slice anyway). - Asserts the batch lives on CPU. Both real callers already stage on CPU, and padding allocates/concatenates — it's not something we want to do on the GPU hot path. - Allows ``pad_size > batch_size`` by cycling row 0 (regression: mini-batch padding can see ``mb_size=1, dp_size=4`` → ``pad_size=3``, and the old ``tensor[:pad_size]`` silently returned a shorter slice). - Handles ``TensorList`` fields (``pixel_values``, ``image_grid_thw``) via cyclic cloning, matching the VLM path introduced in #1498. - Adds ``is_last_step`` to the ``TrainingInput`` TypedDict (it's already used everywhere; this makes the schema match reality). - Field-exhaustive unit tests mirror ``test_generator_output_concatenation``: they enumerate ``TrainingInput.__annotations__`` and fail loudly if a new field is added without updating ``pad_batch()``. Also covers the pad_size > batch_size edge case, CPU-only assertion, both modes, and input-immutability.

nithinvc added 2 commits April 10, 2026 18:26

trainer changes

a32ece2

select vision inputs

941a19e

nithinvc marked this pull request as ready for review April 11, 2026 01:50

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Apr 11, 2026

View reviewed changes

asset and padding

a7e0593

devin-ai-integration bot reviewed Apr 11, 2026

View reviewed changes

Comment thread skyrl/train/trainer.py

SumanthRH approved these changes Apr 13, 2026

View reviewed changes

nithinvc mentioned this pull request Apr 14, 2026

[train][multimodal][3/3] Add multi-turn VLM generator #1486

Open

2 tasks

SumanthRH merged commit 02be377 into NovaSky-AI:main Apr 14, 2026
5 of 7 checks passed

CharlieFRuan reviewed Apr 15, 2026

View reviewed changes

CharlieFRuan mentioned this pull request Apr 16, 2026

[train] Extract pad_batch() to training_batch.py #1523

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput#1498

[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput#1498
SumanthRH merged 3 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/train-vlm-trainer

nithinvc commented Apr 11, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

SumanthRH left a comment

Uh oh!

SumanthRH commented Apr 14, 2026

Uh oh!

Uh oh!

CharlieFRuan Apr 15, 2026 •

edited

Loading

Uh oh!

nithinvc Apr 15, 2026

Uh oh!

CharlieFRuan Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nithinvc commented Apr 11, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Apr 14, 2026

Uh oh!

Uh oh!

CharlieFRuan Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nithinvc Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nithinvc commented Apr 11, 2026 •

edited by devin-ai-integration bot

Loading

CharlieFRuan Apr 15, 2026 •

edited

Loading