Skip to content

[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput#1498

Merged
SumanthRH merged 3 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/train-vlm-trainer
Apr 14, 2026
Merged

[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput#1498
SumanthRH merged 3 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/train-vlm-trainer

Conversation

@nithinvc
Copy link
Copy Markdown
Contributor

@nithinvc nithinvc commented Apr 11, 2026

Summary

Final PR addressing #1493 . This addresses the trainer changes required for compatibility with the new generator.

  • Add VLM (vision-language model) training support by propagating pixel_values and image_grid_thw through the trainer pipeline
  • Extract vision inputs from generator output, wrap them in TensorList for variable-length handling, and pass them through to the training batch and forward pass
  • Update padding logic to handle TensorList types by cloning and concatenating entries instead of using tensor padding

Open with Devin

@nithinvc nithinvc marked this pull request as ready for review April 11, 2026 01:50
gemini-code-assist[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread skyrl/train/trainer.py
Copy link
Copy Markdown
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SumanthRH
Copy link
Copy Markdown
Member

@nithinvc I will get this PR in first given pending changes in #1486

@SumanthRH SumanthRH merged commit 02be377 into NovaSky-AI:main Apr 14, 2026
5 of 7 checks passed
Comment thread skyrl/train/trainer.py
data_save_dir.mkdir(parents=True, exist_ok=True)
data.save(data_save_dir / f"{file_name}.pkl")

def pad_batch(self, training_input: TrainingInputBatch) -> TrainingInputBatch:
Copy link
Copy Markdown
Member

@CharlieFRuan CharlieFRuan Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nithinvc Out of curiosity, did you run into this pad_batch() codepath? I have the impression that this is only needed for step-wise training in which the batch_size in TrainingInputBatch is not deterministic. Is that also the case for VLM training, or you are mostly supporting step-wise training for VLM training?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not directly - this is for future step-wise support which seemed straightforward to add in this pr. Agree, the non-step-wise training skips this codepath

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for clarifying!

CharlieFRuan added a commit that referenced this pull request Apr 16, 2026
… support

- Moves `pad_batch` out of `RayPPOTrainer` into a module-level function in
  `training_batch.py` so that dispatch-level callers can share it.
- Adds a `mode` kwarg: ``train_batch`` (callers own the full batch and want
  uids/trajectory_ids metadata extended with synthetic pad entries) vs
  ``mini_batch`` (callers pad a transient slice and must not touch parent
  metadata that would not correspond to the slice anyway).
- Asserts the batch lives on CPU. Both real callers already stage on CPU,
  and padding allocates/concatenates — it's not something we want to do
  on the GPU hot path.
- Allows ``pad_size > batch_size`` by cycling row 0 (regression: mini-batch
  padding can see ``mb_size=1, dp_size=4`` → ``pad_size=3``, and the old
  ``tensor[:pad_size]`` silently returned a shorter slice).
- Handles ``TensorList`` fields (``pixel_values``, ``image_grid_thw``) via
  cyclic cloning, matching the VLM path introduced in #1498.
- Adds ``is_last_step`` to the ``TrainingInput`` TypedDict (it's already
  used everywhere; this makes the schema match reality).
- Field-exhaustive unit tests mirror ``test_generator_output_concatenation``:
  they enumerate ``TrainingInput.__annotations__`` and fail loudly if a new
  field is added without updating ``pad_batch()``. Also covers the
  pad_size > batch_size edge case, CPU-only assertion, both modes, and
  input-immutability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants