[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput#1498
Merged
SumanthRH merged 3 commits intoNovaSky-AI:mainfrom Apr 14, 2026
Merged
Conversation
2 tasks
Member
| data_save_dir.mkdir(parents=True, exist_ok=True) | ||
| data.save(data_save_dir / f"{file_name}.pkl") | ||
|
|
||
| def pad_batch(self, training_input: TrainingInputBatch) -> TrainingInputBatch: |
Member
There was a problem hiding this comment.
@nithinvc Out of curiosity, did you run into this pad_batch() codepath? I have the impression that this is only needed for step-wise training in which the batch_size in TrainingInputBatch is not deterministic. Is that also the case for VLM training, or you are mostly supporting step-wise training for VLM training?
Contributor
Author
There was a problem hiding this comment.
No not directly - this is for future step-wise support which seemed straightforward to add in this pr. Agree, the non-step-wise training skips this codepath
Member
There was a problem hiding this comment.
got it, thanks for clarifying!
CharlieFRuan
added a commit
that referenced
this pull request
Apr 16, 2026
… support - Moves `pad_batch` out of `RayPPOTrainer` into a module-level function in `training_batch.py` so that dispatch-level callers can share it. - Adds a `mode` kwarg: ``train_batch`` (callers own the full batch and want uids/trajectory_ids metadata extended with synthetic pad entries) vs ``mini_batch`` (callers pad a transient slice and must not touch parent metadata that would not correspond to the slice anyway). - Asserts the batch lives on CPU. Both real callers already stage on CPU, and padding allocates/concatenates — it's not something we want to do on the GPU hot path. - Allows ``pad_size > batch_size`` by cycling row 0 (regression: mini-batch padding can see ``mb_size=1, dp_size=4`` → ``pad_size=3``, and the old ``tensor[:pad_size]`` silently returned a shorter slice). - Handles ``TensorList`` fields (``pixel_values``, ``image_grid_thw``) via cyclic cloning, matching the VLM path introduced in #1498. - Adds ``is_last_step`` to the ``TrainingInput`` TypedDict (it's already used everywhere; this makes the schema match reality). - Field-exhaustive unit tests mirror ``test_generator_output_concatenation``: they enumerate ``TrainingInput.__annotations__`` and fail loudly if a new field is added without updating ``pad_batch()``. Also covers the pad_size > batch_size edge case, CPU-only assertion, both modes, and input-immutability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Final PR addressing #1493 . This addresses the trainer changes required for compatibility with the new generator.
pixel_valuesandimage_grid_thwthrough the trainer pipelineTensorListfor variable-length handling, and pass them through to the training batch and forward passTensorListtypes by cloning and concatenating entries instead of using tensor padding