Conversation
|
This PR has merge conflicts with the base branch. Please rebase: git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 PR merge requirementsWaiting for
This rule is failing.
|
There was a problem hiding this comment.
Code Review
This pull request introduces reinforcement learning (RL) training capabilities (Video GRPO / PPO) for diffusion models, specifically targeting the Wan 2.1 T2V 1.3B model. It adds the GenRLMethod training method, multiple reward functions (HPSv3, VideoAlign, and OCR), a custom text prompt dataset and sampler, and a callback to log sampled RL videos. The reviewer identified several critical and high-severity issues: a potential infinite hang in the dataloader due to process isolation with num_workers > 0, race conditions in reward scoring from concurrent thread execution on non-thread-safe PyTorch modules, a crash in statistics tracking when overwriting list history with a numpy array, a potential crash when prompts is None in advantage calculations, and a division-by-zero risk in the KL penalty calculation when std_dev_t is zero. Additionally, using tempfile.TemporaryDirectory was recommended for safer directory cleanup.
Co-authored-by: Davids048 <jundasu@ucsd.edu>
Co-authored-by: Davids048 <jundasu@ucsd.edu>
Co-authored-by: Davids048 <jundasu@ucsd.edu>
Adds vendored runtime code under fastvideo/train/methods/rl/reward/HPSv3 and fastvideo/train/methods/rl/reward/VideoAlign, replacing the broken gitlinks with normal tracked files. The provenance and porting rules live in the vendor package markers: fastvideo/train/methods/rl/reward/HPSv3/__init__.py, fastvideo/train/methods/rl/reward/HPSv3/hpsv3/__init__.py, and fastvideo/train/methods/rl/reward/VideoAlign/__init__.py. The FastVideo wrapper scripts hpsv3.py and videoalign.py depend on these vendored packages through explicit package imports. The copied runtime files are ported faithfully from upstream, with only import-path changes needed for package importability; this means some third-party code is unrelated to FastVideo internals but remains faithful to the source. Known follow-ups: this vendor code does not currently pass pre-commit. VideoAlign also still assumes the checkpoint artifacts already exist under its checkpoints path or the configured VIDEOALIGN_CHECKPOINT_PATH; the missing/downloaded checkpoint resolution is not addressed in this commit.
Resolve the default VideoAlign checkpoint path by downloading the KlingTeam/VideoReward Hugging Face snapshot and passing the returned local snapshot directory into VideoVLMRewardInference. Explicit checkpoint_path values still work as local path overrides. Make the VideoAlign FlashAttention fallback check for classic FlashAttention-2 metadata/API instead of only checking for a flash_attn namespace. FastVideo may have FlashAttention-4/CuTe installed, but Transformers' flash_attention_2 path requires classic FlashAttention-2. When classic FA2 is unavailable, warn and use SDPA for the VideoAlign reward model.
Pre-commit checks failedHi @Davids048, the pre-commit checks have failed. To fix them locally: # Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install
# Run all checks and auto-fix what's possible
pre-commit run --all-filesCommon fixes:
After fixing, commit and push the changes. The checks will re-run automatically. For future commits, |
|
This PR has merge conflicts with the base branch. Please rebase: git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease |
Summary
Integration/visibility PR from
py/add_rlintomainso maintainers can inspect the full diff, CI impact, and merge implications of the Add RL work before merging the smaller split GenRL PRs.Notes
py/add_rl; this PR shows the aggregate branch impact againstmain.Test plan