Skip to content

[feat] Add RL training support#1411

Open
Davids048 wants to merge 22 commits into
mainfrom
py/add_rl
Open

[feat] Add RL training support#1411
Davids048 wants to merge 22 commits into
mainfrom
py/add_rl

Conversation

@Davids048

@Davids048 Davids048 commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Integration/visibility PR from py/add_rl into main so maintainers can inspect the full diff, CI impact, and merge implications of the Add RL work before merging the smaller split GenRL PRs.

Notes

  • Open PR, not draft.
  • Intended for review/impact visibility first; not a merge recommendation by itself.
  • The split PRs currently target py/add_rl; this PR shows the aggregate branch impact against main.

Test plan

  • CI will run on this PR.

@mergify mergify Bot added type: feat New feature or capability scope: training Training pipeline, methods, configs labels May 27, 2026
@mergify

mergify Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts label May 27, 2026
@mergify

mergify Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

  • #approved-reviews-by>=1
  • check-success=full-suite-passed
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=full-suite-passed
  • check-success=fastcheck-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces reinforcement learning (RL) training capabilities (Video GRPO / PPO) for diffusion models, specifically targeting the Wan 2.1 T2V 1.3B model. It adds the GenRLMethod training method, multiple reward functions (HPSv3, VideoAlign, and OCR), a custom text prompt dataset and sampler, and a callback to log sampled RL videos. The reviewer identified several critical and high-severity issues: a potential infinite hang in the dataloader due to process isolation with num_workers > 0, race conditions in reward scoring from concurrent thread execution on non-thread-safe PyTorch modules, a crash in statistics tracking when overwriting list history with a numpy array, a potential crash when prompts is None in advantage calculations, and a division-by-zero risk in the KL penalty calculation when std_dev_t is zero. Additionally, using tempfile.TemporaryDirectory was recommended for safer directory cleanup.

Comment thread fastvideo/train/methods/rl/utils/data.py
Comment thread fastvideo/train/methods/rl/genrl.py Outdated
Comment thread fastvideo/train/methods/rl/utils/stat_tracking.py Outdated
Comment thread fastvideo/train/callbacks/log_rl_samples.py
Comment thread fastvideo/train/methods/rl/utils/advantages.py
Comment thread fastvideo/train/methods/rl/utils/pipeline.py Outdated
@mergify mergify Bot added the scope: model Model architecture (DiTs, encoders, VAEs) label Jun 4, 2026
@mergify mergify Bot removed the needs-rebase PR has merge conflicts label Jun 5, 2026
Davids048 added 4 commits June 5, 2026 01:45
Adds vendored runtime code under fastvideo/train/methods/rl/reward/HPSv3 and fastvideo/train/methods/rl/reward/VideoAlign, replacing the broken gitlinks with normal tracked files.

The provenance and porting rules live in the vendor package markers: fastvideo/train/methods/rl/reward/HPSv3/__init__.py, fastvideo/train/methods/rl/reward/HPSv3/hpsv3/__init__.py, and fastvideo/train/methods/rl/reward/VideoAlign/__init__.py.

The FastVideo wrapper scripts hpsv3.py and videoalign.py depend on these vendored packages through explicit package imports. The copied runtime files are ported faithfully from upstream, with only import-path changes needed for package importability; this means some third-party code is unrelated to FastVideo internals but remains faithful to the source.

Known follow-ups: this vendor code does not currently pass pre-commit. VideoAlign also still assumes the checkpoint artifacts already exist under its checkpoints path or the configured VIDEOALIGN_CHECKPOINT_PATH; the missing/downloaded checkpoint resolution is not addressed in this commit.
Resolve the default VideoAlign checkpoint path by downloading the KlingTeam/VideoReward Hugging Face snapshot and passing the returned local snapshot directory into VideoVLMRewardInference. Explicit checkpoint_path values still work as local path overrides.

Make the VideoAlign FlashAttention fallback check for classic FlashAttention-2 metadata/API instead of only checking for a flash_attn namespace. FastVideo may have FlashAttention-4/CuTe installed, but Transformers' flash_attention_2 path requires classic FlashAttention-2. When classic FA2 is unavailable, warn and use SDPA for the VideoAlign reward model.
@mergify

mergify Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Pre-commit checks failed

Hi @Davids048, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@hao-ai-lab hao-ai-lab deleted a comment from mergify Bot Jun 8, 2026
@alexzms alexzms self-requested a review June 8, 2026 20:14
@mergify

mergify Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase PR has merge conflicts scope: model Model architecture (DiTs, encoders, VAEs) scope: training Training pipeline, methods, configs type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants