[feat] Add RL training support by Davids048 · Pull Request #1411 · hao-ai-lab/FastVideo

Davids048 · 2026-05-27T17:09:36Z

Summary

Integration/visibility PR from py/add_rl into main so maintainers can inspect the full diff, CI impact, and merge implications of the Add RL work before merging the smaller split GenRL PRs.

Notes

Open PR, not draft.
Intended for review/impact visibility first; not a merge recommendation by itself.
The split PRs currently target py/add_rl; this PR shows the aggregate branch impact against main.

Test plan

CI will run on this PR.

mergify · 2026-05-27T17:10:28Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

mergify · 2026-05-27T17:10:33Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

#approved-reviews-by>=1
check-success=full-suite-passed

This rule is failing.

#approved-reviews-by>=1
check-success=full-suite-passed
check-success=fastcheck-passed
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

gemini-code-assist

Code Review

This pull request introduces reinforcement learning (RL) training capabilities (Video GRPO / PPO) for diffusion models, specifically targeting the Wan 2.1 T2V 1.3B model. It adds the GenRLMethod training method, multiple reward functions (HPSv3, VideoAlign, and OCR), a custom text prompt dataset and sampler, and a callback to log sampled RL videos. The reviewer identified several critical and high-severity issues: a potential infinite hang in the dataloader due to process isolation with num_workers > 0, race conditions in reward scoring from concurrent thread execution on non-thread-safe PyTorch modules, a crash in statistics tracking when overwriting list history with a numpy array, a potential crash when prompts is None in advantage calculations, and a division-by-zero risk in the KL penalty calculation when std_dev_t is zero. Additionally, using tempfile.TemporaryDirectory was recommended for safer directory cleanup.

Co-authored-by: Davids048 <jundasu@ucsd.edu>

Adds vendored runtime code under fastvideo/train/methods/rl/reward/HPSv3 and fastvideo/train/methods/rl/reward/VideoAlign, replacing the broken gitlinks with normal tracked files. The provenance and porting rules live in the vendor package markers: fastvideo/train/methods/rl/reward/HPSv3/__init__.py, fastvideo/train/methods/rl/reward/HPSv3/hpsv3/__init__.py, and fastvideo/train/methods/rl/reward/VideoAlign/__init__.py. The FastVideo wrapper scripts hpsv3.py and videoalign.py depend on these vendored packages through explicit package imports. The copied runtime files are ported faithfully from upstream, with only import-path changes needed for package importability; this means some third-party code is unrelated to FastVideo internals but remains faithful to the source. Known follow-ups: this vendor code does not currently pass pre-commit. VideoAlign also still assumes the checkpoint artifacts already exist under its checkpoints path or the configured VIDEOALIGN_CHECKPOINT_PATH; the missing/downloaded checkpoint resolution is not addressed in this commit.

Resolve the default VideoAlign checkpoint path by downloading the KlingTeam/VideoReward Hugging Face snapshot and passing the returned local snapshot directory into VideoVLMRewardInference. Explicit checkpoint_path values still work as local path overrides. Make the VideoAlign FlashAttention fallback check for classic FlashAttention-2 metadata/API instead of only checking for a flash_attn namespace. FastVideo may have FlashAttention-4/CuTe installed, but Transformers' flash_attention_2 path requires classic FlashAttention-2. When classic FA2 is unavailable, warn and use SDPA for the VideoAlign reward model.

mergify · 2026-06-05T09:01:44Z

Pre-commit checks failed

Hi @Davids048, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

yapf: yapf -i <file> (formatting)
ruff: ruff check --fix <file> (linting)
codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

mergify · 2026-06-12T04:20:29Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

mergify Bot added type: feat New feature or capability scope: training Training pipeline, methods, configs labels May 27, 2026

mergify Bot added the needs-rebase PR has merge conflicts label May 27, 2026

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

mergify Bot added the scope: model Model architecture (DiTs, encoders, VAEs) label Jun 4, 2026

jzhang38 and others added 12 commits June 5, 2026 01:07

first edit

c52de7e

improve ema

9f2fc2f

gen RL runing

ef83574

time profiling and log sampled videos

8b0ba24

sampled video looks correct

6e66321

mv to utils

5ae633a

[feat] GenRL: stabilize reward model compatibility (#1400)

2c19cbd

Co-authored-by: Davids048 <jundasu@ucsd.edu>

[feat] GenRL: keep repeated prompt samples on one rank (#1401)

cd69575

[feat] GenRL: add runtime and memory stability helpers (#1402)

e1be068

Co-authored-by: Davids048 <jundasu@ucsd.edu>

[feat] GenRL: fix PPO loop cadence and diagnostics (#1403)

1fbe5a8

Co-authored-by: Davids048 <jundasu@ucsd.edu>

[feat] GenRL: add Wan LoRA adapter support (#1404)

73e0fa7

[feat] GenRL: add explicit HPSv3 VideoAlign recipes (#1405)

9c2fa5a

Davids048 force-pushed the py/add_rl branch from ecaf0a5 to 9c2fa5a Compare June 5, 2026 01:15

mergify Bot removed the needs-rebase PR has merge conflicts label Jun 5, 2026

Davids048 added 4 commits June 5, 2026 01:45

apply pre-commit formatting

298845c

patch formatting based on ruff hints

d858270

Davids048 added 4 commits June 6, 2026 20:58

fix spelling and markdown pre-commit issues

606b42b

apply yapf formatting to reward runtimes

83bb602

handle yapf and ruff issues

2afa837

fix mypy issues in RL integration files

0b9c325

fix mypy issues for vendored reward runtimes

b2c7553

hao-ai-lab deleted a comment from mergify Bot Jun 8, 2026

alexzms self-requested a review June 8, 2026 20:14

document vendored reward runtimes in agent memory

a520ece

mergify Bot added the needs-rebase PR has merge conflicts label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add RL training support#1411

[feat] Add RL training support#1411
Davids048 wants to merge 22 commits into
mainfrom
py/add_rl

Davids048 commented May 27, 2026 •

edited

Loading

Uh oh!

mergify Bot commented May 27, 2026

Uh oh!

mergify Bot commented May 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 5, 2026

Uh oh!

mergify Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Davids048 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Test plan

Uh oh!

mergify Bot commented May 27, 2026

Uh oh!

mergify Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 5, 2026

Pre-commit checks failed

Uh oh!

mergify Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Davids048 commented May 27, 2026 •

edited

Loading

mergify Bot commented May 27, 2026 •

edited

Loading