Skip to content

[docs] Add GenRL asset preparation recipe#1456

Open
Abecid wants to merge 8 commits into
hao-ai-lab:py/add_rlfrom
Abecid:abecid/genrl-repro-assets
Open

[docs] Add GenRL asset preparation recipe#1456
Abecid wants to merge 8 commits into
hao-ai-lab:py/add_rlfrom
Abecid:abecid/genrl-repro-assets

Conversation

@Abecid

@Abecid Abecid commented Jun 12, 2026

Copy link
Copy Markdown

Purpose

Adds a reproducible GenRL HPSv3 + VideoAlign training recipe for Wan 2.1 T2V 1.3B on the modular fastvideo/train stack.

This PR makes the GenRL run reproducible without relying on a local Modal launcher or git submodules. Runtime assets are prepared through a public helper script, reward dependencies are pinned, and the vendored HPSv3/VideoAlign runtime code is aligned with the reward implementations used by the previous successful GenRL run.

Fixes: N/A

Changes

  • Added examples/train/prepare_genrl_assets.py to prepare:
    • GenRL filtered prompt JSONL files
    • KwaiVGI/VideoReward checkpoint under .cache/VideoReward
    • optional reward-model preflight via --check-rewards
  • Added examples/train/requirements-genrl.txt with the GenRL reward-stack dependency pins.
  • Moved the GenRL Wan config to examples/train/configs/rl/wan/genrl_hpsv3_videoalign.yaml.
  • Updated examples/train/README.md with a non-Modal reproduction path.
  • Vendored only the HPSv3 and VideoAlign runtime files needed by the GenRL reward wrappers.
  • Aligned vendored reward runtimes with the exact successful-run upstream revisions:
    • HPSv3: a2eb2ef2c7b5d91a566347a5825cf6d872122149
    • VideoAlign: aba26b658fec7d9fd30c295187b548ea673c8769
  • Added reward-head load checks so HPSv3/VideoAlign fail fast instead of scoring with randomly initialized reward heads.
  • Added prompt dataset validation for missing files, Git LFS pointer files, malformed JSONL, and non-object JSON entries.
  • Kept modal_train_genrl.py local-only and ignored; it is not part of this PR.

Reproduction / Training Command

Install the GenRL reward-stack pins after the editable FastVideo install:

pip install -r examples/train/requirements-genrl.txt

Prepare prompts and reward checkpoints:

python examples/train/prepare_genrl_assets.py \
  --prompt-dir .cache/genrl_filtered_prompts \
  --genrl-cache-dir .cache/GenRL \
  --videoalign-dir .cache/VideoReward \
  --check-rewards

Launch the 4xGPU GenRL HPSv3 + VideoAlign training run:

WANDB_MODE=online \
WANDB_ENTITY=<your-wandb-entity> \
NUM_GPUS=4 \
VIDEOALIGN_CHECKPOINT_PATH=.cache/VideoReward \
FORCE_QWENVL_VIDEO_READER=opencv \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
bash examples/train/run.sh \
  examples/train/configs/rl/wan/genrl_hpsv3_videoalign.yaml \
  --training.checkpoint.output_dir outputs/genrl_longcat

For the 41-step reproduction probe:

WANDB_MODE=online \
WANDB_ENTITY=<your-wandb-entity> \
NUM_GPUS=4 \
VIDEOALIGN_CHECKPOINT_PATH=.cache/VideoReward \
FORCE_QWENVL_VIDEO_READER=opencv \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
bash examples/train/run.sh \
  examples/train/configs/rl/wan/genrl_hpsv3_videoalign.yaml \
  --training.loop.max_train_steps 41 \
  --training.checkpoint.training_state_checkpointing_steps 20 \
  --training.checkpoint.output_dir outputs/genrl_longcat_repro_rewardfix_41

Verification

Reward parity was checked against the previous successful GenRL run source state:

  • FastVideo commit: 17aecbe2dd07245333a1c0ea85f89b2b7a4a1f88
  • HPSv3 submodule: a2eb2ef2c7b5d91a566347a5825cf6d872122149
  • VideoAlign submodule: aba26b658fec7d9fd30c295187b548ea673c8769
  • VideoReward checkpoint: checkpoint-11352

Fixed-video reward parity after syncing the vendored runtime:

hpsv3_general delta:    ~1e-6
hpsv3_percentile delta: ~1e-6
videoalign_mq delta:    small residual runtime drift, about -0.03 to +0.04
videoalign_ta delta:    small residual runtime drift, about -0.04 to +0.02

Training reproduction run after the reward-runtime fix showed reward curves recovering compared with the earlier bad run.

Test Plan

python -m py_compile \
  examples/train/prepare_genrl_assets.py \
  fastvideo/train/methods/rl/utils/data.py \
  fastvideo/train/methods/rl/reward/hpsv3.py \
  fastvideo/train/methods/rl/reward/videoalign.py

python examples/train/prepare_genrl_assets.py --help

python -c "import yaml; from pathlib import Path; p=Path('examples/train/configs/rl/wan/genrl_hpsv3_videoalign.yaml'); yaml.safe_load(p.read_text()); print(f'parsed {p}')"

Test Results

python -m py_compile ...  # passed
python examples/train/prepare_genrl_assets.py --help  # passed
YAML parse check  # passed

pre-commit was not available in the local shell or the fastvideo conda env, so I could not run the full hook set locally.

Checklist

  • I ran pre-commit run --all-files and fixed all issues
  • I added or updated tests / validation for my changes
  • I updated documentation if needed
  • I considered GPU memory impact of my changes

@mergify mergify Bot added type: docs Documentation only scope: training Training pipeline, methods, configs labels Jun 12, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helper script prepare_genrl_assets.py to download and validate GenRL prompts and VideoReward checkpoints, updates the corresponding training configuration with setup instructions, and adds robust file existence and Git LFS pointer checks to the dataset loader. The review feedback suggests enhancing the asset preparation script by handling Git LFS command failures, improving prompt validation to prevent crashes on malformed JSON, allowing checkpoint detection in the root directory, and resolving a repository ID inconsistency for the VideoReward model.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +61 to +78
prompt_count = 0
with path.open(encoding="utf-8") as f:
for line_no, raw_line in enumerate(f, start=1):
line = raw_line.strip()
if not line:
continue
if (
prompt_count == 0
and line_no == 1
and line.startswith("version https://git-lfs.github.com")
):
raise RuntimeError(
f"{path} is a Git LFS pointer, not real prompt JSON. "
"Install git-lfs and rerun this script."
)
item = json.loads(line)
if item.get("prompt"):
prompt_count += 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prompt validation logic is prone to crashing or bypassing the Git LFS check if there are leading empty lines in the JSONL file. Additionally, if a line contains malformed JSON or a non-dictionary JSON value, json.loads or item.get will raise unhandled exceptions (JSONDecodeError or AttributeError) and crash the script with a cryptic traceback.

Using a saw_content flag (similar to the dataset loader) and wrapping the JSON parsing in a try-except block with explicit type checks makes the validation much more robust and user-friendly.

    prompt_count = 0
    saw_content = False
    with path.open(encoding="utf-8") as f:
        for line_no, raw_line in enumerate(f, start=1):
            line = raw_line.strip()
            if not line:
                continue
            if not saw_content and line.startswith("version https://git-lfs.github.com"):
                raise RuntimeError(
                    f"{path} is a Git LFS pointer, not real prompt JSON. "
                    "Install git-lfs and rerun this script."
                )
            saw_content = True
            try:
                item = json.loads(line)
            except json.JSONDecodeError as e:
                raise RuntimeError(
                    f"Malformed JSON on line {line_no} in {path}: {e}"
                ) from e
            if not isinstance(item, dict):
                raise RuntimeError(
                    f"Expected a JSON object (dict) on line {line_no} in {path}, got {type(item).__name__}."
                )
            if item.get("prompt"):
                prompt_count += 1

Comment thread examples/train/prepare_genrl_assets.py Outdated
Comment on lines +33 to +44
if (genrl_dir / ".git").exists():
_run(["git", "lfs", "install"], cwd=genrl_dir)
_run(
[
"git",
"lfs",
"pull",
"-I",
"datasets/filtered_prompts/*",
],
cwd=genrl_dir,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If git-lfs is not installed or configured on the system, running git lfs install or git lfs pull will fail with a generic subprocess.CalledProcessError and a cryptic traceback.

Wrapping these calls in a try-except block allows us to catch the error and provide a clear, actionable message to the user.

Suggested change
if (genrl_dir / ".git").exists():
_run(["git", "lfs", "install"], cwd=genrl_dir)
_run(
[
"git",
"lfs",
"pull",
"-I",
"datasets/filtered_prompts/*",
],
cwd=genrl_dir,
)
if (genrl_dir / ".git").exists():
try:
_run(["git", "lfs", "install"], cwd=genrl_dir)
_run(
[
"git",
"lfs",
"pull",
"-I",
"datasets/filtered_prompts/*",
],
cwd=genrl_dir,
)
except subprocess.CalledProcessError as exc:
raise RuntimeError(
"Failed to pull Git LFS assets. Ensure that `git-lfs` is installed "
"on your system and available in your PATH."
) from exc

Comment on lines +113 to +125
def has_video_reward_checkpoint(root: Path) -> bool:
model_config = root / "model_config.json"
if not model_config.exists():
return False
for checkpoint in root.glob("checkpoint-*"):
if (checkpoint / "model.pth").exists():
return True
if (
(checkpoint / "adapter_model.safetensors").exists()
and (checkpoint / "non_lora_state_dict.pth").exists()
):
return True
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The has_video_reward_checkpoint function only checks for checkpoints nested inside a checkpoint-* subdirectory. If a user manually downloads or extracts the checkpoint directly into the root directory (so that model.pth or adapter_model.safetensors is in the root), it will not be detected.

Adding a fallback check for the root directory directly makes the checkpoint detection much more robust.

def has_video_reward_checkpoint(root: Path) -> bool:
    model_config = root / "model_config.json"
    if not model_config.exists():
        return False
    if (root / "model.pth").exists():
        return True
    if (
        (root / "adapter_model.safetensors").exists()
        and (root / "non_lora_state_dict.pth").exists()
    ):
        return True
    for checkpoint in root.glob("checkpoint-*"):
        if (checkpoint / "model.pth").exists():
            return True
        if (
            (checkpoint / "adapter_model.safetensors").exists()
            and (checkpoint / "non_lora_state_dict.pth").exists()
        ):
            return True
    return False

Comment thread examples/train/prepare_genrl_assets.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: training Training pipeline, methods, configs type: docs Documentation only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant