[feat]: make VSA tile cache configurable for training by SolitaryThinker · Pull Request #1444 · hao-ai-lab/FastVideo

SolitaryThinker · 2026-06-09T21:42:42Z

What

Make the VSA per-step tile-buffer cache (cache_tile_buf) configurable for training, instead of forcing it off.

Why

#1434 fixed the #1423 activation-checkpointing OOM by hard-coding cache_tile_buf=False at both VSA training build sites. The cached padded buffer was pinned on the per-step attn_metadata, so under full activation checkpointing it survived into the backward recompute and inflated peak memory.

But the cache is purely a per-step speed optimization (reuse the padded QKVG tile scratch across VSA layers). On memory-rich setups — FSDP/SP-sharded large clusters, or runs that don't use full activation checkpointing — the OOM doesn't bite, and keeping the cache is a free speedup. Hard-coding False takes that off the table.

This exposes it as a config knob, defaulting to False to preserve #1434's OOM-safe training behavior. Opt in with True to keep the buffer-reuse speedup.

Changes

New framework: TrainingConfig.vsa_cache_tile_buf (YAML training.vsa.cache_tile_buf), parsed in config.py, wired into WanModel._build_attention_metadata (wan.py). Inherited by cosmos/hunyuan/matrixgame2/wan_causal.
Legacy: TrainingArgs.VSA_cache_tile_buf (--VSA-cache-tile-buf), wired into TrainingPipeline._build_attention_metadata (training_pipeline.py). Inherited by distillation + self-forcing.
Inference unchanged — the LTX2 build still omits the kwarg → default True.
Test: added a value-equivalence test asserting the cached and non-cached tile() paths produce bit-identical tilings, on a padded shape (raw_latent_shape=(5,4,4) → padded T=8) that actually exercises the zero-pad-position scatter. This pins the property that the flag is a pure performance knob, never a correctness change — and addresses the two MINOR test-depth suggestions from the [bugfix]: release VSA tile cache during training #1434 review.
Docs: documented cache_tile_buf in examples/train/configs/example.yaml.

Stacked on #1434

This is stacked on #1434 (which introduces the cache_tile_buf plumbing — main has no such field yet). The net-new change here is one commit; the diff against main shows #1434's commit too until it merges. Once #1434 lands, I'll rebase and this collapses to just the configurability commit. Please review/merge #1434 first.

Test plan

ruff / yapf / codespell clean on all changed files.
New equivalence test logic verified standalone (cached == uncached; padding present at (5,4,4)).
Existing cache_tile_buf plumbing tests from [bugfix]: release VSA tile cache during training #1434 still pass (structure unchanged).

mergify · 2026-06-09T21:43:22Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

check-success=fastcheck-passed
check-success=full-suite-passed

This rule is failing.

check-success=fastcheck-passed
check-success=full-suite-passed
#approved-reviews-by>=1
check-success~=pre-commit
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

gemini-code-assist

Code Review

This pull request introduces a configurable option, cache_tile_buf, to control the reuse of the per-step padded Video Sparse Attention (VSA) tile buffer across attention layers. It defaults to False during training to prevent out-of-memory (OOM) issues under activation checkpointing, while remaining True by default for inference. The option is integrated into configuration files, CLI arguments, and metadata builders, and is supported by new unit tests. The review feedback suggests a minor improvement to simplify a redundant boolean expression in the configuration parsing logic.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-09T21:44:12Z

            run_name=str(tr.get("run_name", "") or ""),
        ),
        vsa_sparsity=float(vs.get("sparsity", 0.0) or 0.0),
+        vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False) or False),


The expression bool(vs.get("cache_tile_buf", False) or False) contains a redundant or False check. Since bool(None) and bool(False) both evaluate to False, simply calling bool() on the retrieved value is sufficient and more readable.

Suggested change

vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False) or False),

vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False)),

References

PEP 8 encourages writing simple and readable expressions. Redundant boolean operations like or False inside a bool() cast should be avoided to keep the code clean. ^(link)

mergify · 2026-06-09T22:03:32Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

PR hao-ai-lab#1434 hard-coded cache_tile_buf=False at both VSA training build sites to fix the hao-ai-lab#1423 activation-checkpointing OOM. That leaves memory-rich clusters (FSDP/SP-sharded, or runs without full activation checkpointing) unable to keep the per-step tile-buffer-reuse speedup. Expose it as a config knob, defaulting to False to preserve hao-ai-lab#1434's OOM-safe training behavior; opt in with True to keep the cache: - new framework: TrainingConfig.vsa_cache_tile_buf (training.vsa.cache_tile_buf), wired into WanModel._build_attention_metadata. - legacy: TrainingArgs.VSA_cache_tile_buf (--VSA-cache-tile-buf), wired into TrainingPipeline._build_attention_metadata (inherited by distillation / self-forcing). - inference is untouched (default True). Add a value-equivalence test (cached vs uncached tilings are bit-identical, on a padded shape that exercises the zero-pad scatter) and document the option in example.yaml.

alexzms

LGTM, clean and minimal. Verified the new equivalence test passes locally (cached vs uncached tile() bit-identical), so the flag stays a pure perf knob, and default false preserves the #1434 OOM-safe behavior.

mergify Bot added type: feat New feature or capability scope: training Training pipeline, methods, configs scope: attention Attention backends (VSA, STA, Flash, etc.) scope: infra CI, tests, Docker, build labels Jun 9, 2026

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

mergify Bot added the needs-rebase PR has merge conflicts label Jun 9, 2026

SolitaryThinker requested a review from alexzms June 9, 2026 22:18

SolitaryThinker force-pushed the feat/vsa-cache-tile-buf-configurable branch from 327c5f7 to c9aeca7 Compare June 9, 2026 22:57

mergify Bot removed the needs-rebase PR has merge conflicts label Jun 9, 2026

alexzms approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]: make VSA tile cache configurable for training#1444

[feat]: make VSA tile cache configurable for training#1444
SolitaryThinker wants to merge 1 commit into
hao-ai-lab:mainfrom
SolitaryThinker:feat/vsa-cache-tile-buf-configurable

SolitaryThinker commented Jun 9, 2026

Uh oh!

mergify Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Uh oh!

mergify Bot commented Jun 9, 2026

Uh oh!

alexzms left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False) or False),
	vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False)),

Conversation

SolitaryThinker commented Jun 9, 2026

What

Why

Changes

Stacked on #1434

Test plan

Uh oh!

mergify Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 9, 2026

Uh oh!

alexzms left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergify Bot commented Jun 9, 2026 •

edited

Loading

alexzms left a comment •

edited

Loading