Skip to content

[feat]: make VSA tile cache configurable for training#1444

Open
SolitaryThinker wants to merge 1 commit into
hao-ai-lab:mainfrom
SolitaryThinker:feat/vsa-cache-tile-buf-configurable
Open

[feat]: make VSA tile cache configurable for training#1444
SolitaryThinker wants to merge 1 commit into
hao-ai-lab:mainfrom
SolitaryThinker:feat/vsa-cache-tile-buf-configurable

Conversation

@SolitaryThinker

Copy link
Copy Markdown
Collaborator

What

Make the VSA per-step tile-buffer cache (cache_tile_buf) configurable for training, instead of forcing it off.

Why

#1434 fixed the #1423 activation-checkpointing OOM by hard-coding cache_tile_buf=False at both VSA training build sites. The cached padded buffer was pinned on the per-step attn_metadata, so under full activation checkpointing it survived into the backward recompute and inflated peak memory.

But the cache is purely a per-step speed optimization (reuse the padded QKVG tile scratch across VSA layers). On memory-rich setups — FSDP/SP-sharded large clusters, or runs that don't use full activation checkpointing — the OOM doesn't bite, and keeping the cache is a free speedup. Hard-coding False takes that off the table.

This exposes it as a config knob, defaulting to False to preserve #1434's OOM-safe training behavior. Opt in with True to keep the buffer-reuse speedup.

Changes

  • New framework: TrainingConfig.vsa_cache_tile_buf (YAML training.vsa.cache_tile_buf), parsed in config.py, wired into WanModel._build_attention_metadata (wan.py). Inherited by cosmos/hunyuan/matrixgame2/wan_causal.
  • Legacy: TrainingArgs.VSA_cache_tile_buf (--VSA-cache-tile-buf), wired into TrainingPipeline._build_attention_metadata (training_pipeline.py). Inherited by distillation + self-forcing.
  • Inference unchanged — the LTX2 build still omits the kwarg → default True.
  • Test: added a value-equivalence test asserting the cached and non-cached tile() paths produce bit-identical tilings, on a padded shape (raw_latent_shape=(5,4,4) → padded T=8) that actually exercises the zero-pad-position scatter. This pins the property that the flag is a pure performance knob, never a correctness change — and addresses the two MINOR test-depth suggestions from the [bugfix]: release VSA tile cache during training #1434 review.
  • Docs: documented cache_tile_buf in examples/train/configs/example.yaml.

Stacked on #1434

This is stacked on #1434 (which introduces the cache_tile_buf plumbing — main has no such field yet). The net-new change here is one commit; the diff against main shows #1434's commit too until it merges. Once #1434 lands, I'll rebase and this collapses to just the configurability commit. Please review/merge #1434 first.

Test plan

  • ruff / yapf / codespell clean on all changed files.
  • New equivalence test logic verified standalone (cached == uncached; padding present at (5,4,4)).
  • Existing cache_tile_buf plumbing tests from [bugfix]: release VSA tile cache during training #1434 still pass (structure unchanged).

@mergify mergify Bot added type: feat New feature or capability scope: training Training pipeline, methods, configs scope: attention Attention backends (VSA, STA, Flash, etc.) scope: infra CI, tests, Docker, build labels Jun 9, 2026
@mergify

mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

  • check-success=fastcheck-passed
  • check-success=full-suite-passed
This rule is failing.
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • #approved-reviews-by>=1
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable option, cache_tile_buf, to control the reuse of the per-step padded Video Sparse Attention (VSA) tile buffer across attention layers. It defaults to False during training to prevent out-of-memory (OOM) issues under activation checkpointing, while remaining True by default for inference. The option is integrated into configuration files, CLI arguments, and metadata builders, and is supported by new unit tests. The review feedback suggests a minor improvement to simplify a redundant boolean expression in the configuration parsing logic.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

run_name=str(tr.get("run_name", "") or ""),
),
vsa_sparsity=float(vs.get("sparsity", 0.0) or 0.0),
vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False) or False),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expression bool(vs.get("cache_tile_buf", False) or False) contains a redundant or False check. Since bool(None) and bool(False) both evaluate to False, simply calling bool() on the retrieved value is sufficient and more readable.

Suggested change
vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False) or False),
vsa_cache_tile_buf=bool(vs.get("cache_tile_buf", False)),
References
  1. PEP 8 encourages writing simple and readable expressions. Redundant boolean operations like or False inside a bool() cast should be avoided to keep the code clean. (link)

@mergify

mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts label Jun 9, 2026
@SolitaryThinker SolitaryThinker requested a review from alexzms June 9, 2026 22:18
PR hao-ai-lab#1434 hard-coded cache_tile_buf=False at both VSA training build sites
to fix the hao-ai-lab#1423 activation-checkpointing OOM. That leaves memory-rich
clusters (FSDP/SP-sharded, or runs without full activation checkpointing)
unable to keep the per-step tile-buffer-reuse speedup.

Expose it as a config knob, defaulting to False to preserve hao-ai-lab#1434's
OOM-safe training behavior; opt in with True to keep the cache:
- new framework: TrainingConfig.vsa_cache_tile_buf (training.vsa.cache_tile_buf),
  wired into WanModel._build_attention_metadata.
- legacy: TrainingArgs.VSA_cache_tile_buf (--VSA-cache-tile-buf), wired into
  TrainingPipeline._build_attention_metadata (inherited by distillation /
  self-forcing).
- inference is untouched (default True).

Add a value-equivalence test (cached vs uncached tilings are bit-identical,
on a padded shape that exercises the zero-pad scatter) and document the
option in example.yaml.
@SolitaryThinker SolitaryThinker force-pushed the feat/vsa-cache-tile-buf-configurable branch from 327c5f7 to c9aeca7 Compare June 9, 2026 22:57
@mergify mergify Bot removed the needs-rebase PR has merge conflicts label Jun 9, 2026

@alexzms alexzms left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, clean and minimal. Verified the new equivalence test passes locally (cached vs uncached tile() bit-identical), so the flag stays a pure perf knob, and default false preserves the #1434 OOM-safe behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: attention Attention backends (VSA, STA, Flash, etc.) scope: infra CI, tests, Docker, build scope: training Training pipeline, methods, configs type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants