Skip to content

[perf] Default Wan VAE decode to bf16 (lossless, faster)#1472

Open
Mister-Raggs wants to merge 1 commit into
hao-ai-lab:mainfrom
Mister-Raggs:perf/wan-vae-bf16-decode
Open

[perf] Default Wan VAE decode to bf16 (lossless, faster)#1472
Mister-Raggs wants to merge 1 commit into
hao-ai-lab:mainfrom
Mister-Raggs:perf/wan-vae-bf16-decode

Conversation

@Mister-Raggs

@Mister-Raggs Mister-Raggs commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Wan's pipeline configs left vae_precision="fp32" — the inherited PipelineConfig default — even though the DiT already runs bf16 and the same AutoencoderKLWan runs bf16 in Cosmos-Predict2.5 and fp16 in Cosmos. fp32 VAE decode is the majority cost on bandwidth-limited devices and gives no quality benefit here; it was the inherited default, not a Wan-specific requirement.

This sets WanT2V480PConfig.vae_precision to bf16. Every Wan variant derives from it (T2V/I2V, 720P, Wan2.2, LucyEdit, SelfForcing) with no override, so the single change covers them all. The two legacy wan_*.json configs are updated for consistency.

Evidence

Measured by decoding one real Wan latent fp32 vs bf16 in the same process (no denoise non-determinism, no codec noise):

  • MS-SSIM(bf16, fp32) = 0.9999 on the identical latent — effectively lossless.
  • ~1.2–1.3× faster VAE decode → roughly 5–10% end-to-end on decode-bound few-step models; free (quality-wise) on full-step.

The random-latent control gave the same 0.9999, and the same VAE already ships in bf16 under Cosmos-Predict2.5, so the change is well-precedented.

Test plan

  • SSIM regression suite — the bf16 output is 0.9999 vs fp32, well above the regression threshold, so existing references should pass.

Wan's pipeline configs left vae_precision at the inherited PipelineConfig
default of fp32, while the DiT already runs bf16 and the *same* AutoencoderKLWan
runs bf16 in Cosmos-Predict2.5 and fp16 in Cosmos. Measured on a real Wan latent
(decoded fp32 vs bf16 in one process):
  - MS-SSIM(bf16, fp32) = 0.9999 on the identical latent (no quality cost)
  - ~1.2-1.3x faster VAE decode (~5-10% end-to-end on decode-bound few-step
    models; free quality-wise on full-step)

fp32 here was the inherited default, not a Wan-specific requirement; this aligns
Wan with its sibling configs. Also updates the two legacy wan_*.json configs.
Copilot AI review requested due to automatic review settings June 19, 2026 22:59
@mergify mergify Bot added type: perf Performance improvement scope: inference Inference pipeline, serving, CLI labels Jun 19, 2026
@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

  • #approved-reviews-by>=1
  • check-success=full-suite-passed
  • check-success~=pre-commit
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • check-success=fastcheck-passed
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the VAE precision configuration for Wan models from 'fp32' to 'bf16' in 'fastvideo/configs/pipelines/wan.py', 'fastvideo/configs/wan_1.3B_t2v_pipeline.json', and 'fastvideo/configs/wan_14B_i2v_480p_pipeline.json' to improve performance while remaining effectively lossless. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Wan pipeline configuration defaults so the Wan VAE runs in bf16 instead of inheriting the base PipelineConfig default of fp32, reducing VAE decode cost while maintaining effectively identical output quality.

Changes:

  • Set WanT2V480PConfig.vae_precision default to "bf16" (covers all Wan variants inheriting from it).
  • Update legacy Wan JSON pipeline configs to use "vae_precision": "bf16" for consistency.
  • Add an explanatory comment in wan.py documenting the rationale for bf16.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
fastvideo/configs/pipelines/wan.py Changes Wan base pipeline config default vae_precision to bf16 and documents the rationale.
fastvideo/configs/wan_14B_i2v_480p_pipeline.json Updates legacy I2V 480p JSON config to set vae_precision to bf16.
fastvideo/configs/wan_1.3B_t2v_pipeline.json Updates legacy T2V JSON config to set vae_precision to bf16.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +53
# bf16 VAE decode is effectively lossless (MS-SSIM 0.9999 vs fp32 on an
# identical latent) and faster; the same AutoencoderKLWan already runs bf16
# in Cosmos-Predict2.5 and fp16 in Cosmos. fp32 here was just the inherited
# PipelineConfig default, not a Wan-specific requirement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: inference Inference pipeline, serving, CLI type: perf Performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants