[feat] Add Kandinsky-5 pipeline support by aryan5v · Pull Request #1471 · hao-ai-lab/FastVideo

aryan5v · 2026-06-19T00:46:28Z

Summary

Adds first-class Kandinsky-5 Lite T2V support through the normal FastVideo model-support path:

Kandinsky-5 pipeline config and default preset
basic/kandinsky5 composed pipeline wiring
Kandinsky-specific latent prep, denoising, and latent decode stages
registry/default-preset wiring for kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers
loader fixes needed by Kandinsky's CLIP text encoder layout and text-encoder CPU offload
transformer parity fixes in the existing Kandinsky5 DiT implementation

Validation

pre-commit run --files <12 implementation files>: passed locally
uv run pytest tests/local_tests/kandinsky5/test_kandinsky5_lite_transformer_parity.py -q -s -rs: passed on B200 GPU, 1 passed, 14 warnings
CUDA_VISIBLE_DEVICES=0 uv run python tests/local_tests/kandinsky5/run_kandinsky5_lite_pipeline_smoke.py: passed, one-step latent smoke generated successfully
High-quality generation validation on B200 GPU 4: W&B run fuczhqid, 768x512, 121 frames, 80 inference steps, guidance scale 5.0, output outputs/kandinsky5_validation/kandinsky5_red_motorcycle_best_quality_512x768_121f_80s.mp4

Fixed 2 file(s) based on 2 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

mergify · 2026-06-19T00:47:06Z

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

mergify · 2026-06-19T00:47:08Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

#approved-reviews-by>=1
check-success=full-suite-passed
check-success~=pre-commit

This rule is failing.

#approved-reviews-by>=1
check-success=full-suite-passed
check-success~=pre-commit
check-success=fastcheck-passed
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

gemini-code-assist

Code Review

This pull request adds support for the Kandinsky-5.0 Lite text-to-video pipeline, introducing the necessary configurations, presets, pipeline stages (latent preparation, denoising, and decoding), and updates to the text encoding and model loading components. Feedback focuses on improving robustness and compatibility: preserving the original tensor dtype in _apply_rotary instead of hardcoding bfloat16, breaking early from the denoising loop on interruption, validating the shape of custom latents, verifying the lengths of text encoder precisions and max lengths, supporting asymmetric patch sizes for divisibility checks, and ensuring prompt_embeds contains at least two elements during input verification.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-19T00:48:25Z

 def _apply_rotary(x: torch.Tensor, rope: torch.Tensor) -> torch.Tensor:
-    orig_dtype = x.dtype
    x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(torch.float32)
    x_out = (rope * x_).sum(dim=-1)
-    return x_out.reshape(*x.shape).to(orig_dtype)
+    return x_out.reshape(*x.shape).to(torch.bfloat16)


Hardcoding torch.bfloat16 in _apply_rotary breaks compatibility when running the model in other precisions (such as float16 or float32). Additionally, since _apply_rotary is immediately followed by .type_as(query) in Kandinsky5Attention.forward, this hardcoded cast causes redundant casting and precision loss. Preserving the input tensor's original dtype is more generic and correct.

Suggested change

def _apply_rotary(x: torch.Tensor, rope: torch.Tensor) -> torch.Tensor:

orig_dtype = x.dtype

x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(torch.float32)

x_out = (rope * x_).sum(dim=-1)

return x_out.reshape(*x.shape).to(orig_dtype)

return x_out.reshape(*x.shape).to(torch.bfloat16)

def _apply_rotary(x: torch.Tensor, rope: torch.Tensor) -> torch.Tensor:

orig_dtype = x.dtype

x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(torch.float32)

x_out = (rope * x_).sum(dim=-1)

return x_out.reshape(*x.shape).to(orig_dtype)

gemini-code-assist · 2026-06-19T00:48:25Z

+                if hasattr(self, "interrupt") and self.interrupt:
+                    continue


Using continue when self.interrupt is set will still iterate through all remaining timesteps in the loop, wasting CPU cycles. Replacing it with break will immediately terminate the denoising loop, which is the expected behavior for an interruption.

Suggested change

if hasattr(self, "interrupt") and self.interrupt:

continue

if hasattr(self, "interrupt") and self.interrupt:

break

gemini-code-assist · 2026-06-19T00:48:25Z

+        else:
+            latents = batch.latents.to(device=device, dtype=dtype)


When custom or pre-computed latents are provided via batch.latents, it is important to validate that their shape matches the expected latent shape to prevent runtime shape mismatch errors later in the pipeline.

Suggested change

else:

latents = batch.latents.to(device=device, dtype=dtype)

else:

if list(batch.latents.shape) != list(shape):

raise ValueError(f"Provided latents shape {list(batch.latents.shape)} does not match expected shape {list(shape)}.")

latents = batch.latents.to(device=device, dtype=dtype)

gemini-code-assist · 2026-06-19T00:48:25Z

+    def __post_init__(self) -> None:
+        if len(self.text_encoder_configs) != 2:
+            raise ValueError(
+                f"Kandinsky5 pipeline requires exactly 2 text encoders (qwen and clip), "
+                f"but got {len(self.text_encoder_configs)} encoder(s)."
+            )


In addition to validating the number of text encoder configs, we should also validate that text_encoder_precisions and text_encoder_max_lengths have exactly 2 elements to prevent potential out-of-bounds index errors during loading or encoding.

def __post_init__(self) -> None: if len(self.text_encoder_configs) != 2: raise ValueError( f"Kandinsky5 pipeline requires exactly 2 text encoders (qwen and clip), " f"but got {len(self.text_encoder_configs)} encoder(s)." ) if len(self.text_encoder_precisions) != 2: raise ValueError( f"Kandinsky5 pipeline requires exactly 2 text encoder precisions, " f"but got {len(self.text_encoder_precisions)}." ) if len(self.text_encoder_max_lengths) != 2: raise ValueError( f"Kandinsky5 pipeline requires exactly 2 text encoder max lengths, " f"but got {len(self.text_encoder_max_lengths)}." )

gemini-code-assist · 2026-06-19T00:48:25Z

+        required_divisor = spatial_ratio * patch_size[1]
+        if height % required_divisor != 0 or width % required_divisor != 0:
+            raise ValueError(f"Kandinsky5 height/width must be divisible by {required_divisor}; "
+                             f"got height={height}, width={width}.")


Checking only patch_size[1] for both height and width divisibility assumes that height and width patch sizes are always identical. To support potential asymmetric patch sizes correctly, patch_size[1] should be used for height and patch_size[2] for width.

Suggested change

required_divisor = spatial_ratio * patch_size[1]

if height % required_divisor != 0 or width % required_divisor != 0:

raise ValueError(f"Kandinsky5 height/width must be divisible by {required_divisor}; "

f"got height={height}, width={width}.")

required_divisor_h = spatial_ratio * patch_size[1]

required_divisor_w = spatial_ratio * patch_size[2]

if height % required_divisor_h != 0 or width % required_divisor_w != 0:

raise ValueError(f"Kandinsky5 height must be divisible by {required_divisor_h} and width by {required_divisor_w}; "

f"got height={height}, width={width}.")

gemini-code-assist · 2026-06-19T00:48:25Z

+    def verify_input(self, batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> VerificationResult:
+        result = VerificationResult()
+        result.add_check("latents", batch.latents, [V.is_tensor, V.with_dims(5)])
+        result.add_check("prompt_embeds", batch.prompt_embeds, V.list_not_empty)
+        return result


Since Kandinsky5DenoisingStage.forward accesses batch.prompt_embeds[0] and batch.prompt_embeds[1], we should validate that batch.prompt_embeds contains at least 2 elements in verify_input to prevent unhandled IndexError crashes.

Suggested change

def verify_input(self, batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> VerificationResult:

result = VerificationResult()

result.add_check("latents", batch.latents, [V.is_tensor, V.with_dims(5)])

result.add_check("prompt_embeds", batch.prompt_embeds, V.list_not_empty)

return result

def verify_input(self, batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> VerificationResult:

result = VerificationResult()

result.add_check("latents", batch.latents, [V.is_tensor, V.with_dims(5)])

result.add_check("prompt_embeds", batch.prompt_embeds, [V.is_list, lambda x: len(x) >= 2])

return result

aryan5v and others added 2 commits June 18, 2026 15:38

[feat]: add kandinsky5 pipeline support

8bd8e6d

fix: apply CodeRabbit auto-fixes

02b48e1

Fixed 2 file(s) based on 2 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

mergify Bot added type: feat New feature or capability scope: inference Inference pipeline, serving, CLI scope: model Model architecture (DiTs, encoders, VAEs) labels Jun 19, 2026

mergify Bot added the needs-rebase PR has merge conflicts label Jun 19, 2026

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add Kandinsky-5 pipeline support#1471

[feat] Add Kandinsky-5 pipeline support#1471
aryan5v wants to merge 2 commits into
hao-ai-lab:mainfrom
aryan5v:aryan/kandinsky5-draft-pr

aryan5v commented Jun 19, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jun 19, 2026

Uh oh!

mergify Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryan5v commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

mergify Bot commented Jun 19, 2026

Uh oh!

mergify Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aryan5v commented Jun 19, 2026 •

edited

Loading

mergify Bot commented Jun 19, 2026 •

edited

Loading