[DRAFT] Introduce quant wrappers for qwen-vl #449

mhs4670go · 2026-01-28T08:21:16Z

This commit introduces quant wrappers for qwen-vl.

Qwen3VLForConditionalGeneration(
  (model): Qwen3VLModel(
    (visual): Qwen3VLVisionModel(
      (patch_embed): Qwen3VLVisionPatchEmbed(
        (proj): Conv3d(3, 1024, kernel_size=(2, 16, 16), stride=(2, 16, 16))
      )
      (pos_embed): Embedding(2304, 1024)
      (rotary_pos_emb): Qwen3VLVisionRotaryEmbedding()
      (blocks): ModuleList(
        (0-23): 24 x Qwen3VLVisionBlock(
          (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (attn): Qwen3VLVisionAttention(
            (qkv): Linear(in_features=1024, out_features=3072, bias=True)
            (proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (mlp): Qwen3VLVisionMLP(
            (linear_fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (linear_fc2): Linear(in_features=4096, out_features=1024, bias=True)
            (act_fn): GELUTanh()
          )
        )
      )
      (merger): Qwen3VLVisionPatchMerger(
        (norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
        (linear_fc1): Linear(in_features=4096, out_features=4096, bias=True)
        (act_fn): GELU(approximate='none')
        (linear_fc2): Linear(in_features=4096, out_features=2560, bias=True)
      )
      (deepstack_merger_list): ModuleList(
        (0-2): 3 x Qwen3VLVisionPatchMerger(
          (norm): LayerNorm((4096,), eps=1e-06, elementwise_affine=True)
          (linear_fc1): Linear(in_features=4096, out_features=4096, bias=True)
          (act_fn): GELU(approximate='none')
          (linear_fc2): Linear(in_features=4096, out_features=2560, bias=True)
        )
      )
    )
    (language_model): Qwen3VLTextModel(
      (embed_tokens): Embedding(151936, 2560)
      (layers): ModuleList(
        (0-35): 36 x Qwen3VLTextDecoderLayer(
          (self_attn): Qwen3VLTextAttention(
            (q_proj): Linear(in_features=2560, out_features=4096, bias=False)
            (k_proj): Linear(in_features=2560, out_features=1024, bias=False)
            (v_proj): Linear(in_features=2560, out_features=1024, bias=False)
            (o_proj): Linear(in_features=4096, out_features=2560, bias=False)
            (q_norm): Qwen3VLTextRMSNorm((128,), eps=1e-06)
            (k_norm): Qwen3VLTextRMSNorm((128,), eps=1e-06)
          )
          (mlp): Qwen3VLTextMLP(
            (gate_proj): Linear(in_features=2560, out_features=9728, bias=False)
            (up_proj): Linear(in_features=2560, out_features=9728, bias=False)
            (down_proj): Linear(in_features=9728, out_features=2560, bias=False)
            (act_fn): SiLUActivation()
          )
          (input_layernorm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
          (post_attention_layernorm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
        )
      )
      (norm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
      (rotary_emb): Qwen3VLTextRotaryEmbedding()
    )
  )
  (lm_head): Linear(in_features=2560, out_features=151936, bias=False)
)

TICO-DCO-1.0-Signed-off-by: seongwoo [email protected]

parjong · 2026-01-28T09:27:06Z

tico/quantization/wrapq/examples/quantize_qwen_text_attn.py

+import tico
+
+save_path = pathlib.Path("qwen3vl_text_attn.q.circle")
+B, S, D = 1, 4, text_cfg.hidden_size


(Note for me) This line decides sequence length

parjong · 2026-01-28T09:27:39Z

tico/quantization/wrapq/examples/quantize_qwen_text_attn.py

+# -------------------------------------------------------------------------
+# 1. Replace layer-0’s self-attention with QuantQwen3VLTextAttention
+# -------------------------------------------------------------------------
+orig_attn = model.model.language_model.layers[0].self_attn


Use the 1st (idx = 0) layer.

mhs4670go · 2026-01-28T10:01:17Z

tico/quantization/wrapq/wrappers/ops/quant_rmsnorm.py

+                )  # self.weight_obs.fake_quant(w)  # type: ignore[assignment]
+
+        # 3) rms
+        rms = torch.ops.circle_custom.rms_norm(


@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.

I copied this from your draft and modified below.

Change directories to the ops/ because it can be used as well in qwen wrapper.

Call torch.ops.circle_custom.rms_norm directly to export CircleRMSNorm.

Do you have any concerns for these changes?

@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.

@mhs4670go
Yep. Sure. Thank you. I'll rebase onto it.

I missed circle_custom.rms_norm to avoid preservation of torch.rms_norm. Thank you.

mhs4670go · 2026-01-29T03:31:08Z

tico/quantization/wrapq/wrappers/ops/quant_rmsnorm.py

+        # 2) quantize weights
+        w = self.module.weight
+        if self._mode is Mode.QUANT:
+            if self.weight_obs is not None and w is not None:


if self.weight_obs is not None and w is not None:

@stamalakhov Just curiosity, is there any case where w is None?

if self.weight_obs is not None and w is not None:

@stamalakhov Just curiosity, is there any case where w is None?

@mhs4670go
No. There are no such cases. Right now weights of RMSNorm are always quantized. This was added for just in case. So i believe it can be removed.

This commit introduces quant wrappers for qwen-vl. TICO-DCO-1.0-Signed-off-by: seongwoo <[email protected]>

mhs4670go added the DRAFT label Jan 28, 2026

mhs4670go force-pushed the qw branch from 781c20f to 0c09b3c Compare January 28, 2026 08:26

parjong reviewed Jan 28, 2026

View reviewed changes

mhs4670go commented Jan 28, 2026

View reviewed changes

mhs4670go commented Jan 29, 2026

View reviewed changes

mhs4670go force-pushed the qw branch from 0c09b3c to 1887305 Compare January 29, 2026 03:37

[DRAFT] Introduce quant wrappers for qwen-vl

0de8b22

This commit introduces quant wrappers for qwen-vl. TICO-DCO-1.0-Signed-off-by: seongwoo <[email protected]>

mhs4670go force-pushed the qw branch from 1887305 to 0de8b22 Compare January 29, 2026 03:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Introduce quant wrappers for qwen-vl #449

[DRAFT] Introduce quant wrappers for qwen-vl #449

Uh oh!

mhs4670go commented Jan 28, 2026 •

edited

Loading

Uh oh!

parjong Jan 28, 2026

Uh oh!

parjong Jan 28, 2026

Uh oh!

mhs4670go Jan 28, 2026

Uh oh!

stamalakhov Jan 28, 2026

Uh oh!

stamalakhov Jan 28, 2026

Uh oh!

mhs4670go Jan 29, 2026 •

edited

Loading

Uh oh!

stamalakhov Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DRAFT] Introduce quant wrappers for qwen-vl #449

Are you sure you want to change the base?

[DRAFT] Introduce quant wrappers for qwen-vl #449

Uh oh!

Conversation

mhs4670go commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parjong Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

parjong Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

stamalakhov Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

stamalakhov Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stamalakhov Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mhs4670go commented Jan 28, 2026 •

edited

Loading

mhs4670go Jan 29, 2026 •

edited

Loading

stamalakhov Jan 29, 2026 •

edited

Loading