Skip to content

Conversation

@mhs4670go
Copy link
Contributor

@mhs4670go mhs4670go commented Jan 28, 2026

This commit introduces quant wrappers for qwen-vl.

Qwen3VLForConditionalGeneration(
  (model): Qwen3VLModel(
    (visual): Qwen3VLVisionModel(
      (patch_embed): Qwen3VLVisionPatchEmbed(
        (proj): Conv3d(3, 1024, kernel_size=(2, 16, 16), stride=(2, 16, 16))
      )
      (pos_embed): Embedding(2304, 1024)
      (rotary_pos_emb): Qwen3VLVisionRotaryEmbedding()
      (blocks): ModuleList(
        (0-23): 24 x Qwen3VLVisionBlock(
          (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (attn): Qwen3VLVisionAttention(
            (qkv): Linear(in_features=1024, out_features=3072, bias=True)
            (proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (mlp): Qwen3VLVisionMLP(
            (linear_fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (linear_fc2): Linear(in_features=4096, out_features=1024, bias=True)
            (act_fn): GELUTanh()
          )
        )
      )
      (merger): Qwen3VLVisionPatchMerger(
        (norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
        (linear_fc1): Linear(in_features=4096, out_features=4096, bias=True)
        (act_fn): GELU(approximate='none')
        (linear_fc2): Linear(in_features=4096, out_features=2560, bias=True)
      )
      (deepstack_merger_list): ModuleList(
        (0-2): 3 x Qwen3VLVisionPatchMerger(
          (norm): LayerNorm((4096,), eps=1e-06, elementwise_affine=True)
          (linear_fc1): Linear(in_features=4096, out_features=4096, bias=True)
          (act_fn): GELU(approximate='none')
          (linear_fc2): Linear(in_features=4096, out_features=2560, bias=True)
        )
      )
    )
    (language_model): Qwen3VLTextModel(
      (embed_tokens): Embedding(151936, 2560)
      (layers): ModuleList(
        (0-35): 36 x Qwen3VLTextDecoderLayer(
          (self_attn): Qwen3VLTextAttention(
            (q_proj): Linear(in_features=2560, out_features=4096, bias=False)
            (k_proj): Linear(in_features=2560, out_features=1024, bias=False)
            (v_proj): Linear(in_features=2560, out_features=1024, bias=False)
            (o_proj): Linear(in_features=4096, out_features=2560, bias=False)
            (q_norm): Qwen3VLTextRMSNorm((128,), eps=1e-06)
            (k_norm): Qwen3VLTextRMSNorm((128,), eps=1e-06)
          )
          (mlp): Qwen3VLTextMLP(
            (gate_proj): Linear(in_features=2560, out_features=9728, bias=False)
            (up_proj): Linear(in_features=2560, out_features=9728, bias=False)
            (down_proj): Linear(in_features=9728, out_features=2560, bias=False)
            (act_fn): SiLUActivation()
          )
          (input_layernorm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
          (post_attention_layernorm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
        )
      )
      (norm): Qwen3VLTextRMSNorm((2560,), eps=1e-06)
      (rotary_emb): Qwen3VLTextRotaryEmbedding()
    )
  )
  (lm_head): Linear(in_features=2560, out_features=151936, bias=False)
)

TICO-DCO-1.0-Signed-off-by: seongwoo [email protected]

import tico

save_path = pathlib.Path("qwen3vl_text_attn.q.circle")
B, S, D = 1, 4, text_cfg.hidden_size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note for me) This line decides sequence length

# -------------------------------------------------------------------------
# 1. Replace layer-0’s self-attention with QuantQwen3VLTextAttention
# -------------------------------------------------------------------------
orig_attn = model.model.language_model.layers[0].self_attn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the 1st (idx = 0) layer.

) # self.weight_obs.fake_quant(w) # type: ignore[assignment]

# 3) rms
rms = torch.ops.circle_custom.rms_norm(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.

I copied this from your draft and modified below.

  1. Change directories to the ops/ because it can be used as well in qwen wrapper.
  2. Call torch.ops.circle_custom.rms_norm directly to export CircleRMSNorm.

Do you have any concerns for these changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.

@mhs4670go
Yep. Sure. Thank you. I'll rebase onto it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed circle_custom.rms_norm to avoid preservation of torch.rms_norm. Thank you.

# 2) quantize weights
w = self.module.weight
if self._mode is Mode.QUANT:
if self.weight_obs is not None and w is not None:
Copy link
Contributor Author

@mhs4670go mhs4670go Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.weight_obs is not None and w is not None:

@stamalakhov Just curiosity, is there any case where w is None?

Copy link
Contributor

@stamalakhov stamalakhov Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.weight_obs is not None and w is not None:

@stamalakhov Just curiosity, is there any case where w is None?

@mhs4670go
No. There are no such cases. Right now weights of RMSNorm are always quantized. This was added for just in case. So i believe it can be removed.

This commit introduces quant wrappers for qwen-vl.

TICO-DCO-1.0-Signed-off-by: seongwoo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants