-
Notifications
You must be signed in to change notification settings - Fork 22
[DRAFT] Introduce quant wrappers for qwen-vl #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| import tico | ||
|
|
||
| save_path = pathlib.Path("qwen3vl_text_attn.q.circle") | ||
| B, S, D = 1, 4, text_cfg.hidden_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note for me) This line decides sequence length
| # ------------------------------------------------------------------------- | ||
| # 1. Replace layer-0’s self-attention with QuantQwen3VLTextAttention | ||
| # ------------------------------------------------------------------------- | ||
| orig_attn = model.model.language_model.layers[0].self_attn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the 1st (idx = 0) layer.
| ) # self.weight_obs.fake_quant(w) # type: ignore[assignment] | ||
|
|
||
| # 3) rms | ||
| rms = torch.ops.circle_custom.rms_norm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.
I copied this from your draft and modified below.
- Change directories to the
ops/because it can be used as well in qwen wrapper. - Call
torch.ops.circle_custom.rms_normdirectly to exportCircleRMSNorm.
Do you have any concerns for these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stamalakhov Could I post a PR for this quant_rmsnorm.py to the main branch? I'll add test codes and apply some naming convention according to this.
@mhs4670go
Yep. Sure. Thank you. I'll rebase onto it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed circle_custom.rms_norm to avoid preservation of torch.rms_norm. Thank you.
| # 2) quantize weights | ||
| w = self.module.weight | ||
| if self._mode is Mode.QUANT: | ||
| if self.weight_obs is not None and w is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.weight_obs is not None and w is not None:@stamalakhov Just curiosity, is there any case where w is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.weight_obs is not None and w is not None:@stamalakhov Just curiosity, is there any case where
wis None?
@mhs4670go
No. There are no such cases. Right now weights of RMSNorm are always quantized. This was added for just in case. So i believe it can be removed.
This commit introduces quant wrappers for qwen-vl. TICO-DCO-1.0-Signed-off-by: seongwoo <[email protected]>
This commit introduces quant wrappers for qwen-vl.
TICO-DCO-1.0-Signed-off-by: seongwoo [email protected]