-
Notifications
You must be signed in to change notification settings - Fork 269
[CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm #3629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
ThomasNing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Except the above comments.
| BQuantGroupSize, | ||
| ck_tile::QuantType::ABQuantGrouped>(arg_parser); | ||
| }; | ||
| lut[hash_multiple_strings({"bf8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reduce the build time a little bit. We only need to put the fp8 case in the example.
| } | ||
| else if constexpr(QuantMode == ck_tile::QuantType::TensorQuant) | ||
| { | ||
| bq_tensor_ptr = std::make_unique<ck_tile::HostTensor<BQDataType>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I know why we need to delete this part? TensorQuant still need a 1,1 quant tensor.
| using BQuantGroupSize = remove_cvref_t<typename Problem::BQuantGroupSize>; | ||
|
|
||
| static_assert(QuantGroupSize::kM == 1, "only N/K blocks for BQuant preshuffle kernel!"); | ||
| static_assert(BQuantGroupSize::kM == 1, "only N/K blocks for BQuant preshuffle kernel!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is named as the BQuantGroupSize, why it has kM in here?
| using BQuantGroupSize = remove_cvref_t<typename Problem::BQuantGroupSize>; | ||
|
|
||
| static_assert(QuantGroupSize::kM == 1, "only N/K blocks for BQuant preshuffle kernel!"); | ||
| static_assert(BQuantGroupSize::kM == 1, "only N/K blocks for BQuant preshuffle kernel!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous file comment
| } | ||
| else | ||
| { | ||
| return nIter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could optimize the if else condition here.
| // Step 1: Create tensor view for AQ | ||
| const auto& aq_tensor_view = [&]() { | ||
| if constexpr(kQuantType == QuantType::AQuantGrouped && PreshuffleQuant) | ||
| if constexpr((kQuantType == QuantType::AQuantGrouped || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When calling the kernel, the AQuant could not do the preshuffle quant as it is the activation. So we could reduce the code of the support of AQuantGrouped + PreshuffleQuant
Proposed changes
Support for PreshuffleQuant in AB quant
Unified the group size definition for A and B quants as well.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered