-
Notifications
You must be signed in to change notification settings - Fork 4
feat: precision-driven quantization (FP16, RTN int4, static QDQ) via --precision flag #872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
b508d65
feat: add --precision fp16 to optimize, build, and export commands
DingmaomaoBJTU e82a099
refactor: integrate FP16 into quantize stage as post-processing
github-actions[bot] ef1926a
chore: remove spurious .data files
github-actions[bot] 6ee9e66
refactor: remove --precision from export/optimize, add fp16 to quantize
github-actions[bot] e974a75
feat(build): extend --precision to accept all quantization values
github-actions[bot] 118187f
fix: resolve CodeQL import warnings in fp16 module
github-actions[bot] c1aab3b
fix: resolve rebase conflicts with main
github-actions[bot] ef8c779
feat: warn when calibration options are ignored in FP16 mode
github-actions[bot] ae2521e
fix: skip task/model_name validation for fp16_only quant configs
github-actions[bot] ed2c591
fix: skip calibration validation for rtn and dynamic algorithms
github-actions[bot] fce06ff
feat: merge --rtn-bits into --precision (int4/w4a16 auto-selects RTN)
github-actions[bot] 9b0e8cc
fix: build pipeline RTN routing and MatMulNBitsQuantizer model extrac…
github-actions[bot] 32211ac
fix: resolve lint warnings (raw regex strings, unused variable)
github-actions[bot] 85a774f
fix: resolve mypy type errors and remove duplicate imports
github-actions[bot] 8c86403
fix: address code review findings
github-actions[bot] e14cd13
fix: address deep code review findings
github-actions[bot] 8b4dcb0
feat: support w4a32 precision (equivalent to int4) and w4a16 FP16 pos…
github-actions[bot] 011ad21
refactor: unify fp16/fp16_only into algorithm='fp16' + fp16_postprocess
github-actions[bot] a79c67c
refactor: replace fp16_postprocess with multi-pass pipeline
github-actions[bot] ddeb3be
fix: clean up intermediate pass files in multi-pass quantize stage
github-actions[bot] 7e17a71
refactor: move multi-pass precision logic into quantize_onnx
github-actions[bot] c4cb818
chore: remove duplicate is_submodule assignment in build config valid…
github-actions[bot] 7f93d93
refactor: extract warn_ignored_calibration_options to shared cli utils
github-actions[bot] 6bb1e64
refactor: move convert_to_fp16 from optim to quant module
github-actions[bot] 8ae24dc
chore: mark legacy mode field as deprecated in quant config
github-actions[bot] bb51338
refactor: unify mode and algorithm fields in WinMLQuantizationConfig
github-actions[bot] 0125ec6
refactor: split quantizer into dispatch pattern and consolidate quant…
github-actions[bot] df3d0a5
fix: type dispatch dict properly to satisfy mypy no-any-return
github-actions[bot] cd9fcd3
refactor: remove multi-pass w4a16 from quantize_onnx, simplify to sin…
github-actions[bot] b261de1
cleanup: remove remaining multi-pass references from build.py
github-actions[bot] eca92a6
cleanup: remove thin wrapper and adopt main's add_pre_process_metadat…
github-actions[bot] 15c151f
test: update e2e test — fp16 is now a valid precision for winml quantize
github-actions[bot] 16bf584
Add explanatory comment to empty except clause (CodeQL fix)
github-actions[bot] b303514
Remove dead 'algorithm' key compat from from_dict()
github-actions[bot] f98a24f
Fix FP16 detection to use config.quant.mode; remove _is_weight_only w…
github-actions[bot] 96cc6af
Fix test: w4a16 now raises ValueError (dead guard removed)
github-actions[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.