You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using quantize_static for quantization, and I found that when per_channel=True and both the weight and activation types are INT16, the accuracy of the quantized model has a significant drop, but it is normal when per_channel=False.
To reproduce
The issue can be reproduced by using the relevant files in demo_1.zip. The reproduction commands and results are as follows,
@yihonglyu
According to your suggestion, I tried to set reduce_range=True, and the accuracy did improve greatly. The results are as follows :
cosine similarity: 0.9986828
mean absolute error: 0.03800502
When executing A16W16 per_channel, a warning is generated:
C:\Users\yilyu\AppData\Local\miniconda3\envs\21000\lib\site-packages\onnxruntime\quantization\base_quantizer.py:232: RuntimeWarning: invalid value encountered in cast
This warning suggests that the scale might be too small to fit into the int16 range. If the accuracy is within acceptable limits, it could be beneficial to avoid using per_channel."
…precision loss (#21645)
### Description
<!-- Describe your changes. -->
When the scale of the bias is too small, the quantized bias may exceed
the range of `int32`, leading to significant loss of precision.
Therefore, before converting quantized bias to `int32`, it needs to be
clipped within the range of `int32` to reduce the loss of quantization
precision.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the issue #21000
Describe the issue
I am using
quantize_static
for quantization, and I found that whenper_channel=True
and both theweight
andactivation
types areINT16
, the accuracy of the quantized model has a significant drop, but it is normal whenper_channel=False
.To reproduce
The issue can be reproduced by using the relevant files in demo_1.zip. The reproduction commands and results are as follows,
per_channel=True, A16W16:
python run.py --per_channel --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/
cosine similarity:
0.546422
❌mean absolute error:
1.7504646
❌per_channel=False, A16W16:
python run.py --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/
cosine similarity:
0.99867153
✔️mean absolute error:
0.038529985
✔️In addition, I compared the case of
A8W8
, and the above issue did not occur.Summary:
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: