[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

duanshengliu · 2024-06-11T14:02:05Z

Describe the issue

I am using quantize_static for quantization, and I found that when per_channel=True and both the weight and activation types are INT16, the accuracy of the quantized model has a significant drop, but it is normal when per_channel=False.

To reproduce

The issue can be reproduced by using the relevant files in demo_1.zip. The reproduction commands and results are as follows,

per_channel=True, A16W16:
python run.py --per_channel --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

cosine similarity: 0.546422 ❌
mean absolute error: 1.7504646 ❌
per_channel=False, A16W16:
python run.py --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

cosine similarity: 0.99867153 ✔️
mean absolute error: 0.038529985 ✔️

In addition, I compared the case of A8W8, and the above issue did not occur.

Summary:

cosine similarity	A16W16	A8W8
per_channel=True	❌	✔️
per_channel=False	✔️	✔️

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

duanshengliu · 2024-06-22T04:01:03Z

@yihonglyu
According to your suggestion, I tried to set reduce_range=True, and the accuracy did improve greatly. The results are as follows :
cosine similarity: 0.9986828
mean absolute error: 0.03800502

So why does this work？

yihonglyu · 2024-06-23T17:07:48Z

When executing A16W16 per_channel, a warning is generated:

C:\Users\yilyu\AppData\Local\miniconda3\envs\21000\lib\site-packages\onnxruntime\quantization\base_quantizer.py:232: RuntimeWarning: invalid value encountered in cast

This warning suggests that the scale might be too small to fit into the int16 range. If the accuracy is within acceptable limits, it could be beneficial to avoid using per_channel."

duanshengliu · 2024-06-25T01:54:16Z

Thanks, I get it.

…precision loss (#21645) ### Description  When the scale of the bias is too small, the quantized bias may exceed the range of `int32`, leading to significant loss of precision. Therefore, before converting quantized bias to `int32`, it needs to be clipped within the range of `int32` to reduce the loss of quantization precision. ### Motivation and Context  Fix the issue #21000

github-actions bot added the quantization issues related to quantization label Jun 11, 2024

yufenglee assigned yihonglyu Jun 11, 2024

duanshengliu closed this as completed Jun 25, 2024

yihonglyu mentioned this issue Jun 25, 2024

Add warning for scale being too small to quantize bias #21155

Open

duanshengliu mentioned this issue Aug 6, 2024

Add overflow protection for quantization bias to reduce quantization precision loss #21645

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

duanshengliu commented Jun 11, 2024 •

edited

Loading

duanshengliu commented Jun 22, 2024

yihonglyu commented Jun 23, 2024

duanshengliu commented Jun 25, 2024

[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

Comments

duanshengliu commented Jun 11, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

duanshengliu commented Jun 22, 2024

yihonglyu commented Jun 23, 2024

duanshengliu commented Jun 25, 2024

duanshengliu commented Jun 11, 2024 •

edited

Loading