Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized ONNX Model Still Has Float32 Input/Output Tensors #21138

Open
jenchun-potentialmotors opened this issue Jun 21, 2024 · 4 comments
Open
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot

Comments

@jenchun-potentialmotors
Copy link

jenchun-potentialmotors commented Jun 21, 2024

Describe the issue

After quantization, the output ONNX model had faster inference speed and smaller model size, but why are the input and output tensors still float32?
I thought it should be uint8 since the output ONNX file is around one fourth of the original size. Also, I tried onnxruntime 1.12.0, 1.13.1, and 1.18.0 and all the results are the same that input and output tensors are all float32.

image

To reproduce

onnxruntime: 1.14.1
torch: 2.3.0
torchvision: 0.18.0

Follow the official example and the results can be reproduced.
https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/notebooks/imagenet_v2/mobilenet.ipynb

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the quantization issues related to quantization label Jun 21, 2024
@hoangtv2000
Copy link

This is the QDQ representation of ONNX model. In order to perform integer-arithmetic only, you have to quantize your model to QOperator representation. For mor detail, follow https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html.
I'm curious what is the method you used to quantized your model? Is it Post-quantize training in ONNXRuntime?
If yes, you just have to change quant_format from QDQ to QOperator

def quantize_static(
    model_input: Union[str, Path, onnx.ModelProto],
    model_output: Union[str, Path],
    calibration_data_reader: CalibrationDataReader,
    quant_format=QuantFormat.QDQ, # Change this to QuantFormat.QOperator
    op_types_to_quantize=None,
    per_channel=False,
    reduce_range=False,
    activation_type=QuantType.QInt8,
    weight_type=QuantType.QInt8,
    nodes_to_quantize=None,
    nodes_to_exclude=None,
    use_external_data_format=False,
    calibrate_method=CalibrationMethod.MinMax,
    extra_options=None,
):
...

@jenchun-potentialmotors
Copy link
Author

@hoangtv2000 Thank you for the comment!
The following are the details for my quantization strategy

  • Post-training quantization: Yes
  • Method selection: Static
  • Representation format: QDQ
  • Data type selection: Activations: uint8, Weights: int8 (U8S8)

I tried your suggestion to perform quantization using QOperator. However, the quantized model's input and output remain float32.
I also tried different data types such as U8U8, S8S8, and U8S8, but the results were almost identical.
Although the quantized model with float32 input/output runs 2-3x as fast as the non-quantized model, I still do not understand why the quantized output is not in int8 format.

Do you have any idea regarding this?

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jul 25, 2024
@ChickenSellerRED
Copy link

I'm facing the same problem, have you solved that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants