Skip to content

v2.12.0

Compare
Choose a tag to compare
@KodiaqQ KodiaqQ released this 31 Jul 12:28
· 2158 commits to develop since this release

Post-training Quantization:

Features:

  • (OpenVINO, PyTorch, ONNX) Excluded comparison operators from the quantization scope for nncf.ModelType.TRANSFORMER.
  • (OpenVINO, PyTorch) Changed the representation of symmetrically quantized weights from an unsigned integer with a fixed zero-point to a signed data type without a zero-point in the nncf.compress_weights() method.
  • (OpenVINO) Extended patterns support of the AWQ algorithm as part of nncf.compress_weights(). This allows apply AWQ for the wider scope of the models.
  • (OpenVINO) Introduced nncf.CompressWeightsMode.E2M1 mode option of nncf.compress_weights() as the new MXFP4 precision (Experimental).
  • (OpenVINO) Added support for models with BF16 precision in the nncf.quantize() method.
  • (PyTorch) Added quantization support for the torch.addmm.
  • (PyTorch) Added quantization support for the torch.nn.functional.scaled_dot_product_attention.

Fixes:

  • (OpenVINO, PyTorch, ONNX) Fixed Fast-/BiasCorrection algorithms with correct support of transposed MatMul layers.
  • (OpenVINO) Fixed nncf.IgnoredScope() functionality for models with If operation.
  • (OpenVINO) Fixed patterns with PReLU operations.
  • Fixed runtime error while importing NNCF without Matplotlib package.

Improvements:

  • Reduced the amount of memory required for applying nncf.compress_weights() to OpenVINO models.
  • Improved logging in case of the not empty nncf.IgnoredScope().

Tutorials:

Compression-aware training:

Fixes:

  • (PyTorch) Fixed issue with wrapping for operator without patched state.

Requirements:

  • Updated Tensorflow (2.15) version. This version requires Python 3.9-3.11.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@Lars-Codes