Implemented Compression Methods

Each compression method receives its own hyperparameters that are organized as a dictionary and basically stored in a JSON file that is deserialized when the training starts. Compression methods can be applied separately or together producing sparse, quantized, or both sparse and quantized models. For more information about the configuration, refer to the samples.

Quantization
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Exports to OpenVINO-supported FakeQuantize ONNX nodes
- Arbitrary bitwidth
- Mixed-bitwidth quantization
- Automatic bitwidth assignment based on HAWQ
- Automatic quantization parameter selection and activation quantizer setup based on HW config preset
- Automatic bitwidth assignment mode AutoQ, based on HAQ, a Deep Reinforcement Learning algorithm to select best mixed precision given quality metric and HW type.
Binarization
- XNOR, DoReFa weight binarization
- Scale/threshold based per-channel activation binarization
Sparsity
- Magnitude sparsity
- Regularization-based (RB) sparsity
Filter pruning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithms.md

Algorithms.md

Implemented Compression Methods

Files

Algorithms.md

Latest commit

History

Algorithms.md

File metadata and controls

Implemented Compression Methods