Skip to content

Tags: huggingface/optimum-quanto

Tags

v0.2.2

Toggle v0.2.2's commit message
chore: release 0.2.2

v0.2.1

Toggle v0.2.1's commit message
release: 0.2.1

v0.2.0

Toggle v0.2.0's commit message
release: 0.2.0

New:
- requantize helper by @calmitchell617,
- StableDiffusion example by @thliang01,
- improved linear backward path,
- AWQ int4 kernels.

v0.1.0

Toggle v0.1.0's commit message
release: 0.1.0

- group-wise quantization,
- safe serialization.

v0.0.13

Toggle v0.0.13's commit message
release: 0.0.13

- new `QConv2d` quantized module,
- official support for `float8` weights.

- fix `QbitsTensor.to()` that was not moving the inner tensors,
- prevent shallow `QTensor` copies when loading weights that do not move
  inner tensors.

v0.0.12

Toggle v0.0.12's commit message
release: 0.0.12

0.0.11

Toggle 0.0.11's commit message
chore: bump version

0.0.10

Toggle 0.0.10's commit message
release: 0.0.10

New features:

- calibration streamline option to remove spurious quantize/dequantize,
- calibration debug mode.

0.0.9

Toggle 0.0.9's commit message
release: 0.0.9

New features:

- quantize weights and activations parameters
- float8 activations

0.0.8

Toggle 0.0.8's commit message
release: 0.0.8

New features:

- weight-only quantization,
- integer matmul acceleration on CUDA.

Bug fixes:

- actually use float16 weights,
- avoid float16 overflows,
- correct device placement,
- robust serialization.