Added int4 config for Mixtral-8x7B weight compression (#255)

Evaluated word perplexity of Mixtral-8x7B on wikitext and found int4 config for weight compression that has a neglible increase (0.4) in the perplexity from original model. Mixtral 8x7B | word_ppl on wikitext -- | -- Torch CPU | 5.17 OV CPU | 5.17 sym_g128_r100 (default) | 5.98 sym_g128_r90 | 5.60 sym_g128_r80 (added) | 5.55
openvinotoolkit · Feb 28, 2024 · 987e213 · 987e213
1 parent 8470250
commit 987e213
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/llm_bench/python/utils/nncf_utils.py b/llm_bench/python/utils/nncf_utils.py
@@ -47,4 +47,5 @@ def get_compressed_path(output_dir: str, base_precision, option: str):
     "open-llama-3b": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
     "falcon-7b-instruct": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
     "orca-mini-3b": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
+    "mixtral-8x7b-v0.1": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 128, "ratio": 0.8},
 }