Skip to content

Commit

Permalink
Added int4 config for Mixtral-8x7B weight compression (#255)
Browse files Browse the repository at this point in the history
Evaluated word perplexity of Mixtral-8x7B on wikitext and found int4
config for weight compression that has a neglible increase (0.4) in the
perplexity from original model.
 
Mixtral 8x7B  | word_ppl on wikitext
-- | --
Torch CPU | 5.17
OV CPU | 5.17
sym_g128_r100 (default) | 5.98
sym_g128_r90 | 5.60
sym_g128_r80 (added) | 5.55
  • Loading branch information
ljaljushkin committed Feb 28, 2024
1 parent 8470250 commit 987e213
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions llm_bench/python/utils/nncf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,5 @@ def get_compressed_path(output_dir: str, base_precision, option: str):
"open-llama-3b": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
"falcon-7b-instruct": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
"orca-mini-3b": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 64, "all_layers": True},
"mixtral-8x7b-v0.1": {"mode": nncf.CompressWeightsMode.INT4_SYM, "group_size": 128, "ratio": 0.8},
}

0 comments on commit 987e213

Please sign in to comment.