forked from ggerganov/ggml
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ggml : 4-bit Integer quantisation + many llama.cpp improvements (gger…
…ganov#27) * gq : attempt at n-bit quantization * gq : add amax based method 3 * gq : progress on method 2 * gq : method 4 (AVX2) * gq : method 4 (ARM) * gq : method 4 (AVX2 attempt) + method 5 (no min) * gq : method 5 (ARM) * gpt-2 : model conversion for Q4_0 quantization * ggml : Q4_0 quantization support (ggml_get_rows()) * gpt-2 : loading Q4_0 quantized model * ggml : q4_0 quantization support * ggml : q4_1 quantization support (seems to work for bigger models) * gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models * ggml : 4-bit quantization works (only scalar for now) * gq : add method 6 (ARM) * ggml : vectorized mad q4_0 (ARM) * ggml : vectorized quantize_row_q4_0 (ARM) * ggml : simplify mad q4_0 (ARM) * ggml : minor indentations * gpt-j : support for 4-bit quantized model inference * ggml : GGML_ASSERT() instead of assert() where appropriate * gpt : avoid ggml_transpose on model tensors (new models!) * gpt-2 : minor * gpt-j : fix conversion for FP16 models (such as GPT-JT-6B) * ggml : add ggml_compute_forward_rope_f16() * gpt : fix memory usage computation * ggml : fix ggml_is_contiguous() to take into account blck size * whisper : add whisper-qunatize tool * whisper : add support for quantized models * whisper : mem usage based on model format type * gpt : seems not worth to use FP16 for KV cache * gpt : support quantisation of f16 models files * ggml : fixes for rpi4 * whisper : add Q4_1 model sizes * ggml : add WASM SIMD for Q4_0 * utils : print quantization histograms * ggml : sync all changes from llama.cpp and whisper.cpp * ggml : finalize the Q4_1 quantization for ARM_NEON
- Loading branch information
Showing
20 changed files
with
6,928 additions
and
1,306 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.