GGUF fp32/fp16 conversion to checkpoint #134

mergennachin · 2024-03-12T07:33:06Z

Summary:

Only works for fp32 and fp16 types so that means it isn't providing much value right now.
convert_hf_checkpoint.py can already directly generate an equivalent .pth checkpoint file without gguf format indirection.
However this PR just creates the foundation and validation that the basic fp32 and fp16 works fine. In the future, we will support running the quantized version of the gguf graph in eager.

Test Plan:

Setup
pip install gguf
git clone git@github.com:ggerganov/llama.cpp.git
python scripts/download.py --repo_id [HF-dir]
Preparation: convert existing hf model to fp16
python llama.cpp/convert.py [HF-dir] --outtype f16`` which will generate [HF-dir]/ggml-model-f16.gguf
Convert GGUF file to a checkpoint
python scripts/convert_from_gguf.py --gguf_file [HF-dir]/ggml-model-f16.gguf --checkpoint_file [HF-dir]/model_gguf.pth
Validate that it works:
python generate.py --checkpoint_path [HF-dir]/model_gguf.pth --device=cpu --prompt "Hello, my name is" --max_new_tokens 20

Summary: Only works for fp32 and fp16 types so that means, It isn't providing much value right now because `convert_hf_checkpoint.py` can directly generate an equivalent .pth checkpoint file without gguf format indirection. In the future, we will support running the quantized version of the graph. Test Plan: 0. Setup pip install gguf git@github.com:ggerganov/llama.cpp.git 1. Preparation: convert existing hf model to fp16 `python llama.cpp/convert.py [HF-dir] --outtype f16`` which will generate [HF-dir]/ggml-model-f16.gguf 2. Convert GGUF file to a checkpoint `python scripts/convert_from_gguf.py --gguf_file [HF-dir]/ggml-model-f16.gguf --checkpoint_file [HF-dir]/model_gguf.pth` 3. Validate that it works: `python generate.py --checkpoint_path [HF-dir]/model_gguf.pth --device=cpu --prompt "Hello, my name is" --max_new_tokens 20`

malfet · 2024-03-13T16:15:59Z

Why import GGUF when one can do decode in place using native PyTorch, see https://github.com/malfet/llm_experiments/blob/74a935344fbce5680dbd2dafc7dfd95231303444/run_llama.py#L447

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 12, 2024

mergennachin requested review from cpuhrsch, msaroufim, Chillee and malfet March 12, 2024 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF fp32/fp16 conversion to checkpoint #134

GGUF fp32/fp16 conversion to checkpoint #134

mergennachin commented Mar 12, 2024 •

edited

Loading

malfet commented Mar 13, 2024

GGUF fp32/fp16 conversion to checkpoint #134

Are you sure you want to change the base?

GGUF fp32/fp16 conversion to checkpoint #134

Conversation

mergennachin commented Mar 12, 2024 • edited Loading

malfet commented Mar 13, 2024

mergennachin commented Mar 12, 2024 •

edited

Loading