Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF fp32/fp16 conversion to checkpoint #134

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mergennachin
Copy link

@mergennachin mergennachin commented Mar 12, 2024

Summary:

Only works for fp32 and fp16 types so that means it isn't providing much value right now.
convert_hf_checkpoint.py can already directly generate an equivalent .pth checkpoint file without gguf format indirection.
However this PR just creates the foundation and validation that the basic fp32 and fp16 works fine. In the future, we will support running the quantized version of the gguf graph in eager.

Test Plan:

  1. Setup
    pip install gguf
    git clone git@github.com:ggerganov/llama.cpp.git
    python scripts/download.py --repo_id [HF-dir]
  2. Preparation: convert existing hf model to fp16
    python llama.cpp/convert.py [HF-dir] --outtype f16`` which will generate [HF-dir]/ggml-model-f16.gguf
  3. Convert GGUF file to a checkpoint
    python scripts/convert_from_gguf.py --gguf_file [HF-dir]/ggml-model-f16.gguf --checkpoint_file [HF-dir]/model_gguf.pth
  4. Validate that it works:
    python generate.py --checkpoint_path [HF-dir]/model_gguf.pth --device=cpu --prompt "Hello, my name is" --max_new_tokens 20

Summary:

Only works for fp32 and fp16 types so that means, It isn't providing much value right now because `convert_hf_checkpoint.py`
can directly generate an equivalent .pth checkpoint file without gguf format indirection.
In the future, we will support running the quantized version of the graph.

Test Plan:
0. Setup
  pip install gguf
  git@github.com:ggerganov/llama.cpp.git
1. Preparation: convert existing hf model to fp16
  `python llama.cpp/convert.py [HF-dir] --outtype f16``
   which will generate [HF-dir]/ggml-model-f16.gguf
2. Convert GGUF file to a checkpoint
   `python scripts/convert_from_gguf.py --gguf_file [HF-dir]/ggml-model-f16.gguf --checkpoint_file [HF-dir]/model_gguf.pth`
3. Validate that it works:
   `python generate.py --checkpoint_path [HF-dir]/model_gguf.pth --device=cpu --prompt "Hello, my name is" --max_new_tokens 20`
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 12, 2024
@malfet
Copy link
Contributor

malfet commented Mar 13, 2024

Why import GGUF when one can do decode in place using native PyTorch, see https://github.com/malfet/llm_experiments/blob/74a935344fbce5680dbd2dafc7dfd95231303444/run_llama.py#L447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants