Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF (Breaking Change to Model Files) #633

Merged
merged 11 commits into from
Aug 25, 2023
Merged

GGUF (Breaking Change to Model Files) #633

merged 11 commits into from
Aug 25, 2023

Conversation

abetlen
Copy link
Owner

@abetlen abetlen commented Aug 24, 2023

GGUF support for llama.cpp Closes #628

Currently works to update your old ggml v3 llama models run

python3 vendor/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input <path-to-ggml> --output <path-to-gguf>

TODO

  • Fix tests
  • Move convert script into package to make it easier for people to migrate Add docs link to conversion script in llama.cpp
  • Fix detokenization bug (getting extra leading space)

@abetlen abetlen changed the title GGUF GGUF (Breaking Change) Aug 24, 2023
@abetlen abetlen changed the title GGUF (Breaking Change) GGUF (Breaking Change to Model Files) Aug 24, 2023
@rlancemartin rlancemartin mentioned this pull request Aug 25, 2023
@abetlen abetlen merged commit 915bbea into main Aug 25, 2023
15 checks passed
@sndani
Copy link

sndani commented Aug 26, 2023

Hello, excited about trying this out with the CodeLlama gguf model.

Followed the MacOS (Sonoma beta) instructions. How do I get the 'llama' shared library?

llama-cpp-python % python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1
Traceback (most recent call last):
....
File "... /llama-cpp-python/llama_cpp/llama_cpp.py", line 80, in
_lib = _load_shared_library(_lib_base_name)
File " ... /llama-cpp-python/llama_cpp/llama_cpp.py", line 71, in _load_shared_library
raise FileNotFoundError(
FileNotFoundError: Shared library with base name 'llama' not found

Thanks!

@abetlen
Copy link
Owner Author

abetlen commented Aug 26, 2023

@sndani try reinstalling with the --verbose flag, it's likely a build error and will be reported by cmake. If the issue isn't resolvable please open an issue and I'll take a look. Cheers

@sndani
Copy link

sndani commented Aug 26, 2023

@abetlen thanks for the great work and thanks for responding.
Yes, did run
% CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir --verbose

Turns out, cmake isn't getting to target libllama.so under vendor/llama.cpp (but 'make clean' try deleting it). This is a dev fix but will open an issue (or the next person who encounters this can) if this isn't only my environment for some reason.

% make clean
% cd vendor/llama.cpp
% LLAMA_METAL=on make libllama.so
% cd ../..
% pip install 'llama-cpp-python[server]'
% export LLAMA_CPP_LIB=./vendor/llama.cpp/libllama.so
% python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1

@reddiamond1234
Copy link

My model is now a lot slower... is there any solution to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GGUF Support
3 participants