Name		Name	Last commit message	Last commit date
parent directory ..
ggml		ggml
README.md		README.md
api.h		api.h
example_add_quant.py		example_add_quant.py
example_test_all_quants.py		example_test_all_quants.py
regenerate.py		regenerate.py
stubs.py		stubs.py
test_tensor.py		test_tensor.py

README.md

Simple autogenerated Python bindings for ggml

This folder contains:

Scripts to generate full Python bindings from ggml headers (+ stubs for autocompletion in IDEs)
Some barebones utils (see ggml/utils.py):
- ggml.utils.init builds a context that's freed automatically when the pointer gets GC'd
- ggml.utils.copy copies between same-shaped tensors (numpy or ggml), w/ automatic (de/re)quantization
- ggml.utils.numpy returns a numpy view over a ggml tensor; if it's quantized, it returns a copy (requires allow_copy=True)
Very basic examples (anyone wants to port llama2.c?)

Provided you set GGML_LIBRARY=.../path/to/libggml_shared.so (see instructions below), it's trivial to do some operations on quantized tensors:

# Make sure libllama.so is in your [DY]LD_LIBRARY_PATH, or set GGML_LIBRARY=.../libggml_shared.so

from ggml import lib, ffi
from ggml.utils import init, copy, numpy
import numpy as np

ctx = init(mem_size=12*1024*1024)
n = 256
n_threads = 4

a = lib.ggml_new_tensor_1d(ctx, lib.GGML_TYPE_Q5_K, n)
b = lib.ggml_new_tensor_1d(ctx, lib.GGML_TYPE_F32, n) # Can't both be quantized
sum = lib.ggml_add(ctx, a, b) # all zeroes for now. Will be quantized too!

gf = ffi.new('struct ggml_cgraph*')
lib.ggml_build_forward_expand(gf, sum)

copy(np.array([i for i in range(n)], np.float32), a)
copy(np.array([i*100 for i in range(n)], np.float32), b)

lib.ggml_graph_compute_with_ctx(ctx, gf, n_threads)

print(numpy(a, allow_copy=True))
#  0.    1.0439453   2.0878906   3.131836    4.1757812   5.2197266. ...
print(numpy(b))
#  0.  100.        200.        300.        400.        500.         ...
print(numpy(sum, allow_copy=True))
#  0.  105.4375    210.875     316.3125    421.75      527.1875     ...

Prerequisites

You'll need a shared library of ggml to use the bindings.

Build libggml_shared.so or libllama.so

As of this writing the best is to use ggerganov/llama.cpp's generated libggml_shared.so or libllama.so, which you can build as follows:

git clone https://github.com/ggerganov/llama.cpp
# On a CUDA-enabled system add -DLLAMA_CUDA=1
# On a Mac add -DLLAMA_METAL=1
cmake llama.cpp \
  -B llama_build \
  -DCMAKE_C_FLAGS=-Ofast \
  -DLLAMA_NATIVE=1 \
  -DLLAMA_LTO=1 \
  -DBUILD_SHARED_LIBS=1 \
  -DLLAMA_MPI=1 \
  -DLLAMA_BUILD_TESTS=0 \
  -DLLAMA_BUILD_EXAMPLES=0
( cd llama_build && make -j )

# On Mac, this will be libggml_shared.dylib instead
export GGML_LIBRARY=$PWD/llama_build/libggml_shared.so
# Alternatively, you can just copy it to your system's lib dir, e.g /usr/local/lib

(Optional) Regenerate the bindings and stubs

If you added or changed any signatures of the C API, you'll want to regenerate the bindings (ggml/cffi.py) and stubs (ggml/init.pyi).

Luckily it's a one-liner using regenerate.py:

pip install -q cffi

python regenerate.py

By default it assumes llama.cpp was cloned in ../../../llama.cpp (alongside the ggml folder). You can override this with:

C_INCLUDE_DIR=$LLAMA_CPP_DIR python regenerate.py

You can also edit api.h to control which files should be included in the generated bindings (defaults to llama.cpp/ggml*.h)

In fact, if you wanted to only generate bindings for the current version of the ggml repo itself (instead of llama.cpp; you'd loose support for k-quants), you could run:

API=../../include/ggml/ggml.h python regenerate.py

Develop

Run tests:

pytest

Alternatives

This example's goal is to showcase cffi-generated bindings that are trivial to use and update, but there are already alternatives in the wild:

https://github.com/abetlen/ggml-python: these bindings seem to be hand-written and use ctypes. It has high-quality API reference docs that can be used with these bindings too, but it doesn't expose Metal, CUDA, MPI or OpenCL calls, doesn't support transparent (de/re)quantization like this example does (see ggml.utils module), and won't pick up your local changes.
https://github.com/abetlen/llama-cpp-python: these expose the C++ llama.cpp interface, which this example cannot easily be extended to support (cffi only generates bindings of C libraries)
pybind11 and nanobind are two alternatives to cffi that support binding C++ libraries, but it doesn't seem either of them have an automatic generator (writing bindings is rather time-consuming).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python

python

README.md

Simple autogenerated Python bindings for ggml

Prerequisites

Build libggml_shared.so or libllama.so

(Optional) Regenerate the bindings and stubs

Develop

Alternatives

Files

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

Simple autogenerated Python bindings for ggml

Prerequisites

Build libggml_shared.so or libllama.so

(Optional) Regenerate the bindings and stubs

Develop

Alternatives