GitHub - rydamckinney/llama-rs at f4ace108f1822c8c9503e01f9547e4b84ba0351f

Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github	.github
doc/resources	doc/resources
ggml-raw	ggml-raw
llama-rs	llama-rs
.gitignore	.gitignore
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
LICENSE	LICENSE
README.md	README.md

Name

Last commit message

Last commit date

Do the LLaMA thing, but now in Rust 🦀🚀🦙

Llama-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook's LLaMA model on a CPU with good performance.

Just like its C++ counterpart, it is powered by the ggml tensor library, achieving the same performance as the original code.

Getting started

Make sure you have a rust toolchain set up.

Clone the repository
Build (cargo build --release)
Run with cargo run --release -- <ARGS>

For example, you try the following prompt:

cargo run --release -- -m /data/Llama/LLaMA/7B/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is

Q&A

Q: Why did you do this?
A: It was not my choice. Ferris appeared to me in my dreams and asked me to rewrite this in the name of the Holy crab.
Q: Seriously now
A: Come on! I don't want to get into a flame war. You know how it goes, something something memory something something cargo is nice, don't make me say it, everybody knows this already.
Q: I insist.
A: Sheesh! Okaaay. After seeing the huge potential for llama.cpp, the first thing I did was to see how hard would it be to turn it into a library to embed in my projects. I started digging into the code, and realized the heavy lifting is done by ggml (a C library, easy to bind to Rust) and the whole project was just around ~2k lines of C++ code (not so easy to bind). After a couple of (failed) attempts to build an HTTP server into the tool, I realized I'd be much more productive if I just ported the code to Rust, where I'm more comfortable.
Q: Is this the real reason?
A: Haha. Of course not. I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust.

Known issues / To-dos

Contributions welcome! Here's a few pressing issues:

The code only sets the right CFLAGS on Linux. The build.rs script in ggml_raw needs to be fixed, so inference will be very slow on every other OS.
The quantization code has not been ported (yet). You can still use the quantized models with llama.cpp.
The code needs to be "library"-fied. It is nice as a showcase binary, but the real potential for this tool is to allow embedding in other services.
No crates.io release. The name llama-rs is reserved and I plan to do this soon-ish.
Anything from the original C++ code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Q&A

Known issues / To-dos

About

Releases

Packages

Languages

License

rydamckinney/llama-rs

Folders and files

Latest commit

History

Repository files navigation

Getting started

Q&A

Known issues / To-dos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages