Update README.md on LLaMA-65B benchmark result.

Noeda · Mar 18, 2023 · 25e3e12 · 25e3e12
1 parent f233f8a
commit 25e3e12
Showing 1 changed file with 7 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -4,10 +4,9 @@ RLLaMA is a pure Rust implementation of [LLaMA large language model inference.](
 
 ## Supported features
 
-  * Use either `f16` and `f32` weights.
-  * LLaMA-7B, LLaMA-13B and LLaMA-30B are all confirmed working. LLaMA-65B
-    likely works but I haven't found a big enough computer to run it.
-  * Multithreaded hand-optimized CPU inference
+  * Uses either `f16` and `f32` weights.
+  * LLaMA-7B, LLaMA-13B, LLaMA-30B, LLaMA-65B all confirmed working
+  * Hand-optimized AVX2 implementation
   * OpenCL support for GPU inference.
 
 ## Performance
@@ -22,6 +21,7 @@ LLaMA-7B:  AMD Ryzen 3950X:                       1008ms / token    f32    (pure
 LLaMA-13B: AMD Ryzen 3950X:                       1029ms / token    f16    (pure Rust)
 LLaMA-13B: AMD Ryzen 3950X:                       1930ms / token    f32    (pure Rust)
 LLaMA-30B: AMD Ryzen 5950X:                       2112ms / token    f16    (pure Rust)
+LLaMA-65B: AMD Ryzen 5950X:                       4186ms / token    f16    (pure Rust)
 
 OpenCL (all use f16):
 
@@ -181,10 +181,13 @@ LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X:  4098ms / token
 # I've been focusing on making the ordinary non-OpenCL CPU implementation
 # faster and I got some gains, most importantly from multithreading.
 # There is Float16 support now, so I've added f16/f32 to these tables:
+#
+# I also managed to run LLaMA-65B for the first time.
 
 LLaMA-7B:  AMD Ryzen 3950X: 552ms / token     f16
 LLaMA-7B:  AMD Ryzen 3950X: 1008ms / token    f32
 LLaMA-13B: AMD Ryzen 3950X: 1029ms / token    f16
 LLaMA-13B: AMD Ryzen 3950X: 1930ms / token    f32
 LLaMA-30B: AMD Ryzen 5950X: 2112ms / token    f16
+LLaMA-65B: AMD Ryzen 5950X: 4186ms / token    f16
 ```