From 25e3e12d9d5f4941840f58984f161d6827935d17 Mon Sep 17 00:00:00 2001
From: Mikko Juola <mikjuo@gmail.com>
Date: Sat, 18 Mar 2023 09:52:09 -0700
Subject: [PATCH] Update README.md on LLaMA-65B benchmark result.

---
 README.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index f0dddaa..fb94f5e 100644
--- a/README.md
+++ b/README.md
@@ -4,10 +4,9 @@ RLLaMA is a pure Rust implementation of [LLaMA large language model inference.](
 
 ## Supported features
 
-  * Use either `f16` and `f32` weights.
-  * LLaMA-7B, LLaMA-13B and LLaMA-30B are all confirmed working. LLaMA-65B
-    likely works but I haven't found a big enough computer to run it.
-  * Multithreaded hand-optimized CPU inference
+  * Uses either `f16` and `f32` weights.
+  * LLaMA-7B, LLaMA-13B, LLaMA-30B, LLaMA-65B all confirmed working
+  * Hand-optimized AVX2 implementation
   * OpenCL support for GPU inference.
 
 ## Performance
@@ -22,6 +21,7 @@ LLaMA-7B:  AMD Ryzen 3950X:                       1008ms / token    f32    (pure
 LLaMA-13B: AMD Ryzen 3950X:                       1029ms / token    f16    (pure Rust)
 LLaMA-13B: AMD Ryzen 3950X:                       1930ms / token    f32    (pure Rust)
 LLaMA-30B: AMD Ryzen 5950X:                       2112ms / token    f16    (pure Rust)
+LLaMA-65B: AMD Ryzen 5950X:                       4186ms / token    f16    (pure Rust)
 
 OpenCL (all use f16):
 
@@ -181,10 +181,13 @@ LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X:  4098ms / token
 # I've been focusing on making the ordinary non-OpenCL CPU implementation
 # faster and I got some gains, most importantly from multithreading.
 # There is Float16 support now, so I've added f16/f32 to these tables:
+#
+# I also managed to run LLaMA-65B for the first time.
 
 LLaMA-7B:  AMD Ryzen 3950X: 552ms / token     f16
 LLaMA-7B:  AMD Ryzen 3950X: 1008ms / token    f32
 LLaMA-13B: AMD Ryzen 3950X: 1029ms / token    f16
 LLaMA-13B: AMD Ryzen 3950X: 1930ms / token    f32
 LLaMA-30B: AMD Ryzen 5950X: 2112ms / token    f16
+LLaMA-65B: AMD Ryzen 5950X: 4186ms / token    f16
 ```