From 8acb9f32b8f18c811b579b4190f7eef611b6e91a Mon Sep 17 00:00:00 2001
From: Mikko Juola <mikjuo@gmail.com>
Date: Sat, 11 Mar 2023 22:55:08 -0800
Subject: [PATCH] Update README.md for new discoveries.

---
 README.md | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index c4269a1..7945a44 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,11 @@ figured out how to multithread this.
 I've also managed to run LLaMA-13B which just barely fits in my 64-gig machine
 with 32-bit float weights everywhere.
 
-I have not tried the bigger models yet.
+LLaMA-30B technically runs but my computer does not have enough memory to keep
+all the weights around so generating a token takes minutes.
+
+I have not tried LLaMA-60B but presumably if all the smaller models work it
+would run given a sufficiently chonky computer.
 
 This uses AVX2 intrinsics to speed up itself. Therefore, you need an x86-family
 CPU to run this.
@@ -32,7 +36,8 @@ decompress it.
 $ cd LLaMA
 $ cd 7B
 $ unzip consolidated.00.pth
-# Only necessary for LLaMA-7B, rllama currently expected .00, .01, .02 etc.in directories
+# For LLaMA-7B, rename consolidated to consolidated.00
+# For the larger models, the number is there already so no need to do this step.
 $ mv consolidated consolidated.00
 ```
 
@@ -51,3 +56,6 @@ settings.
 # Future plans
 
 This is a hobby thing for me so don't expect updates or help.
+
+* Some other CPU implementations use quantization to reduce the size of weights
+* Put some of the operations on the OpenCL GPU