Update README.md for new discoveries.

3 years ago · 8acb9f32b8
parent 26d5309cf7
commit 8acb9f32b8
1 changed files with 10 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -11,7 +11,11 @@ figured out how to multithread this.
 I've also managed to run LLaMA-13B which just barely fits in my 64-gig machine
 with 32-bit float weights everywhere.

-I have not tried the bigger models yet.
+LLaMA-30B technically runs but my computer does not have enough memory to keep
+all the weights around so generating a token takes minutes.
+
+I have not tried LLaMA-60B but presumably if all the smaller models work it
+would run given a sufficiently chonky computer.

 This uses AVX2 intrinsics to speed up itself. Therefore, you need an x86-family
 CPU to run this.
@ -32,7 +36,8 @@ decompress it.
 $ cd LLaMA
 $ cd 7B
 $ unzip consolidated.00.pth
-# Only necessary for LLaMA-7B, rllama currently expected .00, .01, .02 etc.in directories
+# For LLaMA-7B, rename consolidated to consolidated.00
+# For the larger models, the number is there already so no need to do this step.
 $ mv consolidated consolidated.00
 ```

@ -51,3 +56,6 @@ settings.
 # Future plans

 This is a hobby thing for me so don't expect updates or help.
+
+* Some other CPU implementations use quantization to reduce the size of weights
+* Put some of the operations on the OpenCL GPU