From 8acb9f32b8f18c811b579b4190f7eef611b6e91a Mon Sep 17 00:00:00 2001 From: Mikko Juola Date: Sat, 11 Mar 2023 22:55:08 -0800 Subject: [PATCH] Update README.md for new discoveries. --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c4269a1..7945a44 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,11 @@ figured out how to multithread this. I've also managed to run LLaMA-13B which just barely fits in my 64-gig machine with 32-bit float weights everywhere. -I have not tried the bigger models yet. +LLaMA-30B technically runs but my computer does not have enough memory to keep +all the weights around so generating a token takes minutes. + +I have not tried LLaMA-60B but presumably if all the smaller models work it +would run given a sufficiently chonky computer. This uses AVX2 intrinsics to speed up itself. Therefore, you need an x86-family CPU to run this. @@ -32,7 +36,8 @@ decompress it. $ cd LLaMA $ cd 7B $ unzip consolidated.00.pth -# Only necessary for LLaMA-7B, rllama currently expected .00, .01, .02 etc.in directories +# For LLaMA-7B, rename consolidated to consolidated.00 +# For the larger models, the number is there already so no need to do this step. $ mv consolidated consolidated.00 ``` @@ -51,3 +56,6 @@ settings. # Future plans This is a hobby thing for me so don't expect updates or help. + +* Some other CPU implementations use quantization to reduce the size of weights +* Put some of the operations on the OpenCL GPU