Update README.md for new discoveries.

broken-opencl-code
Mikko Juola 3 years ago
parent 26d5309cf7
commit 8acb9f32b8

@ -11,7 +11,11 @@ figured out how to multithread this.
I've also managed to run LLaMA-13B which just barely fits in my 64-gig machine
with 32-bit float weights everywhere.
I have not tried the bigger models yet.
LLaMA-30B technically runs but my computer does not have enough memory to keep
all the weights around so generating a token takes minutes.
I have not tried LLaMA-60B but presumably if all the smaller models work it
would run given a sufficiently chonky computer.
This uses AVX2 intrinsics to speed up itself. Therefore, you need an x86-family
CPU to run this.
@ -32,7 +36,8 @@ decompress it.
$ cd LLaMA
$ cd 7B
$ unzip consolidated.00.pth
# Only necessary for LLaMA-7B, rllama currently expected .00, .01, .02 etc.in directories
# For LLaMA-7B, rename consolidated to consolidated.00
# For the larger models, the number is there already so no need to do this step.
$ mv consolidated consolidated.00
```
@ -51,3 +56,6 @@ settings.
# Future plans
This is a hobby thing for me so don't expect updates or help.
* Some other CPU implementations use quantization to reduce the size of weights
* Put some of the operations on the OpenCL GPU

Loading…
Cancel
Save