Rust+OpenCL+AVX2 implementation of LLaMA inference code
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Go to file
Mikko Juola 3b8f904f13 First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
proto First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
src First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
.gitignore First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
Cargo.lock First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
Cargo.toml First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
LICENSE First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
LICENSE.third_parties First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
README.md First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago
build.rs First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago

README.md

AdeonLLaMA

This is my attempt at making the LLaMA language model working on a pure Rust CPU implementation.

As of writing of this, it can run LLaMA-7B at around ~1 token per second, using something like 1.5 threads because I haven't yet properly figured out how to multithread this.

It uses AVX2 intrinsics to speed up itself.

How to run

You will need the LLaMA-7B weights first. Refer to https://github.com/facebookresearch/llama/

Once you have 7B weights, and the tokenizer.model it comes with, you can make it generate tokens:

cargo run --release -- --tokenizer-model /path/to/tokenizer.model --model-path /path/to/LLaMA/7B