Inference on CPU code for LLaMA models

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

randaller a6bf90e654 Update README.md		3 years ago
llama	run on cpu	3 years ago
model	Create .gitkeep	3 years ago
tokenizer	Create .gitkeep	3 years ago
.gitignore	Initial commit	3 years ago
CODE_OF_CONDUCT.md	Initial commit	3 years ago
CONTRIBUTING.md	Initial commit	3 years ago
FAQ.md	Add FAQ.md // add command line options	3 years ago
LICENSE	Initial commit	3 years ago
MODEL_CARD.md	Fix typos in MODEL_CARD.md	3 years ago
README.md	Update README.md	3 years ago
download.sh	Initial commit	3 years ago
example-cpu.py	Create example-cpu.py	3 years ago
example.py	Add FAQ.md // add command line options	3 years ago
requirements.txt	Initial commit	3 years ago
setup.py	Initial commit	3 years ago

README.md

Inference LLaMA models using CPU only

This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form

Setup

In a conda env with pytorch / cuda available, run

pip install -r requirements.txt

Then in this repository

pip install -e .

Download

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

CPU Inference

Place tokenizer.model and tokenizer_checklist.chk into /tokenizer folder

Place three files of 7B model into /model folder

Run it:

python example-cpu.py

README.md

Inference LLaMA models using CPU only

Setup

Download

CPU Inference

FAQ

Model Card

License