You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.1 KiB

Raw Blame History

Inference LLaMA models using CPU only

This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form

Setup

In a conda env with pytorch / cuda available, run

pip install -r requirements.txt

Then in this repository

pip install -e .

Download

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

CPU Inference

Place tokenizer.model and tokenizer_checklist.chk into /tokenizer folder

Place three files of 7B model into /model folder

Run it:

python example-cpu.py

FAQ

1. The download.sh script doesn't work on default bash in MacOS X
2. Generations are bad!
3. CUDA Out of memory errors
4. Other languages

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

1.1 KiB Raw Blame History