You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
3 years ago | |
|---|---|---|
| llama | 3 years ago | |
| model | 3 years ago | |
| tokenizer | 3 years ago | |
| .gitignore | 3 years ago | |
| CODE_OF_CONDUCT.md | 3 years ago | |
| CONTRIBUTING.md | 3 years ago | |
| FAQ.md | 3 years ago | |
| LICENSE | 3 years ago | |
| MODEL_CARD.md | 3 years ago | |
| README.md | 3 years ago | |
| download.sh | 3 years ago | |
| example-cpu.py | 3 years ago | |
| example.py | 3 years ago | |
| requirements.txt | 3 years ago | |
| setup.py | 3 years ago | |
README.md
Inference LLaMA models using CPU only
This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form
Setup
In a conda env with pytorch / cuda available, run
pip install -r requirements.txt
Then in this repository
pip install -e .
Download
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
CPU Inference
Place tokenizer.model and tokenizer_checklist.chk into /tokenizer folder
Place three files of 7B model into /model folder
Run it:
python example-cpu.py
FAQ
- 1. The download.sh script doesn't work on default bash in MacOS X
- 2. Generations are bad!
- 3. CUDA Out of memory errors
- 4. Other languages
Model Card
See MODEL_CARD.md
License
See the LICENSE file.