You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
3 years ago | |
|---|---|---|
| llama | 3 years ago | |
| model | 3 years ago | |
| tokenizer | 3 years ago | |
| .gitignore | 3 years ago | |
| CODE_OF_CONDUCT.md | 3 years ago | |
| CONTRIBUTING.md | 3 years ago | |
| FAQ.md | 3 years ago | |
| LICENSE | 3 years ago | |
| MODEL_CARD.md | 3 years ago | |
| README.md | 3 years ago | |
| example-cpu.py | 3 years ago | |
| requirements.txt | 3 years ago | |
| setup.py | 3 years ago | |
README.md
Inference LLaMA models using CPU only
This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference by using only CPU. Thus requires no videocard, but 64 (better 128 Gb) of RAM and modern processor is required.
Setup
In a conda env with pytorch / cuda available, run
pip install -r requirements.txt
Then in this repository
pip install -e .
Download tokenizer and models
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
CPU Inference
Place tokenizer.model and tokenizer_checklist.chk into [/tokenizer] folder
Place three files of 7B model into [/model] folder
Run it:
python example-cpu.py
Model Card
See MODEL_CARD.md
License
See the LICENSE file.