You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
834 B
834 B
Inference LLaMA models using CPU only
This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference by using only CPU. No videocard is needed.
Setup
In a conda env with pytorch / cuda available, run
pip install -r requirements.txt
Then in this repository
pip install -e .
Download tokenizer and models
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
CPU Inference
Place tokenizer.model and tokenizer_checklist.chk into /tokenizer folder
Place three files of 7B model into /model folder
Run it:
python example-cpu.py
Model Card
See MODEL_CARD.md
License
See the LICENSE file.