Chat with Meta's LLaMA models at home made easy

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

randaller 9f1bca0d38 Update README.md		3 years ago
llama	Add files via upload	3 years ago
model	Create .gitignore	3 years ago
tokenizer	Create .gitignore	3 years ago
README.md	Update README.md	3 years ago
example-chat.py	Add files via upload	3 years ago
merge-weights.py	Add files via upload	3 years ago
requirements.txt	Add files via upload	3 years ago
setup.py	Add files via upload	3 years ago

README.md

Chat with Meta's LLaMA models at home made easy

This repository is a chat example with LLaMA (arXiv) models running on a typical home PC. You will just need a NVIDIA videocard and some RAM to chat with model.

This repo is heavily based on Meta's original repo: https://github.com/facebookresearch/llama

And on Venuatu's repo: https://github.com/venuatu/llama

Examples of chats here

https://github.com/facebookresearch/llama/issues/162

System requirements

Modern enough CPU
NVIDIA graphics card
64 or better 128 Gb of RAM (192 or 256 would be perfect)

One may run with 32 Gb of RAM, but inference will be slow (with the speed of your swap file reading)

Conda Environment Setup Example for Windows 10+

Download and install Anaconda Python https://www.anaconda.com and run Anaconda Prompt

conda create -n llama python=3.10
conda activate llama
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Setup

In a conda env with pytorch / cuda available, run

pip install -r requirements.txt

Then in this repository

pip install -e .

Download tokenizer and models

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

magnet:xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce

Prepare model

First, you need to unshard model checkpoints to a single file. Let's do this for 30B model.

python merge_weights.py --input_dir D:\Downloads\LLaMA --model_size 30B

In this example, D:\Downloads\LLaMA is a root folder of downloaded torrent with weights.

This will create merged.pth file in the root folder of this repo.

Place this file and corresponding (torrentroot)/30B/params.json of model into [/model] folder.

Place (torrentroot)/tokenizer.model file to the [/tokenizer] folder of this repo. Now you are ready to go.

Run the chat

python example-chat.py ./model ./tokenizer/tokenizer.model