There is a `.cargo/config.toml` inside this repository that will enable these
There is a `.cargo/config.toml` inside this repository that will enable these
features if you install manually from this Git repository instead.
features if you install manually from this Git repository instead.
# How to run
## LLaMA weights
You will need Rust. Make sure you can run `cargo` from a command line. In
Refer to https://github.com/facebookresearch/llama/ As of now, you need to be
particular, this is using unstable features so you need nightly rust. Make sure
approved to get weights.
that if you write `cargo --version` it shows that it is nightly Rust.
You will need to download LLaMA-7B weights. Refer to https://github.com/facebookresearch/llama/
For LLaMA-7B make sure, you got these files:
Once you have 7B weights, and the `tokenizer.model` it comes with, you need to
```shell
decompress it.
* 7B/consolidated.00.pth
* 7B/params.json
* tokenizer.model
```
The `consolidated.00.pth` is actually a zip file. You need to unzip it:
```shell
```shell
$ cd LLaMA
$ cd 7B
$ cd 7B
$ unzip consolidated.00.pth
$ unzip consolidated.00.pth
# For LLaMA-7B, rename consolidated to consolidated.00
# For the larger models, the number is there already so no need to do this step.
$ mv consolidated consolidated.00
$ mv consolidated consolidated.00
```
```
You should then be ready to generate some text.
If you are using a larger model like LLaMA-13B, then you can skip the last step
of renaming the `consolidated` directory.
```shell
You should now be ready to generate some text.
cargo run --release -- --tokenizer-model /path/to/tokenizer.model --model-path /path/to/LLaMA/7B --param-path /path/to/LLaMA/7B/params.json --prompt "The meaning of life is"
```
By default, it will use the weights in the precision they are in the source
## Example
files. You can use `--f16` command line argument to cast the largest weight
matrices to float16. Also, using OpenCL will also cast the weight matrices to
float16.
You can use `--temperature`, `--top-p` and `--top-k` to adjust token sampler
Run LLaMA-7B with some weights casted to 16-bit floats:
settings.
There is `--repetition-penalty` setting. 1.0 means no penalty. This value
```shell
likely should be between 0 and 1. Values smaller than 1.0 give a penalty to
`x*(repetitition_penalty^num_occurrences)` before applying `softmax()` on the
--param-path /path/to/LLaMA/7B/params.json \
output probabilities. Or in other words, values smaller than 1.0 apply penalty.
--f16 \
--prompt "The meaning of life is"
```
You can also use `--prompt-file` to read the prompt from a file instead from
Use `rllama --help` to see all the options.
the command line.
# How to turn on OpenCL
## How to turn on OpenCL
Use `opencl` Cargo feature.
Use `opencl` Cargo feature.
```
```
cargo run --release --features opencl -- --tokenizer-model /path/to/tokenizer.model --model-path /path/to/LLaMA/7B --param-path /path/to/LLaMA/7B/params.json --prompt "The meaning of life is"