From b4d5cf91a772d213ac5c7a7c6d75828392ccedd6 Mon Sep 17 00:00:00 2001 From: Mikko Juola Date: Mon, 13 Mar 2023 17:45:23 -0700 Subject: [PATCH] Mention in README.md that using OpenCL does not cast weights to 32-bit floats. --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index d2231ca..f399f56 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,8 @@ cargo run --release -- --tokenizer-model /path/to/tokenizer.model --model-path / ``` Right now it seems to use around ~25 gigabytes of memory for 7B and around ~50 -gigabytes for 13B. Internally all weights are cast to 32-bit floats. +gigabytes for 13B. If you don't use OpenCL, then all parameters are cast to +32-bit floats. You can use `--temperature`, `--top-p` and `--top-k` to adjust token sampler settings.