From 44e0abf0f18614aaf436014022ff568d9c809970 Mon Sep 17 00:00:00 2001 From: Mikko Juola Date: Fri, 17 Mar 2023 23:43:04 -0700 Subject: [PATCH] Clarify that the OpenCL implementations all use f16. --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index c894600..0e230fa 100644 --- a/README.md +++ b/README.md @@ -7,12 +7,16 @@ https://github.com/ggerganov/ggml that could run GPT-J 6B models. The current performance is as follows: ``` +Pure Rust implementations: + LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16 (pure Rust) LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 (pure Rust) LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 (pure Rust) +OpenCL (all use f16): + LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token (OpenCL on GPU) LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token (OpenCL on CPU) LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: