From 882ff0525429de5fb1a292a1ed31b23ea55093f3 Mon Sep 17 00:00:00 2001 From: Mikko Juola Date: Fri, 17 Mar 2023 23:33:04 -0700 Subject: [PATCH] Update README.md for new benchmarks. --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 573adc8..6d395b5 100644 --- a/README.md +++ b/README.md @@ -143,9 +143,20 @@ LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1226ms / token # commit de5dd592777b3a4f5a9e8c93c8aeef25b9294364 (15 March 2023) # The matrix multiplication on GPU is now much faster. It didn't have that much # effect overall though, but I got modest improvement on LLaMA-7B GPU. + LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1232ms / token LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X: 4098ms / token + +# commit 3d0afcf24309f28ec540ed7645c35400a865ad6f +# I've been focusing on making the ordinary non-OpenCL CPU implementation +# faster and I got some gains, most importantly from multithreading. +# There is Float16 support now, so I've added f16/f32 to these tables: + +LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16 +LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 +LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 +LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 ```