Update README.md for new benchmarks.

master
Mikko Juola 3 years ago
parent 3d0afcf243
commit 882ff05254

@ -143,9 +143,20 @@ LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1226ms / token
# commit de5dd592777b3a4f5a9e8c93c8aeef25b9294364 (15 March 2023)
# The matrix multiplication on GPU is now much faster. It didn't have that much
# effect overall though, but I got modest improvement on LLaMA-7B GPU.
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: <ran out of GPU memory>
LLaMA-13B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 1232ms / token
LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X: 4098ms / token
# commit 3d0afcf24309f28ec540ed7645c35400a865ad6f
# I've been focusing on making the ordinary non-OpenCL CPU implementation
# faster and I got some gains, most importantly from multithreading.
# There is Float16 support now, so I've added f16/f32 to these tables:
LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16
LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32
LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16
LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32
```

Loading…
Cancel
Save