Update README.md on LLaMA-65B benchmark result.

master
Mikko Juola 3 years ago
parent f233f8ad8f
commit 25e3e12d9d

@ -4,10 +4,9 @@ RLLaMA is a pure Rust implementation of [LLaMA large language model inference.](
## Supported features ## Supported features
* Use either `f16` and `f32` weights. * Uses either `f16` and `f32` weights.
* LLaMA-7B, LLaMA-13B and LLaMA-30B are all confirmed working. LLaMA-65B * LLaMA-7B, LLaMA-13B, LLaMA-30B, LLaMA-65B all confirmed working
likely works but I haven't found a big enough computer to run it. * Hand-optimized AVX2 implementation
* Multithreaded hand-optimized CPU inference
* OpenCL support for GPU inference. * OpenCL support for GPU inference.
## Performance ## Performance
@ -22,6 +21,7 @@ LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 (pure
LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 (pure Rust)
LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 (pure Rust)
LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 (pure Rust) LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 (pure Rust)
LLaMA-65B: AMD Ryzen 5950X: 4186ms / token f16 (pure Rust)
OpenCL (all use f16): OpenCL (all use f16):
@ -181,10 +181,13 @@ LLaMA-30B: AMD Ryzen 5950X + OpenCL Ryzen 5950X: 4098ms / token
# I've been focusing on making the ordinary non-OpenCL CPU implementation # I've been focusing on making the ordinary non-OpenCL CPU implementation
# faster and I got some gains, most importantly from multithreading. # faster and I got some gains, most importantly from multithreading.
# There is Float16 support now, so I've added f16/f32 to these tables: # There is Float16 support now, so I've added f16/f32 to these tables:
#
# I also managed to run LLaMA-65B for the first time.
LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16 LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16
LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32
LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16
LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32
LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16
LLaMA-65B: AMD Ryzen 5950X: 4186ms / token f16
``` ```

Loading…
Cancel
Save