Clarify that the OpenCL implementations all use f16.

master
Mikko Juola 3 years ago
parent 58463458ee
commit 44e0abf0f1

@ -7,12 +7,16 @@ https://github.com/ggerganov/ggml that could run GPT-J 6B models.
The current performance is as follows: The current performance is as follows:
``` ```
Pure Rust implementations:
LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16 (pure Rust) LLaMA-7B: AMD Ryzen 3950X: 552ms / token f16 (pure Rust)
LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 (pure Rust) LLaMA-7B: AMD Ryzen 3950X: 1008ms / token f32 (pure Rust)
LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1029ms / token f16 (pure Rust)
LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 (pure Rust) LLaMA-13B: AMD Ryzen 3950X: 1930ms / token f32 (pure Rust)
LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 (pure Rust) LLaMA-30B: AMD Ryzen 5950X: 2112ms / token f16 (pure Rust)
OpenCL (all use f16):
LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token (OpenCL on GPU) LLaMA-7B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: 247ms / token (OpenCL on GPU)
LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token (OpenCL on CPU) LLaMA-7B: AMD Ryzen 3950X + OpenCL Ryzen 3950X: 680ms / token (OpenCL on CPU)
LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: <I ran out of GPU memory :(> LLaMA-13B: AMD Ryzen 3950X + OpenCL GTX 3090 Ti: <I ran out of GPU memory :(>

Loading…
Cancel
Save