From 97bb730aee68ed41a3f2398c255749e90e3f7d65 Mon Sep 17 00:00:00 2001 From: randaller Date: Sun, 5 Mar 2023 19:24:26 +0300 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4571398..bf34554 100755 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ Running the model on Windows computer equipped with 12700k, fast nvme and 128 Gb | Model | RAM usage fp32 | RAM usage bf16 | fp32 inference | bf16 inference | | ------------- | ------------- | ------------- | ------------- | ------------- | | 7B | 44 Gb | 22 Gb | 170 seconds | 850 seconds | -| 13B | 77 Gb, peak to 100 Gb | 380 seconds | can't handle to wait | +| 13B | 77 Gb, peak 100 Gb | 38 Gb | 380 seconds | can't handle to wait | ### RAM usage optimization By default, torch uses Float32 precision while running on CPU, that leads, for example, to using 44 GB of RAM for 7B model. We may use Bfloat16 precision on CPU too, which decreases RAM consumption/2, down to 22 GB for 7B model, but inference processing much slower.