From ae518062fa58dea4ad920917b7f63751c47f7356 Mon Sep 17 00:00:00 2001
From: randaller <randaller@users.noreply.github.com>
Date: Sun, 19 Mar 2023 17:18:33 +0300
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 927b36b..afa7980 100644
--- a/README.md
+++ b/README.md
@@ -201,7 +201,7 @@ One with A100 might try to set 38Gb to a GPU0 and try to inference the model com
 
 One with 4*A100 might wish to use: {0: "38GiB", 1: "38GiB", 2: "38GiB", 3: "38GiB", "cpu":"128GiB"}.
 
-For me, with 6Gb for 3070ti, this works three times slower against pure CPU inference.
+For me, with 7Gb for 3070ti, for 7B model, this works at the same speed as pure CPU inference.
 
 ```
 python hf-inference-cuda-example.py