From 7266efc685d6f589789cb5c8d9a905e9a1bb8f0a Mon Sep 17 00:00:00 2001 From: randaller Date: Sun, 5 Mar 2023 19:06:15 +0300 Subject: [PATCH] Update README.md --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index f00f6d7..9b49a76 100755 --- a/README.md +++ b/README.md @@ -51,6 +51,14 @@ Run the example: python example-cpu.py ``` +### RAM usage optimization +By default, torch uses Float32 precision while running on CPU, that leads, for example, to using 44 GB of RAM for 7B model. We may use Bfloat16 precision on CPU too, which decreases RAM consumption/2, down to 22 GB for 7B model, but inference processing much slower. + +Uncomment this line in the example-cpu.py to enable Bfloat16 and save memory. +``` +# torch.set_default_dtype(torch.bfloat16) +``` + ### Model Card See [MODEL_CARD.md](MODEL_CARD.md)