Update README.md

main
randaller 3 years ago committed by GitHub
parent 164d3e4b36
commit 0ef9a6de6d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -51,12 +51,21 @@ Run the example:
python example-cpu.py python example-cpu.py
``` ```
### Some measurements
Running the model on Windows computer equipped with 12700k, fast nvme and 128 Gb of RAM.
| Model | RAM usage fp32 | RAM usage bf16 | fp32 inference | bf16 inference |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| 7B | 44 Gb | 22 Gb | 170 seconds | 850 seconds |
| 13B | 77 Gb, peak to 100 Gb | 380 seconds | can't handle to wait |
### RAM usage optimization ### RAM usage optimization
By default, torch uses Float32 precision while running on CPU, that leads, for example, to using 44 GB of RAM for 7B model. We may use Bfloat16 precision on CPU too, which decreases RAM consumption/2, down to 22 GB for 7B model, but inference processing much slower. By default, torch uses Float32 precision while running on CPU, that leads, for example, to using 44 GB of RAM for 7B model. We may use Bfloat16 precision on CPU too, which decreases RAM consumption/2, down to 22 GB for 7B model, but inference processing much slower.
Uncomment this line in the example-cpu.py to enable Bfloat16 and save memory. Uncomment this line in the example-cpu.py to enable Bfloat16 and save memory.
``` ```
# torch.set_default_dtype(torch.bfloat16) torch.set_default_dtype(torch.bfloat16)
``` ```
### Model Card ### Model Card

Loading…
Cancel
Save