|
|
|
@ -168,6 +168,16 @@ Trained model will be saved into [./trained] folder. Now you may launch inferenc
|
|
|
|
python hf-inference-example.py
|
|
|
|
python hf-inference-example.py
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Bfloat16 optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To save memory you may enable Bfloat16 processing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
# to save memory use bfloat16 on cpu
|
|
|
|
|
|
|
|
import torch
|
|
|
|
|
|
|
|
torch.set_default_dtype(torch.bfloat16)
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Reference
|
|
|
|
## Reference
|
|
|
|
|
|
|
|
|
|
|
|
LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971
|
|
|
|
LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971
|
|
|
|
|