Merge branch 'main' of https://github.com/BlinkDL/RWKV-LM into main

main
BlinkDL 3 years ago
commit 6ed3a3db09

@ -2,7 +2,7 @@
## RWKV: RNN with Transformer-level LLM Performance ## RWKV: RNN with Transformer-level LLM Performance
RWKV is a RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly computer the hidden state for the "RNN" mode. RWKV is a RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
So it's combining the best of RNN and transformer - **great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding** (using the final hidden state). So it's combining the best of RNN and transformer - **great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding** (using the final hidden state).
@ -12,9 +12,13 @@ So it's combining the best of RNN and transformer - **great performance, fast in
![RWKV-chat](RWKV-chat.png) ![RWKV-chat](RWKV-chat.png)
**You can run RWKV on low VRAM GPUs with this fork (choose pytorch-stream):** https://github.com/harrisonvanderbyl/rwkv_chatbot **You can run RWKV on low VRAM GPUs with this pip package:** https://github.com/harrisonvanderbyl/rwkvstic
--- ## Join our Discord: https://discord.gg/bDSBUMeFpc :)
You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.
Twitter: https://twitter.com/BlinkDL_AI
I am training RWKV-4 14B on the Pile (final release around Feb-15-2023): https://wandb.ai/blinkdl/RWKV-v4-Pile I am training RWKV-4 14B on the Pile (final release around Feb-15-2023): https://wandb.ai/blinkdl/RWKV-v4-Pile
@ -31,12 +35,6 @@ I am doing image experiments too (For example: https://huggingface.co/BlinkDL/cl
Smooth training - no loss spikes! (lr & bsz change around 15G tokens) Smooth training - no loss spikes! (lr & bsz change around 15G tokens)
![RWKV-loss](RWKV-loss.png) ![RWKV-loss](RWKV-loss.png)
## Join our Discord: https://discord.gg/bDSBUMeFpc :)
You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.
Twitter: https://twitter.com/BlinkDL_AI
![RWKV-eval](RWKV-eval.png) ![RWKV-eval](RWKV-eval.png)
All of the trained models will be open-source. Inference is very fast (only matrix-vector multiplications, no matrix-matrix multiplications) even on CPUs, so you can even run a LLM on your phone. All of the trained models will be open-source. Inference is very fast (only matrix-vector multiplications, no matrix-matrix multiplications) even on CPUs, so you can even run a LLM on your phone.
@ -116,16 +114,26 @@ https://github.com/harrisonvanderbyl/rwkv_chatbot
https://github.com/mrsteyk/RWKV-LM-deepspeed https://github.com/mrsteyk/RWKV-LM-deepspeed
https://github.com/wozeparrot/tinyrwkv
https://github.com/gururise/rwkv_gradio
https://github.com/huggingface/transformers/issues/17230 https://github.com/huggingface/transformers/issues/17230
https://huggingface.co/spaces/Hazzzardous/RWKV-Instruct
https://github.com/ArEnSc/Production-RWKV https://github.com/ArEnSc/Production-RWKV
https://github.com/nlpodyssey/verbaflow (in Go) https://github.com/nlpodyssey/verbaflow (in Go)
https://github.com/nlpodyssey/rwkv (in Go) https://github.com/nlpodyssey/rwkv (in Go)
https://github.com/mrsteyk/rwkvk-rs
https://github.com/resloved/RWKV-notebooks https://github.com/resloved/RWKV-notebooks
https://colab.research.google.com/github/harrisonvanderbyl/rwkvstic/blob/master/notebooks/chatbot.ipynb
https://github.com/Pathos14489/RWKVDistributedInference https://github.com/Pathos14489/RWKVDistributedInference
https://github.com/AXKuhta/rwkv-onnx-dml https://github.com/AXKuhta/rwkv-onnx-dml
@ -174,7 +182,7 @@ You will be training the "GPT" version because it's paralleziable and faster to
**Fine-tuning RWKV-4 Pile models:** use 'prepare-data.py' in https://github.com/BlinkDL/RWKV-v2-RNN-Pile/tree/main/RWKV-v3 to tokenize .txt into train.npy data. Then set EXPRESS_PILE_MODE to True in train.py, and run it. **Fine-tuning RWKV-4 Pile models:** use 'prepare-data.py' in https://github.com/BlinkDL/RWKV-v2-RNN-Pile/tree/main/RWKV-v3 to tokenize .txt into train.npy data. Then set EXPRESS_PILE_MODE to True in train.py, and run it.
Read the inference code in src/model.py and try using the final hidden state.xx .aa .bb) as a faithful sentence embedding for other tasks. Probably you shall begin with .xx and .aa/.bb (.aa divided by .bb). Read the inference code in src/model.py and try using the final hidden state.xx .aa .bb) as a faithful sentence embedding for other tasks. Probably you should begin with .xx and .aa/.bb (.aa divided by .bb).
Colab for fine-tuning RWKV-4 Pile models: https://colab.research.google.com/github/resloved/RWKV-notebooks/blob/master/RWKV_v4_RNN_Pile_Fine_Tuning.ipynb Colab for fine-tuning RWKV-4 Pile models: https://colab.research.google.com/github/resloved/RWKV-notebooks/blob/master/RWKV_v4_RNN_Pile_Fine_Tuning.ipynb
@ -411,7 +419,7 @@ Multi-task training might help too. I will try this dataset format:
[TxtFirst] [Desc of Img (txt tokens)] [Img] [img tokens] [TxtFirst] [Desc of Img (txt tokens)] [Img] [img tokens]
and sometimes and sometimes
[ImgFirst] [img tokens] [Txt] [Desc of Img (txt tokens)] [ImgFirst] [img tokens] [Txt] [Desc of Img (txt tokens)]
... the order of the imgs shall be randomized in the DataLoader, and [TxtFirst] [ImgFirst] [Img] [Txt] are special tokens ... the order of the imgs should be randomized in the DataLoader, and [TxtFirst] [ImgFirst] [Img] [Txt] are special tokens
and do random sampling of the full dataset. So sometimes the model will see the img tokens first and then the corresponding txt tokens, which is a [img -> txt] task. And the model will see some partial imgs and partial txts. I think a char-level LM might help the model to write correct text on images. and do random sampling of the full dataset. So sometimes the model will see the img tokens first and then the corresponding txt tokens, which is a [img -> txt] task. And the model will see some partial imgs and partial txts. I think a char-level LM might help the model to write correct text on images.
## How to sample a large dataset (for training) ## How to sample a large dataset (for training)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 220 KiB

After

Width:  |  Height:  |  Size: 177 KiB

Loading…
Cancel
Save