diff --git a/README.md b/README.md index 44faa8e..e0e1e28 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,8 @@ So it's combining the best of RNN and transformer - **great performance, fast in **RWKV chatbot**: https://github.com/BlinkDL/ChatRWKV +**HF space**: https://huggingface.co/spaces/yahma/rwkv-14b + ![RWKV-chat](RWKV-chat.png) **You can run RWKV on low VRAM GPUs with this pip package:** https://github.com/harrisonvanderbyl/rwkvstic @@ -18,9 +20,7 @@ So it's combining the best of RNN and transformer - **great performance, fast in You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them. -Twitter: https://twitter.com/BlinkDL_AI - -I am training RWKV-4 14B on the Pile (final release around Feb-15-2023): https://wandb.ai/blinkdl/RWKV-v4-Pile +**Twitter**: https://twitter.com/BlinkDL_AI ![RWKV-eval2](RWKV-eval2.png) @@ -65,38 +65,6 @@ You can find me (BlinkDL) in the EleutherAI Discord too: https://www.eleuther.ai ![RWKV-demo](RWKV-demo.png) -## New ideas (just to record some new ideas) - -I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example: - -Channel 0 = "space" - -Channel 1 = "capitalize first letter" - -Channel 2 = "capitalize all letters" - -Therefore: - -Embedding of "abc": [0, 0, 0, x0, x1, x2 , ..] - -Embedding of " abc": [1, 0, 0, x0, x1, x2, ..] - -Embedding of " Abc": [1, 1, 0, x0, x1, x2, ..] - -Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...] - -...... - -so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc". - -Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong. - -Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb. - -Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings. - -At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV. - ## Quick start Use https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v4neo (latest code, compatible with v4). @@ -108,37 +76,41 @@ prompt = f'\nQ & A\n\nQuestion:\n{qq}\n\nDetailed Expert Answer:\n' # let the mo **Cool Community RWKV Projects (check them!)**: -https://pypi.org/project/rwkvstic/ +https://pypi.org/project/rwkvstic/ Easy pip package (with 8bit & offload for low VRAM GPUs) + +https://github.com/harrisonvanderbyl/rwkv_chatbot Chatbot using rwkvstic -https://github.com/harrisonvanderbyl/rwkv_chatbot +https://github.com/hizkifw/WebChatRWKVstic WebUI (WIP) -https://github.com/mrsteyk/RWKV-LM-deepspeed +https://github.com/gururise/rwkv_gradio RWKV Gradio -https://github.com/wozeparrot/tinyrwkv +https://github.com/mrsteyk/RWKV-LM-deepspeed Another training fork -https://github.com/gururise/rwkv_gradio +https://github.com/Blealtan/RWKV-LM-LoRA LoRA fine-tuning -https://github.com/huggingface/transformers/issues/17230 +https://github.com/wozeparrot/tinyrwkv RWKV in tinygrad (nice simple DL framework) -https://huggingface.co/spaces/Hazzzardous/RWKV-Instruct +https://github.com/huggingface/transformers/issues/17230 RWKV HF package (WIP) -https://github.com/ArEnSc/Production-RWKV +https://github.com/ArEnSc/Production-RWKV RWKV HF package source -https://github.com/nlpodyssey/verbaflow (in Go) +https://github.com/nlpodyssey/verbaflow RWKV in Go -https://github.com/nlpodyssey/rwkv (in Go) +https://github.com/nlpodyssey/rwkv RWKV in Go -https://github.com/mrsteyk/rwkvk-rs +https://github.com/mrsteyk/rwkvk-rs RWKV in Rust -https://github.com/resloved/RWKV-notebooks +https://github.com/imxcstar/CSharp-RWKV-V4 RWKV in C# -https://colab.research.google.com/github/harrisonvanderbyl/rwkvstic/blob/master/notebooks/chatbot.ipynb +https://github.com/resloved/RWKV-notebooks RWKV colab notebooks -https://github.com/Pathos14489/RWKVDistributedInference +https://colab.research.google.com/github/harrisonvanderbyl/rwkvstic/blob/master/notebooks/chatbot.ipynb RWKV chatbot colab notebook -https://github.com/AXKuhta/rwkv-onnx-dml +https://github.com/Pathos14489/RWKVDistributedInference RWKV Distributed Inference -https://github.com/josephrocca/rwkv-v4-web +https://github.com/AXKuhta/rwkv-onnx-dml RWKV ONNX + +https://github.com/josephrocca/rwkv-v4-web RWKV-v4 running in the browser (simple demo. greedy decode) ### Inference @@ -202,6 +174,38 @@ ss = json.dumps({"meta": meta, "text": text}, ensure_ascii=False) out.write(ss + "\n") ``` +## New ideas (just to record some new ideas) + +I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example: + +Channel 0 = "space" + +Channel 1 = "capitalize first letter" + +Channel 2 = "capitalize all letters" + +Therefore: + +Embedding of "abc": [0, 0, 0, x0, x1, x2 , ..] + +Embedding of " abc": [1, 0, 0, x0, x1, x2, ..] + +Embedding of " Abc": [1, 1, 0, x0, x1, x2, ..] + +Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...] + +...... + +so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc". + +Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong. + +Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb. + +Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings. + +At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV. + ## How it works RWKV is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103). @@ -397,6 +401,10 @@ I believe RWKV is performant because W is like repeatedly applying a diagonal ma Moreover it's possible to turn it into a continuous ODE (a bit similar to State Space Models). I will write about it later. +## Star History + +[![Star History Chart](https://api.star-history.com/svg?repos=BlinkDL/RWKV-LM&type=Date)](https://star-history.com/#BlinkDL/RWKV-LM&Date) + ## Multimodal ideas I have an idea for [text --> 32x32 RGB image] using a LM (transformer, RWKV, etc.). Will test it soon. diff --git a/RWKV-eval2.png b/RWKV-eval2.png index 6c82115..447c88d 100644 Binary files a/RWKV-eval2.png and b/RWKV-eval2.png differ diff --git a/RWKV-loss.png b/RWKV-loss.png index 78be1de..13e8d43 100644 Binary files a/RWKV-loss.png and b/RWKV-loss.png differ