@ -6,6 +6,12 @@ RWKV is a RNN with Transformer-level LLM performance, which can also be directly
So it's combining the best of RNN and transformer - **great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding** (using the final hidden state).
I am training RWKV-4 14B on the Pile: https://wandb.ai/blinkdl/RWKV-v4-Pile

All of the trained models will be open-source. Inference is very fast (only matrix-vector multiplications, no matrix-matrix multiplications) even on CPUs, so you can even run a LLM on your phone.