@ -6,6 +6,10 @@ RWKV is an RNN with Transformer-level LLM performance, which can also be directl
So it's combining the best of RNN and transformer - **great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding** (using the final hidden state).
@ -34,8 +38,6 @@ print(out.detach().cpu().numpy()) # same result as above

**Hugging Face space**: https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio
You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.