diff --git a/README.md b/README.md index 57981fc..8a02ea1 100644 --- a/README.md +++ b/README.md @@ -190,7 +190,7 @@ out.write(ss + "\n") ## Towards RWKV-5 (just to record some new ideas) -### List of some ideas +### Some ideas 1. Now time decay is like 0.999^T (0.999 is learnable). Change it to something like (0.999^T + 0.1) where 0.1 is learnable too. The 0.1 part will be kept forever. @@ -198,6 +198,10 @@ out.write(ss + "\n") 3. Inject some trainable and interpolable positional encoding? +4. Aside from 2d rotation, we can try other Lie groups such as 3d rotation ( SO(3) ). Non-abelian RWKV lol. + +5. RWKV might be great on analog devices (search for Analog Matrix-vector multiplication & Photonic Matrix-vector multiplication). + ### Misc I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example: