From 09132dea52a201f67251a8f802d6a5f5b4a3695b Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Mon, 27 Jun 2022 20:42:15 +0800 Subject: [PATCH] Update README.md --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 696ddab..b98c35e 100644 --- a/README.md +++ b/README.md @@ -101,10 +101,6 @@ I need a better CUDA kernel to (1) pull off maxK so there's need to clamp k to 6 Removing the maxK limitation will also make it easy to clean the state of a KV-V channel, by using a huge K. -Namely, this is my plan: - -![RWKV-v3-plan](RWKV-v3-plan.png) - ## Explaining the code for RWKV v2+ GPT mode Note: this is for the latest v2+ model. @@ -203,6 +199,12 @@ return rkv ``` The self.value, self.receptance matrices are all initialized to zero. +## Towards RWKV-3 + +RWKV-3 will work under FP16. + +![RWKV-v3-plan](RWKV-v3-plan.png) + ## From GPT to RWKV-2 (the formulas) Let F[t] be the system state at t.