From 09132dea52a201f67251a8f802d6a5f5b4a3695b Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Mon, 27 Jun 2022 20:42:15 +0800
Subject: [PATCH] Update README.md

---
 README.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 696ddab..b98c35e 100644
--- a/README.md
+++ b/README.md
@@ -101,10 +101,6 @@ I need a better CUDA kernel to (1) pull off maxK so there's need to clamp k to 6
 
 Removing the maxK limitation will also make it easy to clean the state of a KV-V channel, by using a huge K.
 
-Namely, this is my plan:
-
-![RWKV-v3-plan](RWKV-v3-plan.png)
-
 ## Explaining the code for RWKV v2+ GPT mode
 
 Note: this is for the latest v2+ model.
@@ -203,6 +199,12 @@ return rkv
 ```
 The self.value, self.receptance matrices are all initialized to zero.
 
+## Towards RWKV-3
+
+RWKV-3 will work under FP16.
+
+![RWKV-v3-plan](RWKV-v3-plan.png)
+
 ## From GPT to RWKV-2 (the formulas)
 
 Let F[t] be the system state at t.