From bcd4adb7818d01712f30504f03aaa536295ccfef Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Mon, 9 Aug 2021 15:52:13 +0800
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3d22b0a..3492e2c 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@ alt="\begin{align*}
 \end{align*}
 ">
 
-* Here R, K, V is generated by linear transforms of input.
+* Here R, K, V is generated by linear transforms of input. Basically RWKV decomposes attention into R(target) * W(src -> target) * K(src). So I call R "receptance", and sigmoid means it's in 0~1 range.
 
 * The Time-mix is similar to AFT (https://arxiv.org/abs/2105.14103). There are two differences.