From 64fdb610567a3d50ae6e974866ae18f0ab0769e8 Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Mon, 9 Aug 2021 19:38:45 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0568cdb..b35b9ff 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ alt="\begin{align*} \end{align*} "> -* Here R, K, V is generated by linear transforms of input. Basically RWKV decomposes attention into R(target) * W(src -> target) * K(src). So I call R "receptance", and sigmoid means it's in 0~1 range. +* Here R, K, V are generated by linear transforms of input, and W is parameter. Basically RWKV decomposes attention into R(target) * W(src, target) * K(src). So we can call R "receptance", and sigmoid means it's in 0~1 range. * The Time-mix is similar to AFT (https://arxiv.org/abs/2105.14103). There are two differences.