From 6fe2f798a1bc9b32a6f8dfd4c2087a43b86c4846 Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Mon, 31 Oct 2022 15:29:34 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 05f862e..ad4efa1 100644 --- a/README.md +++ b/README.md @@ -109,7 +109,7 @@ kv / k is the memory mechanism. The token with high k can be remembered for a lo The R-gate is important for performance. k = info strength of this token (to be passed to future tokens). r = whether to apply the info to this token. -## RWKV-3 improvements (used in the latest 1.5B run) +## RWKV-3 improvements Use different trainable TimeMix factors for R / K / V in SA and FF layers. Example: ```python