Update README.md

4 years ago · af56c2446d
parent ecaf1f98aa
commit af56c2446d
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -16,7 +16,7 @@ Write out the formulas for "token at pos 2" and "token at pos 3" and you will ge
 kv / k is the memory mechanism. The token with high k can be remembered for a long duration, if W is close to 1 in the channel.
-It's also using my SmallInitEmb trick https://github.com/BlinkDL/SmallInitEmb (applicable to all transformers).
+It's also using my SmallInitEmb trick https://github.com/BlinkDL/SmallInitEmb (applicable to all transformers), and a custom CUDA kernel https://github.com/BlinkDL/RWKV-CUDA .
 The pseudocode (execution from top to bottom):