Update README.md

main
PENG Bo 4 years ago committed by GitHub
parent 780bed4e19
commit 4b1df60e94
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -10,7 +10,9 @@ Hence it can be 100x faster than GPT, and 100x more VRAM friendly.
The a b c d factors work together to build a time-decay curve: u, 1, w, w^2, w^3, ...
Write out the formulas for "token at pos 2" and "token at pos 3" and you will get the idea.
Write out the formulas for "token at pos 2" and "token at pos 3" and you will get the idea:
* a and b: EMAs of kv and k.
* c and d: a and b combined with self-attention.
The model:

Loading…
Cancel
Save