BlinkDL
|
fcd01f8851
|
no message
|
4 years ago |
BlinkDL
|
76e241b71e
|
saves vocab.json, and the model every X epoch
|
4 years ago |
PENG Bo
|
689a6a924d
|
Update train.py
|
4 years ago |
PENG Bo
|
34fa2ec81b
|
Update README.md
|
4 years ago |
PENG Bo
|
58bdb908f9
|
Update README.md
|
4 years ago |
PENG Bo
|
3d8d0373b4
|
Update README.md
|
4 years ago |
BlinkDL
|
710d3e34b7
|
better init for RWKV
|
4 years ago |
BlinkDL
|
619ed00e4b
|
misc improvement
|
4 years ago |
PENG Bo
|
a36fc09fea
|
Update README.md
|
4 years ago |
PENG Bo
|
a91084efa9
|
Update README.md
|
4 years ago |
BlinkDL
|
3329161ed7
|
rapid convergence using ZERO initialization
|
4 years ago |
BlinkDL
|
7f391c5758
|
+ RWKV tiny-attn and now it's great for ctx 1024 or 2048
|
4 years ago |
PENG Bo
|
a9f39c112c
|
Update README.md
|
4 years ago |
PENG Bo
|
8fd4601dea
|
Update README.md
|
4 years ago |
BlinkDL
|
9b903db103
|
Merge branch 'main' of https://github.com/BlinkDL/RWKV-LM into main
|
4 years ago |
BlinkDL
|
8aec414db2
|
no message
|
4 years ago |
PENG Bo
|
9e959d0b8a
|
Update README.md
|
4 years ago |
BlinkDL
|
4ffd8f1b76
|
+ new comparison
|
4 years ago |
PENG Bo
|
04852faf04
|
Update README.md
|
4 years ago |
BlinkDL
|
ad627311f4
|
clean init code
|
4 years ago |
BlinkDL
|
c675b47705
|
misc improvements
|
4 years ago |
BlinkDL
|
ef29f4b9e8
|
fixed nan loss
|
4 years ago |
BlinkDL
|
4fd8716976
|
improve RWKV time_w initialization
|
4 years ago |
PENG Bo
|
1ea53a2f03
|
Update README.md
|
4 years ago |
BlinkDL
|
a31a3b2e92
|
+ MHA_shift
|
4 years ago |
PENG Bo
|
4096fff9ee
|
Update README.md
|
4 years ago |
PENG Bo
|
12ba06216d
|
Update README.md
|
4 years ago |
PENG Bo
|
639de69256
|
Create CITATION.cff
|
4 years ago |
PENG Bo
|
994170685b
|
Update README.md
|
4 years ago |
BlinkDL
|
3b9005ea11
|
RWKV: now faster and less params
|
4 years ago |
BlinkDL
|
546114c6a5
|
still use layernorm for everything
|
4 years ago |
PENG Bo
|
c68ea168b1
|
Update README.md
|
4 years ago |
PENG Bo
|
73a63e175f
|
Update README.md
|
4 years ago |
PENG Bo
|
2df321d3f4
|
Update README.md
|
4 years ago |
PENG Bo
|
6e2ba61d95
|
Update README.md
|
4 years ago |
PENG Bo
|
cd9b352b45
|
Update README.md
|
4 years ago |
PENG Bo
|
d2b100c2ac
|
Update README.md
|
4 years ago |
PENG Bo
|
8af6289d0c
|
Update README.md
|
4 years ago |
BlinkDL
|
fd098b1d2e
|
small update
|
4 years ago |
PENG Bo
|
3b01c8c3cf
|
Update README.md
|
4 years ago |
BlinkDL
|
65eda0f915
|
no message
|
4 years ago |
BlinkDL
|
3b60c5b266
|
add wandb, and rename variables
|
4 years ago |
BlinkDL
|
440bebff1a
|
fixed nan in large models
|
4 years ago |
PENG Bo
|
f80ff53595
|
Update README.md
|
4 years ago |
BlinkDL
|
62e2cb06d6
|
fixing nan in large models
|
4 years ago |
BlinkDL
|
d699a69169
|
misc improvements
|
4 years ago |
BlinkDL
|
6266f481da
|
minor changes
|
4 years ago |
PENG Bo
|
88297e7949
|
Update README.md
|
4 years ago |
BlinkDL
|
89eab46e60
|
+ info
|
4 years ago |
BlinkDL
|
e9fbd9bf70
|
remove layernorm -> better RWKV
|
4 years ago |