BlinkDL
|
0a0eae447d
|
+headQK (compatible with 2022-02-15 AI-Writer)
|
4 years ago |
BlinkDL
|
b48aa1d430
|
no message
|
4 years ago |
BlinkDL
|
a19be54bf5
|
no message
|
4 years ago |
BlinkDL
|
fcd01f8851
|
no message
|
4 years ago |
BlinkDL
|
76e241b71e
|
saves vocab.json, and the model every X epoch
|
4 years ago |
PENG Bo
|
689a6a924d
|
Update train.py
|
4 years ago |
BlinkDL
|
3329161ed7
|
rapid convergence using ZERO initialization
|
4 years ago |
BlinkDL
|
7f391c5758
|
+ RWKV tiny-attn and now it's great for ctx 1024 or 2048
|
4 years ago |
BlinkDL
|
ad627311f4
|
clean init code
|
4 years ago |
BlinkDL
|
c675b47705
|
misc improvements
|
4 years ago |
BlinkDL
|
a31a3b2e92
|
+ MHA_shift
|
4 years ago |
BlinkDL
|
fd098b1d2e
|
small update
|
4 years ago |
BlinkDL
|
3b60c5b266
|
add wandb, and rename variables
|
4 years ago |
BlinkDL
|
440bebff1a
|
fixed nan in large models
|
4 years ago |
BlinkDL
|
62e2cb06d6
|
fixing nan in large models
|
4 years ago |
BlinkDL
|
d699a69169
|
misc improvements
|
4 years ago |
BlinkDL
|
6266f481da
|
minor changes
|
4 years ago |
BlinkDL
|
89eab46e60
|
+ info
|
4 years ago |
BlinkDL
|
e9fbd9bf70
|
remove layernorm -> better RWKV
|
4 years ago |
BlinkDL
|
55405c57d0
|
better splitting of words
|
4 years ago |
BlinkDL
|
01d6972f4f
|
now works for word-level LM
|
4 years ago |
BlinkDL
|
447eae5841
|
add MHA-plus model
|
4 years ago |
BlinkDL
|
aa4e2a68f4
|
first commit
|
4 years ago |