13 Commits (a31a3b2e926129b2d261f40be08b2c58737efc3f)

Author SHA1 Message Date
BlinkDL a31a3b2e92 + MHA_shift 4 years ago
BlinkDL fd098b1d2e small update 4 years ago
BlinkDL 3b60c5b266 add wandb, and rename variables 4 years ago
BlinkDL 440bebff1a fixed nan in large models 4 years ago
BlinkDL 62e2cb06d6 fixing nan in large models 4 years ago
BlinkDL d699a69169 misc improvements 4 years ago
BlinkDL 6266f481da minor changes 4 years ago
BlinkDL 89eab46e60 + info 4 years ago
BlinkDL e9fbd9bf70 remove layernorm -> better RWKV 4 years ago
BlinkDL 55405c57d0 better splitting of words 4 years ago
BlinkDL 01d6972f4f now works for word-level LM 4 years ago
BlinkDL 447eae5841 add MHA-plus model 4 years ago
BlinkDL aa4e2a68f4 first commit 4 years ago