From 959115a7e65dbefd9583b86e8032a94be143c15b Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Mon, 9 Aug 2021 19:25:24 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3492e2c..0568cdb 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ Moreover we multiply the final output of Time-mix layer by γ(t). The reason for * The Channel-mix is similar to GeGLU (https://arxiv.org/abs/2002.05202) with an extra R factor. -* Finally, we add extra time-mixing as in (https://github.com/BlinkDL/minGPT-tuned) +* Finally, we add extra time-mixing as in (https://github.com/BlinkDL/minGPT-tuned). You can try reducing the amt of time-mixing in upper layers of deep models. ***