From 959115a7e65dbefd9583b86e8032a94be143c15b Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Mon, 9 Aug 2021 19:25:24 +0800
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3492e2c..0568cdb 100644
--- a/README.md
+++ b/README.md
@@ -30,7 +30,7 @@ Moreover we multiply the final output of Time-mix layer by γ(t). The reason for
 
 * The Channel-mix is similar to GeGLU (https://arxiv.org/abs/2002.05202) with an extra R factor.
 
-* Finally, we add extra time-mixing as in (https://github.com/BlinkDL/minGPT-tuned)
+* Finally, we add extra time-mixing as in (https://github.com/BlinkDL/minGPT-tuned). You can try reducing the amt of time-mixing in upper layers of deep models.
 
 ***