From cd9b352b453b0eb06ad5e8af7cf4fb1748c00485 Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Fri, 13 Aug 2021 11:48:22 +0800
Subject: [PATCH] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 4e1e71a..b2fba21 100644
--- a/README.md
+++ b/README.md
@@ -44,13 +44,13 @@ when you train a GPT, the hidden representation of a token has to accomplish two
 
 1. predict the next token. sometimes this is easy (obvious next token).
 
-2. collect info so later token can use it. this is always hard.
+2. collect all prev ctx info so later token can use it. this is always hard.
 
-the time_shifted channels can focus on (2). So we have good propagation of info. It's like some kind of residual connection.
+the time_shifted channels can focus on (2). so we have good propagation of info. it's like some kind of residual connection.
 
 ***
 
-p.s. There is a MHA_pro model in this repo with strong performance. Give it a try :)
+p.s. There is aother MHA_pro model in this repo with strong performance. Give it a try :)
 
 ***