From 73a63e175f407825040bc4017e8260e6f2ef7e58 Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Fri, 13 Aug 2021 13:56:30 +0800
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 12ef871..18ed9f3 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,7 @@ when you train a GPT, the hidden representation of a token has to accomplish two
 
 the time_shifted channels can focus on (2). so we have good propagation of info. it's like some kind of residual connection.
 
-you can use time_shift in usual QKV self-attention too. when i studied the weights, i found V really likes time_shifted channel. less so for Q. makes sense if you think abt it.
+you can use time_shift in usual QKV self-attention too. when i studied the weights, i found V really likes the time_shifted channel. less so for Q. makes sense if you think abt it.
 
 ***