From c103f0caa3cb8371a2e77f814ede9505265c588d Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Mon, 27 Jun 2022 10:37:00 +0800
Subject: [PATCH] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f57cbfa..b369909 100644
--- a/README.md
+++ b/README.md
@@ -61,7 +61,7 @@ And it's also using a number of my tricks, such as:
 
 * Better initilization: I init most of the matrices to ZERO (see RWKV_Init in https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v2-RNN/src/model.py).
 
-* You can transfer some parameters from a small model to a large model, for faster and better convergence (see https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/).
+* You can transfer some parameters from a small model to a large model (note: I sort & smooth them too), for faster and better convergence (see https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/).
 
 * My CUDA kernel: https://github.com/BlinkDL/RWKV-CUDA to speedup training.
 
@@ -173,7 +173,7 @@ rwkv = self.output(rwkv) # final output projection
 
 The self.key, self.receptance, self.output matrices are all initialized to zero.
 
-The time_mix, time_decay, time_first vectors are transferred from a smaller trained model.
+The time_mix, time_decay, time_first vectors are transferred from a smaller trained model (note: I sort & smooth them too).
 
 #### The GPT mode - FFN block