diff --git a/README.md b/README.md
index 7677451..bc37b2b 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN
 
 RWKV is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103).
 
-However it's also using a number of my tricks, such as:
+And it's also using a number of my tricks, such as:
 
 * SmallInitEmb: https://github.com/BlinkDL/SmallInitEmb (applicable to all transformers) which helps the embedding quality, and stabilizes Post-LN (which is what I am using).
 
@@ -50,7 +50,7 @@ However it's also using a number of my tricks, such as:
 
 * Extra R-gate in the FFN (applicable to all transformers). I am also using reluSquared from Primer.
 
-* Better initilization: I init most of the matrices to ZERO (see RWKV_Init in https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v2-RNN/src/model.py)
+* Better initilization: I init most of the matrices to ZERO (see RWKV_Init in https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v2-RNN/src/model.py).
 
 * You can transfer some parameters from a small model to a large model, for faster and better convergence (see https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/).