From 8fd4601dea44e0f06076f9991b26b3e12131b4c6 Mon Sep 17 00:00:00 2001
From: PENG Bo <33809201+BlinkDL@users.noreply.github.com>
Date: Tue, 17 Aug 2021 22:59:57 +0800
Subject: [PATCH] Update README.md

---
 README.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/README.md b/README.md
index c1ad5e2..b198cb5 100644
--- a/README.md
+++ b/README.md
@@ -88,3 +88,11 @@ Blue: MHA_pro (MHA with various tweaks & RWKV-type-FFN) - slow - needs more VRAM
   url          = {https://doi.org/10.5281/zenodo.5196577}
 }
 ```
+
+# Initialization
+
+We use careful initialization for RWKV to get fast convergence - orthogonal matrices with proper scaling, special time_w curves, and reduce initial output weights in higher layers. Check model.py for details.
+
+Some learned time_w examples:
+
+![RWKV-time-w](RWKV-time-w.png)