diff --git a/README.md b/README.md
index c1ad5e2..b198cb5 100644
--- a/README.md
+++ b/README.md
@@ -88,3 +88,11 @@ Blue: MHA_pro (MHA with various tweaks & RWKV-type-FFN) - slow - needs more VRAM
   url          = {https://doi.org/10.5281/zenodo.5196577}
 }
 ```
+
+# Initialization
+
+We use careful initialization for RWKV to get fast convergence - orthogonal matrices with proper scaling, special time_w curves, and reduce initial output weights in higher layers. Check model.py for details.
+
+Some learned time_w examples:
+
+![RWKV-time-w](RWKV-time-w.png)