From 8fd4601dea44e0f06076f9991b26b3e12131b4c6 Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Tue, 17 Aug 2021 22:59:57 +0800 Subject: [PATCH] Update README.md --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index c1ad5e2..b198cb5 100644 --- a/README.md +++ b/README.md @@ -88,3 +88,11 @@ Blue: MHA_pro (MHA with various tweaks & RWKV-type-FFN) - slow - needs more VRAM url = {https://doi.org/10.5281/zenodo.5196577} } ``` + +# Initialization + +We use careful initialization for RWKV to get fast convergence - orthogonal matrices with proper scaling, special time_w curves, and reduce initial output weights in higher layers. Check model.py for details. + +Some learned time_w examples: + +![RWKV-time-w](RWKV-time-w.png)