From 15db7d3e141fd75753f69c6ff6e6c06e8aaba4f7 Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Mon, 16 May 2022 00:31:00 +0800 Subject: [PATCH] Update README.md --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index 4bbcda1..6fbb3ae 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,11 @@ Read the inference code in https://github.com/BlinkDL/RWKV-v2-RNN-Pile/blob/main See the release for a 27M params model on enwik8 with 0.72 BPC(dev). Run run.py in https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN :) +Fine-tuning & training: +https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN + +Note: change 1e-15 to 1e-9 in https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v2-RNN/src/model.py and https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v2-RNN/src/model_run.py and probably you need other changes as well. You can compare the output with the latest code ( https://github.com/BlinkDL/RWKV-v2-RNN-Pile ) to verify it. + ## How it works RWKV is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103).