From 041227cdad28f9ffccd95eb80ba658065311d6be Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Sat, 2 Jul 2022 01:13:50 +0800 Subject: [PATCH] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 0f287be..ce69065 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,8 @@ See the release here for a 27M params model on enwik8 with 0.72 BPC(dev). Run ru ### Training / Fine-tuning +Colab for fine-tuning: https://colab.research.google.com/drive/1BwceyZczs5hQr1wefmCREonEWhY-zeST + Training: https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN You will be training the "GPT" version because it's paralleziable and faster to train. I find RWKV can extrapolate, so training with ctxLen 768 can work for ctxLen of 1000+. You can fine-tune the model with longer ctxLen and it can quickly adapt to longer ctxLens.