diff --git a/README.md b/README.md
index 44faa8e..e0e1e28 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,8 @@ So it's combining the best of RNN and transformer - **great performance, fast in
 
 **RWKV chatbot**: https://github.com/BlinkDL/ChatRWKV
 
+**HF space**: https://huggingface.co/spaces/yahma/rwkv-14b
+
 ![RWKV-chat](RWKV-chat.png)
 
 **You can run RWKV on low VRAM GPUs with this pip package:** https://github.com/harrisonvanderbyl/rwkvstic
@@ -18,9 +20,7 @@ So it's combining the best of RNN and transformer - **great performance, fast in
 
 You are welcome to join the RWKV discord https://discord.gg/bDSBUMeFpc to build upon it. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them.
 
-Twitter: https://twitter.com/BlinkDL_AI
-
-I am training RWKV-4 14B on the Pile (final release around Feb-15-2023): https://wandb.ai/blinkdl/RWKV-v4-Pile
+**Twitter**: https://twitter.com/BlinkDL_AI
 
 ![RWKV-eval2](RWKV-eval2.png)
 
@@ -65,38 +65,6 @@ You can find me (BlinkDL) in the EleutherAI Discord too: https://www.eleuther.ai
 
 ![RWKV-demo](RWKV-demo.png)
 
-## New ideas (just to record some new ideas)
-
-I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example:
-
-Channel 0 = "space"
-
-Channel 1 = "capitalize first letter"
-
-Channel 2 = "capitalize all letters"
-
-Therefore:
-
-Embedding of "abc":  [0, 0, 0, x0, x1, x2 , ..]
-
-Embedding of " abc":  [1, 0, 0, x0, x1, x2, ..]
-
-Embedding of " Abc":  [1, 1, 0, x0, x1, x2, ..]
-
-Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...]
-
-......
-
-so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc".
-
-Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong.
-
-Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb.
-
-Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings.
-
-At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV.
-
 ## Quick start
 
 Use https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v4neo (latest code, compatible with v4).
@@ -108,37 +76,41 @@ prompt = f'\nQ & A\n\nQuestion:\n{qq}\n\nDetailed Expert Answer:\n' # let the mo
 
 **Cool Community RWKV Projects (check them!)**:
 
-https://pypi.org/project/rwkvstic/
+https://pypi.org/project/rwkvstic/ Easy pip package (with 8bit & offload for low VRAM GPUs)
+
+https://github.com/harrisonvanderbyl/rwkv_chatbot Chatbot using rwkvstic
 
-https://github.com/harrisonvanderbyl/rwkv_chatbot
+https://github.com/hizkifw/WebChatRWKVstic WebUI (WIP)
 
-https://github.com/mrsteyk/RWKV-LM-deepspeed
+https://github.com/gururise/rwkv_gradio RWKV Gradio
 
-https://github.com/wozeparrot/tinyrwkv
+https://github.com/mrsteyk/RWKV-LM-deepspeed Another training fork
 
-https://github.com/gururise/rwkv_gradio
+https://github.com/Blealtan/RWKV-LM-LoRA LoRA fine-tuning
 
-https://github.com/huggingface/transformers/issues/17230
+https://github.com/wozeparrot/tinyrwkv RWKV in tinygrad (nice simple DL framework)
 
-https://huggingface.co/spaces/Hazzzardous/RWKV-Instruct
+https://github.com/huggingface/transformers/issues/17230 RWKV HF package (WIP)
 
-https://github.com/ArEnSc/Production-RWKV
+https://github.com/ArEnSc/Production-RWKV RWKV HF package source
 
-https://github.com/nlpodyssey/verbaflow (in Go)
+https://github.com/nlpodyssey/verbaflow RWKV in Go
 
-https://github.com/nlpodyssey/rwkv (in Go)
+https://github.com/nlpodyssey/rwkv RWKV in Go
 
-https://github.com/mrsteyk/rwkvk-rs
+https://github.com/mrsteyk/rwkvk-rs RWKV in Rust
 
-https://github.com/resloved/RWKV-notebooks
+https://github.com/imxcstar/CSharp-RWKV-V4 RWKV in C#
 
-https://colab.research.google.com/github/harrisonvanderbyl/rwkvstic/blob/master/notebooks/chatbot.ipynb
+https://github.com/resloved/RWKV-notebooks RWKV colab notebooks
 
-https://github.com/Pathos14489/RWKVDistributedInference
+https://colab.research.google.com/github/harrisonvanderbyl/rwkvstic/blob/master/notebooks/chatbot.ipynb RWKV chatbot colab notebook
 
-https://github.com/AXKuhta/rwkv-onnx-dml
+https://github.com/Pathos14489/RWKVDistributedInference RWKV Distributed Inference
 
-https://github.com/josephrocca/rwkv-v4-web
+https://github.com/AXKuhta/rwkv-onnx-dml RWKV ONNX
+
+https://github.com/josephrocca/rwkv-v4-web RWKV-v4 running in the browser (simple demo. greedy decode)
 
 ### Inference
 
@@ -202,6 +174,38 @@ ss = json.dumps({"meta": meta, "text": text}, ensure_ascii=False)
 out.write(ss + "\n")
 ```
 
+## New ideas (just to record some new ideas)
+
+I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example:
+
+Channel 0 = "space"
+
+Channel 1 = "capitalize first letter"
+
+Channel 2 = "capitalize all letters"
+
+Therefore:
+
+Embedding of "abc":  [0, 0, 0, x0, x1, x2 , ..]
+
+Embedding of " abc":  [1, 0, 0, x0, x1, x2, ..]
+
+Embedding of " Abc":  [1, 1, 0, x0, x1, x2, ..]
+
+Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...]
+
+......
+
+so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc".
+
+Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong.
+
+Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb.
+
+Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings.
+
+At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV.
+
 ## How it works
 
 RWKV is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103).
@@ -397,6 +401,10 @@ I believe RWKV is performant because W is like repeatedly applying a diagonal ma
 
 Moreover it's possible to turn it into a continuous ODE (a bit similar to State Space Models). I will write about it later.
 
+## Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=BlinkDL/RWKV-LM&type=Date)](https://star-history.com/#BlinkDL/RWKV-LM&Date)
+
 ## Multimodal ideas
 
 I have an idea for [text --> 32x32 RGB image] using a LM (transformer, RWKV, etc.). Will test it soon.
diff --git a/RWKV-eval2.png b/RWKV-eval2.png
index 6c82115..447c88d 100644
Binary files a/RWKV-eval2.png and b/RWKV-eval2.png differ
diff --git a/RWKV-loss.png b/RWKV-loss.png
index 78be1de..13e8d43 100644
Binary files a/RWKV-loss.png and b/RWKV-loss.png differ