Update README.md

3 years ago · 78579a00d2
parent e6d9e4979a
commit 78579a00d2
1 changed files with 32 additions and 32 deletions
--- a/README.md
+++ b/README.md
@ -65,38 +65,6 @@ You can find me (BlinkDL) in the EleutherAI Discord too: https://www.eleuther.ai

 ![RWKV-demo](RWKV-demo.png)

-## New ideas (just to record some new ideas)
-
-I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example:
-
-Channel 0 = "space"
-
-Channel 1 = "capitalize first letter"
-
-Channel 2 = "capitalize all letters"
-
-Therefore:
-
-Embedding of "abc":  [0, 0, 0, x0, x1, x2 , ..]
-
-Embedding of " abc":  [1, 0, 0, x0, x1, x2, ..]
-
-Embedding of " Abc":  [1, 1, 0, x0, x1, x2, ..]
-
-Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...]
-
-......
-
-so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc".
-
-Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong.
-
-Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb.
-
-Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings.
-
-At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV.
-
 ## Quick start

 Use https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v4neo (latest code, compatible with v4).
@ -206,6 +174,38 @@ ss = json.dumps({"meta": meta, "text": text}, ensure_ascii=False)
 out.write(ss + "\n")
 ```

+## New ideas (just to record some new ideas)
+
+I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example:
+
+Channel 0 = "space"
+
+Channel 1 = "capitalize first letter"
+
+Channel 2 = "capitalize all letters"
+
+Therefore:
+
+Embedding of "abc":  [0, 0, 0, x0, x1, x2 , ..]
+
+Embedding of " abc":  [1, 0, 0, x0, x1, x2, ..]
+
+Embedding of " Abc":  [1, 1, 0, x0, x1, x2, ..]
+
+Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...]
+
+......
+
+so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc".
+
+Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong.
+
+Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb.
+
+Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings.
+
+At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV.
+
 ## How it works

 RWKV is inspired by Apple's AFT (https://arxiv.org/abs/2105.14103).