From 13c6149205542eddf2d8941f777c416b01457b7a Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Tue, 31 Jan 2023 12:04:14 +0800 Subject: [PATCH] Update README.md --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 4790699..13ddba3 100644 --- a/README.md +++ b/README.md @@ -91,11 +91,13 @@ Embedding of "ABC": [0, 0, 1, x0, x1, x2, ...] so they will share most of the embedding. And we can rapidly compute the output probability of all variations of "abc". -Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong. A better method is to define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb. +Note: the above method is assuming that p(" xyz") / p("xyz") is the same for any "xyz", which can be wrong. -Why I think this is better: At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. +Better: define emb_space emb_capitalize_first emb_capitalize_all to be a function of emb. -I plan to test this in a new version of RWKV. +Maybe the Best: let 'abc' ' abc' etc. to share the last 90% of their embeddings. + +At this moment, all our tokenizers spend too many items to represent all variations of 'abc' ' abc' ' Abc' etc. Moreover the model cannot discover that these are actually similar if some of these variations are rare in the dataset. My method can solve this. I plan to test this in a new version of RWKV. ## Quick start