From 13b8784502cb85b1872721998ca16f6c18a20c94 Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Wed, 15 Mar 2023 03:00:17 +0800 Subject: [PATCH] Update README.md --- README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/README.md b/README.md index 49966d5..8015a7b 100644 --- a/README.md +++ b/README.md @@ -214,6 +214,18 @@ out.write(ss + "\n") 5. RWKV might be great on analog devices (search for Analog Matrix-vector multiplication & Photonic Matrix-vector multiplication). RNN is very hardware-friendly. SNN RWKV is straightforward. I wonder if it can be optimized for quantum computation too. +### Vision Tasks + +1. I find it's good to add a 2d pos encoding: +``` +self.pos_emb_x = nn.Parameter(torch.zeros((1,args.my_pos_emb,args.n_embd))) +self.pos_emb_y = nn.Parameter(torch.zeros((args.my_pos_emb,1,args.n_embd))) +... +x = x + pos_emb_x + pos_emb_y +``` + +2. In a langauge model, it's the best to use [tokenShift of 1 token]. However you can try [tokenShift of N (or N-1) (or N+1) tokens] if the image size is N x N, because that will be like mixing [the token above the current positon (or the token above the to-be-predicted positon)] with [current token]. You can use try different tokenShift styles for "ATT" & "FFN", or mixing different tokenShift styles - such as mixing [token A] with [token A-1] and [token A-(N-1)] etc. + ### Misc I have an idea to improve tokenization. We can hardcode some channels to have meanings. Example: