diff --git a/README.md b/README.md index bdcdf1f..3ad3f8e 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,10 @@ Moreover we multiply the final output of Time-mix layer by γ(t). The reason for *** +p.s. There is a MHA_pro model in this repo with strong performance. Give it a try :) + +*** + We also propose a new sampling method (as in src/utils.py): (1) Find the max probability p_max after softmax.