From 2bc3912594c341a2570aa2c9a50ee655f117cb3a Mon Sep 17 00:00:00 2001 From: PENG Bo <33809201+BlinkDL@users.noreply.github.com> Date: Wed, 13 Apr 2022 16:06:12 +0800 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 53343fa..aceca13 100644 --- a/README.md +++ b/README.md @@ -32,12 +32,12 @@ The pseudocode (execution from top to bottom): I propose a simple new method to find better LR schedules. The method is cost-efficient and practical for large LMs. The takeaway is we can model the loss curve dynamics (phenomenology) w.r.t. the LR, and a nice closed-form LR curve can be directly computed from it using variantional method. Moreover we can predict the final loss with reasonable accuracy. -![better_lr_schedule](Research/better_lr_schedule.png) - -UPDATE: In "Conclusion 1.", remember to use the best-fitting regime (ignore the initial steps where our approximations break down) to fit the parameters. +UPDATE: In "Conclusion 1.", remember to use the best-fitting regime (ignore the initial steps where our approximations break down) to fit the parameters. Try this: exponential LR decay from lr to 0.2 * lr in 3 hrs. In the last three plots, black = predicted loss curve of the new LR schedule, blue = original (unoptimized) real loss curve, orange = new LR schedule. +![better_lr_schedule](Research/better_lr_schedule.png) + # The top-p-x sampling method We propose a new sampling method called top-p-x: