Update README.md

main
PENG Bo 4 years ago committed by GitHub
parent 31a0f3944c
commit 2bc3912594
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -32,12 +32,12 @@ The pseudocode (execution from top to bottom):
I propose a simple new method to find better LR schedules. The method is cost-efficient and practical for large LMs. The takeaway is we can model the loss curve dynamics (phenomenology) w.r.t. the LR, and a nice closed-form LR curve can be directly computed from it using variantional method. Moreover we can predict the final loss with reasonable accuracy. I propose a simple new method to find better LR schedules. The method is cost-efficient and practical for large LMs. The takeaway is we can model the loss curve dynamics (phenomenology) w.r.t. the LR, and a nice closed-form LR curve can be directly computed from it using variantional method. Moreover we can predict the final loss with reasonable accuracy.
![better_lr_schedule](Research/better_lr_schedule.png) UPDATE: In "Conclusion 1.", remember to use the best-fitting regime (ignore the initial steps where our approximations break down) to fit the parameters. Try this: exponential LR decay from lr to 0.2 * lr in 3 hrs.
UPDATE: In "Conclusion 1.", remember to use the best-fitting regime (ignore the initial steps where our approximations break down) to fit the parameters.
In the last three plots, black = predicted loss curve of the new LR schedule, blue = original (unoptimized) real loss curve, orange = new LR schedule. In the last three plots, black = predicted loss curve of the new LR schedule, blue = original (unoptimized) real loss curve, orange = new LR schedule.
![better_lr_schedule](Research/better_lr_schedule.png)
# The top-p-x sampling method # The top-p-x sampling method
We propose a new sampling method called top-p-x: We propose a new sampling method called top-p-x:

Loading…
Cancel
Save