Cleanup and Removal of Deprecated Optimizers#1367
Cleanup and Removal of Deprecated Optimizers#1367Koratahiu wants to merge 1 commit intoNerogar:masterfrom
Conversation
any idea why it could have been added in the first place? |
I can see a mention from Nero as “Adam being the default “ nearing 3 years ago. Maybe that’s why it was added for completeness sake? (Speculation) Here’s some quotes from Nero, madmen and surgo, 2 years ago
Even then adam (without the W) was regarded as bad. Safe to remove imo. Do we need a migration for this so it goes AdamW as the default after these are removed? |
It’s true that it has a unique geometry (L^2 norm) with rotational invariance, which is optimal for embeddings like CLIP tokens.
Unrelated, but theoretically, we should apply weight decay oppositely to how it's implemented in the original Adam (scaling it by the inverse square root of the second moment). |

Summary of Changes
This PR removes the following optimizers and their associated configurations, UI elements, and logic:
DADAPT_ADA_GRAD,DADAPT_ADAM,DADAPT_ADAN,DADAPT_LION, andDADAPT_SGD. (superseded by Prodigy)ADAGRADandADAGRAD_8BIT. (Very outdated and unstable)TIGERandYOGI. (Tiger is SignSGD with tweaked momentum, and YOGI whats that)To Consider