🚀 The feature, motivation and pitch
As of now, ExecuTorch supports SGD as the only optimizer, with Adam listed as planned.
Given that AdamW is the default optimizer used in most modern PyTorch training workflows, is AdamW support also expected as part of the ExecuTorch optimizer roadmap?
Are there any known design or semantic considerations in existing AdamW implementations (e.g., weight decay behavior or optimizer state handling) that influence how AdamW support is approached in ExecuTorch?
Thanks!
Alternatives
No response
Additional context
No response
RFC (Optional)
No response