As of now, ExecuTorch supports SGD as the only optimizer, with Adam listed as planned.
Given that AdamW is the default optimizer used in most modern PyTorch training workflows, is AdamW support also expected as part of the ExecuTorch optimizer roadmap?
Are there any known design or semantic considerations in existing AdamW implementations (e.g., weight decay behavior or optimizer state handling) that influence how AdamW support is approached in ExecuTorch?
Thanks!
cc @JacobSzwejbka
As of now, ExecuTorch supports SGD as the only optimizer, with Adam listed as planned.
Given that AdamW is the default optimizer used in most modern PyTorch training workflows, is AdamW support also expected as part of the ExecuTorch optimizer roadmap?
Are there any known design or semantic considerations in existing AdamW implementations (e.g., weight decay behavior or optimizer state handling) that influence how AdamW support is approached in ExecuTorch?
Thanks!
cc @JacobSzwejbka