princeton-nlp/MeZO
A memory-efficient zeroth-order optimizer that fine-tunes language models using only forward passes, reducing memory usage by up to 12x.

MeZO adapts classical zeroth-order SGD to operate in-place for language model fine-tuning, enabling training of 30B parameter models on a single 80GB GPU (vs 2.7B with Adam). The method achieves comparable performance to backpropagation-based fine-tuning across multiple tasks and supports both full-parameter and parameter-efficient tuning techniques such as LoRA and prefix tuning. It also enables optimization of non-differentiable objectives like accuracy or F1 scores.