AGI-Arena/MARS
MARS is an optimizer framework that combines variance reduction with preconditioned updates to accelerate training of large language models.

MARS (Make Variance Reduction Shine) provides a unified optimization framework addressing gradient variance challenges in large model training. It implements a scaled stochastic recursive momentum for variance-reduced gradient estimation combined with a preconditioned update approximating second-order Newton methods. The implementation includes CUDA kernels and supports both pretraining (GPT-2 XL, FineWeb-Edu) and fine-tuning workflows for large language models.