jiaweizzhao/GaLore
A PyTorch optimizer that reduces LLM training memory by projecting gradients into low-rank space.

Velocity · 7d
+2.1
★ / day
Trend
→steady
star history
GaLore provides memory-efficient full-parameter learning for LLMs by projecting gradients into a low-rank subspace during training. It integrates with existing optimizers like AdamW, AdamW8bit, and Adafactor with minimal code changes. The method achieves comparable or better results than LoRA-style adapters while maintaining full-parameter learning capabilities, and has been extended with quantized variants like Q-GaLore.