luyug/GradCache
A memory-efficient training technique for scaling contrastive learning batch size far beyond GPU/TPU constraints using gradient caching.

Gradient Cache enables training contrastive learning models with arbitrarily large batch sizes on limited hardware by caching gradients and only keeping one model copy in memory at a time. It supports both PyTorch and JAX/Flax frameworks, making it adaptable across deep learning ecosystems. The technique was developed for dense passage retrieval (DPR) systems and embedding training, which are core components of RAG pipelines and LLM application stacks.