← all repositories

haoliuhl/ringattention

A distributed attention mechanism for training large language models with context lengths up to tens of millions of tokens.

773 stars Python Language ModelsML Frameworks
ringattention
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides a JAX implementation of Ring Attention with Blockwise Parallel Transformers, enabling near-infinite context training for large language models. The approach distributes attention and feedforward computation across multiple devices while overlapping communication with computation, allowing training without additional overhead. It handles blockwise computing of attention and feedforward networks to efficiently scale to very long sequences.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.