kyegomez/LongNet
Implementation of LongNet, a transformer architecture using dilated attention to scale to 1 billion token sequences.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
This repository provides a plug-and-play implementation of dilated attention from the LongNet research paper. The architecture enables transformers to process extremely long sequences by using dilated attention mechanisms that scale efficiently to 1,000,000,000 tokens. It serves as a direct replacement for standard attention layers in transformer models.