← all repositories

kyegomez/LongNet

Implementation of LongNet, a transformer architecture using dilated attention to scale to 1 billion token sequences.

720 stars Python Language ModelsML Frameworks
LongNet
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides a plug-and-play implementation of dilated attention from the LongNet research paper. The architecture enables transformers to process extremely long sequences by using dilated attention mechanisms that scale efficiently to 1,000,000,000 tokens. It serves as a direct replacement for standard attention layers in transformer models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.