lucidrains/MEGABYTE-pytorch
A PyTorch implementation of MEGABYTE, a multiscale transformer for predicting million-byte sequences across multiple granularities.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
Implementation of the MEGABYTE architecture that uses hierarchical transformers to model sequences at multiple scales. The global transformer handles coarse representations while local transformers refine finer details. Supports variable depth hierarchies, flash attention, and training/generation on arbitrary byte sequences.