← all repositories

lucidrains/MEGABYTE-pytorch

A PyTorch implementation of MEGABYTE, a multiscale transformer for predicting million-byte sequences across multiple granularities.

654 stars Python Language ModelsML Frameworks
MEGABYTE-pytorch
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Implementation of the MEGABYTE architecture that uses hierarchical transformers to model sequences at multiple scales. The global transformer handles coarse representations while local transformers refine finer details. Supports variable depth hierarchies, flash attention, and training/generation on arbitrary byte sequences.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.