mit-han-lab/lite-transformer
A lightweight transformer architecture with Long-Short Range Attention for efficient NLP tasks.

Velocity · 7d
+0.3
★ / day
Trend
→steady
star history
This repository implements the Lite Transformer, a research paper from MIT’s Han Lab published at ICLR 2020. The architecture introduces Long-Short Range Attention to improve efficiency over standard transformers. The implementation builds on fairseq and includes custom CUDA-accelerated convolution layers (lightconv and dynamicconv). It supports standard NLP benchmarks including IWSLT14 and WMT translation tasks.