Is lite-transformer open source?

Yes — mit-han-lab/lite-transformer is an open-source project tracked on heatdrop.

What language is lite-transformer written in?

mit-han-lab/lite-transformer is primarily written in Python.

How popular is lite-transformer?

mit-han-lab/lite-transformer has 609 stars on GitHub.

Where can I find lite-transformer?

mit-han-lab/lite-transformer is on GitHub at https://github.com/mit-han-lab/lite-transformer.

← all repositories

mit-han-lab/lite-transformer

A transformer that knows when to skim and when to study

MIT's 2020 ICLR paper splits attention into long-range and short-range specialists, trading full self-attention for speed.

★609 stars Python Language Models ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Lite Transformer replaces the standard transformer’s uniform attention with two parallel branches: one using dilated convolution for long-range dependencies, the other using regular self-attention for local detail. The idea is that not every token pair needs the full quadratic treatment. It’s built as a fork of fairseq, with CUDA kernels for light/dynamic convolution layers that you’ll need to compile yourself.

The interesting bit

The long-short split is the core bet: convolution handles the “skimming” (global structure), attention handles the “close reading” (local semantics). The README is admirably concrete about compute budgets — models at 90M, 360M, and 527M multiply-adds, with BLEU scores listed for each — which makes the efficiency claim checkable rather than hand-wavy.

Key highlights

Pretrained checkpoints for WMT'14 En-Fr, WMT'16 En-De, CNN/DailyMail summarization, and WIKITEXT-103 language modeling
Distributed training setup included (multi-node, 16+ GPUs)
Language modeling lives on a separate language-model branch
Requires PyTorch ≥1.0, Python ≥3.6, and NCCL for training
Custom CUDA modules need manual build via setup.py in two fairseq/modules subdirectories

Caveats

The repo appears dormant (2020 paper, 611 stars, no recent activity visible); fairseq has likely moved on
Installation involves multiple manual compilation steps that may bit-rot with newer CUDA/PyTorch versions
No issue tracker or discussion visible in the README to gauge current usability

Verdict

Worth studying if you’re designing efficient attention variants or benchmarking against 2020-era efficiency baselines. Skip it if you need a maintained, batteries-included training stack — fairseq itself, or modern alternatives like Hugging Face or Megatron, will be less archaeologically demanding.

Frequently asked

What is mit-han-lab/lite-transformer?: MIT's 2020 ICLR paper splits attention into long-range and short-range specialists, trading full self-attention for speed.
Is lite-transformer open source?: Yes — mit-han-lab/lite-transformer is an open-source project tracked on heatdrop.
What language is lite-transformer written in?: mit-han-lab/lite-transformer is primarily written in Python.
How popular is lite-transformer?: mit-han-lab/lite-transformer has 609 stars on GitHub.
Where can I find lite-transformer?: mit-han-lab/lite-transformer is on GitHub at https://github.com/mit-han-lab/lite-transformer.