← all repositories
google-research/long-range-arena

A shooting gallery for attention mechanisms that claim to be efficient

Google Research built a standardized proving ground so you can stop arguing about whether your O(n log n) attention actually works.

788 stars Python ML FrameworksLLMOps · Eval
long-range-arena
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

Long Range Arena (LRA) is a benchmark suite that pits efficient Transformer variants against each other on six tasks—ranging from hierarchical ListOps to pixel-level image classification—while measuring accuracy, speed, and memory. It ships with JAX/Flax implementations of vanilla Transformers and a leaderboard of published results.

The interesting bit

The benchmark enforces an “apples-to-apples” mode: if you want on the official leaderboard, your model can’t exceed ~10% more parameters than the base Transformer, and you can’t touch embedding sizes or layer counts. It’s a deliberate attempt to stop the “we beat BERT on one task with 3× the compute” paper mill.

Key highlights

  • Six diverse tasks: ListOps, text classification, document retrieval, pixel-level images, Pathfinder, and Path-X (the last one breaks most models—every listed variant except external entries “FAIL” on it as of Nov 2020)
  • Includes implementations of Linformer, Performer, Reformer, BigBird, Longformer, and others in JAX/Flax
  • Two comparison modes: strict “apples-to-apples” for leaderboard inclusion, or “free-for-all” if you just want numbers for your paper
  • External submissions accepted but segregated; the authors verify all official results internally
  • Datasets available via Google Cloud Storage; some (like document retrieval) require manual assembly from original sources due to redistribution limits

Caveats

  • The V2 release added “xformer models” but the README still contains struck-through text about a planned update, suggesting the project may be only partially maintained
  • Path-X remains unsolved by all core benchmarked models; it’s unclear if this is a meaningful hardness signal or a dataset artifact
  • Document retrieval setup is awkward—you need to download raw AAN data yourself and match IDs

Verdict

Worth your time if you’re publishing (or evaluating) a new efficient attention mechanism and need credible, standardized comparisons. Skip it if you just want plug-and-play fast attention for a product—the code is research infrastructure, not a library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.