← all repositories
facebookresearch/LAMA

How much does BERT actually know? This probe finds out.

LAMA is a standardized benchmark for extracting and comparing factual knowledge across pretrained language models.

1.4k stars Python Language ModelsLLMOps · Eval
LAMA
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does LAMA provides a consistent interface to test whether pretrained language models—BERT, GPT, RoBERTa, ELMo, Transformer-XL—contain factual and commonsense knowledge. It uses cloze-style probes (fill-in-the-[MASK]) to see if a model can complete statements like “The theory of relativity was developed by [MASK].” The package also lets you encode sentences to embeddings and compare model outputs side-by-side.

The interesting bit The project treats language models as implicit knowledge bases and asks: can we query them like one? It ships with a unified vocabulary intersection across all supported models, so comparisons are less confounded by tokenization differences. The “Negated-LAMA” variant even tests whether models handle negation—spoiler, often poorly.

Key highlights

  • Supports five major model families through a single CLI interface
  • Includes pre-built datasets and a ~55 GB model download script
  • Can encode sentences for downstream tasks or run interactive [MASK] completion
  • Provides unified cased/lowercased vocabularies for fair cross-model comparison
  • Extensible to negated probes and LAMA-UHN variants for harder evaluation

Caveats

  • Requires significant disk space (~55 GB) and manual model setup
  • Single-token [MASK] gaps only; multi-word answers are out of scope
  • Code targets Python 3.7 and older model versions; may need tweaks for current transformers
  • CC-BY-NC 4.0 license restricts commercial use

Verdict Researchers studying knowledge extraction or model comparison should grab this. If you just need a quick BERT inference snippet, huggingface pipelines are lighter.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.