Yes — beir-cellar/beir is open source, released under the Apache-2.0 license.

What language is beir written in?

beir-cellar/beir is primarily written in Python.

beir-cellar/beir has 2.2k stars on GitHub.

Where can I find beir?

beir-cellar/beir is on GitHub at https://github.com/beir-cellar/beir.

beir-cellar/beir

One benchmark to stress-test them all: retrieval models meet 17 datasets

BEIR gives retrieval researchers a single Python framework to evaluate dense, sparse, lexical, and reranking models across diverse IR tasks without dataset wrangling.

★2.2k stars Python RAG · Search LLMOps · Eval Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

BEIR is a Python toolkit that bundles 17 preprocessed information-retrieval datasets and a common evaluation harness. You bring a model—Sentence-BERT, a HuggingFace encoder, even a LoRA-tuned vLLM instance or a Cohere API call—and BEIR handles downloading corpora, running retrieval, and computing NDCG, MAP, Recall, Precision, and MRR at standard cutoffs. It is essentially glue code, but glue code that saves you from writing the same evaluation boilerplate for the seventeenth time.

The interesting bit

The “heterogeneous” part is not marketing fluff. The datasets span scientific fact-checking, FAQ retrieval, bio-medical search, and web passage ranking, so a model that aces one domain can still embarrass itself on another. BEIR exposes that variance deliberately, making it harder to cherry-pick a leaderboard win.

Key highlights

17 benchmark datasets ready to download and load via GenericDataLoader
Supports lexical, dense, sparse, and reranking architectures in one framework
Built-in metrics: NDCG@k, MAP@k, Recall@k, Precision@k, and MRR for k up to 1000
Wrappers for SBERT, HuggingFace transformers (with Flash Attention 2), vLLM with LoRA, and third-party APIs like Cohere
Python 3.9+, pip-installable, with Colab notebooks and a Hugging Face hub presence

Caveats

The README is enthusiastic but thin on dataset documentation; you will need the wiki or the original papers to understand what each dataset actually measures
Some newer paths (vLLM, LoRA) require extra dependencies—peft, accelerate, vllm, faiss-cpu—that are not in the base install

Verdict

If you are building or comparing retrieval models and need a sanity check across domains, BEIR is the closest thing to a standard yardstick. If you only care about one narrow retrieval task, it is overkill—just use that task’s native scripts.

Frequently asked

What is beir-cellar/beir?: BEIR gives retrieval researchers a single Python framework to evaluate dense, sparse, lexical, and reranking models across diverse IR tasks without dataset wrangling.
Is beir open source?: Yes — beir-cellar/beir is open source, released under the Apache-2.0 license.
What language is beir written in?: beir-cellar/beir is primarily written in Python.
How popular is beir?: beir-cellar/beir has 2.2k stars on GitHub.
Where can I find beir?: beir-cellar/beir is on GitHub at https://github.com/beir-cellar/beir.