Is bluebert open source?

Yes — ncbi-nlp/bluebert is an open-source project tracked on heatdrop.

What language is bluebert written in?

ncbi-nlp/bluebert is primarily written in Python.

How popular is bluebert?

ncbi-nlp/bluebert has 597 stars on GitHub.

Where can I find bluebert?

ncbi-nlp/bluebert is on GitHub at https://github.com/ncbi-nlp/bluebert.

← all repositories

ncbi-nlp/bluebert

BERT went to med school and actually paid attention

A BERT variant pre-trained on 4 billion words of PubMed abstracts and clinical notes, because general-domain language models struggle with medical jargon.

★597 stars Python Language Models Domain Apps

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

BlueBERT is a BERT checkpoint continued-pretrained on biomedical text: ~4 billion words from PubMed abstracts plus MIMIC-III clinical notes. The NCBI team offers base and large variants, with or without the clinical data mix, plus fine-tuning scripts for five NLP tasks—sentence similarity, NER, relation extraction, document classification, and inference.

The interesting bit

The real value isn’t the architecture; it’s the data curation and the explicit comparison. The authors preprocessed PubMed with surgical modesty—lowercasing, ASCII filtering, NLTK tokenization—then published both the corpus and the exact pretraining commands. You can reproduce from scratch or grab the HuggingFace weights.

Key highlights

Four model variants: Base/Large × PubMed-only/PubMed+MIMIC-III
Weights hosted on both NCBI FTP and HuggingFace Hub
Fine-tuning scripts included for 5 biomedical NLP tasks (STS, NER, RE, classification, NLI)
Preprocessed ~4B-word PubMed corpus available for download
Evaluated on 10 benchmarking datasets against ELMo and general BERT

Caveats

Code appears to be thin wrappers around Google’s original BERT scripts; not a standalone framework
Last meaningful update was 2020 (HuggingFace migration); repository looks dormant
Clinical use requires the usual MIMIC-III credentialing dance

Verdict

Worth a look if you’re doing biomedical NLP and need a battle-tested starting point. Skip it if you want a modern, maintained library—this is essentially a model zoo with glue scripts, not a framework.

Frequently asked

What is ncbi-nlp/bluebert?: A BERT variant pre-trained on 4 billion words of PubMed abstracts and clinical notes, because general-domain language models struggle with medical jargon.
Is bluebert open source?: Yes — ncbi-nlp/bluebert is an open-source project tracked on heatdrop.
What language is bluebert written in?: ncbi-nlp/bluebert is primarily written in Python.
How popular is bluebert?: ncbi-nlp/bluebert has 597 stars on GitHub.
Where can I find bluebert?: ncbi-nlp/bluebert is on GitHub at https://github.com/ncbi-nlp/bluebert.