BioNeMo: NVIDIA’s recipe book for training biology transformers
Because training a 15B-parameter protein BERT on a laptop is a bad idea, BioNeMo wires TransformerEngine and Megatron-FSDP to biological data for GPU clusters.

What it does
BioNeMo is NVIDIA’s collection of training recipes and reusable Python libraries for digital biology. It bundles model implementations—protein BERTs such as ESM2, single-cell models like Geneformer, and general LLMs including Llama 3 and Qwen—with GPU-optimized data loaders and scaling primitives. The goal is to move biological foundation model training off a lone workstation and onto clusters without every lab rewriting the same FASTA I/O and mixed-precision boilerplate.
The interesting bit
The project treats molecular data as a first-class citizen in large-scale distributed training. Rather than slapping Hugging Face wrappers on biology models, it integrates NVIDIA TransformerEngine, Megatron-FSDP, and low-precision formats such as FP8 and NVFP4 directly into native PyTorch recipes, and even ships sparse-autoencoder dashboards for interpreting what codon models learn.
Key highlights
- Pre-built recipes for ESM2, CodonFM, Geneformer, Evo2, Llama 3, Mixtral, and Qwen with TransformerEngine acceleration.
- A public support matrix tracking which recipes are active versus work-in-progress across PyTorch, Hugging Face Accelerate, and Megatron-FSDP backends.
- Reusable sub-packages for biological I/O (
bionemo-noodlesfor FASTA), single-cell dataset loading (bionemo-scdl), and molecular co-design (bionemo-moco). - Low-precision training benchmarks included, such as ESM2 15B at up to 2,367 TFLOPS/GPU on NVIDIA B300.
- Sparse autoencoder tooling and interactive feature dashboards for model interpretability on biological sequences.
Caveats
- Several recipes and models are marked work-in-progress in the support matrix, including Geneformer, Evo2, and context parallelism for some architectures.
- The framework is clearly aimed at GPU clusters; the README warns that running recipes on a Colab T4 may be too slow or run out of memory.
Verdict
BioNeMo is for computational biology teams already running on NVIDIA GPU clusters who want a head start on training or fine-tuning foundation models. If your work stays on CPU or single-GPU notebooks, this is overkill and likely a resource headache.