NVIDIA-BioNeMo/bionemo-recipes

BioNeMo: NVIDIA’s recipe book for training biology transformers

Because training a 15B-parameter protein BERT on a laptop is a bad idea, BioNeMo wires TransformerEngine and Megatron-FSDP to biological data for GPU clusters.

★771 stars Python Domain Apps ML Frameworks Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+0.3

★ / day

Trend

→steady

star history

What it does

BioNeMo is NVIDIA’s collection of training recipes and reusable Python libraries for digital biology. It bundles model implementations—protein BERTs such as ESM2, single-cell models like Geneformer, and general LLMs including Llama 3 and Qwen—with GPU-optimized data loaders and scaling primitives. The goal is to move biological foundation model training off a lone workstation and onto clusters without every lab rewriting the same FASTA I/O and mixed-precision boilerplate.

The interesting bit

The project treats molecular data as a first-class citizen in large-scale distributed training. Rather than slapping Hugging Face wrappers on biology models, it integrates NVIDIA TransformerEngine, Megatron-FSDP, and low-precision formats such as FP8 and NVFP4 directly into native PyTorch recipes, and even ships sparse-autoencoder dashboards for interpreting what codon models learn.

Key highlights

Pre-built recipes for ESM2, CodonFM, Geneformer, Evo2, Llama 3, Mixtral, and Qwen with TransformerEngine acceleration.
A public support matrix tracking which recipes are active versus work-in-progress across PyTorch, Hugging Face Accelerate, and Megatron-FSDP backends.
Reusable sub-packages for biological I/O (bionemo-noodles for FASTA), single-cell dataset loading (bionemo-scdl), and molecular co-design (bionemo-moco).
Low-precision training benchmarks included, such as ESM2 15B at up to 2,367 TFLOPS/GPU on NVIDIA B300.
Sparse autoencoder tooling and interactive feature dashboards for model interpretability on biological sequences.

Caveats

Several recipes and models are marked work-in-progress in the support matrix, including Geneformer, Evo2, and context parallelism for some architectures.
The framework is clearly aimed at GPU clusters; the README warns that running recipes on a Colab T4 may be too slow or run out of memory.

Verdict

BioNeMo is for computational biology teams already running on NVIDIA GPU clusters who want a head start on training or fine-tuning foundation models. If your work stays on CPU or single-GPU notebooks, this is overkill and likely a resource headache.