explosion/curated-transformers
A PyTorch library providing state-of-the-art transformer models (BERT, RoBERTa, Llama, Falcon, etc.) composed from reusable building blocks.

Curated Transformers is a PyTorch library that implements encoder and decoder transformer architectures including major LLMs like Falcon, Llama, GPT-NeoX, and Dolly v2. Models are built from reusable components, enabling features like 4/8-bit quantization via bitsandbytes and meta device optimization to work across all supported architectures. The library provides consistent type annotations and targets production use, serving as the transformer implementation for spaCy 3.7.