facebookresearch/lingua
A minimal, fast PyTorch-based library for training and inference of large language models designed for research experimentation.

Velocity · 7d
+7.9
★ / day
Trend
→steady
star history
Meta Lingua is a lean research codebase for LLM development that enables end-to-end training, inference, and evaluation of language models. It provides easy-to-modify PyTorch components allowing researchers to experiment with new architectures, losses, and data pipelines. The library includes tools for data downloading and preparation from sources like FineWeb and DCLM datasets, and supports tokenizer setup.