bespokelabsai/curator
Python library for bulk inference and scalable synthetic data curation for LLM post-training.

Velocity · 7d
+2.9
★ / day
Trend
→steady
star history
The repository provides a framework for generating and curating synthetic datasets used in post-training language model pipelines. It supports bulk inference workflows for scalable data extraction and structured dataset generation, including tools for instruction-tuning and integration with fine-tuning frameworks like LoRA.