Is curator open source?

Yes — bespokelabsai/curator is open source, released under the Apache-2.0 license.

What language is curator written in?

bespokelabsai/curator is primarily written in Python.

How popular is curator?

bespokelabsai/curator has 1.7k stars on GitHub.

Where can I find curator?

bespokelabsai/curator is on GitHub at https://github.com/bespokelabsai/curator.

← all repositories

bespokelabsai/curator

Subclass an LLM, Get a Synthetic Data Factory

Bespoke Curator turns bulk LLM inference into repeatable, fault-tolerant Python pipelines for generating structured synthetic datasets.

★1.7k stars Python Data Tooling Language Models LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Bespoke Curator is a Python library for orchestrating large-scale synthetic data generation and structured extraction. You subclass curator.LLM, implement a prompt method and a parse method, and the library handles the rest: parallel execution across providers, response caching, retries, and fault recovery. It outputs HuggingFace-compatible datasets and supports chaining steps into larger pipelines.

The interesting bit

Instead of treating prompts as one-off scripts, Curator frames them as batch data jobs. It bakes in the unglamorous but essential infrastructure—async dispatch, structured output validation via Pydantic, and batch API integrations that the README claims can halve token costs—so you can focus on the data logic rather than the request plumbing.

Key highlights

Structured output parsing via Pydantic models, with first-class support for typed extraction.
Built-in resilience: caching, automatic retries, and fault recovery for long-running generation jobs.
Broad backend support through LiteLLM and vLLM, plus batch APIs for OpenAI, Anthropic, and Gemini.
Code execution backends (local, Ray, Docker, e2b) for running generated code inside the pipeline.
Proven track record: used to build public datasets like OpenThoughts2-1M and Bespoke-Stratos-17k.

Verdict

Data engineers and researchers who need to turn raw LLM outputs into curated, structured training corpora will find this saves them from writing yet another ad-hoc request loop. If you only need a single chat completion, it is overkill.

Frequently asked

What is bespokelabsai/curator?: Bespoke Curator turns bulk LLM inference into repeatable, fault-tolerant Python pipelines for generating structured synthetic datasets.
Is curator open source?: Yes — bespokelabsai/curator is open source, released under the Apache-2.0 license.
What language is curator written in?: bespokelabsai/curator is primarily written in Python.
How popular is curator?: bespokelabsai/curator has 1.7k stars on GitHub.
Where can I find curator?: bespokelabsai/curator is on GitHub at https://github.com/bespokelabsai/curator.