← all repositories

magpie-align/magpie

Research-grade pipeline for generating high-quality synthetic alignment data by prompting aligned LLMs with their native pre-query templates.

magpie
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

Magpie generates training data for LLM alignment and fine-tuning by exploiting the prompt templates of aligned LLMs to produce both user queries and model responses without manual annotation. It requires no seed questions or prompt engineering, making synthetic data generation more scalable. The project provides generated datasets (1M+ examples) from models like Llama-3.3, QwQ, and Skywork-o1, specifically targeting supervised fine-tuning workflows.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.