← all repositories

teknium1/GPTeacher

A collection of modular training datasets generated by GPT-4 for fine-tuning instruction-following language models.

1.7k stars Python Data ToolingLanguage Models
GPTeacher
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

GPTeacher provides multiple instruction datasets designed for fine-tuning LLMs. The General-Instruct dataset contains ~20,000 examples including chain-of-thought reasoning and logic puzzles. The Roleplay-Instruct dataset (including a V2 version 2.5x larger) simulates diverse character interactions. The Code-Instruct dataset offers ~5,350 coding tasks across multiple languages. The Toolformer dataset prepares models for tool-use scenarios. All datasets follow Alpaca’s instruction-input-output format for compatibility with standard fine-tuning pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.