chaoswork/sft_datasets
A curated repository of open-source datasets for supervised fine-tuning of large language models.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
This repository organizes and catalogs open-source SFT datasets used for fine-tuning large language models, specifically focusing on Chinese language data. It includes diverse datasets for instruction following, mathematical reasoning, dialogue generation, and multi-task NLP. Each entry documents the dataset size, language, task type, generation method, and download links to sources like Hugging Face.