← all repositories

chaoswork/sft_datasets

A curated repository of open-source datasets for supervised fine-tuning of large language models.

sft_datasets
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

This repository organizes and catalogs open-source SFT datasets used for fine-tuning large language models, specifically focusing on Chinese language data. It includes diverse datasets for instruction following, mathematical reasoning, dialogue generation, and multi-task NLP. Each entry documents the dataset size, language, task type, generation method, and download links to sources like Hugging Face.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.