← all repositories

yaodongC/awesome-instruction-dataset

A curated list of open-source instruction-tuning and RLHF datasets for training text and multi-modal instruction-following LLMs.

1.1k stars Data ToolingLearning
awesome-instruction-dataset
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

This repository aggregates publicly available datasets used to fine-tune and train instruction-following large language models. It categorizes datasets by modality (text, visual), generation method (human-generated, self-instruct, mixed), language, and task type. The collection includes resources for training models such as Alpaca, LLaMA, ChatGPT, and GPT-4, as well as red-teaming and RLHF datasets used in reinforcement learning pipelines for LLM alignment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.