← all repositories

jianzhnie/awesome-instruction-datasets

A curated list of instruction datasets and RLHF datasets used to fine-tune and train large language models.

awesome-instruction-datasets
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

This repository aggregates publicly available instruction datasets and human preference datasets used to train and fine-tune LLMs. It covers datasets like Alpaca, GuanacoDataset, OpenAssistant OASST1, Anthropic HH-RLHF, and others. The collection includes both instruction-tuning datasets for supervised fine-tuning and preference datasets for RLHF training of language models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.