← all repositories

Zjh-819/LLMDataHub

A curated hub of high-quality datasets for LLM instruction finetuning and training.

LLMDataHub
Velocity · 7d
+2.9
★ / day
Trend
steady
star history

LLMDataHub aggregates open-source training corpora for large language models, covering alignment datasets, domain-specific datasets, pretraining corpora, and multimodal datasets. It provides links, size, language, usage guidance, and descriptions for each dataset to help researchers and developers train LLMs like Alpaca, Vicuna, and ChatGLM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.