The hottest AI & LLM repositories on GitHub — measured, ranked, and explained.

← all repositories

Zjh-819/LLMDataHub

A curated hub of high-quality datasets for LLM instruction finetuning and training.

★3.4k stars Data Tooling Language Models Learning

View on GitHub ↗

LLMDataHub

Not currently ranked — collecting fresh signals.

star history

LLMDataHub aggregates open-source training corpora for large language models, covering alignment datasets, domain-specific datasets, pretraining corpora, and multimodal datasets. It provides links, size, language, usage guidance, and descriptions for each dataset to help researchers and developers train LLMs like Alpaca, Vicuna, and ChatGLM.

Frequently asked

What is Zjh-819/LLMDataHub?: A curated hub of high-quality datasets for LLM instruction finetuning and training.
Is LLMDataHub open source?: Yes — Zjh-819/LLMDataHub is open source, released under the MIT license.
How popular is LLMDataHub?: Zjh-819/LLMDataHub has 3.4k stars on GitHub.
Where can I find LLMDataHub?: Zjh-819/LLMDataHub is on GitHub at https://github.com/Zjh-819/LLMDataHub.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.