← all repositories

niderhoff/nlp-datasets

An alphabetically organized collection of public domain and freely available text datasets for NLP research and model training.

nlp-datasets
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

This repository provides an alphabetical list of free text datasets intended for Natural Language Processing tasks. It covers diverse sources including web archives, blog posts, product reviews, academic papers, email corpora, and conversational data spanning English and multiple languages. The collection serves as a reference for researchers and developers seeking training data for language models and NLP applications.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.