← all repositories

double22a/speech_dataset

A curated list of Chinese and English speech recognition datasets with durations and download links.

460 stars Data Tooling
speech_dataset
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

This repository aggregates and documents publicly available speech datasets for automatic speech recognition research and development. It catalogs datasets across multiple languages including Mandarin Chinese and English, with metadata such as duration in hours and source URLs. The listed datasets include well-known resources like LibriSpeech, Common Voice, Aishell, and WenetSpeech, serving as a reference index for speech ML practitioners.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.