← all repositories

buriburisuri/speech-to-text-wavenet

A TensorFlow implementation adapting DeepMind's WaveNet generative audio model for end-to-end sentence-level English speech recognition.

4k stars Python Image · Video · Audio
speech-to-text-wavenet
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

This repository implements an end-to-end speech recognition system based on DeepMind’s WaveNet architecture. It uses dilated convolutions and Connectionist Temporal Classification (CTC) loss to train on raw audio features extracted as MFCC from the VCTK dataset. The model processes speech audio to produce text transcriptions without intermediate phoneme stages.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.