buriburisuri/speech-to-text-wavenet
A TensorFlow implementation adapting DeepMind's WaveNet generative audio model for end-to-end sentence-level English speech recognition.

Velocity · 7d
+1.1
★ / day
Trend
→steady
star history
This repository implements an end-to-end speech recognition system based on DeepMind’s WaveNet architecture. It uses dilated convolutions and Connectionist Temporal Classification (CTC) loss to train on raw audio features extracted as MFCC from the VCTK dataset. The model processes speech audio to produce text transcriptions without intermediate phoneme stages.