Is lip-reading-deeplearning open source?

Yes — astorfi/lip-reading-deeplearning is open source, released under the Apache-2.0 license.

What language is lip-reading-deeplearning written in?

astorfi/lip-reading-deeplearning is primarily written in Python.

How popular is lip-reading-deeplearning?

astorfi/lip-reading-deeplearning has 1.9k stars on GitHub.

Where can I find lip-reading-deeplearning?

astorfi/lip-reading-deeplearning is on GitHub at https://github.com/astorfi/lip-reading-deeplearning.

← all repositories

astorfi/lip-reading-deeplearning

Teaching neural networks to read lips by listening

A 2017 TensorFlow implementation that matches audio and video streams using coupled 3D CNNs, with lip reading as the demo application.

★1.9k stars Python Computer Vision Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo implements a coupled 3D convolutional neural network that learns whether a 0.3-second audio clip matches a 0.3-second video of lip motion. The visual stream processes 9 grayscale mouth-region frames (9×60×100); the audio stream processes a spectrogram cube of MFEC features and derivatives (15×40×3). A lip-tracking utility using dlib extracts mouth regions from arbitrary videos as a preprocessing step.

The interesting bit

The architecture treats both modalities as spatio-temporal volumes: 3D convolutions run across stacked video frames and stacked audio spectrogram windows. The paper’s claimed edge is “online pair selection” for training — though the code only implements a simpler hard-threshold version, which the README discloses upfront.

Key highlights

TensorFlow 1.x-era implementation of the IEEE Access 2017 paper by Torfi et al.
Includes VisualizeLip.py for dlib-based mouth extraction and bounding-box visualization
Audio features rely on the author’s companion SpeechPy package
Processing pipeline standardizes to 30 fps, extracts audio via FFmpeg
Training and evaluation scripts are thin wrappers: train.py and test.py

Caveats

The input pipeline is entirely BYO: you must prepare your own dataset and feature extraction; the code assumes “utterance-based extracted features” are already sitting there
The adaptive pair-selection method from the paper is not implemented — only basic hard thresholding
README is vague on dataset specifics, hardware requirements, and how to actually wire your data into the network

Verdict

Worth a look if you’re reproducing classic audio-visual matching baselines or studying 3D CNN architectures for multimodal fusion. Skip it if you need a batteries-included lip-reading toolkit or modern PyTorch code — this is research scaffolding from the TF 1.x era, not a product.

Frequently asked

What is astorfi/lip-reading-deeplearning?: A 2017 TensorFlow implementation that matches audio and video streams using coupled 3D CNNs, with lip reading as the demo application.
Is lip-reading-deeplearning open source?: Yes — astorfi/lip-reading-deeplearning is open source, released under the Apache-2.0 license.
What language is lip-reading-deeplearning written in?: astorfi/lip-reading-deeplearning is primarily written in Python.
How popular is lip-reading-deeplearning?: astorfi/lip-reading-deeplearning has 1.9k stars on GitHub.
Where can I find lip-reading-deeplearning?: astorfi/lip-reading-deeplearning is on GitHub at https://github.com/astorfi/lip-reading-deeplearning.