amanvirparhar/chaplin
Real-time silent speech recognition tool that reads lips via webcam and converts them to text, using an Auto-AVSR visual speech recognition model with LLM post-processing.

Chaplin is a visual speech recognition system that captures video from a webcam, processes lip movements using the Auto-AVSR model trained on the Lip Reading Sentences 3 dataset, and converts silent mouthing into text. The raw VSR output is then corrected and typed at the cursor using a local LLM (qwen3:4b via ollama). The system runs entirely locally with no cloud dependencies, requiring users to press a key to toggle recording while mouthing words.