Teaching Alexa sign language with a webcam and TensorFlow.js
A browser-based hack that lets you train custom gestures to control an Amazon Echo—no cloud ML required.

What it does
This project runs entirely in your browser. You hold up gestures to your webcam, train them on the fly as voice commands (“lights on,” “what’s the weather”), and the page translates those gestures into spoken audio that an Echo within earshot can hear and act on. It uses TensorFlow.js and a K-nearest-neighbors image classifier—no model uploading, no API keys, no backend.
The interesting bit
The clever part isn’t the model (it’s a basic KNN classifier from Google’s deeplearn.js era). It’s the interaction loop: webcam → browser → synthesized speech → Echo’s microphone. The project hijacks the Echo’s existing audio input rather than touching Amazon’s APIs at all. That makes it a hack, not an integration—and a surprisingly effective one for a proof of concept.
Key highlights
- Runs 100% client-side in Chrome/Firefox; webcam and mic permissions are all you need
- Train custom gestures and labels on the fly—no pre-baked sign language dataset
- Uses the older deeplearn.js KNN image classifier (the README notes a newer TensorFlow.js version exists)
- Live demo hosted on GitHub Pages; local dev server via budo on port 9966
- Garnered BBC, Verge, Mashable, and Fast Company coverage in 2018
Caveats
- The “Coming Soon” TensorFlow blog post link is still empty; project appears dormant since 2018
- Requires an actual Echo nearby; without one, you’re just making your computer talk to itself
- KNN classifiers are memory-hungry and don’t generalize well to new lighting/angles
Verdict
Grab this if you want a tangible, 20-minute demo of in-browser ML with a satisfying physical payoff. Skip it if you need production sign language recognition—this is gesture classification with a speech proxy, not true ASL understanding.