← all repositories
hthuwal/sign-language-gesture-recognition

Teaching CNNs to read sign language, one frame at a time

A classic two-stage pipeline that extracts visual features with Inception v3, then lets an RNN figure out the temporal story.

538 stars Python Computer VisionML Frameworks
sign-language-gesture-recognition
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo implements a sign-language gesture recognizer that processes video sequences. It slices videos into frames, retrains Google’s Inception v3 on those frames, then feeds either softmax probabilities or raw pool-layer features into an RNN (LSTM) to classify the full gesture. The work is tied to a published paper on Argentinian Sign Language.

The interesting bit

The two-stage design is deliberately modular: you can swap the “understanding” part by choosing either the final classification layer or the pre-classification pool layer as your frame representation. It’s a snapshot of how temporal video understanding was commonly tackled before end-to-end transformers took over.

Key highlights

  • Frame extraction with optional hand-segmentation preprocessing (dataset-specific, but removable)
  • Retrains Inception v3 via TensorFlow Hub’s standard retrain script
  • Two intermediate representations: 2048-dim pool vectors or n-class softmax distributions
  • RNN training/evaluation scripts with pickled feature dumps as input
  • Tested on a dummy 3-class dataset in Google Colab

Caveats

  • Dependencies include tflearn, which is effectively unmaintained
  • OpenCV must be built from source; pip’s version lacks video support
  • The hand-segmentation step is hardcoded for the Argentinian dataset and needs manual removal for other data

Verdict

Worth a look if you’re studying classical video-classification pipelines or need a reproducible baseline for sign-language research. Skip it if you want a modern, end-to-end trainable model you can drop your own data into without surgery.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.