When more features hurt: a FER2013 reality check
A straightforward CNN benchmark that tests whether throwing HOG and face landmarks at a neural net actually helps—or backfires.

What it does
Trains CNNs to classify facial expressions from the FER2013 dataset, with a twist: it systematically compares raw pixels against raw pixels plus dlib face landmarks, HOG features, and sliding-window HOG. There’s also real-time webcam prediction via OpenCV if you train a model first.
The interesting bit
The author actually ran the ablation study most people skip. Adding HOG decreased accuracy for the shallower Model A—possible overfitting, or the network failing to correlate the extra information. Batch normalization, meanwhile, delivered up to a 50% relative improvement on Model B. The best configuration (Model B with landmarks + HOG + sliding window) hit 75.1% on 5 emotions, still short of the 75.2% literature benchmark.
Key highlights
- Two CNN architectures tested: a simpler Model A and a deeper-convolution Model B
- Optional SVM baseline included (performs worse, as expected)
- Full pipeline: FER2013 preprocessing, hyperparameter optimization with Hyperopt, image/video inference
- Code supports Python 2.7 and 3.6; dependencies include TensorFlow, TFLearn, dlib, OpenCV 3
- Training on all 7 emotions drops accuracy to 61.4%—the dataset is genuinely messy
Caveats
- FER2013 is unaligned, incorrectly labeled, and contains non-face samples; the author documents this honestly
- Requires manual dataset download and path configuration in
parameters.py - Real-time inference needs a trained model file; no pretrained weights are provided in the README
Verdict
Worth a look if you’re building a FER baseline and want evidence that feature engineering still matters—or doesn’t. Skip if you need a drop-in, pretrained emotion classifier; this is a training-and-experimentation repo, not a product.