← all repositories
maelfabien/Multimodal-Emotion-Recognition

Job-interview emotion decoder reads your face, voice, and words

A French Employment Agency partnership produced a Flask app that scores candidate emotions from video, audio, and text simultaneously.

1.1k stars Jupyter Notebook Domain AppsComputer Vision
Multimodal-Emotion-Recognition
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This project is a real-time web app that ingests job-candidate interviews through three channels—facial video, vocal audio, and written text—and runs each through a separate deep-learning pipeline. The results are fused into an ensemble prediction served via a Flask interface. It was built in partnership with the French Employment Agency, which tells you exactly where they expected the most value: high-volume hiring.

The interesting bit

The audio pipeline uses a Time-Distributed CNN that slides a rolling window across log-mel-spectrograms, feeding local features into LSTMs for temporal context. It is the kind of architecture that sounds over-engineered until you remember recruiters do not review spectrograms by hand.

Key highlights

  • Three independent modalities: text (CNN-LSTM with Word2Vec embeddings), audio (time-distributed CNN-LSTM on spectrograms), video (FER2013-trained facial model)
  • Pre-trained weights and processed datasets provided via Google Drive for each modality
  • Colab notebooks available for audio and video pipelines
  • Flask web app with live webcam or file upload support
  • Academic paper linked (Overleaf, read-only)

Caveats

  • The README is enthusiastic but thin on ensemble mechanics and actual accuracy numbers; the “Accuracy_Speech” chart exists but no value is quoted
  • FER2013 is explicitly called “quite challenging” with empty and mislabeled images
  • The web app code lives in a separate repository, not this one

Verdict

Worth a look if you are building multimodal classifiers and want a worked example of late-fusion architecture with downloadable weights. Skip it if you need production-grade inference code or reproducible benchmarks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.