← all repositories
x4nth055/emotion-recognition-using-speech

A kitchen-sink approach to feeling out speech

This repo wraps sklearn classifiers and Keras RNNs into a single toolkit for detecting emotions from audio, complete with four baked-in datasets and a microphone test script.

687 stars Python ML FrameworksOther AI
emotion-recognition-using-speech
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

The project trains machine learning models to classify emotions from speech audio. It bundles four datasets (RAVDESS, TESS, EMO-DB, plus a custom noisy set), extracts standard audio features via librosa (MFCC, chromagram, mel spectrogram, contrast, tonnetz), and wraps both sklearn classifiers and Keras RNNs behind a unified Python interface. A test.py script lets you speak into a microphone for live prediction.

The interesting bit

The author did the tedious data-wrangling so you don’t have to: datasets are pre-formatted, features are pre-extracted on first run, and grid-search results are pickled for reuse. The “glue code” here is the value — it lowers the barrier from “I want to try speech emotion recognition” to actually running rec.train() in a few lines.

Key highlights

  • Supports 9 emotion labels across 4 datasets, with optional class balancing
  • 8 sklearn classifiers + RNNs (LSTM/GRU via Keras), plus regressor variants for 3- and 5-emotion subsets
  • Automatic audio conversion to 16000Hz mono via ffmpeg if your files don’t match
  • Pre-computed grid search results in grid/ folder; histogram plotting for model comparison
  • Live microphone testing with python test.py

Caveats

  • The README shows train scores of 1.0 against test scores of ~0.81–0.89, suggesting overfitting is present and unaddressed
  • The “custom” dataset is explicitly described as “unbalanced noisy” — quality unclear
  • librosa and scikit-learn versions are pinned to releases from 2018–2021; compatibility with current Python/package versions is untested

Verdict

Good for students or researchers who need a working baseline fast and don’t mind dated dependencies. Skip it if you need production-grade robustness or modern deep-learning architectures; the RNN implementation is basic (128-unit LSTM stacks) and the evaluation metrics are thin.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.