A CNN that guesses if you're angry—and whether you're male
A student-style Keras notebook that classifies emotion and gender from short audio clips using LibROSA and a small CNN.

What it does
This Jupyter Notebook project trains a convolutional neural network to classify ten categories: five emotions (angry, calm, fearful, happy, sad) split by gender. It uses LibROSA to extract features from 3-second audio clips, doubling the sampling rate to squeeze out more data when training examples are scarce. The author tested MLPs and LSTMs first, found them wanting, and settled on a CNN that reportedly hits “a little more than 70%” validation accuracy.
The interesting bit
The README is refreshingly honest about the slog: “lot of trail and error methods, tuning etc.” It also makes a slightly unusual claim—that the model distinguishes male from female voices with “100% accuracy”—while being upfront that emotion detection itself is fuzzier. The live test with “This coffee sucks” in an angry tone is a nice touch of real-world validation, though it’s a single anecdote.
Key highlights
- Trains on RAVDESS (~1,500 clips, 24 actors) and SAVEE (~500 clips, 4 male actors)
- Uses LibROSA for feature extraction; clips padded or truncated to 3 seconds
- CNN chosen after MLP and LSTM “under-performed with very low accuracies”
- Outputs 10 classes: female/male × angry, calm, fearful, happy, sad
- Includes a live-voice test with screenshots of predicted vs. actual
Caveats
- The “100%” gender accuracy and “more than 70%” emotion accuracy come from the README without published test-set methodology or confidence intervals
- SAVEE dataset is all-male, so gender balance across emotions is uneven
- No code or model weights are visible in the README; it’s essentially a write-up with images
Verdict
Worth a skim if you’re starting out in audio ML and want to see a worked Keras example with real datasets. Skip it if you need production-ready code or reproducible benchmarks; this is a learning project, not a library.