xuanjihe/speech-emotion-recognition
TensorFlow implementation of 3-D Convolutional Recurrent Neural Networks for classifying emotions in speech audio on the IEMOCAP dataset.

This repository provides a neural network model for speech emotion recognition (SER), classifying utterances into emotional categories using convolutional and recurrent layers combined with attention-based pooling. It processes audio features (mel-spectrograms) through 3-D convolution and bidirectional RNN layers, then aggregates frame-level features using max/mean/attention pooling to produce utterance-level predictions. The model was developed for the IEMOCAP benchmark dataset and achieved recognition rates reported in a 2018 IEEE Signal Processing Letters paper.