Emotion Recognition Using MFCC, Spectrograms and Deep Neural Networks
Keywords:
Deep Learning, Emotion Recognition, Feature Extraction, Spectrograms, Neural Networks.Abstract
Speech recognition technologies are essential modern tools, with various systems differing in feature extraction and classification methods. This research examines multiple feature extraction approaches, highlighting the potential benefits of combining them to improve accuracy.
The proposed method was developed in three stages, each using a distinct method. The first stage employed MFCC (Mel-Frequency Cepstral Coefficients), achieving 78.62% accuracy. However, MFCC alone may lose important temporal and visual cues by segmenting signals into short frames, limiting emotion detection.
In the second stage, spectrograms were used, enhancing emotion recognition and achieving 93.20% accuracy by preserving energy distribution across frequencies. The third stage applied Feature-Level Fusion, combining MFCC and spectrogram outputs. This hybrid model, evaluated using a Random Forest classifier, reached a 97% accuracy rate, with F1-score, Precision, and Recall also at 97%.
The results show that fusing acoustic and visual representations significantly improves performance compared to individual models. Our proposed approach demonstrates superior effectiveness in emotion recognition from speech.