Speech Emotion Recognition Using Machine Learning

Student: Timothy Ademola (Project, 2025)
Department of Computer Science
Federal University, Dutsin-Ma, Katsina State


Abstract

In recent years, the development of Automatic Speech Emotion Recognition (SER) systems has gained significant attention within the field of Human-Computer Interaction (HCI). These systems aim to bridge the gap between human emotional intelligence and machine understanding by enabling computers to recognize and interpret human emotions through speech signals. This project presents a robust and scalable SER model utilizing both traditional machine learning and deep learning techniques. Speech features such as Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Mel Energy Spectrum Dynamic Coefficients (MEDC) are extracted to capture emotional cues embedded in speech. The system is trained and evaluated using benchmark emotional speech datasets including the Berlin Emotional Speech Database, RAVDESS, IEMOCAP, and CREMA-D. Classifiers such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks are employed to distinguish between emotions like anger, happiness, sadness, fear, and neutrality. The proposed methodology achieves high classification accuracy, demonstrating its applicability in real-world scenarios such as call centers, virtual assistants, and mental health monitoring. This research contributes to the advancement of intelligent emotion-aware systems and enhances natural interaction between humans and machine

Keywords
- Speech Emotion Recognition - Machine Learning - SER - Emotion Detection - Speech Processing - Affective Computing - Artificial Intelligence - Deep Learning - Natural Language Processing - Human-Computer Interaction