Speech Emotion Classification

language

license

library_name

Speech Emotion Classification

Detect emotions from speech using advanced deep learning models

🎯 Overview

This repository contains a sophisticated deep learning model for speech emotion classification. The model is designed to detect and classify emotions from audio recordings with high accuracy using advanced neural network architectures. It combines acoustic features from both Mel-frequency cepstral coefficients (MFCCs) and mel-spectrograms to analyze emotional content in speech.

🌟 Key Features

Multi-modal Architecture: Combines CNN and MLP branches for comprehensive feature analysis
Real-time Processing: Capable of processing and analyzing speech in real-time
High Accuracy: State-of-the-art performance on emotion classification tasks
Cross-platform Compatibility: Runs seamlessly on Windows, macOS, and Linux
Hugging Face Integration: Easy model sharing and deployment via Hugging Face Hub

📊 Dataset

The model was trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains high-quality recordings of professional actors expressing different emotions. The dataset includes 8 distinct emotions:

😌 Neutral: Emotionless speech
😌 Calm: Calm and relaxed emotion
😊 Happy: Joyful and cheerful emotion
😢 Sad: Melancholic and sorrowful emotion
😡 Angry: Irritated and mad emotion
😱 Fearful: Scared and apprehensive emotion
😤 Disgust: Revolted and repulsed emotion
😮 Surprised: Astonished and amazed emotion

📈 Performance Metrics

Metric	Value
Test Accuracy	~42.13%
Precision (weighted)	~72.53%
Recall (weighted)	~42.13%
F1-Score (weighted)	~40.90%

🛠️ Installation

Prerequisites

Python 3.7 or higher
pip package manager

Setup

Clone the repository:

git clone https://github.com/your-username/speech_emotion_classification.git
cd speech_emotion_classification

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies:

pip install -r requirements.txt

Or install the dependencies manually:

pip install tensorflow numpy librosa scikit-learn huggingface_hub pandas matplotlib seaborn

🚀 Usage

1. Load and Use the Model

import librosa
import numpy as np
from tensorflow import keras

# Load the pre-trained model
model = keras.models.load_model('./path/to/model.keras')

# Load an audio file
audio_path = 'path/to/audio.wav'
y, sr = librosa.load(audio_path, sr=None)

# Extract features
mfcc_features = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
spectrogram_features = librosa.feature.melspectrogram(y=y, sr=sr)

# Normalize and reshape features according to your preprocessing pipeline
# (Implementation depends on how the model was trained)

# Make prediction
# For multi-modal models, pass both feature arrays: [mfcc_features_reshaped, spec_features_reshaped]
predictions = model.predict([mfcc_features_reshaped, spec_features_reshaped])

# Get emotion with highest probability
emotion_labels = ['neutral', 'calm', 'happy', 'sad', 'angry', 'fearful', 'disgust', 'surprised']
predicted_emotion = emotion_labels[np.argmax(predictions)]

print(f"Predicted emotion: {predicted_emotion}")

2. Train Your Own Model

python auto_train.py

3. Test the Model

python test_prediction_pipeline.py

🏗️ Architecture

The model uses a sophisticated multi-modal architecture:

MFCC Branch: Processes Mel-frequency cepstral coefficients using dense neural network layers
Spectrogram Branch: Processes mel-spectrogram features using convolutional layers
Fusion Layer: Combines both feature representations before final classification
Output Layer: Softmax layer for emotion classification across 8 emotional states

📁 Project Structure

speech_emotion_classification/
├── app.py                 # Streamlit web application
├── auto_train.py          # Automated training script
├── debug_labels.py        # Label debugging utilities
├── driver.py              # Main execution script
├── push_to_hub.py         # Hugging Face model upload script
├── split_model.py         # Model splitting utilities
├── test_*.py              # Test files
├── requirements.txt       # Project dependencies
├── README.md              # This file
└── ...

🧪 Evaluation

To evaluate the model on custom audio files:

python test_prediction_pipeline.py

This will run the model on the test dataset and provide detailed performance metrics.

🤗 Hugging Face Integration

The model can be easily shared and deployed using Hugging Face Hub:

python push_to_hub.py

🚧 Limitations

Performance may vary with different accents and languages
Audio quality (noise, clarity) can significantly affect accuracy
Emotions expressed in speech can be culturally dependent
Requires clear audio with minimal background noise for best results
Shorter audio clips (5-10 seconds) typically work better than longer recordings

🛡️ Ethical Considerations

This model should not be used to make critical decisions about individuals without their explicit consent
Results should be interpreted with caution and not treated as definitive psychological assessments
Consider privacy implications when processing audio of individuals
Use responsibly and ethically, with appropriate consent when analyzing personal speech
Be aware of potential bias in the training data and its impact on model predictions

🧪 Reproducibility

To ensure reproducible results:

Set random seeds:

import numpy as np
import tensorflow as tf
import random

np.random.seed(42)
tf.random.set_seed(42)
random.seed(42)

Use the same training data and preprocessing pipeline

🤝 Contributing

Contributions are welcome! Here's how you can contribute:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please make sure to update tests as appropriate and follow the existing code style.

Development Setup

git clone https://github.com/your-username/speech_emotion_classification.git
cd speech_emotion_classification
pip install -r requirements.txt
pip install -r requirements-dev.txt  # For development dependencies

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use this model in your research, please cite:

@software{speech_emotion_classification,
  author = {AI Research Team},
  title = {Speech Emotion Classification Model},
  year = {2025},
  url = {https://github.com/your-username/speech_emotion_classification}
}

🆘 Support

If you have any questions or encounter issues:

Check the Issues page
Open a new issue if your problem hasn't been addressed
For feature requests, please open an issue with the "enhancement" tag

🙏 Acknowledgments

The RAVDESS dataset creators for providing the high-quality emotional speech data
The TensorFlow team for providing an excellent deep learning framework
The Librosa team for audio processing capabilities
The Hugging Face team for model sharing capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.vscode		.vscode
__pycache__		__pycache__
demo_files		demo_files
features		features
logs/run_20250504_154714		logs/run_20250504_154714
models		models
results		results
samples		samples
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
auto_train.py		auto_train.py
debug_labels.py		debug_labels.py
driver.py		driver.py
requirements.txt		requirements.txt
setup.py		setup.py
speech_emotion.log		speech_emotion.log
split_model.py		split_model.py
test_deployment_readiness.py		test_deployment_readiness.py
test_prediction_pipeline.py		test_prediction_pipeline.py
test_split_model_loading.py		test_split_model_loading.py
test_split_models.py		test_split_models.py
test_system.py		test_system.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Classification

🎯 Overview

🌟 Key Features

📊 Dataset

📈 Performance Metrics

🛠️ Installation

Prerequisites

Setup

🚀 Usage

1. Load and Use the Model

2. Train Your Own Model

3. Test the Model

🏗️ Architecture

📁 Project Structure

🧪 Evaluation

🤗 Hugging Face Integration

🚧 Limitations

🛡️ Ethical Considerations

🧪 Reproducibility

🤝 Contributing

Development Setup

📄 License

📚 Citation

🆘 Support

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Rayyan9477/speech_emotion_classification

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Classification

🎯 Overview

🌟 Key Features

📊 Dataset

📈 Performance Metrics

🛠️ Installation

Prerequisites

Setup

🚀 Usage

1. Load and Use the Model

2. Train Your Own Model

3. Test the Model

🏗️ Architecture

📁 Project Structure

🧪 Evaluation

🤗 Hugging Face Integration

🚧 Limitations

🛡️ Ethical Considerations

🧪 Reproducibility

🤝 Contributing

Development Setup

📄 License

📚 Citation

🆘 Support

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages