Skip to content

A collection of dataset consists of a total of 8 English speech datasets for SER

License

Notifications You must be signed in to change notification settings

standing-o/Combined_Dataset_for_Speech_Emotion_Recognition

Repository files navigation

Combined Dataset for Speech Emotion Recognition (SER)

  • A collection of dataset consists of a total of 8 English speech emotion datasets.
  • This dataset will help you create a generalized deep learning model for SER.
  • Most of the data also includes text data for voice, which can be used for multimodal modeling.
  • Nov. 2, 2023 ~ Nov. 12, 2023

     

Requirements

  • Each dataset can be downloaded from the link below, and each data file must be located in the appropriate folder.
  • We use pandas profiling utilizing the simple EDA.
conda install -c conda-forge ydata-profiling

     

Collection of Dataset

  • A detailed description of each dataset is provided here.
    • This jupyter notebook generates a single data frame containing the entire data paths and features.

License: Open Database License, https://opendatacommons.org/licenses/odbl/1-0/

  • Number of Dataset: 7442
  • 48 male and 43 female actors between the ages of 20 and 74.
  • A variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).
  • Emotion 6 Classes: Anger, Disgust, Fear, Happy, Neutral, Sad
  • In addition, it contains Gender, Age and Emotion Level features.

Citation [1]: S. Poria, D. Hazarika, N. Majumder, G. Naik, R. Mihalcea, E. Cambria. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation. (2018)

Citation [2]: Chen, S.Y., Hsu, C.C., Kuo, C.C. and Ku, L.W. EmotionLines: An Emotion Corpus of Multi-Party Conversations. arXiv preprint arXiv:1802.08379 (2018).

  • Number of Dataset: Train, Test, Dev (We only used the train data.)
  • MELD has 1400+ dialogues and 13,000+ utterances from 'Friends.
  • Emotion 7 Classes: Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear
  • Sentiment: Positive, Negative and Neutral

The MLEnd datasets have been created by students at the School of Electronic Engineering and Computer Science, Queen Mary University of London.

  • 31 nationalities and 42 unique languages, 154 speakers
  • Each audio recording corresponds to one English numeral (from "zero" to "billion") that is read using different intonations
  • Number of Dataset: 32654
  • Emotion 4 Classes: Neutral, Bored, Excited and Question
  • It contains Nationality feature.

Citation: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.

  • 24 professional actors (12 female, 12 male), two lexically-matched statements in a neutral North American accent.
  • Number of Dataset: 1440
  • Emotion 7 Classes: Calm, Happy, Sad, Angry, Fearful, Surprise and Disgust
  • Emotional Intensity: Normal, Strong

License: Data files © Original Authors | Authors: Philip Jackson and Sanaul Haq

  • Four native English male speakers, aged from 27 to 31 years
  • A total of 120 utterances per speaker, 15 TIMIT sentences per emotion: 3 common, 2 emotion-specific and 10 generic sentences
  • Emotion 7 Classes: Anger, Disgust, Fear, Happiness, Sadness and Surprise

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), https://creativecommons.org/licenses/by-nc-nd/4.0/

  • 200 target words were spoken in the carrier phrase "Say the word _' by two female actresses (aged 26 and 64 years).
  • Emotion 7 Classes: Anger, Disgust, Fear, Happiness, Pleasant Surprise, Sadness, and Neutral
  • It contains Age and Gender features.

Citation: Kun Zhou, Berrak Sisman, Rui Liu and Haizhou Li, "Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset" ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  • 350 parallel utterances spoken by 10 native English and 10 native Mandarin speakers.
    • We only used English.
  • Emotion 5 Classes: Neutral, Happy, Angry, Sad and Surprise

License: CC0: Public Domain, https://creativecommons.org/publicdomain/zero/1.0/

Citation: Jesin James, Li Tian, Catherine Watson, "An Open Source Emotional Speech Corpus for Human Robot Interaction Applications", in Proc. Interspeech, 2018.

  • Emotional speech corpus with balanced New Zealand English vowels, encompassing 5 primary and 5 secondary emotions.
  • Emotion 10 Classes: Angry, Anxious, Apologetic, Assertive, Concerned, Encouraging, Excited, Happy, Neutral and Sad
  • In addition, it contains a Gender feature.

     

Project Structure

├── dataset                      # Not contained in this repository
│   ├── crema-d                  # CREMA-D dataset
│   ├── meld                     # MELD dataset
│   ├── mlend                    # MLEnd dataset
│   ├── ravdess                  # RAVDESS dataset
│   ├── savee                    # SAVEE dataset
│   ├── tess                     # TESS dataset
│   ├── esd                      # Emotional Voice Conversion: Theory, Databases and ESD dataset
│   └── jl-corpus                # JL Corpus Dataset
├── MakeEngSpeechDataset.ipynb   # Make a Dataframe Including 8 Dataset
├── SpeechEDA.ipynb              # EDA using Pandas Profiling
├── speech_dataset.csv           # Main Dataset
└── report.html                  # EDA Report

About

A collection of dataset consists of a total of 8 English speech datasets for SER

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published