CoughAgainstCovid Official Dataset Repository

This is the official repository for accessing the data used by the project CoughAgainstCovid.

Data Description

Due to privacy constraints, we are not allowed to release the original raw audiowaveforms. Instead we release spectrograms, which are 2D time-frequency representations of the audio. To create the spectrograms from the raw audio waveform, we used the following transforms,

ToTensor
Resample (44.1khz to 16khz)
Background Noise (From ESC-50 Dataset)
Spectrogram (n_fft=512, win_length=512, hop_length=160)
MelScale (n_mels=64, f_min=125, f_max=7500)
AmplitudeToDB
ToNumpy

We share the 2D numpy arrays (npy files) for all the audio sounds collected.

Accessing/Downloading the Data

To download/access the spectrograms,

Fill the form and attach the signed doc file. You will receive a text file with the links in 10-15 mins.
Download the text file, rename it (to say links.txt) and save it at a location where you can access it.
Run prepare.py to download and unzip the data. (This script should take 1-2hrs depending upon the download speed)

# To run prepare.py and download, unzip the data at ~/data, (wget would be used to download)
python prepare.py -lp path_to_links_file -od ~/data

Args:
    links_path (lp): Path to the text file with the links to the zip files.
    output_dir (od): Path to the output directory. If it does not exist, it will be created

Running this script will download the data and unzip it to the output directory. The spectrograms should be present at output_dir/spectrograms/

Metadata Details

We provide a metadata file (attributes.csv) that contains supplementary information about the patients. The table contains the supplementary information present in the csv file.

Attribute	Column Name in CSV	Description
Patient Id	patient_id	Unique Identifier
Patient Age	enroll_patient_age	Continuous
Health Worker	enroll_health_worker	Discrete
Temperature	enroll_patient_temperature	Continuous
Travel History	enroll_travel_history	Discrete
Presence of Cough	enroll_cough	Discrete
Presence of Shortness of Breath	enroll_shortness_of_breath	Discrete
Presence of Fever	enroll_fever	Discrete
Days with Cough	enroll_days_with_cough	Continuous
Days with Shortness of Breath (SOB)	enroll_days_with_shortness_of_breath	Continuous
Days with Fever	enroll_days_with_fever	Continuous
Contact with Covid Confirmed Case	enroll_contact_with_confirmed_covid_case	Discrete
Comorbidities	enroll_comorbidities	Discrete
Patient Respiratory Rate	enroll_patient_respiratory_rate	Continuous
Smoking Habits	enroll_habits	Discrete
Cough Relief Measures	enroll_cough_relief_measures	Discrete
State	testresult_state	Discrete
Test Facility	testresult_facility	Discrete
Test Time	testresult_end_time	DateTime
Covid Result	testresult_covid_test_result	Discrete
Covid Test Type	testresult_diagnostics_test_type	Discrete
Audio Recording (aaaaaa sound)	aaaaa_recording	File Name
Audio Recording (oooooo sound)	ooooo_recording	File Name
Audio Recording (eeeeee sound)	eeeee_recording	File Name
Audio Recording (a sound)	a_sound	File Name
Audio Recording (e sound)	e_sound	File Name
Audio Recording (o sound)	o_sound	File Name
Audio Recording (Cough Sound 1)	cough_1	File Name
Audio Recording (Cough Sound 2)	cough_2	File Name
Audio Recording (Cough Sound 3)	cough_3	File Name
Audio Recording (Breathing)	breathing	File Name
Audio Recording (1 to 10 Counting)	audio_1_to_10	File Name
Audio Recording (Room)	room_sound	File Name
Audio Recording (Room Recording)	room_recording	File Name

While we collect cough sounds for all the 7169 patients, we collect some outher sounds as well. The audio recording for them would exist only if their filename exists in this metadata file.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE.md		LICENSE.md
README.md		README.md
attributes.csv		attributes.csv
prepare.py		prepare.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoughAgainstCovid Official Dataset Repository

Data Description

Accessing/Downloading the Data

Metadata Details

Dataset Paper will be Released Soon.

About

Releases

Packages

Contributors 2

Languages

License

WadhwaniAI/cough-against-covid-data

Folders and files

Latest commit

History

Repository files navigation

CoughAgainstCovid Official Dataset Repository

Data Description

Accessing/Downloading the Data

Metadata Details

Dataset Paper will be Released Soon.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages