This is the official repository for accessing the data used by the project CoughAgainstCovid.
Due to privacy constraints, we are not allowed to release the original raw audiowaveforms. Instead we release spectrograms, which are 2D time-frequency representations of the audio. To create the spectrograms from the raw audio waveform, we used the following transforms,
- ToTensor
- Resample (44.1khz to 16khz)
- Background Noise (From ESC-50 Dataset)
- Spectrogram (n_fft=512, win_length=512, hop_length=160)
- MelScale (n_mels=64, f_min=125, f_max=7500)
- AmplitudeToDB
- ToNumpy
We share the 2D numpy arrays (npy files) for all the audio sounds collected.
To download/access the spectrograms,
- Fill the form and attach the signed doc file. You will receive a text file with the links in 10-15 mins.
- Download the text file, rename it (to say links.txt) and save it at a location where you can access it.
- Run prepare.py to download and unzip the data. (This script should take 1-2hrs depending upon the download speed)
# To run prepare.py and download, unzip the data at ~/data, (wget would be used to download)
python prepare.py -lp path_to_links_file -od ~/data
Args:
links_path (lp): Path to the text file with the links to the zip files.
output_dir (od): Path to the output directory. If it does not exist, it will be created
Running this script will download the data and unzip it to the output directory. The spectrograms should be present at output_dir/spectrograms/
We provide a metadata file (attributes.csv
) that contains supplementary information about the patients. The table contains the supplementary information present in the csv file.
Attribute | Column Name in CSV | Description |
---|---|---|
Patient Id | patient_id | Unique Identifier |
Patient Age | enroll_patient_age | Continuous |
Health Worker | enroll_health_worker | Discrete |
Temperature | enroll_patient_temperature | Continuous |
Travel History | enroll_travel_history | Discrete |
Presence of Cough | enroll_cough | Discrete |
Presence of Shortness of Breath | enroll_shortness_of_breath | Discrete |
Presence of Fever | enroll_fever | Discrete |
Days with Cough | enroll_days_with_cough | Continuous |
Days with Shortness of Breath (SOB) | enroll_days_with_shortness_of_breath | Continuous |
Days with Fever | enroll_days_with_fever | Continuous |
Contact with Covid Confirmed Case | enroll_contact_with_confirmed_covid_case | Discrete |
Comorbidities | enroll_comorbidities | Discrete |
Patient Respiratory Rate | enroll_patient_respiratory_rate | Continuous |
Smoking Habits | enroll_habits | Discrete |
Cough Relief Measures | enroll_cough_relief_measures | Discrete |
State | testresult_state | Discrete |
Test Facility | testresult_facility | Discrete |
Test Time | testresult_end_time | DateTime |
Covid Result | testresult_covid_test_result | Discrete |
Covid Test Type | testresult_diagnostics_test_type | Discrete |
Audio Recording (aaaaaa sound) | aaaaa_recording | File Name |
Audio Recording (oooooo sound) | ooooo_recording | File Name |
Audio Recording (eeeeee sound) | eeeee_recording | File Name |
Audio Recording (a sound) | a_sound | File Name |
Audio Recording (e sound) | e_sound | File Name |
Audio Recording (o sound) | o_sound | File Name |
Audio Recording (Cough Sound 1) | cough_1 | File Name |
Audio Recording (Cough Sound 2) | cough_2 | File Name |
Audio Recording (Cough Sound 3) | cough_3 | File Name |
Audio Recording (Breathing) | breathing | File Name |
Audio Recording (1 to 10 Counting) | audio_1_to_10 | File Name |
Audio Recording (Room) | room_sound | File Name |
Audio Recording (Room Recording) | room_recording | File Name |
While we collect cough sounds for all the 7169 patients, we collect some outher sounds as well. The audio recording for them would exist only if their filename exists in this metadata file.