Fork of sprocket speicialized in aligning normal and electrolarynx speech.
- Prepare wav files. Use
test_prepare_*_files.py
to rename, resample to 16 kHz, and copy files from source to the working folder, i.e. underexample/data/wav/
. - Also use
test_cut_wavs.py
andtest_stretch_audio.py
to cut initial transients and pre-stretch WAV fiels if needed. - Run
initialize.py
steps 1, 2, and 3. - Modify the
*.yml
and*.list
files if needed.- Add
fake
in the YML file if needed. - Use the
f0
andnpow
histograms underconf/figure
to set reasonable thresholds.
- Add
- See
example/conf/speaker/nasal_tsai_mhint_20211128_cut[0.10].yml
for an example.
- Run
run_sprocket.py
steps 1, 2, 3, and 6. The results will be stored inexample/data/pair/*/aligned/
.
example/src/yml.py
,example/src/extract_features.py
,sprocket/speech/feature_extrator.py
,sprocket/speech/analyzer.py
: Add the ability to provide fake f0 (e.g. 100 Hz electrolarynx excitations) and median-filter the extracted f0 by specifying the kernel size.sprocket/speech/extfrm.py
: Log the number of non-silent frames extracted.example/src/estimate_twf_and_jnt.py
: Median-filter the power and also save the joint feature vectors in the HDF5 file format. Useful for outputting the aligned WAV files.example/run_sprocket.py
: Add an extra step 6, which outputs the aligned WAV files. Also disable step 4 & 5.
example/file_utils.py
: Add utilites to rename and resample files. Used bytest_prepare_{nasal, normal}_files.py
example/src/output_aligned.py
: Take the results of sprocket step 3 (aligning) and resynthesize aligned WAVs. Used in sprocket step 6.
example/test_prepare_{nasal, normal}_files.py
: Prepare nasal and normal files.example/test_stretch_audio.py
: Pre-stretch faster speech files (usually normal speech) to roughly match the slower ones. This depends on thepysox
package.example/test_cut_wavs.py
: Cut the initial part of WAV files; useful to remove the initial transients.