Skip to content

Latest commit

 

History

History

align

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Speech Alignment

Before runing an Align example, you should refer to the ASR recipe to get the trained ASR model first:

source env.sh
bash examples/align/$dataset_name/run.sh

Core Stages(stage1-3 is ASR stages, stage4 is Align stage):

(1) Data preparation

Before you run examples/asr/$dataset_name/run.sh, you should download the coorsponding dataset and store it in examples/asr/$dataset_name/data. The script examples/asr/$dataset_name/local/prepare_data.py would generate the desired csv file decripting the dataset

(2) Data normalization

With the generated csv file, we should compute the cmvn file firstly like this

$ python athena/cmvn_main.py examples/asr/$dataset_name/configs/mpc.json examples/asr/$dataset_name/data/all.csv

(3) Acoustic model training

You can train a transformer model using json file examples/asr/$dataset_name/configs/transformer.json or train a mtl_transformer_ctc model using json file examples/asr/$dataset_name/configs/mtl_transformer.json

(4) CTC alignment

After trained a transformer model, we provide the CTC alignment method for aligning the speech and label on frame level. To use it, run

$ python athena/alignment.py examples/align/$dataset_name/configs/mtl_transformer.align.json

(5) Alignment result

The CTC alignment result of one utterance is shown below, we can see the output of ctc alignment is with time delayed