Please refer to the results table for supported tasks/examples. To run an ASR example, execute the following commands from your Athena root directory:
source env.sh
bash examples/asr/$dataset_name/run.sh
Before you run examples/asr/$dataset_name/run.sh
, you should download the coorsponding dataset and store it in examples/asr/$dataset_name/data
. The script examples/asr/$dataset_name/local/prepare_data.py
would generate the desired csv file decripting the dataset
With the generated csv file, we should compute the cmvn file firstly like this
$ python athena/cmvn_main.py examples/asr/$dataset_name/configs/mpc.json examples/asr/$dataset_name/data/all.csv
You can perform the unsupervised pretraining using the json file examples/asr/$dataset_name/mpc.json
or just skip this
You can train a transformer model using json file examples/asr/$dataset_name/configs/transformer.json
or train a mtl_transformer_ctc model using json file examples/asr/$dataset_name/configs/mtl_transformer.json
You can train a rnnlm model using the transcripts with the json file examples/asr/$dataset_name/rnnlm.json
, of course, you should firstly prepare the csv file for it
Currently, we provide a simple but not so effective way for decoding mtl_transformer_ctc model. To use it, run
$ python athena/inference.py examples/asr/$dataset_name/configs/$model_name_deocde.json
bash examples/asr/aishell/local/run_score.sh inference.log score_aishell examples/asr/aishell/data/vocab
Language | Task | Model Name | Training Data | Hours of Speech | Error Rate |
---|---|---|---|---|---|
English | ASR | Transformer | LibriSpeech Dataset | 960 h | 3.1% (WER) |
English | ASR | Transformer | [GigaSpeech Dataset] | 10000 h | 11.7% (WER) |
Mandarin | ASR | Transformer | HKUST Dataset | 151 h | 21.64 (CER) |
Mandarin | ASR | Conformer | HKUST Dataset | 151 h | 21.33% (CER) |
Mandarin | ASR | Transformer | AISHELL Dataset | 178 h | 5.13% (CER) |
Mandarin | ASR | Conformer | AISHELL Dataset | 178 h | 4.95% (CER) |
Mandarin | ASR | Conformer | MISP2021 Challenge Task2 | 120h | 49% (CER) |
Mandarin | AV-ASR | Conformer-AV | MISP2021 Challenge Task2 | 120h | 61% (CER) |