N-Best-ASR-Transformer

This repository contains code for "N-Best-ASR-Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses." The paper has been accepted in ACL-IJCNLP 2021.

Introduction

N-Best-ASR-Transformer

Spoken Language Understanding (SLU) systems parse speech into semantic structures like dialog acts and slots. This involves the use of an Automatic Speech Recognizer (ASR) to transcribe speech into multiple text alternatives (hypotheses). Transcription errors, common in ASRs, impact downstream SLU performance negatively. Approaches to mitigate such errors involve using richer information from the ASR, either in form of N-best hypotheses or word-lattices. We hypothesize that transformer models learn better with a simpler utterance representation using the concatenation of the N-best ASR alternatives, where each alternative is separated by a special delimiter [SEP]. In our work, we test our hypothesis by using concatenated N-best ASR alternatives as the input to transformer encoder models, namely BERT and XLM-RoBERTa, and achieve performance equivalent to the prior state-of-the-art model on DSTC2 dataset. We also show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime. Additionally, this methodology is accessible to users of third-party ASR APIs which do not provide word-lattice information.

Architecture

About Data

Data

We conduct our experiments on a benchmark SLU dataset which ASR alternatives, DSTC2. Origin data can be obtained here.

Data preprocessing:
```
python helpers/process_dstc2_with_SEP.py \
    --data_dir <input_dir> \
    --out_dir <output_dir>
```
Note that you should replace the <input_dir> with the original DSTC2 data, and replace the <output_dir> with your own output directory.

The data is preprocessed and saved in dstc2_data/processed_data/*, where each line is a data sample in the form of:

[SYS] Hello you are here to book a table [USR] yes want a to book a table [SEP] yes book table

Experiment

We build our experiments keeping this repository as our base code.

The data preprocessing step mentioned above converts the DSTC2 data into the desired input format.

Downstream classifier setting

We build two classifiers, the first one for act-slot pairs and the second one for value.

Training and Evaluation Script and Parameters:

Run command:

./run/train_eval_N_Best_ASR_Transformer_STC.sh

Parameters:

--pre_trained_model : pre-trained model name to use among `"bert"`,`"roberta"`,`"xlm-roberta"`  

--add_l2_loss: Flag used to set usage of MSE loss between asr and transcript hidden state.

--tod_pre_trained_model: Path to TOD pre-trained checkpoint Note: This will override pre_trained_model value if passed.

--add_segment_ids : Flag to decide to add segment ids.

--without_system_act: Flag to remove previous system act [In our case this is previous system utterance]

Parameter to perform Sample Complexity related Experiments:

--coverage: Based on coverage percentage stratified data samples will be picked as a training set. Coverage = (0,1], where, coverage = 1 means you are including the whole data set for training, and, coverage < 1 refers to the percentage of samples you want to consider for training your model. For our work we test our model for sample complexity coverage of {0.05, 0.10, 0.20, 0.50}.

Results

Results of Multi-Seq ASR BERT:

Model	F1 score (%)	Acc. (%)
N-Best-ASR ASR BERT	87.4	81.9
N-Best-ASR ASR XLM-R	87.8	81.8

Results are average after 5 runs on the dataset, each having a unique random seed.

Citation

If you use our work, please cite

@inproceedings{ganesan-etal-2021-n,
    title = "N-Best {ASR} Transformer: Enhancing {SLU} Performance using Multiple {ASR} Hypotheses",
    author = "Ganesan, Karthik  and
      Bamdev, Pakhi  and
      B, Jaivarsan  and
      Venugopal, Amresh  and
      Tushar, Abhinav",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-short.14",
    doi = "10.18653/v1/2021.acl-short.14",
    pages = "93--98",
}

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
dstc2_data/processed_data/raw		dstc2_data/processed_data/raw
figs		figs
helpers		helpers
models		models
run		run
utils		utils
.gitignore		.gitignore
README.md		README.md
n_best_asr_bert.py		n_best_asr_bert.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

N-Best-ASR-Transformer

Introduction