Skip to content

Latest commit

 

History

History
92 lines (65 loc) · 3.57 KB

README.md

File metadata and controls

92 lines (65 loc) · 3.57 KB

SLT.KIT

This repository contains a toolkit for speech translation. It provides a Docker container with a ready to use pipeline containing the following components:

  • a neural speech recognition system
  • a sentence segmentation system
  • an attention-based translation system

The speech recognition system processes the audio files and creates the transcription in the source language. Afterwords the sentence segmentation system adds punctuation and recases the output. Finally the output is translated by the machine translation system. We provide pipelines to train these model as well as pre-trained models for all components for the task of translating English lectures to German.

The system uses the following software:

Requirements:

Installation

    git clone https://github.com/isl-mt/SLT.KIT.git
    cd SLT.KIT
    docker build --build-arg CUDA=$CUDAVERSION -t slt.kit -f Dockerfile.ST-Baseline .
    with CUDAVERSION = 8.0 or 9.0 or 9.1

Run

  • Starting the docker container (e.g. source language English (en) and target language German (de))
    docker run -ti --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=$gpuid slt.kit
    export sl=en
    export tl=de

File Structure

  • The general file structure used by all models and systems is described in File structure

System

  • This repository contains different systems that can be used to do speech translation
    • Cascaded systems: Systems that combine an ASR, sentence segmentation/puncation and MT component

      • ctc-tedlium2.smallTED: Combination of the ctc-tedlium2 ASR system and the smallTED system for sentence segmentation and MT
      • ctc-tedlium2.midSize: Combination of the ctc-tedlium2 ASR system and the midSize system for sentence segmentation and MT
    • ASR systems: Systems to transcribe the audio

      • ctc-tedlium2: Simple LSTM network trained with the CTC loss that outputs BPE units
      • las-tedlium2: Attention-based ASR system
    • Sentence segmentation/MT

      • ted: System trained on the TED corpus
      • midSize: System trained on TED and EPPS corpus

Test sets

  • English to German
    • dev2010
    • tst2010
    • tst2013
    • tst2014
    • tst2015

Results

The results reported here are generated by Rover'ing the output of the three ASR systems (CTC 300, CTC 10k and the attention-based ASR system) and using the MT system trained on the TED corpus.

English to German

SET BLEU TER BEER CharacTER BLEU(ci) TER(ci)
dev2010 13.98 71.78 45.88 78.50 15.05 69.68
tst2010 14.08 71.66 44.40 77.66 15.12 69.36
tst2013 13.73 72.81 44.02 71.45 14.61 70.78
tst2014 13.28 74.34 42.43 78.38 14.01 72.62

Furthermore, results for the MT system can be found here.