A simple Python package to easily use Meta's Massively Multilingual Speech (MMS) project.
The current MMS code is using subprocess to call another Python script, which is not very convenient to use, and might lead to several issues. This package is created to address those problems and to wrap up the project in an API to easily integrate it with other projects.
- You will need ffmpeg for audio processing
- Install
easymms
from Pypi
pip install easymms
or from source
pip install git+https://github.com/abdeladim-s/easymms
- If you want to use the
Alignment
model:
- you will need
perl
to use uroman. Check the perl website for installation instructions on different platforms. - You will need a nightly version of
torchaudio
:
pip install -U --pre torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
- You might need sox as well.
Fairseq
has not included theMMS
project yet in the released PYPI version, so until the next release, you will need to installfairseq
from source:
pip uninstall fairseq && pip install git+https://github.com/facebookresearch/fairseq
fairseq
when running the code in interactive environments like Jupyter notebooks.
Please use normal Python files or use the colab notebook provided above.
You will need first to download the model weights, you can find and download all the supported models from here.
from easymms.models.asr import ASRModel
asr = ASRModel(model='/path/to/mms/model')
files = ['path/to/media_file_1', 'path/to/media_file_2']
transcriptions = asr.transcribe(files, lang='eng', align=False)
for i, transcription in enumerate(transcriptions):
print(f">>> file {files[i]}")
print(transcription)
from easymms.models.asr import ASRModel
asr = ASRModel(model='/path/to/mms/model')
files = ['path/to/media_file_1', 'path/to/media_file_2']
transcriptions = asr.transcribe(files, lang='eng', align=True)
for i, transcription in enumerate(transcriptions):
print(f">>> file {files[i]}")
for segment in transcription:
print(f"{segment['start_time']} -> {segment['end_time']}: {segment['text']}")
print("----")
from easymms.models.alignment import AlignmentModel
align_model = AlignmentModel()
transcriptions = align_model.align('path/to/wav_file.wav',
transcript=["segment 1", "segment 2"],
lang='eng')
for transcription in transcriptions:
for segment in transcription:
print(f"{segment['start_time']} -> {segment['end_time']}: {segment['text']}")
from easymms.models.tts import TTSModel
tts = TTSModel('eng')
res = tts.synthesize("This is a simple example")
tts.save(res)
Coming Soon
You can check the API reference documentation for more details.
Since the models are released under the CC-BY-NC 4.0 license. This project is following the same License.
This project is not endorsed or certified by Meta AI and is just simplifying the use of the MMS project.
All credit goes to the authors and to Meta for open sourcing the models.
Please check their paper Scaling Speech Technology to 1000+ languages and their blog post.