Skip to content

Latest commit

 

History

History
86 lines (56 loc) · 2.9 KB

README.md

File metadata and controls

86 lines (56 loc) · 2.9 KB

VoiceKit NodeJS examples

You will need nodejs and npm to run this examples. To work correctly all scripts should be run from this directory (voicekit-examples/nodejs).

Usage

Install the requirements

$ npm install

Basic recognition examples

Run basic (non-streaming) speech recognition example:

$ node recognize.js -e MPEG_AUDIO -r 16000 -c 1 ../audio/sample_1.mp3

To disable automatic punctuation and get up to 3 recognition alternatives:

$ node recognize.js -e MPEG_AUDIO -r 16000 -c 1 --no-automatic-punctuation --max-alternatives 3 ../audio/sample_1.mp3

Run streaming speech recognition with interim results and disabled voice activity detection (VAD):

$ node recognize_stream.js -e MPEG_AUDIO -r 16000 -c 1 --interim-results --no-perform-vad ../audio/sample_1.mp3

Specify longer silence timeout for voice activity detection (you will probably need longer audio to actually see the difference):

$ node recognize_stream.js -e MPEG_AUDIO -r 16000 -c 1 --interim-results --silence-duration-threshold 1.2 ../audio/sample_1.mp3

Return just the first recognized utterance and halt (you will probably need longer audio to actually see the difference):

$ node recognize_stream.js -e MPEG_AUDIO -r 16000 -c 1 --interim-results --single-utterance ../audio/sample_1.mp3

Basic synthesis examples

To run basic speech synthesis and save result to wav:

$ node synthesize_stream.js -r 48000 -e LINEAR16 "И мысли тоже тяжелые и медлительные, падают неторопливо и редко одна за другой, точно песчинки в разленившихся песочных часах." output_1.wav

You can also specify alternative stress with 0 sign after a vowel:

$ node synthesize_stream.js -r 48000 -e LINEAR16 "За0мок - замо0к." output_2.wav

Feel free to use arabic numerals and named entities:

$ node synthesize_stream.js -r 48000 -e LINEAR16 "Газета Times, 03 января 2009 года - Канцлер на грани ради второго спасения банков." output_3.wav

For now, LINEAR16 does not support samples rates other than 48kHz. Use RAW_OPUS (@discordjs/opus required) to specify different sample rates:

$ node synthesize_stream.js -r 16000 -e RAW_OPUS "Привет, мир." output_4.wav

SSML

It is possible to use SSML:

$ node synthesize_stream.js -r 48000 -e LINEAR16 --ssml "<speak><p><s>Оригинальная мысль?</s><s>Нет ничего легче.</s></p><break time='300ms'/><p><s>Библиотеки просто набиты ими.</s></p></speak>" output_5.wav

Voice Selection

You can specify voice name:

$ node synthesize_stream.js -r 48000 -e LINEAR16 --voice alyona "Привет! Меня зовут Алёна." output_6.wav