Node.js binding to the Windows Media Speech Synthesis API

Uses N-API to bind to the Universal Windows Platform speech-synthesis API. The native speech synthesizer on Windows 10 and newer.

Speech is returned as a Uint8Array, in WAVE format
Will recognize voice packages installed via the Windows Speech settings
Supports SSML (when option enableSsml is set to true)
Provides markers and metadata tracks
Addon binaries are pre-bundled. Doesn't require any install-time postprocessing
Supports Windows 10 and newer, on both x64 and arm64

Installation

npm install @echogarden/windows-media-tts

Usage examples

List voices:

import { getVoiceList, synthesize } from '@echogarden/window-media-tts'

const voices = getVoiceList()

console.log(voices)

Prints:

[
  {
    id: 'HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech_OneCore\\Voices\\Tokens\\MSTTS_V110_enUS_DavidM',
    displayName: 'Microsoft David',
    description: 'Microsoft David - English (United States)',
    language: 'en-US',
    gender: 'male'
  },
  {
    id: 'HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech_OneCore\\Voices\\Tokens\\MSTTS_V110_enUS_ZiraM',
    displayName: 'Microsoft Zira',
    description: 'Microsoft Zira - English (United States)',
    language: 'en-US',
    gender: 'female'
  },

  //...
]

Synthesize text

import { writeFile } from 'node:fs/promises'
import { synthesize } from '@echogarden/window-media-tts'

const text = `
Hello World! How are you doing today?
`

const { audioData } = synthesize(text, {
	voiceName: 'Microsoft Zira',
	speakingRate: 0.9,
	audioPitch: 1.0,
	enableSsml: false
})

await writeFile(`hello-world.wav`, audioData)

The voiceName property will match any of id, displayName or description properties of the target voice.

Synthesize SSML

const ssml = `
Hello World! <mark name="a" />How are you <mark name="b" />doing today?
`

const { audioData, markers } = synthesize(ssml, {
	voiceName: 'Microsoft David',
	speakingRate: 0.8,
	audioPitch: 1.0,
	enableSsml: true
})

// ...

console.log(markers)

Markers show as:

[ { name: 'a', time: 1.6964375 }, { name: 'b', time: 2.2614375 } ]

Notes:

The wrapping SSML <speak> tag is automatically added, and includes the language name for the selected voice (or for the default one, if not selected). That is crucial for the synthesis engine to successfully accept the input
The Windows synthesis engine is very strict about the validity of the SSML markup. If the SSML has errors, like tags that aren't properly closed, it will fail without a helpful error message

Timed metadata

The Windows synthesis API returns the start time of each sentence and word, in a TimedMetadataTracks object.

This object is converted to a JavaScript object and returned in the timedMetadataTracks property of the returned object.

const { audioData, timedMetadataTracks } = synthesize(...)

Note: it seems to give correct timing. However, the cue identifiers seem to be empty, cue durations are 0, and it contains no text offset information. This makes it harder to reliably match the cues to the words and sentences in the text, in practice.

Building the N-API addons

The library is bundled with pre-built addons, so recompilation shouldn't be needed.

If you still want to compile yourself, for a modification or a fork:

Install Visual Studio 2022 build tools
In the addons directory, run npm install, which would install the necessary build tools. Then run npm run build-x64
To cross-compile for arm64 go to Visual Studio Installer -> Individual components, and ensure MSVC v143 - VS 2022 C++ ARM64 build tools (latest) is checked. Then run npm run build-arm64
Resulting binaries should be written to the addons/bin directory

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
addons		addons
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node.js binding to the Windows Media Speech Synthesis API

Installation

Usage examples

List voices:

Synthesize text

Synthesize SSML

Timed metadata

Building the N-API addons

License

About

Languages

License

echogarden-project/windows-media-tts

Folders and files

Latest commit

History

Repository files navigation

Node.js binding to the Windows Media Speech Synthesis API

Installation

Usage examples

List voices:

Synthesize text

Synthesize SSML

Timed metadata

Building the N-API addons

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages