Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added serverless API #282

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ It is possible to make minor changes to the generated speech.

In addition, it is required to use `--rate=-50%` instead of `--rate -50%` (note the lack of an equal sign) otherwise the `-50%` would be interpreted as just another argument.

### Deploying to Serverless API

See the [examples/serverless-api](examples/serverless-api) folder for more information on how to deploy to [Cerebrium](https://www.cerebrium.ai).

### Note on the `edge-playback` command

`edge-playback` is just a wrapper around `edge-tts` that plays back the generated speech. It takes the same arguments as the `edge-tts` option.
Expand Down
33 changes: 33 additions & 0 deletions examples/serverless-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Serverless Edge TTS API

This project demonstrates how to run Edge TTS as a serverless API using [Cerebrium](https://www.cerebrium.ai)

## Overview

The `main.py` file contains a function `run` that takes a text input and an optional voice parameter to generate audio and subtitles using Edge TTS. This example specifically streams the output.

## Installation

1. pip install cerebrium
2. cerebrium login
3. Make sure you are in the serverless-api folder and run ```cerebrium deploy```

## Usage

Once deployed, you should be able to make a curl request similar to the below. You can find this url on your Cerebrium dashboard.
```
curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxx/serverless-api/run' \
--header 'Authorization: Bearer <AUTH_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{"text": "Tell me something"}'
```

The `run` function takes two parameters:

- `text` (str): The text to be converted to speech
- `voice` (str, optional): The voice to use for TTS (default: "en-GB-SoniaNeural")

It returns a dictionary containing:

- `audio_data`: The generated audio as a base64-encoded string
- `subtitles`: The generated subtitles in WebVTT format
19 changes: 19 additions & 0 deletions examples/serverless-api/cerebrium.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[cerebrium.deployment]
name = "serverless-api"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = "[./*, main.py, cerebrium.toml]"
exclude = "[.*]"

[cerebrium.hardware]
cpu = 2
memory = 6.0
compute = "CPU"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 5
cooldown = 30

[cerebrium.dependencies.pip]
"edge-tts" = "latest"
49 changes: 49 additions & 0 deletions examples/serverless-api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

"""
This module provides a serverless API for text-to-speech conversion using Edge TTS.

It includes functionality to generate audio and subtitles from input text,
utilizing the edge_tts library. The main function, `run`, is designed to be
used in a serverless environment, returning an asynchronous generator that
yields audio data and subtitles.

Dependencies:
- edge_tts: For text-to-speech conversion
- typing: For type hinting

Usage:
The main entry point is the `run` function, which takes text input
and an optional voice parameter to generate audio and subtitles.
"""

from typing import Dict
from typing import AsyncGenerator
import edge_tts

async def run(text: str, voice: str = "en-GB-SoniaNeural") -> AsyncGenerator[Dict[str, str], None]:
"""
Asynchronously generates audio and subtitles for the given text using the specified voice.

Args:
text (str): The text to be converted to speech.
voice (str): The voice model to use, defaults to "en-GB-SoniaNeural".

Returns:
AsyncGenerator[Dict[str, str], None]: A generator that yields dictionaries containing
"""
communicate = edge_tts.Communicate(text, voice)
submaker = edge_tts.SubMaker()
audio_data = bytearray()
subtitles = ""

async for chunk in communicate.stream():
if chunk["type"] == "audio":
audio_data.extend(chunk["data"])
elif chunk["type"] == "WordBoundary":
submaker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])

subtitles = submaker.generate_subs()
yield {
"audio_data": audio_data.decode("latin-1"),
"subtitles": subtitles
}
Loading