Skip to content

Commit

Permalink
Added serverless API
Browse files Browse the repository at this point in the history
  • Loading branch information
milo157 committed Oct 22, 2024
1 parent 3a21044 commit 2018772
Show file tree
Hide file tree
Showing 4 changed files with 77 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ It is possible to make minor changes to the generated speech.

In addition, it is required to use `--rate=-50%` instead of `--rate -50%` (note the lack of an equal sign) otherwise the `-50%` would be interpreted as just another argument.

### Deploying to Serverless API

See the [examples/serverless-api](examples/serverless-api) folder for more information on how to deploy to [Cerebrium](https://www.cerebrium.ai).

### Note on the `edge-playback` command

`edge-playback` is just a wrapper around `edge-tts` that plays back the generated speech. It takes the same arguments as the `edge-tts` option.
Expand Down
33 changes: 33 additions & 0 deletions examples/serverless-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Serverless Edge TTS API

This project demonstrates how to run Edge TTS as a serverless API using [Cerebrium](https://www.cerebrium.ai)

## Overview

The `main.py` file contains a function `run` that takes a text input and an optional voice parameter to generate audio and subtitles using Edge TTS.

## Installation

1. pip install cerebrium
2. cerebrium login
3. Make sure you are in the serverless-api folder and run ```cerebrium deploy```

## Usage

Once deployed, you should be able to make a curl request similar to:
```
curl --location 'https://api.cortex.cerebrium.ai/v4/p-xxxxxx/serverless-api/run' \
--header 'Authorization: Bearer <AUTH_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{"text": "Tell me something"}'
```

The `run` function takes two parameters:

- `text` (str): The text to be converted to speech
- `voice` (str, optional): The voice to use for TTS (default: "en-GB-SoniaNeural")

It returns a dictionary containing:

- `audio_data`: The generated audio as a base64-encoded string
- `subtitles`: The generated subtitles in WebVTT format
19 changes: 19 additions & 0 deletions examples/serverless-api/cerebrium.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[cerebrium.deployment]
name = "serverless-api"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = "[./*, main.py, cerebrium.toml]"
exclude = "[.*]"

[cerebrium.hardware]
cpu = 2
memory = 12.0
compute = "CPU"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 5
cooldown = 30

[cerebrium.dependencies.pip]
"edge-tts" = "latest"
21 changes: 21 additions & 0 deletions examples/serverless-api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

import edge_tts

async def run(text: str, voice: str = "en-GB-SoniaNeural"):

communicate = edge_tts.Communicate(text, voice)
submaker = edge_tts.SubMaker()
audio_data = bytearray()
subtitles = ""

async for chunk in communicate.stream():
if chunk["type"] == "audio":
audio_data.extend(chunk["data"])
elif chunk["type"] == "WordBoundary":
submaker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])

subtitles = submaker.generate_subs()
return {
"audio_data": audio_data.decode("latin-1"),
"subtitles": subtitles
}

0 comments on commit 2018772

Please sign in to comment.