Skip to content

The python library for real-time communication

License

Notifications You must be signed in to change notification settings

jbowles/fastrtc

This branch is 56 commits behind freddyaboulton/fastrtc:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

freddyaboultonFreddy Boulton
and
Freddy Boulton
Feb 28, 2025
01c085e · Feb 28, 2025
Feb 24, 2025
Feb 27, 2025
Feb 28, 2025
Feb 28, 2025
Feb 27, 2025
Feb 28, 2025
Feb 26, 2025
Feb 24, 2025
Sep 25, 2024
Feb 24, 2025
Feb 24, 2025
Feb 27, 2025
Feb 27, 2025
Feb 26, 2025

Repository files navigation

FastRTC

FastRTC Logo
Static Badge Static Badge

The Real-Time Communication Library for Python.

Turn any python function into a real-time audio and video stream over WebRTC or WebSockets.

Installation

pip install fastrtc

to use built-in pause detection (see ReplyOnPause), and text to speech (see Text To Speech), install the vad and tts extras:

pip install fastrtc[vad, tts]

Key Features

  • 🗣️ Automatic Voice Detection and Turn Taking built-in, only worry about the logic for responding to the user.
  • 💻 Automatic UI - Use the .ui.launch() method to launch the webRTC-enabled built-in Gradio UI.
  • 🔌 Automatic WebRTC Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a webRTC endpoint for your own frontend!
  • ⚡️ Websocket Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a websocket endpoint for your own frontend!
  • 📞 Automatic Telephone Support - Use the fastphone() method of the stream to launch the application and get a free temporary phone number!
  • 🤖 Completely customizable backend - A Stream can easily be mounted on a FastAPI app so you can easily extend it to fit your production application. See the Talk To Claude demo for an example on how to serve a custom JS frontend.

Docs

https://fastrtc.org

Examples

See the Cookbook for examples of how to use the library.

🗣️👀 Gemini Audio Video Chat

Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

gemini-audio-video-first.mp4

Demo | Code

🗣️ Google Gemini Real Time Voice API

Talk to Gemini in real time using Google's voice API.

gemini-live-chat.mp4

Demo | Code

🗣️ OpenAI Real Time Voice API

Talk to ChatGPT in real time using OpenAI's voice API.

openai-live-chat.mp4

Demo | Code

🤖 Hello Computer

Say computer before asking your question!

2025-02-20_00-05-11.mp4

Demo | Code

🤖 Llama Code Editor

Create and edit HTML pages with just your voice! Powered by SambaNova systems.

llama-code-editor.mp4

Demo | Code

🗣️ Talk to Claude

Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

talk-to-claude.mp4

Demo | Code

🎵 Whisper Transcription

Have whisper transcribe your speech in real time!

whisper-realtime.mp4

Demo | Code

📷 Yolov10 Object Detection

Run the Yolov10 model on a user webcam stream in real time!

yolov10-stream.mp4

Demo | Code

🗣️ Kyutai Moshi

Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

talk-to-moshi.mp4

Demo | Code

🗣️ Hello Llama: Stop Word Detection

A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

hey-llama-final.mp4

Demo | Code

Usage

This is an shortened version of the official usage guide.

  • .ui.launch(): Launch a built-in UI for easily testing and sharing your stream. Built with Gradio.
  • .fastphone(): Get a free temporary phone number to call into your stream. Hugging Face token required.
  • .mount(app): Mount the stream on a FastAPI app. Perfect for integrating with your already existing production system.

Quickstart

Echo Audio

from fastrtc import Stream, ReplyOnPause
import numpy as np

def echo(audio: tuple[int, np.ndarray]):
    # The function will be passed the audio until the user pauses
    # Implement any iterator that yields audio
    # See "LLM Voice Chat" for a more complete example
    yield audio

stream = Stream(
    handler=ReplyOnPause(detection),
    modality="audio", 
    mode="send-receive",
)

LLM Voice Chat

from fastrtc import (
    ReplyOnPause, AdditionalOutputs, Stream,
    audio_to_bytes, aggregate_bytes_to_16bit
)
import gradio as gr
from groq import Groq
import anthropic
from elevenlabs import ElevenLabs

groq_client = Groq()
claude_client = anthropic.Anthropic()
tts_client = ElevenLabs()


# See "Talk to Claude" in Cookbook for an example of how to keep 
# track of the chat history.
def response(
    audio: tuple[int, np.ndarray],
):
    prompt = groq_client.audio.transcriptions.create(
        file=("audio-file.mp3", audio_to_bytes(audio)),
        model="whisper-large-v3-turbo",
        response_format="verbose_json",
    ).text
    response = claude_client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    response_text = " ".join(
        block.text
        for block in response.content
        if getattr(block, "type", None) == "text"
    )
    iterator = tts_client.text_to_speech.convert_as_stream(
        text=response_text,
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_multilingual_v2",
        output_format="pcm_24000"
        
    )
    for chunk in aggregate_bytes_to_16bit(iterator):
        audio_array = np.frombuffer(chunk, dtype=np.int16).reshape(1, -1)
        yield (24000, audio_array)

stream = Stream(
    modality="audio",
    mode="send-receive",
    handler=ReplyOnPause(response),
)

Webcam Stream

from fastrtc import Stream
import numpy as np


def flip_vertically(image):
    return np.flip(image, axis=0)


stream = Stream(
    handler=flip_vertically,
    modality="video",
    mode="send-receive",
)

Object Detection

from fastrtc import Stream
import gradio as gr
import cv2
from huggingface_hub import hf_hub_download
from .inference import YOLOv10

model_file = hf_hub_download(
    repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
)

# git clone https://huggingface.co/spaces/fastrtc/object-detection
# for YOLOv10 implementation
model = YOLOv10(model_file)

def detection(image, conf_threshold=0.3):
    image = cv2.resize(image, (model.input_width, model.input_height))
    new_image = model.detect_objects(image, conf_threshold)
    return cv2.resize(new_image, (500, 500))

stream = Stream(
    handler=detection,
    modality="video", 
    mode="send-receive",
    additional_inputs=[
        gr.Slider(minimum=0, maximum=1, step=0.01, value=0.3)
    ]
)

Running the Stream

Run:

Gradio

stream.ui.launch()

Telephone (Audio Only)

```py
stream.fastphone()
```

FastAPI

app = FastAPI()
stream.mount(app)

# Optional: Add routes
@app.get("/")
async def _():
    return HTMLResponse(content=open("index.html").read())

# uvicorn app:app --host 0.0.0.0 --port 8000

About

The python library for real-time communication

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.3%
  • Svelte 26.0%
  • TypeScript 5.2%
  • HTML 1.8%
  • Other 0.7%