Skip to content

luisleao/twilio-gemini-stream

Repository files navigation

Call Gemini: Generative AI Phone Calling

Wouldn't it be neat if you could build an app that allowed you to chat with Google Gemini on the phone?

Twilio gives you a superpower called Media Streams. Media Streams provides a Websocket connection to both sides of a phone call. You can get audio streamed to you, process it, and send audio back.

This app serves as a demo exploring two services:

These service combine to create a voice application that is remarkably better at transcribing, understanding, and speaking than traditional IVR systems.

Features:

  • 🏁 Returns responses with low latency, typically 1 second by utilizing streaming.
  • ❗️ Allows the user to interrupt the Google Gemini assistant and ask a different question.
  • 📔 Maintains chat history with Google Gemini.

Setting up for Development

Prerequisites

If you're hosting the app locally, we also recommend using a tunneling service like ngrok so that Twilio can forward audio to your app.

1. Start Ngrok

Start an ngrok tunnel for port 3000:

ngrok http 3000

Ngrok will give you a unique URL, like abc123.ngrok.io. Copy the URL without http:// or https://. You'll need this URL in the next step.

2. Configure Environment Variables

Copy .env.example to .env and configure the env variables.

3. Install Dependencies with NPM

Install the necessary packages:

npm install

4. Start Your Server in Development Mode

Run the following command:

npm run dev

This will start your app using nodemon so that any changes to your code automatically refreshes and restarts the server.

5. Configure an Incoming Phone Number

Connect a phone number using the Twilio Console.

You can also use the Twilio CLI:

twilio phone-numbers:update +1[your-twilio-number] --voice-url=https://your-server.ngrok.io/incoming

This configuration tells Twilio to send incoming call audio to your app when someone calls your number. The app responds to the incoming call webhook with a Stream TwiML verb that will connect an audio media stream to your websocket server.

Testing with Jest

Repeatedly calling the app can be a time consuming way to test your tool function calls. This project contains example unit tests that can help you test your functions without relying on the Google Gemini to call them.

Simple example tests are available in the /test directory. To run them, simply run npm run test.

Deploy via Fly.io

Fly.io is a hosting service similar to Heroku that simplifies the deployment process. Given Twilio Media Streams are sent and received from us-east-1, it's recommended to choose Fly's Ashburn, VA (IAD) region.

Deploying to Fly.io is not required to try the app, but can be helpful if your home internet speed is variable.

Modify the app name fly.toml to be a unique value (this must be globally unique).

Deploy the app using the Fly.io CLI:

fly launch

fly deploy

Import your secrets from your .env file to your deployed app:

fly secrets import < .env

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published