Gemini 2.0 Flash Multimodal Live API Client writen in vanilla JS

This project is a web application that demonstrates the capabilities of the Gemini 2.0 Flash Multimodal Live API. It allows users to interact with the API using text, audio, video, and screen sharing in real-time. Made possible by Google's original React based repository.

Inspiration

This repository was inspired by this issue

It aims to demonstrate the Gemini 2.0 Flash Multimodal Live API using a simple web client built with vanilla JavaScript, HTML, and CSS. It was created to address the complexity of the original React-based implementation found in the Google's original React based repository. The original implementation, while functional, introduced unnecessary dependencies and a build process that obscured the simplicity of interacting with the API. This project simplifies the development process by removing the need for React, TypeScript, SCSS, and a complex build chain, making it more accessible and easier to understand for a wider range of developers.

Features

Text Input: Send text messages to the Gemini API and receive responses.
Audio Input: Use your microphone to send audio input to the API.
Audio Output: Hear the API's responses through your speakers.
Video Input: Send video from your webcam to the API. (It uses motion detection to aviod sending static frames for better performance)
Screen Sharing: Share your screen with the API.
Real-time Interaction: Experience low-latency interactions with the API made possible by optimizations.
Function Calling: Utilize function calling capabilities for enhanced interactions.
Visual Feedback: Audio visualizers provide real-time feedback on audio input and output levels.
Error Handling: Robust error handling and logging for debugging.

Getting Started

Prerequisites

A modern web browser that supports the Web Audio API, WebRTC, and WebSockets.
A Google AI Studio API key.
Python 3.0 or higher.

(Note: I think this is the minimal version that supports `python -m http.server`. If you can run js in other ways, you can ignore this.)

(Cause I got security error when I just open the index.html file in the browser. If you know better please craete a PR for this part.)

Installation

Clone this repository to your local machine.
Navigate to the project directory.
Create a config.js file inside the js/config directory.
Add your Google AI Studio API key to config.js following the instructions in js/config/README.md.
Run python -m http.server in the project root. (It runs at localhost:8000 by default but you can just write a port after like python -m http.server 3000 to run it at localhost:3000)

Usage

Click the "Connect" button to establish a connection to the Gemini API.
Enter text messages in the input field and click "Send" to interact with the API.
Click the microphone button to start sending audio input.
Click the camera button to start sending video from your webcam.
Click the screen share button to share your screen.
Observe the logs in the logs container for real-time updates and debugging information.

Project Structure

The project is organized into several directories:

js/: Contains the JavaScript code for the application.
- audio/: Handles audio input, output, and processing.
- config/: Contains configuration file, including API keys and settings.
- core/: Implements core functionalities like the WebSocket client and worklet registry.
- tools/: Defines tools for function calling with the Gemini API. You can add your own tools here. Read more in js/tools/README.md
- utils/: Provides utility functions for logging, error handling, and data conversion.
- video/: Manages video input and screen sharing.
css/: Contains the CSS stylesheet for the application.

Important Notes

Ensure that your API key is kept secure and not exposed publicly (add config.js to .gitignore).
Refer to the js/config/README.md file for detailed instructions on configuring your API key.
The application is designed for modern web browsers and may not be compatible with older browsers.

Contributing

Contributions to this project are welcome. Please feel free to submit pull requests or open issues on the repository.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini 2.0 Flash Multimodal Live API Client writen in vanilla JS

Inspiration

Features

Getting Started

Prerequisites

(Note: I think this is the minimal version that supports `python -m http.server`. If you can run js in other ways, you can ignore this.)

(Cause I got security error when I just open the index.html file in the browser. If you know better please craete a PR for this part.)

Installation

Usage

Project Structure

Important Notes

Contributing

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
css		css
js		js
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
index.html		index.html

License

ViaAnthroposBenevolentia/gemini-2-live-api-demo

Folders and files

Latest commit

History

Repository files navigation

Gemini 2.0 Flash Multimodal Live API Client writen in vanilla JS

Inspiration

Features

Getting Started

Prerequisites

(Note: I think this is the minimal version that supports python -m http.server. If you can run js in other ways, you can ignore this.)

(Cause I got security error when I just open the index.html file in the browser. If you know better please craete a PR for this part.)

Installation

Usage

Project Structure

Important Notes

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

(Note: I think this is the minimal version that supports `python -m http.server`. If you can run js in other ways, you can ignore this.)