Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pure Python audio chat app with Multimodal Live API #1551

Merged
merged 25 commits into from
Jan 8, 2025

Conversation

freddyaboulton
Copy link
Contributor

Description

The multimodal-live-app requires knowledge of html and js.

Adding a pure-python webapp built with Gradio that supports audio input/output streaming will help developers get started building with the multimodal api without leaving their preferred language.

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

@freddyaboulton freddyaboulton requested a review from a team as a code owner December 17, 2024 20:47
Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @freddyaboulton and reviewers,

Gemini is currently reviewing this pull request; a full review should follow in a few minutes. In the meantime, here's a quick summary to help everyone get up to speed:

This PR adds a pure-Python web application built with Gradio that enables audio input/output streaming for the multimodal API. The goal is to provide a more accessible entry point for developers who prefer Python, eliminating the need for HTML and JavaScript knowledge to get started with the multimodal API. The main changes involve creating a new app.py file (gemini/gradio-voice/app.py) which contains a complete Gradio application for voice interaction with the Gemini API. The requirements.txt file (gemini/gradio-voice/requirements.txt) has also been updated to include the necessary libraries: gradio_webrtc, librosa, and python-dotenv.

The app.py file implements a WebSocket connection to the Gemini API, handles audio encoding and decoding, and manages the real-time audio streaming between the user and the Gemini model. The Gradio interface provides a simple user experience for interacting with the voice chat functionality. The API key is securely handled via a password-protected textbox.

If there's anything unclear or missing from this summary, please let me know. I'll provide a more detailed review shortly.

And now, a little haiku to lighten the mood:

Python's gentle grace,
Streams of audio, flowing free,
Gemini speaks back.

@freddyaboulton freddyaboulton changed the title Add a Add a pure-python audio chat app to help python developers get started Dec 17, 2024
Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request introduces a Python-based Gradio web application that facilitates real-time voice chat with Gemini. This is a great addition, as it allows developers to interact with Gemini using a familiar language and a user-friendly interface. The code is generally well-structured, but there are some areas where improvements can be made to enhance robustness, error handling, and user experience.

gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/requirements.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@holtskinner holtskinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also resolve spelling errors and respond to Gemini review comments. Deferring to Zack/Katie for further review

gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
@holtskinner holtskinner changed the title Add a pure-python audio chat app to help python developers get started feat: Pure Python audio chat app with Multimodal Live API Dec 18, 2024
@freddyaboulton
Copy link
Contributor Author

Thank you @holtskinner ! I will address the Gemini comments today/tomorrow. I think there is something wrong with the spellcheck as WebRTC is a valid term. How can I configure it to ignore that phrase?

Copy link
Contributor

Hi @freddyaboulton, Regarding your question about the spellcheck flagging WebRTC as an error, that's a valid point. Many spellcheckers don't recognize technical terms. The best way to handle this depends on your spellchecking tool. If it's a built-in IDE feature, look for options to add custom dictionaries or ignore specific words/phrases. If it's a separate tool like aspell or hunspell, you can usually add WebRTC to a custom dictionary file. For example, with aspell, you might create a personal dictionary file (e.g., .aspell.en.pws) and add WebRTC to it. The exact method will vary based on your setup. Let me know if you need help with the specific tool you are using.

@freddyaboulton
Copy link
Contributor Author

Hi @holtskinner @katiemn @ZackAkil - I switched the demo to use start_stream! However, I ran into a couple of issues I had to solve manually in my locally installed version of the genai package. See here: googleapis/python-genai#35

gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
@holtskinner
Copy link
Collaborator

@freddyaboulton Thanks for making the adjustments! Can you please resolve the linter errors? https://github.com/GoogleCloudPlatform/generative-ai/actions/runs/12584981682/job/35075770729?pr=1551

@holtskinner holtskinner assigned freddyaboulton and unassigned ZackAkil and katiemn Jan 2, 2025
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
gemini/gradio-voice/app.py Outdated Show resolved Hide resolved
@freddyaboulton
Copy link
Contributor Author

Hi @holtskinner ! Thanks for the patience over the holidays. I've updated the demo based on your suggestions and fixed the lint!

@freddyaboulton
Copy link
Contributor Author

Hi @holtskinner ! Thanks for adding the dropdowns for region and voice! I moved the directory and added a README as well as addressing the other comments.

@holtskinner holtskinner merged commit d5266c2 into GoogleCloudPlatform:main Jan 8, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants