POC Python Realtime API o1 assistant

This is a proof of concept for using the OpenAI's Realtime API to chain tools, call o1-preview & o1-mini, structure output responses, and glimpse into the future of AI assistant powered engineering.

See video where we use and discuss this POC

This codebase is a v0, poc. It's buggy, but contains the core ideas for realtime personal ai assistants & AI Agents.

Setup

Install uv, the hyper modern Python package manager.
Setup environment cp .env.sample .env add your OPENAI_API_KEY
Update personalization.json to fit your setup
Install dependencies uv sync
Run the realtime assistant uv run main

Try This

Here are some voice commands you can try with the assistant:

"Hey Ada, how are you?"
"What's the current time?"
"Generate a random number."
"Open ChatGPT, Claude, and Hacker News."
"Create a new CSV file called user analytics with 10 mock rows."
"Update the user analytics file, add 20 additional mock rows, use a reasoning model."

Code Breakdown

Main Script (`main.py`)

The main.py script serves as the entry point for the application. It sets up the WebSocket connection, handles audio input/output, and manages the interaction between the user and the AI assistant.

Environment and Personalization Setup

The application uses environment variables (loaded from a .env file) and a personalization.json file to customize the assistant's behavior and store API keys.

Function Definitions

Several functions are defined to handle various tasks:

get_current_time(): Returns the current time.
get_random_number(): Generates a random number between 1 and 100.
open_browser(): Opens specified URLs in the browser.
create_file(): Creates a new file with generated content.
update_file(): Updates an existing file's content.
delete_file(): Deletes a specified file.

These functions can be called by the AI assistant to perform actions based on user requests.

AsyncMicrophone Class

The AsyncMicrophone class manages asynchronous audio input, allowing for real-time speech capture and processing.

WebSocket Connection and Event Handling

The application establishes a WebSocket connection with the OpenAI Realtime API. It handles various events such as response.created, response.done, and function calls, enabling real-time interaction with the AI assistant.

Audio Processing

Audio data is captured, encoded in base64, and sent over the WebSocket. The play_audio() function handles playback of the assistant's audio responses.

Runtime Logging and Decorators

The timeit_decorator is used to log execution times of functions. Runtime information is logged to runtime_time_table.jsonl for performance analysis.

Entry Point (`main()` Function)

The main() function initializes the application, sets up the WebSocket connection, and manages the main event loop for user interaction.

Additional Resources and Utilities

The codebase includes various utility functions for tasks such as structured output prompts, chat prompts, and audio encoding/decoding.

Improvements

Up for a challenge? Here are some ideas on how to improve the experience:

Organize code.
Add interruption handling. Current version prevents it for simplicity.
Add transcript logging.
Make personalization.json a pydantic type.
Let tools run in parallel.
Fix audio randomly cutting out near the end.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
AI_DOCS		AI_DOCS
images		images
src/realtime_api_async_python		src/realtime_api_async_python
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
personalization.json		personalization.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POC Python Realtime API o1 assistant

Setup

Try This

Code Breakdown

Main Script (`main.py`)

Environment and Personalization Setup

Function Definitions

AsyncMicrophone Class

WebSocket Connection and Event Handling

Audio Processing

Runtime Logging and Decorators

Entry Point (`main()` Function)

Additional Resources and Utilities

Improvements

Resources

About

Releases

Packages

Languages

matthewdcage/poc-realtime-ai-assistant

Folders and files

Latest commit

History

Repository files navigation

POC Python Realtime API o1 assistant

Setup

Try This

Code Breakdown

Main Script (main.py)

Environment and Personalization Setup

Function Definitions

AsyncMicrophone Class

WebSocket Connection and Event Handling

Audio Processing

Runtime Logging and Decorators

Entry Point (main() Function)

Additional Resources and Utilities

Improvements

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Main Script (`main.py`)

Entry Point (`main()` Function)

Packages