Python: draft initial implementation of Realtime API #10127

eavanvalkenburg · 2025-01-08T16:04:34Z

Motivation and Context

Implements the OpenAI Realtime API with Semantic Kernel

Description

Implements a separate Service Client class with its own ExecutionSettings, but still based on ChatCompletionClientBase.
Only support streaming operations with additional public methods for sending data to the conversation.
TBD if that is the way to move forward with it.

TODO:

lots of comments
tests
cleanup

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

python/samples/concepts/audio/audio_player.py

python/pyproject.toml

python/samples/concepts/audio/audio_recorder_stream.py

...ernel/connectors/ai/open_ai/prompt_execution_settings/open_ai_realtime_execution_settings.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_base.py

python/semantic_kernel/connectors/ai/realtime_client_base.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_base.py

python/semantic_kernel/connectors/ai/open_ai/services/open_ai_realtime_utils.py

python/semantic_kernel/connectors/ai/realtime_client_base.py

python/tests/unit/contents/test_audio_content.py

markwallace-microsoft · 2025-01-09T15:58:44Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
semantic_kernel/connectors/ai
chat_completion_client_base.py	124	2	98%	395, 405
function_calling_utils.py	52	10	81%	161–186
realtime_client_base.py	22	4	82%	102, 109–110, 114
semantic_kernel/connectors/ai/open_ai/services
open_ai_realtime.py	37	14	62%	35–37, 84–98, 127, 147
semantic_kernel/connectors/ai/open_ai/services/realtime
open_ai_realtime_base.py	181	130	28%	79–117, 127–163, 170–203, 211–384, 393–397, 403, 412, 416, 420
open_ai_realtime_webrtc.py	122	83	32%	61–63, 66–74, 84–129, 134–141, 144–171, 179–188, 192–211
open_ai_realtime_websocket.py	58	29	50%	46–64, 67–73, 83–86, 91–94
utils.py	40	32	20%	38–44, 54, 69–126
semantic_kernel/contents
audio_content.py	25	2	92%	81, 86
binary_content.py	115	16	86%	81, 120, 138–139, 180–184, 192–198
function_call_content.py	107	3	97%	197, 225–226
semantic_kernel/contents/utils
data_uri.py	101	4	96%	44–45, 68, 133
TOTAL	17721	2182	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
3049	4 💤	0 ❌	0 🔥	1m 20s ⏱️

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_base.py

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_websocket.py

python/semantic_kernel/connectors/ai/open_ai/services/realtime/open_ai_realtime_base.py

docs/decisions/00XX-realtime-api-clients.md

moonbox3 · 2025-01-27T04:25:19Z

docs/decisions/00XX-realtime-api-clients.md

+
+# Content and Events
+
+## Considered Options - Content and Events


Should we call out whether the “control” versus “content” distinction is a fundamental part of real-time interaction or just an implementation detail? For example, OpenAI distinguishes control events (input_audio_buffer.committed) from content events (conversation.item.create), while Google appears to treat everything as part of a unified content stream (BidiGenerateContent*).

This distinction might influence our decision in a few ways:

If the distinction is inherent to real-time systems, separating control from content may result in a cleaner, more flexible design.

However, if it’s just a specific quirk of OpenAI’s API, enforcing it could complicate support for providers like Google that don’t make the same distinction.

On the other hand, ignoring OpenAI’s finer-grained controls might limit the ability to fully utilize other features in the future.

I think it would make sense to call this out explicitly in the doc and could provide additional context for why we’re choosing one approach over the other.

eavanvalkenburg requested a review from a team as a code owner January 8, 2025 16:04

eavanvalkenburg marked this pull request as draft January 8, 2025 16:04

markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Jan 8, 2025

TaoChenOSU reviewed Jan 8, 2025

View reviewed changes

eavanvalkenburg force-pushed the realtime branch from f83002c to 1b2eaaf Compare January 9, 2025 15:55

markwallace-microsoft added the documentation label Jan 10, 2025

eavanvalkenburg force-pushed the realtime branch 4 times, most recently from 20f5270 to 2afa19f Compare January 23, 2025 10:04

moonbox3 reviewed Jan 27, 2025

View reviewed changes

markwallace-microsoft removed the documentation label Jan 31, 2025

eavanvalkenburg and others added 16 commits January 31, 2025 15:51

draft initial implementation of Realtime API

bee1817

major update

9cd8c9e

updated note

9778a5c

reverted some changes

7cb5683

WIP ADR

f924f4e

small updates

a3df625

webrtc WIP

ab8c082

updated ADR

7754aab

webrtc working!

284475b

added dependency

5924ba6

added dep

d25a03a

added nd

edae188

renamed

bf4d61d

changed import

9427d63

binary content fix

26b17d4

restructured

0eaa574

eavanvalkenburg added 8 commits January 31, 2025 15:52

fix import

3dacf8c

small optimization in code

6395419

updates to the ADR

ccd84b4

import improvements

3d4ad22

updated code and ADR

e68bd82

wip on redoing the api

d451023

WIP

cf18596

removed built-in audio players, split for websocket and rtc

ad2ec58

eavanvalkenburg force-pushed the realtime branch from 9268d10 to ad2ec58 Compare January 31, 2025 14:53

add image event import

f94b631

eavanvalkenburg mentioned this pull request Jan 31, 2025

Python: Realtime API #10073

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: draft initial implementation of Realtime API #10127

Python: draft initial implementation of Realtime API #10127

eavanvalkenburg commented Jan 8, 2025

markwallace-microsoft commented Jan 9, 2025 •

edited

Loading

moonbox3 Jan 27, 2025


		# Content and Events

		## Considered Options - Content and Events

Python: draft initial implementation of Realtime API #10127

Are you sure you want to change the base?

Python: draft initial implementation of Realtime API #10127

Conversation

eavanvalkenburg commented Jan 8, 2025

Motivation and Context

Description

Contribution Checklist

markwallace-microsoft commented Jan 9, 2025 • edited Loading

Python Unit Test Overview

moonbox3 Jan 27, 2025

Choose a reason for hiding this comment

markwallace-microsoft commented Jan 9, 2025 •

edited

Loading