[Platform] Introduce `Speech` support #943

Guikingone · 2025-11-22T18:04:39Z

Q	A
Bug fix?	no
New feature?	yes
Docs?	yes
Issues	--
License	MIT

OskarStark · 2025-11-23T09:30:23Z

To me we maybe should introduce capabilities also to platforms rather than having a voice component. As far as I understand I cannot use the Voice component standalone, right?

I don't think a dedicated component is the way to go here

Guikingone · 2025-11-23T09:32:20Z

We can introduce it via the Platform, could be easier, the voice can be used without agents but it will requires the Platform at least.

Will update the PR to match this approach 👍🏻

OskarStark · 2025-11-23T09:33:36Z

I agree, Agent scope is not needed 👍🏻

chr-hertel · 2025-11-23T10:29:49Z

Hi @Guikingone, i agree that week lack some kind of guidance on how voices work - but same goes for other binary stuff like creating images or videos.

so two things i would like to understand

what's the high-level goal here - like what do you want to build?
why is it an extra component and not part of Platform?

btw, "speech" is more common than "vioce" isn't it?
btw2, have you seen the demo around audio and video?

Guikingone · 2025-11-23T10:35:36Z

what's the high-level goal here - like what do you want to build?

The main goal is to add the capacity to have an agent/platform that can "listen" and answer to inputs thanks to voice / speech (voice is used as a sugar here, could be renamed to speech), creating a workflow where you can submit voice, call the platform that transforms it to speech / text (depending on the situation you're in) and returning it to the user without frictions.

why is it an extra component and not part of Platform?

It is now part of Platform, I just pushed an update on it following the comment from @OskarStark.

btw, "speech" is more common than "voice" isn't it?

Agreed, could be renamed to Speech.

btw2, have you seen the demo around audio and video?

Yes, the goal is to ease it with a "built-in" approach / API that stays transparent for the user.

chr-hertel · 2025-11-23T10:50:15Z

just realized we should the "audio" demo to "speech" as well - and i'm def not really happy with that solution there.

can we make it as easy as the structured output - like with an listener?

i like that starting point:

$result = $platform->invoke('eleven_multilingual_v2', new Text('Hello world'), [
    'voice' => 'Dslrhjl3ZpzrctukrQSN', // Brad (https://elevenlabs.io/app/voice-library?voiceId=Dslrhjl3ZpzrctukrQSN)
]);

echo $result->asVoice();

what would be the return type here? would it be same as asBinary() or asDataUri()

Guikingone · 2025-11-23T14:42:20Z

can we make it as easy as the structured output - like with an listener?

Could be something to explore, the API is not locked for now.

what would be the return type here? would it be same as asBinary() or asDataUri()

My first approach was to do the same thing as asBinary to ease the usage.

src/agent/composer.json

This PR was merged into the main branch. Discussion ---------- [Demo][Website] Rename audio demo to speech | Q | A | ------------- | --- | Bug fix? | no | New feature? | no | Docs? | | Issues | | License | MIT Following a discussion of #943 Commits ------- ffc2b64 Rename audio demo to speech

Guikingone · 2025-11-25T12:56:56Z

Well, might seems weird but here we go, stt, tts and sts are working like a charm ... 👀

Guikingone · 2025-12-18T16:44:45Z

Friendly ping for @OskarStark and @chr-hertel (when you have time, I have other PRs to finish in parallel 😅 ):

I know that this PR was controversial and not optimal when started, here's the latest version (a better one IMO):

Speech options are now part of platforms (only EL for now) to ease the configuration for everyone
Once the speech options are defined, this trigger a decoration using a SpeechPlatformInterface (that requires a custom implementation, more on this choice later) of the platform (this way, we can access both indepently) that receives both the platform and the options.
Each platform that support speech must define an implementation of SpeechPlatformInterface, for EL, it's defined via this PR, once defined, the interface exposes two methods: generate and listen (same behavior than before), this part can probably be unified as a single behavior for every platform in the listener, could be something to explore.
Once defined, the decorated platform is injected using tags in SpeechListener, a subscriber that listens Invocation and Result events (again, nothing new here).
The previous API around speech bag and speech is kept (as we can have multiple platform for speech, multiple generation, etc).
The OptionsResolver is now a dependency of ElevenLabs bridge to ease the configuration validation

My first idea was to work around the client but as it's not injected in the container, bad luck, the platform is easier to decorate and we can keep the public API "as it" without new methods and so on.

Let me know if something is not clear or still controversial / not optimized, I'm pretty sure it can be improved / eased 🙏🏻

Guikingone · 2026-01-02T18:32:29Z

Hi @OskarStark / @chr-hertel, hope you're fine (and happy new year 😄 ) 👋🏻

Here's the latest version of this PR, I reworked it from the ground up and removed all the extra complexity, the main logic use a SpeechConfiguration class and the SpeechListener, that's it (can't trim it further to be fair, the best I can do to remove code is closing the PR 😅), the configuration has been updated for both ElevenLabs and Cartesia, same keys, same behavior.

Events are used, no decoration on the platform, just plain platforms injected through tags, I also improved the MessageBag API to introduce replace and latestAs methods (more informations in the tests) used to support the feature.

The documentation is updated as long as agents files and examples.

The CI is failing due to twig (no changes on this part via this PR, weird), when you have time to review it / gave feedback, I can dive into more details if needed 🙂

Guikingone force-pushed the agent/voice_provider branch from 2c573eb to 8dd5cd5 Compare November 23, 2025 09:30

Guikingone changed the title ~~[Voice] Introduce the component~~ [Platform] Introduce VoiceProviders and VoiceListeners Nov 23, 2025

Guikingone changed the title ~~[Platform] Introduce VoiceProviders and VoiceListeners~~ [Platform] Introduce Speech support via Platform Nov 23, 2025

OskarStark reviewed Nov 23, 2025

View reviewed changes

src/agent/composer.json Outdated Show resolved Hide resolved

Guikingone force-pushed the agent/voice_provider branch from 79ddf87 to f011c3e Compare November 23, 2025 17:41

chr-hertel mentioned this pull request Nov 23, 2025

[Demo][Website] Rename audio demo to speech #958

Merged

Guikingone force-pushed the agent/voice_provider branch from dcae952 to be04280 Compare November 24, 2025 14:32

OskarStark changed the title ~~[Platform] Introduce Speech support via Platform~~ [Platform] Introduce Speech support Nov 24, 2025

Guikingone force-pushed the agent/voice_provider branch from be04280 to b319521 Compare November 25, 2025 12:49

Guikingone force-pushed the agent/voice_provider branch 3 times, most recently from 120f391 to 1963409 Compare November 26, 2025 12:42

Guikingone marked this pull request as ready for review November 26, 2025 12:44

Guikingone requested review from Nyholm and chr-hertel as code owners November 26, 2025 12:44

carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Nov 26, 2025

Guikingone marked this pull request as draft November 26, 2025 12:46

Guikingone marked this pull request as ready for review November 26, 2025 13:00

Guikingone force-pushed the agent/voice_provider branch from be85dda to 74bd8cb Compare November 26, 2025 13:00

Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from 1ce622e to a4821e8 Compare December 18, 2025 16:28

Guikingone force-pushed the agent/voice_provider branch 4 times, most recently from 5382e82 to 1bb192e Compare December 22, 2025 13:11

Guikingone mentioned this pull request Dec 23, 2025

[Platform] Exposing clients to the container #1265

Closed

Guikingone force-pushed the agent/voice_provider branch from 1bb192e to 1328590 Compare December 24, 2025 18:28

Guikingone force-pushed the agent/voice_provider branch 2 times, most recently from 7767b3e to d67d33b Compare January 2, 2026 18:15

Guikingone force-pushed the agent/voice_provider branch 6 times, most recently from 42dc667 to 5e79d24 Compare January 12, 2026 08:05

Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from 75fd1df to 0c00d21 Compare January 16, 2026 10:36

Guikingone mentioned this pull request Jan 18, 2026

[Platform] ElevenLabs definitions rework #1273

Closed

Guikingone added 2 commits January 18, 2026 19:29

feat(platform): add Speech

104bc84

deps

537c382

Guikingone force-pushed the agent/voice_provider branch from c8a8615 to 537c382 Compare January 18, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Platform] Introduce `Speech` support #943

[Platform] Introduce `Speech` support #943

Uh oh!

Guikingone commented Nov 22, 2025 •

edited

Loading

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025 •

edited

Loading

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

Uh oh!

Guikingone commented Nov 25, 2025

Uh oh!

Guikingone commented Dec 18, 2025

Uh oh!

Guikingone commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Platform] Introduce Speech support #943

Are you sure you want to change the base?

[Platform] Introduce Speech support #943

Uh oh!

Conversation

Guikingone commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

OskarStark commented Nov 23, 2025

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chr-hertel commented Nov 23, 2025

Uh oh!

Guikingone commented Nov 23, 2025

Uh oh!

Uh oh!

Guikingone commented Nov 25, 2025

Uh oh!

Guikingone commented Dec 18, 2025

Uh oh!

Guikingone commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Platform] Introduce `Speech` support #943

[Platform] Introduce `Speech` support #943

Guikingone commented Nov 22, 2025 •

edited

Loading

Guikingone commented Nov 23, 2025 •

edited

Loading