Skip to content

Conversation

@Guikingone
Copy link
Contributor

@Guikingone Guikingone commented Nov 22, 2025

Q A
Bug fix? no
New feature? yes
Docs? yes
Issues --
License MIT

@OskarStark
Copy link
Contributor

To me we maybe should introduce capabilities also to platforms rather than having a voice component. As far as I understand I cannot use the Voice component standalone, right?

I don't think a dedicated component is the way to go here

@Guikingone
Copy link
Contributor Author

We can introduce it via the Platform, could be easier, the voice can be used without agents but it will requires the Platform at least.

Will update the PR to match this approach 👍🏻

@OskarStark
Copy link
Contributor

I agree, Agent scope is not needed 👍🏻

@Guikingone Guikingone changed the title [Voice] Introduce the component [Platform] Introduce VoiceProviders and VoiceListeners Nov 23, 2025
@chr-hertel
Copy link
Member

Hi @Guikingone, i agree that week lack some kind of guidance on how voices work - but same goes for other binary stuff like creating images or videos.

so two things i would like to understand

  • what's the high-level goal here - like what do you want to build?
  • why is it an extra component and not part of Platform?

btw, "speech" is more common than "vioce" isn't it?
btw2, have you seen the demo around audio and video?

@Guikingone
Copy link
Contributor Author

Guikingone commented Nov 23, 2025

what's the high-level goal here - like what do you want to build?

The main goal is to add the capacity to have an agent/platform that can "listen" and answer to inputs thanks to voice / speech (voice is used as a sugar here, could be renamed to speech), creating a workflow where you can submit voice, call the platform that transforms it to speech / text (depending on the situation you're in) and returning it to the user without frictions.

why is it an extra component and not part of Platform?

It is now part of Platform, I just pushed an update on it following the comment from @OskarStark.

btw, "speech" is more common than "voice" isn't it?

Agreed, could be renamed to Speech.

btw2, have you seen the demo around audio and video?

Yes, the goal is to ease it with a "built-in" approach / API that stays transparent for the user.

@Guikingone Guikingone changed the title [Platform] Introduce VoiceProviders and VoiceListeners [Platform] Introduce Speech support via Platform Nov 23, 2025
@chr-hertel
Copy link
Member

just realized we should the "audio" demo to "speech" as well - and i'm def not really happy with that solution there.

can we make it as easy as the structured output - like with an listener?

i like that starting point:

$result = $platform->invoke('eleven_multilingual_v2', new Text('Hello world'), [
    'voice' => 'Dslrhjl3ZpzrctukrQSN', // Brad (https://elevenlabs.io/app/voice-library?voiceId=Dslrhjl3ZpzrctukrQSN)
]);

echo $result->asVoice();

what would be the return type here? would it be same as asBinary() or asDataUri()

@Guikingone
Copy link
Contributor Author

can we make it as easy as the structured output - like with an listener?

Could be something to explore, the API is not locked for now.

what would be the return type here? would it be same as asBinary() or asDataUri()

My first approach was to do the same thing as asBinary to ease the usage.

OskarStark added a commit that referenced this pull request Nov 24, 2025
This PR was merged into the main branch.

Discussion
----------

[Demo][Website] Rename audio demo to speech

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | no
| Docs?         |
| Issues        |
| License       | MIT

Following a discussion of #943

Commits
-------

ffc2b64 Rename audio demo to speech
@OskarStark OskarStark changed the title [Platform] Introduce Speech support via Platform [Platform] Introduce Speech support Nov 24, 2025
@Guikingone
Copy link
Contributor Author

Well, might seems weird but here we go, stt, tts and sts are working like a charm ... 👀

@Guikingone Guikingone force-pushed the agent/voice_provider branch 3 times, most recently from 120f391 to 1963409 Compare November 26, 2025 12:42
@Guikingone Guikingone marked this pull request as ready for review November 26, 2025 12:44
@carsonbot carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Nov 26, 2025
@Guikingone Guikingone marked this pull request as draft November 26, 2025 12:46
@Guikingone Guikingone marked this pull request as ready for review November 26, 2025 13:00
@Guikingone Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from 1ce622e to a4821e8 Compare December 18, 2025 16:28
@Guikingone
Copy link
Contributor Author

Friendly ping for @OskarStark and @chr-hertel (when you have time, I have other PRs to finish in parallel 😅 ):

I know that this PR was controversial and not optimal when started, here's the latest version (a better one IMO):

  • Speech options are now part of platforms (only EL for now) to ease the configuration for everyone
  • Once the speech options are defined, this trigger a decoration using a SpeechPlatformInterface (that requires a custom implementation, more on this choice later) of the platform (this way, we can access both indepently) that receives both the platform and the options.
  • Each platform that support speech must define an implementation of SpeechPlatformInterface, for EL, it's defined via this PR, once defined, the interface exposes two methods: generate and listen (same behavior than before), this part can probably be unified as a single behavior for every platform in the listener, could be something to explore.
  • Once defined, the decorated platform is injected using tags in SpeechListener, a subscriber that listens Invocation and Result events (again, nothing new here).
  • The previous API around speech bag and speech is kept (as we can have multiple platform for speech, multiple generation, etc).
  • The OptionsResolver is now a dependency of ElevenLabs bridge to ease the configuration validation

My first idea was to work around the client but as it's not injected in the container, bad luck, the platform is easier to decorate and we can keep the public API "as it" without new methods and so on.

Let me know if something is not clear or still controversial / not optimized, I'm pretty sure it can be improved / eased 🙏🏻

@Guikingone Guikingone force-pushed the agent/voice_provider branch 4 times, most recently from 5382e82 to 1bb192e Compare December 22, 2025 13:11
@Guikingone Guikingone force-pushed the agent/voice_provider branch 2 times, most recently from 7767b3e to d67d33b Compare January 2, 2026 18:15
@Guikingone
Copy link
Contributor Author

Hi @OskarStark / @chr-hertel, hope you're fine (and happy new year 😄 ) 👋🏻

Here's the latest version of this PR, I reworked it from the ground up and removed all the extra complexity, the main logic use a SpeechConfiguration class and the SpeechListener, that's it (can't trim it further to be fair, the best I can do to remove code is closing the PR 😅), the configuration has been updated for both ElevenLabs and Cartesia, same keys, same behavior.

Events are used, no decoration on the platform, just plain platforms injected through tags, I also improved the MessageBag API to introduce replace and latestAs methods (more informations in the tests) used to support the feature.

The documentation is updated as long as agents files and examples.

The CI is failing due to twig (no changes on this part via this PR, weird), when you have time to review it / gave feedback, I can dive into more details if needed 🙂

@Guikingone Guikingone force-pushed the agent/voice_provider branch 6 times, most recently from 42dc667 to 5e79d24 Compare January 12, 2026 08:05
@Guikingone Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from 75fd1df to 0c00d21 Compare January 16, 2026 10:36
@Guikingone Guikingone force-pushed the agent/voice_provider branch from c8a8615 to 537c382 Compare January 18, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants