user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

zacharyw · 2024-11-27T14:19:41Z

Hello - I'm not sure if this is a bug, or just something I'm doing wrong.

I am creating a model:

model = openai.realtime.RealtimeModel(
            instructions=data['globalPrompt'],
            voice='shimmer',
            temperature=0.8,
            # max_response_output_tokens=float('inf'),
            modalities=['audio'],
            turn_detection=openai.realtime.ServerVadOptions(
                threshold=0.9, prefix_padding_ms=200, silence_duration_ms=500
            ),
        )

agent = MultimodalAgent(model=model)

I have event handlers defined for when speech is committed:

@agent.on("user_speech_committed")
def on_user_speech_committed(msg: llm.ChatMessage):
    logger.info(f"User speech committed: {msg}")
        
@agent.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent speech committed: {msg}")

During a conversation, the agent_speech_committed event is fired normally and the msg param contains the AI's response.

However, the user_speech_committed event is never picked up.

In addition, in the debug logs, I can see a user conversation item being created with audio, but the transcription is blank:

DEBUG livekit.plugins.openai.realtime - conversation item created {"type": "conversation.item.created", "event_id": "event_AYCyhuvHvIlGLSpkTu6MH", "previous_item_id": "item_AYCyaCV55zc87azG5Z4cz", "item": {"id": "item_AYCyhNzFR1Nobo2kKvcCW", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "input_audio", "transcript": null}]}, "pid": 1181, "job_id": "AJ_XvWaLLkWk3Hv"}

I'm not sure if that could be related to the event not firing or not.

The text was updated successfully, but these errors were encountered:

longcw · 2024-11-27T14:32:41Z

The transcription is expected to be empty when the conversation item is created. The transcription should be included in a message sent later by the realtime API, and the user_speech_committed event will be emitted when the agent receives the transcription.

There should be a debug log for committed user speech for example

2024-11-27 22:28:40,980 - DEBUG livekit.agents - committed user speech {"user_transcript": "Hello, hello.\n", "pid": 686607, "job_id": "AJ_2QWC7zGGTTk9"}

If it's not there, could you share more logs for debugging?

zacharyw · 2024-11-27T15:08:44Z

Hmm, I restarted my docker container, without having changed anything, and now the event is being picked up it looks like, and I'm seeing events trigger on both sides now, sorry for the errant issue.

I will say though that the transcription is radically different from actual audio that the AI picked up and used. I'm imagining this is due to discrepancies between the realtime model and the whisper model used to generate the transcript?

I'm not sure if there's anything I can do to improve that, though.

zacharyw added the question Further information is requested label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024

zacharyw commented Nov 27, 2024

user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

Comments

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024

zacharyw commented Nov 27, 2024