Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_speech_committed event is never fired using RealtimeModel/MultimodalAgent #1142

Open
zacharyw opened this issue Nov 27, 2024 · 2 comments
Labels
question Further information is requested

Comments

@zacharyw
Copy link

Hello - I'm not sure if this is a bug, or just something I'm doing wrong.

I am creating a model:

model = openai.realtime.RealtimeModel(
            instructions=data['globalPrompt'],
            voice='shimmer',
            temperature=0.8,
            # max_response_output_tokens=float('inf'),
            modalities=['audio'],
            turn_detection=openai.realtime.ServerVadOptions(
                threshold=0.9, prefix_padding_ms=200, silence_duration_ms=500
            ),
        )

agent = MultimodalAgent(model=model)

I have event handlers defined for when speech is committed:

@agent.on("user_speech_committed")
def on_user_speech_committed(msg: llm.ChatMessage):
    logger.info(f"User speech committed: {msg}")
        
@agent.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent speech committed: {msg}")

During a conversation, the agent_speech_committed event is fired normally and the msg param contains the AI's response.

However, the user_speech_committed event is never picked up.

In addition, in the debug logs, I can see a user conversation item being created with audio, but the transcription is blank:

DEBUG livekit.plugins.openai.realtime - conversation item created {"type": "conversation.item.created", "event_id": "event_AYCyhuvHvIlGLSpkTu6MH", "previous_item_id": "item_AYCyaCV55zc87azG5Z4cz", "item": {"id": "item_AYCyhNzFR1Nobo2kKvcCW", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "input_audio", "transcript": null}]}, "pid": 1181, "job_id": "AJ_XvWaLLkWk3Hv"}

I'm not sure if that could be related to the event not firing or not.

@zacharyw zacharyw added the question Further information is requested label Nov 27, 2024
@longcw
Copy link
Collaborator

longcw commented Nov 27, 2024

The transcription is expected to be empty when the conversation item is created. The transcription should be included in a message sent later by the realtime API, and the user_speech_committed event will be emitted when the agent receives the transcription.

There should be a debug log for committed user speech for example

2024-11-27 22:28:40,980 - DEBUG livekit.agents - committed user speech {"user_transcript": "Hello, hello.\n", "pid": 686607, "job_id": "AJ_2QWC7zGGTTk9"}

If it's not there, could you share more logs for debugging?

@zacharyw
Copy link
Author

Hmm, I restarted my docker container, without having changed anything, and now the event is being picked up it looks like, and I'm seeing events trigger on both sides now, sorry for the errant issue.

I will say though that the transcription is radically different from actual audio that the AI picked up and used. I'm imagining this is due to discrepancies between the realtime model and the whisper model used to generate the transcript?

I'm not sure if there's anything I can do to improve that, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants