Getting a "WARNING livekit.agents - The realtime API returned a text content part, which is not supported" warning during conversations #1143

zacharyw · 2024-11-27T14:27:27Z

Hello - I'm not sure if this is a bug or something I'm doing wrong.

Using RealtimeModel`MultimodalAgent` I am attempting to start a conversation and seed it with some conversation history so the user can pick up where they left off when the conversation.

I am setting the modality to "audio" to try and ensure text responses aren't being used, but I sometimes get this warning/error, and no audio output is produced. It seems to happen somewhat randomly: the more convo history there is, the more likely it seems to happen.

Some code for how I've set things up:

model = openai.realtime.RealtimeModel(
            instructions=data['globalPrompt'],
            voice='shimmer',
            temperature=0.8,
            # max_response_output_tokens=float('inf'),
            modalities=['audio'],
            turn_detection=openai.realtime.ServerVadOptions(
                threshold=0.9, prefix_padding_ms=200, silence_duration_ms=500
            ),
        )

agent = MultimodalAgent(model=model)

agent.start(ctx.room)

logger.info("starting agent")
        
session = model.sessions[0]
# Add messages to conversation history if needed
for message in data.get('messages', []):
    logger.info(f"role: {message['role']}, content: {message['content']}")
    session.conversation.item.create(
        llm.ChatMessage(role='assistant' if message['role'] == 'system' else 'user', content=message['content'])
    )
        
session.response.create()

messages is an array of messages returned from my API that just contains content (a string) and role (a string). It seems that setting role to "assistant" instead of "system" seems to reduce the frequency of this issue, but it could be a placebo effect.

This code is based on the example code from the integration guide: https://docs.livekit.io/agents/openai/multimodalagent/

Which is interesting that this example code sets audio and text modalities, when text doesn't really seem to work at all.

The text was updated successfully, but these errors were encountered:

longcw · 2024-11-27T14:39:50Z

First most likely the Realtime model doesn't support system role. And this is more like a bug in the model side that there is a chance to response in text mode if there are some text chat ctx initialized (even assistant role has a higher chance to trigger the text mode than user).

Here is a PR just got merged #1121 to try to recover to the audio mode by deleting the text response and appending an empty audio to the chat history.

zacharyw · 2024-11-27T15:03:21Z

Oh thank you - looks like I need to get onto the cutting edge of changes then. Out of curiosity could you explain why setting the modalities to audio and text is required despite text being an undesirable state to get into?

longcw · 2024-11-27T15:07:57Z

There is no way to set the API audio-only, from the document https://platform.openai.com/docs/api-reference/realtime-client-events/session/update, we can only set the mode to ["text"] to disable audio, but not the other way around.

zacharyw added the question Further information is requested label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting a "WARNING livekit.agents - The realtime API returned a text content part, which is not supported" warning during conversations #1143

Getting a "WARNING livekit.agents - The realtime API returned a text content part, which is not supported" warning during conversations #1143

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024

Getting a "WARNING livekit.agents - The realtime API returned a text content part, which is not supported" warning during conversations #1143

Getting a "WARNING livekit.agents - The realtime API returned a text content part, which is not supported" warning during conversations #1143

Comments

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024

zacharyw commented Nov 27, 2024

longcw commented Nov 27, 2024