You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello - I'm not sure if this is a bug or something I'm doing wrong.
Using RealtimeModel`MultimodalAgent` I am attempting to start a conversation and seed it with some conversation history so the user can pick up where they left off when the conversation.
I am setting the modality to "audio" to try and ensure text responses aren't being used, but I sometimes get this warning/error, and no audio output is produced. It seems to happen somewhat randomly: the more convo history there is, the more likely it seems to happen.
Some code for how I've set things up:
model = openai.realtime.RealtimeModel(
instructions=data['globalPrompt'],
voice='shimmer',
temperature=0.8,
# max_response_output_tokens=float('inf'),
modalities=['audio'],
turn_detection=openai.realtime.ServerVadOptions(
threshold=0.9, prefix_padding_ms=200, silence_duration_ms=500
),
)
agent = MultimodalAgent(model=model)
agent.start(ctx.room)
logger.info("starting agent")
session = model.sessions[0]
# Add messages to conversation history if needed
for message in data.get('messages', []):
logger.info(f"role: {message['role']}, content: {message['content']}")
session.conversation.item.create(
llm.ChatMessage(role='assistant' if message['role'] == 'system' else 'user', content=message['content'])
)
session.response.create()
messages is an array of messages returned from my API that just contains content (a string) and role (a string). It seems that setting role to "assistant" instead of "system" seems to reduce the frequency of this issue, but it could be a placebo effect.
First most likely the Realtime model doesn't support system role. And this is more like a bug in the model side that there is a chance to response in text mode if there are some text chat ctx initialized (even assistant role has a higher chance to trigger the text mode than user).
Here is a PR just got merged #1121 to try to recover to the audio mode by deleting the text response and appending an empty audio to the chat history.
Oh thank you - looks like I need to get onto the cutting edge of changes then. Out of curiosity could you explain why setting the modalities to audio and text is required despite text being an undesirable state to get into?
Hello - I'm not sure if this is a bug or something I'm doing wrong.
Using
RealtimeModel
`MultimodalAgent` I am attempting to start a conversation and seed it with some conversation history so the user can pick up where they left off when the conversation.I am setting the modality to "audio" to try and ensure text responses aren't being used, but I sometimes get this warning/error, and no audio output is produced. It seems to happen somewhat randomly: the more convo history there is, the more likely it seems to happen.
Some code for how I've set things up:
messages
is an array of messages returned from my API that just containscontent
(a string) androle
(a string). It seems that setting role to "assistant" instead of "system" seems to reduce the frequency of this issue, but it could be a placebo effect.This code is based on the example code from the integration guide: https://docs.livekit.io/agents/openai/multimodalagent/
Which is interesting that this example code sets audio and text modalities, when text doesn't really seem to work at all.
The text was updated successfully, but these errors were encountered: