Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_audio_in_queue Getting Stucked and not getting the InputAudioRawFrame coming from DailyInput #721

Open
keenborder786 opened this issue Nov 15, 2024 · 19 comments
Assignees

Comments

@keenborder786
Copy link

keenborder786 commented Nov 15, 2024

Description

It is a bug. At some point of a call following is happening:

  • DailyInputTransport._audio_in_task_handler is working as expected
  • InputAudioRawFrame is pushed from DailyInputTransport
  • But then InputAudioRawFrame is never consumed by _audio_in_queue as _audio_task_handler of BaseInputTransport is never able to get the InputAudioRawFrame and bot becomes deaf as nothing is pushed in the VAD or the other Frames in the pipeline.

Environment

  • pipecat-ai version: 0.0.47
  • python version: 3.12
  • OS: Mac OS 14.4

Issue description

The issue is that, at some point, the _audio_in_queue stops receiving InputAudioRawFrame objects, even though they are still being sent from Daily. We are not entirely sure why this happens. However, when it does, the InputAudioRawFrame objects continue arriving from Daily but are not pushed into the asyncio queue. This essentially crashes the entire pipeline.

You can observe in the logs that when this occurs, the User-Started-Speaking log statement is also not printed. Additionally, when a participant leaves, Pipecat cancels all tasks. Strangely, after this, we start seeing the User-Started-Speaking and User-Stopped-Speaking statements, which should have been logged earlier. Also the main task for pipecat bot is kept hanging running.

My suspicion is that a task is not executing properly, which is causing a blockage in the queue.

Repro steps

The bug is difficult to reproduce as it occurs randomly. However, it is most often observed later in the call when the user asks a question, and a tool is invoked. Additionally, we noticed that when this bug happens, the TTS response is abruptly cut off in the middle.

#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

import asyncio
import aiohttp
import os
import sys
import argparse
import requests
from functools import partial
from pipecat.audio.vad.silero import SileroVADAnalyzer, VADParams
from pipecat.frames.frames import LLMMessagesFrame, TextFrame, EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
from pipecat.services.deepgram import DeepgramSTTService
from runner import configure
from openai.types.chat import ChatCompletionToolParam
from loguru import logger
import random
from dotenv import load_dotenv


load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
PARTICIPANT_ID = [""] # this will get filled in later
DEEPGRAM_API_KEY = ""
DAILY_API_KEY = ""
CARTESIA_API_KEY = ""
DAILY_ROOM_NAME = ""
TO = ""
FROM_DAILY_CALLER_ID = ""
QUESTION_AFFIRMATION = [
    "Excellent question!",
    "That's a solid question!",
    "Good question!",
    "Great question!",
    "I am glad you asked that!",
]
VOICEMAIL_EXAMPLES = [
    "We are sorry, there is noone available to take your call...",
    "Please leave a message for ",
    "Please leave your name and phone and I'll get back to you as soon as I can",
    "Your call has been forwarded to voicemail, the person you are trying to reach is not available",
    "The person you're trying to reach is not available",
    "Hey, it's (user's name) leave a message",
    "Hey you reached (user's name), please leave a message",
    "Hi/Hey I'm not available please leave a message",
    "The number you're trying to reach...",
    "I'll get back to you as soon as possible",
    "This voicemail is not receiving any messages",
    "My VM is full so If you need to reach me, please text me",
    "Leave a message and I'll call you back",
    "You've reached my cell phone, I'm not available",
    "We are sorry, there is noneone available to take your call...",
]

async def configure(aiohttp_session: aiohttp.ClientSession):
    (url, token, _) = await configure_with_args(aiohttp_session)
    return (url, token)


async def configure_with_args(
    aiohttp_session: aiohttp.ClientSession, parser: argparse.ArgumentParser | None = None
):
    if not parser:
        parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
    parser.add_argument(
        "-u", "--url", type=str, required=False, help="URL of the Daily room to join"
    )
    parser.add_argument(
        "-k",
        "--apikey",
        type=str,
        required=False,
        help="Daily API Key (needed to create an owner token for the room)",
    )

    args, unknown = parser.parse_known_args()

    url = f"https://biglysales-team.daily.co/{DAILY_ROOM_NAME}"
    key = DAILY_API_KEY

    if not url:
        raise Exception(
            "No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
        )

    if not key:
        raise Exception(
            "No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
        )

    daily_rest_helper = DailyRESTHelper(
        daily_api_key=key,
        daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
        aiohttp_session=aiohttp_session,
    )

    # Create a meeting token for the given room with an expiration 1 hour in
    # the future.
    expiry_time: float = 60 * 60

    token = await daily_rest_helper.get_token(url, expiry_time)

    return (url, token, args)


async def cancel_transport_input_task(transport_input) -> None:
    transport_input._audio_task.cancel()
    await transport_input._audio_task

def get_cartesia_static_response(text: str, voice_id: str, model_id: str, **kwargs) -> bytes:
    """
    Makes an API call to Cartesia to generate TTS audio bytes.

    Args:
        text (str): The transcript text.
        voice_id (str): The ID of the voice to be used.
        model_id (str): The model ID for the TTS request.
        **kwargs: Additional parameters like output format.

    Returns:
        bytes: The audio bytes returned by the API.

    Raises:
        ValueError: If the API request fails.
    """
    try:
        output_format = kwargs.get("output_format")
        response = requests.post(
            "https://api.cartesia.ai/tts/bytes",
            headers={"X-API-Key": CARTESIA_API_KEY, "Cartesia-Version": "2024-06-10", "Content-Type": "application/json"},
            json={
                "model_id": model_id,
                "transcript": text,
                "voice": {"mode": "id", "id": voice_id},
                "output_format": output_format,
            },
        )
        response.raise_for_status()
        logger.info("Cartesia TTS response cached.")
        return response.content
    except Exception as error:
        logger.opt(exception=True).error("Error Occurred while getting Audio Bytes from Cartesia")
        raise error


async def say_agent_response(
    end_response,
    transport,
    audio_end_buffer: float,
    cancel_input_audio: bool = True,
) -> None:
    """
    Sends the agent's audio response via the specified transport and adds a buffer delay after playback.
    Optionally cancels any ongoing audio input tasks before playing the response.

    Args:
        end_response (bytes): The audio response to be played.
        transport (Union[DailyTransport, FastAPIWebsocketTransport]): The transport handling audio I/O.
        audio_end_buffer (int): Time (in seconds) to wait after playing the audio.
        cancel_input_audio (bool, optional): Whether to cancel ongoing audio input tasks. Defaults to True.

    Retries:
        Retries up to 3 times in case of an exception, with a 2-second delay between attempts.
    """
    try:
        if cancel_input_audio and transport:
            logger.info("Canceling the Audio Input Task")
            await cancel_transport_input_task(transport._input)
        await transport._output.write_raw_audio_frames(end_response)
        await asyncio.sleep(audio_end_buffer)
    except AttributeError:
        logger.info("Audio Input Transport already cancelled")
    except Exception as error:
        logger.opt(exception=True).error("Error Occurred while Uttering Agent Response.")
        raise error
    
async def get_meeting_dates(
    function_name: str,
    tool_call_id: str,
    _args,
    llm,
    context: OpenAILLMContext,
    result_callback,
) -> None:
    logger.info("Invoking `get_meeting_dates` tool with argument {_args}", _args=_args)
    await result_callback("""
    "1- 8AM Eastern on November 16 2024\n\n2- 9AM Eastern on November 16 2024\n\n3- 10AM Eastern on November 16 2024\n\n4- 11AM Eastern on November 16 2024\n\n5- 12PM Eastern on November 16 2024\n\n6- 1PM Eastern on November 16 2024\n\n7- 2PM Eastern on November 16 2024\n\n8- 3PM Eastern on November 16 2024\n\n9- 4PM Eastern on November 16 2024\n\n10- 5PM Eastern on November 16 2024\n\n11- 6PM Eastern on November 16 2024\n\n12- 7PM Eastern on November 16 2024\n\n"
    """)


async def transfer_call(
    function_name: str,
    tool_call_id: str,
    _args,
    llm,
    context: OpenAILLMContext,
    result_callback,
) -> None:
    await result_callback("CALL TRANSFERED")


async def voicemail(
    function_name: str,
    tool_call_id: str,
    _args,
    llm,
    context,
    result_callback,
    transport
) -> None:
    logger.info("Invoking `voicemail` tool with argument {_args}", _args=_args)
    await transport.stop_dialout(PARTICIPANT_ID[0])
    await result_callback("VOICEMAIL DETECTED")


async def get_knowledge_base(
    function_name: str, tool_call_id: str, _args, llm, context: OpenAILLMContext, result_callback
) -> None:
    logger.info("Invoking `get_knowledge_base` tool with argument {_args}", _args=_args)

    kb_call_reason = _args.get("reason", "GENERAL_QUESTION")
    if kb_call_reason == "GENERAL_QUESTION":
        await llm.push_frame(TextFrame(random.choice(QUESTION_AFFIRMATION)))
    await result_callback("""Sure! Here are some random facts that could be associated with a solar panel company: 1. **History**: The company was founded in 2008 by a group of renewable energy enthusiasts who wanted to make solar power more accessible to homeowners and businesses. 2. **Headquarters**: Their headquarters is a net-zero energy building powered entirely by their own solar panels, showcasing their commitment to sustainability. 3. **Products**: They produce three main types of solar panels: monocrystalline, polycrystalline, and thin-film, catering to different customer needs and budgets. 4. **Innovation**: The company holds patents for advanced solar cell technology that increases efficiency by 20% compared to industry standards. 5. **Global Reach**: They have installed solar systems in over 40 countries and have manufacturing plants on three continents.6. **Community Impact**: For every 100 solar panels sold, they donate a panel to a school or community center in underprivileged areas. 7. **Workforce**: The company employs over 5,000 people, 40% of whom are in research and development roles. 8. **Recognition**: They won the “Green Energy Innovator of the Year” award in 2022 for their work on solar panels made from recycled materials. 9. **Sustainability**: Their panels are designed to last 25+ years and are 95% recyclable at the end of their life cycle. 10. **Customer Perks**: They offer a 25-year warranty and real-time monitoring systems that allow users to track energy production via an app. 11. **Mission Statement**: "Empowering the world with clean energy, one panel at a time." 12. **Energy Production**: The combined output of all their installations generates enough electricity to power over 2 million homes annually. 13. **R&D Efforts**: They are actively working on integrating solar panels into everyday items like backpacks and electric vehicles. 14. **Solar Farms**: The company has partnered with governments to develop large-scale solar farms, including one that spans over 10,000 acres. 15. **Future Goals**: By 2030, they aim to make solar power the most affordable energy source worldwide.  Would you like these customized for a specific scenario?""")


async def end_call(
    function_name: str,
    tool_call_id: str,
    _args,
    llm,
    context: OpenAILLMContext,
    result_callback,
    voice_provider: str,
    voice_id: str,
    transport: DailyTransport,
) -> None:
    logger.info("Invoking `end_call` tool with argument {_args}", _args=_args)
    end_call_sentence = _args.get("end_call_sentence", "Thank you for your time have a great day")
    estimated_time_end_call_sentence = 15
    logger.info(
        "Estimated End Call Sentence Time is {estimated_time_end_call_sentence}", estimated_time_end_call_sentence=estimated_time_end_call_sentence
    )
    if voice_provider == "cartesia":
        await say_agent_response(
            get_cartesia_static_response(
                end_call_sentence,
                voice_id,
                "sonic-english",
                output_format={"container": "raw", "encoding": "pcm_s16le", "sample_rate": 16000},
            ),
            transport,
            estimated_time_end_call_sentence,
            True,
        )
    await transport.stop_dialout(PARTICIPANT_ID[0])
    await result_callback("CALL ENDED BY ASSISTANT")



async def main():
    async with aiohttp.ClientSession() as session:
        (room_url, token) = await configure(session)
        stt = DeepgramSTTService(
            api_key=DEEPGRAM_API_KEY,
            audio_passthrough=False,
        )
        transport = DailyTransport(
            room_url,
            token,
            "Test-Bot",
            DailyParams(
                api_key=DAILY_API_KEY,
                audio_in_enabled=True,
                audio_out_enabled=True,
                vad_audio_passthrough=True,
                vad_enabled=True,
                transcription_enabled=False,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(start_secs=0.2)),
            ),
        )

        tts = CartesiaTTSService(
            api_key=CARTESIA_API_KEY,
            voice_id="7360f116-6306-4e9a-b487-1235f35a0f21",
        )

        llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o")
        llm.register_function('get_knowledge_base', get_knowledge_base)
        llm.register_function('end_call', partial(end_call,transport=transport, voice_provider='cartesia', voice_id = '7360f116-6306-4e9a-b487-1235f35a0f21'))
        llm.register_function('transfer_call', transfer_call)
        llm.register_function('voicemail', partial(voicemail, transport = transport))
        llm.register_function('get_meeting_dates', get_meeting_dates)

        tools = [
            ChatCompletionToolParam(
        type="function",
        function={
            "name": "get_knowledge_base",
            "description": """""Always Use this tool:
                - When User have Questions about Company
                - If user is not showing interest and you need to convince them by using information about Company
                - When User have any General Query and Questions
            """,
            "strict": True,  # type: ignore[typeddict-unknown-key]
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "User Question/Concern about Company",
                    },
                    "reason": {
                        "type": "string",
                        "enum": ["CONVINCE", "GENERAL_QUESTION"],
                        "description": "Reason why you are using this tool. This can only be one of the category: `CONVINCE`, `GENERAL_QUESTION`.\
                            `CONVINCE` would be in case you are convincing the user otherwise it is `GENERAL_QUESTION`",
                    },
                },
                "additionalProperties": False,
                "required": ["question", "reason"],
            },
        },
    ),
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "end_call",
            "description": "Use this tool to end the call",
            "strict": True,  # type: ignore[typeddict-unknown-key]
            "parameters": {
                "type": "object",
                "properties": {
                    "end_call_sentence": {
                        "type": "string",
                        "description": "The End Call Sentence that needs to be utter by the AI",
                    }
                },
                "additionalProperties": False,
                "required": ["end_call_sentence"],
            },
        },
    ),
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "transfer_call",
            "description": "Use this tool to transfer the call",
            "strict": True,  # type: ignore[typeddict-unknown-key]
            "parameters": {
                "type": "object",
                "properties": {
                    "transfer_number_sentence": {
                        "type": "string",
                        "description": "The sentence that AI needs to speak before transfering the call",
                    },
                    "transfer_number": {
                        "type": "string",
                        "description": "The number to use to which we need transfer the call",
                    },
                },
                "additionalProperties": False,
                "required": ["transfer_number", "transfer_number_sentence"],
            },
        },
    ),
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "voicemail",
            "description": f"Use this tool if you reach voicemail. Here is some examples: {'\n'.join(VOICEMAIL_EXAMPLES)}",
            "parameters": {},
        },
    ),
    ChatCompletionToolParam(
        type="function",
        function={
            "name": "get_meeting_dates",
            "description": "Use this tool to get the meeting dates in order to schedule a meeting date with the user. The output of this\
                        tool are slots which you need to recommend to the user. The slots are given in numerical list in ascending order\
                            (earlier slots to later slots)",
            "parameters": {},
        },
    )]
        messages = [
            {
                "role": "system",
                "content": """Main Prompt""",
            },
        ]

        context = OpenAILLMContext(messages, tools)
        context_aggregator = llm.create_context_aggregator(context)

        pipeline = Pipeline(
            [
                transport.input(),  # Transport user input
                stt,
                context_aggregator.user(),  # User responses
                llm,  # LLM
                tts,  # TTS
                transport.output(),  # Transport bot output
                context_aggregator.assistant(),  # Assistant spoken responses
            ]
        )

        task = PipelineTask(
            pipeline,
            PipelineParams(
                allow_interruptions=True,
                enable_metrics=True,
                enable_usage_metrics=True,
                report_only_initial_ttfb=True,
            ),
        )

        @transport.event_handler("on_call_state_updated")
        async def on_call_state_updated(transport, state: str) -> None:
            logger.info("Call State Updated, state: {state}", state=state)

            async def _dialout_retry_handler() -> None:
                try:
                    for i in range(3):
                        logger.info("Attempting a Dial-Out, Attempt: {attempt}", attempt=i + 1)
                        await transport.start_dialout(
                            {"phoneNumber": TO, "callerId": FROM_DAILY_CALLER_ID, "video": False}
                        )
                        await asyncio.sleep(15)
                        current_participant_count = transport.participant_counts()
                        if current_participant_count["present"] >= 2:
                            return
                    raise Exception("Unable to perform a dial-out for Daily-Co ROOM")
                except Exception as e:
                    raise e

            async def _dialout_task_exception(task: asyncio.Task) -> None:
                if task.exception():
                    await task.queue_frames([EndFrame()])
                else:
                    logger.info("Dial-out completed successfully.")

            def _handle_dialout_completion(task: asyncio.Task) -> None:
                asyncio.create_task(_dialout_task_exception(task))

            if state == "joined":
                task = transport.input().get_event_loop().create_task(_dialout_retry_handler())
                task.add_done_callback(_handle_dialout_completion)

        # Event handler for on_first_participant_joined
        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant) -> None:
            logger.info("First Participant Joined with ID {participant_id}", participant_id=participant["id"])
            PARTICIPANT_ID[0] = participant["id"]
            await transport.capture_participant_transcription(participant["id"])

       

        # Event handler for on_dialout_answered
        @transport.event_handler("on_dialout_answered")
        async def on_dialout_answered(transport, cdata) -> None:
            logger.info("Dial-Out Answered with data as follow: {data}", data=cdata)
            await task.queue_frames([LLMMessagesFrame(messages)])

        # Event handler for on_participant_left
        @transport.event_handler("on_participant_left")
        async def on_participant_left(transport, participant, reason) -> None:
            logger.info("Following Participant Left {participant}", participant=participant)
            await task.queue_frames([EndFrame()])

        runner = PipelineRunner()

        await runner.run(task)


if __name__ == "__main__":
    asyncio.run(main())

Expected behavior

  • The InputAudioRawFrames Should be pushed in the asyncio queue

Actual behavior

  • The InputAudioRawFrames is not being pushed in the asyncio queue

Logs

  • The bug happens 2024-11-13T20:21:00.339 as you can see TTS service is in middle of saying and then we don't recieve anything else.
2024-11-13T20:21:00.339 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:00.339 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:00.338 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.processors:run_tts:120 | Generating TTS: [Could you please]

2024-11-13T20:21:00.339 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:00.339 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:00.339 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.processors.metrics.frame_processor_metrics:start_tts_usage_metrics:79 | CustomCartesiaTTSService#0 usage characters: 16

2024-11-13T20:21:00.340 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:00.339 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:00.339 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:59 | CustomCartesiaTTSService#0 processing time: 0.0005865097045898438

2024-11-13T20:21:00.340 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:00.340 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:00.339 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.processors:report_metrics:769 | Following metric data was reported: CustomCartesiaTTSService#0 - ProcessingMetricsData - 0.0005865097045898438 for following call 773c7908-c060-4357-9ee2-2f830d054ec2

2024-11-13T20:21:00.490 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:00.490 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:00.489 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.processors:report_metrics:769 | Following metric data was reported: CustomCartesiaTTSService#0 - TTFBMetricsData - 0.1501784324645996 for following call 773c7908-c060-4357-9ee2-2f830d054ec2

2024-11-13T20:21:01.458 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:01.457 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:01.457 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_output:_bot_stopped_speaking:328 | Bot stopped speaking

2024-11-13T20:21:22.000 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:22.000 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:21.999 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.services.tts_api:get_cartesia_static_response:53 | Cartesia TTS response cached.

2024-11-13T20:21:43.835 app[e82de19f7d4528] mia [info] 2024-11-13 20:21:43.834 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:21:43.834 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.services.tts_api:get_cartesia_static_response:53 | Cartesia TTS response cached.

2024-11-13T20:22:03.781 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.781 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.780 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.services.daily:on_participant_left:535 | Participant left faf7b96c-a3bf-4cd3-a59b-93d99f63965e

2024-11-13T20:22:03.782 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.781 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDERR: {"timestamp":"2024-11-13T20:22:03.781217Z","level":"ERROR","fields":{"message":"no subscription for consumer: ConsumerId(\"47df4197-7642-4f7a-aeb0-33f5859551cb\")"},"target":"daily_core::call_manager::events::from_sfu::soup_consumer_closed"}

2024-11-13T20:22:03.782 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.782 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.781 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.transport_handlers:on_participant_left:296 | Following Participant Left {'info': {'isOwner': False, 'permissions': {'hasPresence': True, 'canSend': ['customAudio', 'screenAudio', 'screenVideo', 'camera', 'microphone', 'customVideo'], 'canAdmin': ['participants', 'streaming', 'transcription']}, 'isLocal': False, 'userName': '+15612544007', 'joinedAt': 1731529142}, 'id': 'faf7b96c-a3bf-4cd3-a59b-93d99f63965e'}

2024-11-13T20:22:03.782 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.782 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.781 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.transport_handlers:on_participant_left:300 | Attempting to Stop Recording

2024-11-13T20:22:03.783 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.782 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.782 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.transport_handlers:on_participant_left:302 | Recording has been stopped

2024-11-13T20:22:03.783 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.783 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.782 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.transport_handlers:on_participant_left:306 | The Participants in Call Now: {'local': {'info': {'userName': 'groundupsolar-BOT', 'joinedAt': 1731529137, 'isLocal': True, 'isOwner': False, 'permissions': {'hasPresence': True, 'canAdmin': ['participants', 'transcription'], 'canSend': ['camera', 'microphone', 'screenAudio', 'screenVideo', 'customVideo', 'customAudio']}}, 'id': '8a737641-a975-4524-9a4a-24dcd0ceb506', 'media': {'customAudio': {}, 'screenVideo': {'subscribed': 'unsubscribed', 'state': 'off', 'offReasons': ['user']}, 'customVideo': {}, 'screenAudio': {'subscribed': 'unsubscribed', 'state': 'off', 'offReasons': ['user']}, 'microphone': {'subscribed': 'unsubscribed', 'state': 'playable', 'track': {'id': '457d03f5-d3e8-4246-87c7-b4737529c1c3'}}, 'camera': {'offReasons': ['user'], 'state': 'off', 'subscribed': 'unsubscribed'}}}}

2024-11-13T20:22:03.818 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.817 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.817 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

2024-11-13T20:22:03.832 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.831 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDERR: {"timestamp":"2024-11-13T20:22:03.831744Z","level":"ERROR","fields":{"message":"Failed to close consumer: ConsumerNoLongerExists(ConsumerId(\"47df4197-7642-4f7a-aeb0-33f5859551cb\"))"},"target":"daily_core::call_manager::events::subscription::common"}

2024-11-13T20:22:03.889 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.888 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.887 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:115 | User started speaking

2024-11-13T20:22:03.889 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.889 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDERR: tasks cancelled error:

2024-11-13T20:22:03.889 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:03.889 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:03.889 | ERROR | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | sentry_sdk.integrations.logging:sentry_patched_callhandlers:98 | tasks cancelled error:

2024-11-13T20:22:04.102 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.102 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.101 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.services.deepgram:_disconnect:201 | DeepgramSTTService#0: Disconnected from Deepgram

2024-11-13T20:22:04.107 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.106 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.106 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.services.openai:_stream_chat_completions:165 | Generating chat: [{"role": "system", "content": "\n[Identity]\nYou are calling potential leads interested in purchasing solar panels. Your goal is to verify if the customer qualifies and then schedule a meeting if they qualify. A Potential Customer who might be interested in Solar Panels\n\n[Style]\n\n[Style]\n- You Respond in a natural, human-like way.\n- You speak in the Hemingway style.\n- You do not overly flatter people.\n- You love to be brief and concise.\n- You are not wordy nor verbose.\n- You ask only one question at a time.\n- Be creative when talking to the end user meaning do not repeat the same sentences.\n\n\n[Input]\n#QUALIFYING_QUESTION_GRAPH\n {\"Do you have a single family home?\": {\"Yes\": \"What is your average utility bill?\", \"No\": \"NOT_QUALIFY_FINAL_RESPONSE_<Unfortunately, you do not qualify for our Solar Panel Program. Thank you for your time. Have a great day.>\"}, \"What is your average utility bill?\": {\"More than $150. You must get a exact number. Re-ask this question if they don't provide a number.\": \"Does your roof get direct sunlight?\", \"Less than $150\": \"NOT_QUALIFY_FINAL_RESPONSE_<Unfortunately, you do not qualify for our Solar Panel Program. Thank you for your time. Have a great day.>\"}, \"Does your roof get direct sunlight?\": {\"Yes\": \"please confirm your property address\", \"No\": \"NOT_QUALIFY_FINAL_RESPONSE_<Unfortunately, you do not qualify for our Solar Panel Program. Thank you for your time. Have a great day.>\"}, \"Please confirm your property address\": {\"User must provide a valid address\": \"QUALIFY_MEETING_SCHEDULE_<>\"}}\n - This is a Qualifying Question JSON in the following format:\n {\n \"Q_1\":{\n \"Q_1_User_Answer_1\":\"Q_2\",\n \"Q_1_User_Answer_2\":\"Q_3\"\n },\n \"Q_2\":{\n \"Q_2_User_Answer_1\":\"<QUALIFY/NOT_QUALIFY>_<ACTION>_<ACTION_INPUT>\",\n \"Q_2_User_Answer_2\":\"Q_5\"\n },\n \"Q_3\":{\n \"Q_3_User_Answer_1\":\"<QUALIFY/NOT_QUALIFY>_<ACTION>_<ACTION_INPUT>\",\n \"Q_3_User_Answer_2\":\"<QUALIFY/NOT_QUALIFY>_<ACTION>_<ACTION_INPUT>\"\n }\n }\n - You need to ask the Qualifying Question from the User and Depending on their on response using `QUALIFYING_QUESTION_GRAPH`, ask the next Qualifying Question \n unless you reached `<QUALIFY/NOT_QUALIFY>_<ACTION>_<ACTION_INPUT>`\n **Take Note of following when navigating**\n - If you reach QUALIFY_TRANSFER_CALL_<ACTION_INPUT>:\n - <ACTION_INPUT> is the transfer number. You need to invoke `transfer_call` where `transfer_number` is coming from <ACTION_INPUT> and generate `transfer_number_sentence` as per conversation context.\n - If you reach QUALIFY_SEND_SMS_<ACTION_INPUT>:\n - <ACTION_INPUT> is a string for SMS content. You need to invoke `send_sms` where `send_content` is coming from <ACTION_INPUT>.\n - If you reach NOT_QUALIFY_FINAL_RESPONSE_<ACTION_INPUT>:\n - <ACTION_INPUT> is a string for end response. You need to invoke `end_call` where `end_call_sentence` is coming from <ACTION_INPUT>.\n - If you reach NOT_QUALIFY_SEND_SMS_<ACTION_INPUT>:\n - <ACTION_INPUT> is a string for SMS content. You need to invoke `send_sms` where `send_content` is coming from <ACTION_INPUT>.\n - If you reach NOT_QUALIFY_TRANSFER_CALL_<ACTION_INPUT>:\n - <ACTION_INPUT> is the transfer number. You need to invoke `transfer_call` where `transfer_number` is coming from <ACTION_INPUT> and generate `transfer_number_sentence` as per conversation context.\n\n[Instructions when Doing Tasks]\n- Do NOT REPEAT Yourself.\n- DO NOT ASK PERMISSIONS from the USER.\n- DO NOT APOLOGISE to the USER.\n- DO NOT ASK ANY EXTRA QUESTIONS that are not listed in the `QUALIFYING_QUESTION_GRAPH`.\n- If user is not interested in the call then try convincing them using information from `get_knowledge_base`.\n- If user have any questions about the company then answer them using information from `get_knowledge_base`.\n- Always invoke `company_rest_api_endpoint` using \n- Whenever you need to look up something using your knowledge base, use `get_knowledge_base`\n- After every answer from the user given to the `QUALIFYING_QUESTION_GRAPH`, first give a relevant affirmation and then move to the next qualifying question.\n- Provide a direct answer to the user's question first.\n- Always first answer the user question about the company using `get_knowledge_base` before moving back to doing your tasks.\nAfter booking the meeting, obtain a picture of the electric bill, specifically the last 12 months usage, so we can calculate their average monthly bill and the amount of savings we can provide them.\n\n\n[Tasks]\n1. Say, \"Hi, Can I please speak to Tom\"\n<wait for user to respond>\n2. Say, \"This is Mike with Ground Up Solar calling on a recorded line. You had recently inquired about solar panels for your home. Ground up Solar can reduce your monthly payment 50% or more from day one with no money down. Are you still in the market?\"\n<wait for user to respond>\n3. Now start navigating through `QUALIFYING_QUESTION_GRAPH`.\n4. Now Only if you reach `QUALIFY_MEETING_SCHEDULE` then you need to schedule a meeting using the following rules:\n a. Use `get_meeting_dates` tool to bring in opening meeting dates. As soon as tool has been executed then immediately move to step b.\n b. Now, you need to recommend a single slot to the user from the the slots gotten from `get_meeting_dates` and wait for user to respond to the proposed time slot. If user agrees to the proposed slot then immediately move to step c without saying anything else. Otherwise if user DOES NOT agree to the proposed slot, ask the user the day and time that works best for them and wait for them to respond.\n **Instructions for Step b**\n - Today Date is 2024-11-13 3:18PM\n - If proposed slot is today then say today instead of uttering the slot as it is otherwise state the slot as you got it from `get_meeting_dates` without changing a word.\n - If proposed slot is tomorrow then say tomorrow instead of uttering the slot as it is otherwise state the slot as you got it from `get_meeting_dates` without changing a word.\n - DO NOT recommend multiple times at once. Recommend only one time at a time and then wait for user response.\n - DO NOT say the Year Part.\n - DO NOT say EST.\n - You must confirm both the Day and Time of the Meeting from the user.\n - Start by first recommending the earliest times available.\n c. Confirm the User has explicitly agreed to the proposed meeting time, if not then move back to step b otherwise move to step d.\n d. Ask the user to text a picture of both front and back of their power bills from the last 12 months to the phone number I am calling you from. This will allow us to prepare an accurate quote of how much money we can save you for our meeting with solar expert. And Then move to step e. \n e. End the call using `end_call` tool after user has agreed to a meeting time while invoking `end_call_sentence` with the conversation context.\n", "name": "system"}, {"role": "assistant", "content": "Hi, can I please speak to Tom?"}, {"role": "user", "content": "Yeah. This is Tom. Who's this?"}, {"role": "assistant", "content": "This is Mike with Ground Up Solar calling on a recorded line. You had recently inquired about solar panels for your home. Ground Up Solar can reduce your monthly payment by 50% or more from day one with no money down. Are you still in the market?"}, {"role": "user", "content": "Yeah. I'm still in the market."}, {"role": "assistant", "content": "Great to hear that. Do you have a single-family home?"}, {"role": "user", "content": "So how does this work? Would I get a loan or something?"}, {"role": "assistant", "content": "I am glad you asked that!"}, {"role": "assistant", "tool_calls": [{"id": "call_oDUludYQZl83X7RtOOogXvRG", "function": {"name": "get_knowledge_base", "arguments": "{\"question\": \"How does the solar panel financing work?\", \"reason\": \"GENERAL_QUESTION\"}"}, "type": "function"}]}, {"role": "tool", "content": "\"CompanyFacts\\n \\n\\n CompanyHistory\\n GroundUp Solar was created to provide affordable solar energy solutions to homeowners\\n\\n WhyUs\\n We offer some of the most advanced solar panels in the market We hold over 25 patents for highefficiency solar cells ensuring the largest cost savings\\n\\n FAQS\\n 1 How much does it cost \\n We offer the best prices in the south The cost of a solar system depends on the size location and type of installation Schedule an appointment to have a no obligation quote\\n\\n 2 Will installing solar panels increase my property value \\n Yes homes with solar systems typically sell faster and for more money than homes without panels\\n\\n 3 What areas do you cover \\n we have the best prices on solar in the southern United States\\n\\n 4 Can solar panels completely power my home \\n Yes a properly sized solar system can meet 100 of your home\\u2019s energy needs\\n\\n 5 Do you cover Florida \\n Yes we have the best prices on solar in Florida\\n\\n25 Can solar panels completely power my home \\n Yes a properly sized solar system can meet 100 of your home\\u2019s energy needs\\n\\n 26 How do I get started with solar installation \\n To get started schedule a consultation with a solar expert\\n\\n 27 Do you cover North Arkansas \\n Yes we have the best prices on solar in Arkansas\\n\\n 28 Do you cover Mississippi \\n Yes we have the best prices on solar in Mississippi\\n\\n 29 Do you cover West Virginia \\n Yes we have the best prices on solar in West Virginia\\n\\n 30 Do you work in North Carolina \\n Yes we have the best prices on solar in North Carolina\\n\\n 31 How can you do that \\n As soon as your credit is approved we can start the process\\n\\n \\n\\n PricingRules\\n None\\n\\n16 What types of panels do you provide \\n We provide all three main types of panels monocrystalline polycrystalline and thin film\\n\\n 17 Do you charge for the estimate \\n No We provide a no obligation quote\\n\\n 18 \\n \\n\\n 19 \\n \\n\\n 20 Do you do generators \\n Yes We provide generators and power walls as well as panels Our company offers topnotch generator services to ensure you have a reliable backup power source\\n\\n 21 Will solar panels damage my roof \\n Solar panels will not damage your roof In fact they can help protect the roof from weathering\\n\\n 22 \\n \\n\\n 23 Do I need to replace my roof before installing solar panels \\n No If your roof needs to be replaced our partners can help with that just let me know\\n\\n 24 Will installing solar panels increase my property value \\n Yes homes with solar systems typically sell faster and for more money than homes without panels\"", "tool_call_id": "call_oDUludYQZl83X7RtOOogXvRG"}, {"role": "assistant", "content": "We offer financing options that allow you to start with no money down. As soon as your credit is approved, we can begin the process. Now, do you have a single-family home?"}, {"role": "user", "content": "Goddamn it."}, {"role": "assistant", "content": "Do you have a single-family home?"}, {"role": "user", "content": "Now I want to do this. Yeah. Did you go deaf on me? I do have a single family home."}, {"role": "assistant", "content": "Great! What's your average utility bill?"}, {"role": "user", "content": "I don't know."}, {"role": "assistant", "content": "Could you please"}, {"role": "user", "content": "A few 100"}]

2024-11-13T20:22:04.161 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.160 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.160 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

2024-11-13T20:22:04.336 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.336 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.335 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:115 | User started speaking

2024-11-13T20:22:04.337 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.336 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.336 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:44 | CustomOpenAILLMService#0 TTFB: 0.23023509979248047

2024-11-13T20:22:04.337 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.337 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.337 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.processors.metrics.frame_processor_metrics:stop_processing_metrics:59 | CustomOpenAILLMService#0 processing time: 0.23050856590270996

2024-11-13T20:22:04.337 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.337 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.337 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.processors:report_metrics:769 | Following metric data was reported: CustomOpenAILLMService#0 - TTFBMetricsData - 0.23023509979248047 for following call 773c7908-c060-4357-9ee2-2f830d054ec2

2024-11-13T20:22:04.338 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.337 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.337 | INFO | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | bigly_pipecat.processors:report_metrics:769 | Following metric data was reported: CustomOpenAILLMService#0 - ProcessingMetricsData - 0.23050856590270996 for following call 773c7908-c060-4357-9ee2-2f830d054ec2

2024-11-13T20:22:04.385 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.384 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.384 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

2024-11-13T20:22:04.821 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.820 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.820 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:115 | User started speaking

2024-11-13T20:22:04.933 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:04.933 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:04.932 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

2024-11-13T20:22:05.533 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:05.532 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:05.532 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:115 | User started speaking

2024-11-13T20:22:05.674 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:05.674 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:05.673 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

2024-11-13T20:22:05.865 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:05.864 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:05.864 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:115 | User started speaking

2024-11-13T20:22:05.909 app[e82de19f7d4528] mia [info] 2024-11-13 20:22:05.909 | INFO | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-13 20:22:05.908 | DEBUG | CALL-ID=773c7908-c060-4357-9ee2-2f830d054ec2 | pipecat.transports.base_input:_handle_interruptions:122 | User stopped speaking

We also added some logging statements in the pipecat source which leads us to the conclusion that no InputAudioRawFrames are placed in _audio_in_queue, following are the logs for that. As you can see that InputAudioFrame are coming from Daily but are not pushed in the pipeline:

2024-11-15 03:22:24.047 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#3168(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.047 | INFO     | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-15 03:22:24.047 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#3168(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.047 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.base_input:_audio_task_handler:165 | Pushing the Audio Frame in PipeCat Pipeline InputAudioRawFrame#3168(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.047 | INFO     | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-15 03:22:24.047 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.base_input:_audio_task_handler:165 | Pushing the Audio Frame in PipeCat Pipeline InputAudioRawFrame#3168(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.067 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#3169(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.067 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.base_input:_audio_task_handler:165 | Pushing the Audio Frame in PipeCat Pipeline InputAudioRawFrame#3169(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.067 | INFO     | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-15 03:22:24.067 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#3169(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.067 | INFO     | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-15 03:22:24.067 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.base_input:_audio_task_handler:165 | Pushing the Audio Frame in PipeCat Pipeline InputAudioRawFrame#3169(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:22:24.087 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#3170(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
<Crash Happens>

2024-11-15 03:23:50.137 | INFO     | CALL-ID= | bigly_pipecat.bot_spawner:stream_output:62 | STDOUT: 2024-11-15 03:23:50.137 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#7344(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-15 03:23:50.157 | INFO     | CALL-ID=dce7c7e1-dc7b-4dd9-83b6-d51f35656f38 | pipecat.transports.services.daily:_audio_in_task_handler:666 | Getting Daily Audio InputAudioRawFrame#7345(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
@aconchillo
Copy link
Contributor

My suspicion is that a task is not executing properly, which is causing a blockage in the queue.

Usually when this happens is becausse the asyncio event loop is blocked and usually what blocks it is some blocking call. It can also be that there is some exception that is not being triggered properly. There's this bug reported: #684

If you are running a custom Pipecat (i.e. from the repo) would it be possible for you to re-raise the exception in the line mentioned in that bug report?

Repro steps

The bug is difficult to reproduce as it occurs randomly. However, it is most often observed later in the call when the user asks a question, and a tool is invoked. Additionally, we noticed that when this bug happens, the TTS response is abruptly cut off in the middle.

Does this happen with any specific function call or any of them?

@keenborder786
Copy link
Author

@aconchillo Usually what I have noticed that it happens when it uses get_meeting_dates or get_knowledge_base tool. I feel like I should add exception for those tools as well.

@keenborder786
Copy link
Author

@aconchillo We noticed sometimes it also occurs at the very start of the call as well. There is no specific pattern!!!

@keenborder786
Copy link
Author

keenborder786 commented Nov 18, 2024

@aconchillo No we are NOT using a fork pipecat

@rusg77
Copy link

rusg77 commented Nov 19, 2024

@aconchillo
SileroOnnxModel.call
is blocking event loop.

Hard to reproduce. Any suggestion how to fix it?

@keenborder786
Copy link
Author

@aconchillo also as you pointed their is now exception being catch their as well so we never get to know!!!

@rusg77
Copy link

rusg77 commented Nov 22, 2024

@aconchillo Finally good logs
How to reproduce: interrupt bot until it's stuck.
I'm attacing full log. But it's now it's clear what's happeing

Flow:

  1. UserStartedSpeakingFrame#2 is triggered - https://github.com/rusg77/pipecat/blob/2b543057357d868f6029c8bccf9471305c4ea444/src/pipecat/processors/audio/vad/silero.py#L89
  2. StartInterruptionFrame pushed. https://github.com/rusg77/pipecat/blob/2b543057357d868f6029c8bccf9471305c4ea444/src/pipecat/processors/audio/vad/silero.py#L64
  3. All frame processors are starting interruption https://github.com/rusg77/pipecat/blob/2b543057357d868f6029c8bccf9471305c4ea444/src/pipecat/processors/frame_processor.py#L156
  4. Here I added more detailed log messages: https://github.com/rusg77/pipecat/blob/2b543057357d868f6029c8bccf9471305c4ea444/src/pipecat/processors/frame_processor.py#L191
2024-11-21 22:29:33.803 | DEBUG    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:_handle_interruptions:122 | User started speaking
2024-11-21 22:29:33.803 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:193 | DailyInputTransport _start_interruption: 10
2024-11-21 22:29:33.803 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:195 | DailyInputTransport _start_interruption: 20
2024-11-21 22:29:33.803 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:198 | DailyInputTransport _start_interruption: 30
2024-11-21 22:29:33.803 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:221 | DailyInputTransport __create_push_task: 10
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:223 | DailyInputTransport __create_push_task: 20
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:225 | DailyInputTransport __create_push_task: 30
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:201 | DailyInputTransport _start_interruption: 40
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:210 | DailyInputTransport Pushing StartInterruptionFrame#2 from DailyInputTransport#0 to DeepgramSTTService#0
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:193 | DeepgramSTTService _start_interruption: 10
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:195 | DeepgramSTTService _start_interruption: 20
2024-11-21 22:29:33.804 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:198 | DeepgramSTTService _start_interruption: 30
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:221 | DeepgramSTTService __create_push_task: 10
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:223 | DeepgramSTTService __create_push_task: 20
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:225 | DeepgramSTTService __create_push_task: 30
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:201 | DeepgramSTTService _start_interruption: 40
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:210 | DeepgramSTTService Pushing StartInterruptionFrame#2 from DeepgramSTTService#0 to OpenAIUserContextAggregator#0
2024-11-21 22:29:33.805 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:193 | OpenAIUserContextAggregator _start_interruption: 10
2024-11-21 22:29:33.806 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:195 | OpenAIUserContextAggregator _start_interruption: 20
2024-11-21 22:29:33.806 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:212 | OpenAIUserContextAggregator Done pushing from OpenAIUserContextAggregator#0 to OpenAILLMService#0
2024-11-21 22:29:33.806 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:210 | OpenAILLMService Pushing LLMFullResponseEndFrame#2 from OpenAILLMService#0 to CartesiaTTSService#0
2024-11-21 22:29:33.807 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:212 | OpenAILLMService Done pushing LLMFullResponseEndFrame#2 from OpenAILLMService#0 to CartesiaTTSService#0
2024-11-21 22:29:33.818 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.services.daily:read_next_audio_frame:271 | len(audio): 640
  1. For some reason (new frame?) OpenAIUserContextAggregator is stuck on await self.__push_frame_task, which blocks entire pipeline
  2. After this event you can see in the log, that audio_queue started to grow
2024-11-21 22:29:33.819 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:69 | Pushing audio frame queue size 0
2024-11-21 22:29:33.819 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:71 | Done pushing audio frame InputAudioRawFrame#1341(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.838 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.services.daily:read_next_audio_frame:271 | len(audio): 640
2024-11-21 22:29:33.838 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:67 | Pushing audio frame InputAudioRawFrame#1342(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.839 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:69 | Pushing audio frame queue size 1
2024-11-21 22:29:33.839 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:71 | Done pushing audio frame InputAudioRawFrame#1342(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.858 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.services.daily:read_next_audio_frame:271 | len(audio): 640
2024-11-21 22:29:33.858 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:67 | Pushing audio frame InputAudioRawFrame#1343(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.859 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:69 | Pushing audio frame queue size 2
2024-11-21 22:29:33.859 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:71 | Done pushing audio frame InputAudioRawFrame#1343(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.878 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.services.daily:read_next_audio_frame:271 | len(audio): 640
2024-11-21 22:29:33.878 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:67 | Pushing audio frame InputAudioRawFrame#1344(pts: None, size: 640, frames: 320, sample_rate: 16000, channels: 1)
2024-11-21 22:29:33.879 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.transports.base_input:push_audio_frame:69 | Pushing audio frame queue size 3

https://github.com/rusg77/pipecat/blob/2b543057357d868f6029c8bccf9471305c4ea444/src/pipecat/transports/base_input.py#L69

  1. Once I hang up, EndFrame was triggered and pipeline finally cancelled await self.__push_frame_task
2024-11-21 22:29:52.150 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__internal_push_frame:210 | DailyInputTransport Pushing EndFrame#0 from DailyInputTransport#0 to DeepgramSTTService#0
2024-11-21 22:29:52.150 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:_start_interruption:198 | OpenAIUserContextAggregator _start_interruption: 30
2024-11-21 22:29:52.150 | TRACE    | CALL-ID=76a510df-9ab8-4fa1-bc31-2ad7f49382c3 | pipecat.processors.frame_processor:__create_push_task:221 | OpenAIUserContextAggregator __create_push_task: 10```

@GavinVS
Copy link

GavinVS commented Nov 25, 2024

So I'm also experiencing this - spent some time debugging and basically found two issues:

First, in _start_interruption() (frame_processor.py, line 222), the two awaits need to be wrapped in error handling. Otherwise it's possible for them to throw an error and we exit the method without restarting the handler loop.

Second (and I think this is actually the cause of the issue), in __push_frame_task_handler() and maybe also __input_frame_task_handler() in the same file, it depends on catching the CancelledError to break the loop. However, if any downstream (nested) awaits catch the CancelledError, it won't be propagated up to here and the loop won't break. Then _start_interruption() hangs forever because its sitting there in await self.__cancel_push_task()

I checked, and some of the processors definitely catch the CancelledError - one example is the CartesiaTTSService line 257. At this point, it just becomes a race condition - if we happen to be inside that task while the interruption starts, the CancelledError won't break the outer loop.

@jamsea
Copy link
Contributor

jamsea commented Nov 26, 2024

I'm deeply investigating this. For demos WITHOUT llm function calls I can't re-create this. My working theory is some error related to function calling in openai.py isn't working as expected when there are a lot of interruptions causing things to get into a weird state

@ajobi-uhc
Copy link

ajobi-uhc commented Nov 26, 2024

I have reproduced this as well! Using Websockets though NOT the daily transport layer. I believe it has to do with the base input transport layer and the audio_task_handler being blocked as we see no logs from it when the issue is occurring and assuming the task is canceled or is just hanging. No function calls are being used in my repro

@vipyne
Copy link
Member

vipyne commented Nov 26, 2024

@ajobi-uhc could you share your repro code? or logs?

@ajobi-uhc
Copy link

ajobi-uhc commented Nov 26, 2024

Cant share repo code very easily yet but these are the logs while the bug is active (ie bot is not answering but pipeline is running) See how we are adding frames via the push_audio_frame task without clearing the queue - I have logging in the audio_task_handler but nothing was being logged as the task (I believe) is somehow blocked

app-1  | 19:53:09 | INFO     | _handle_vad                    | User started speaking 1732650789.7765822
app-1  | 19:53:09 | DEBUG    | _handle_interruptions          | User started speaking
app-1  | 19:53:09 | DEBUG    | process_frame                  | > Frame: StartInterruptionFrame#7
app-1  | 19:53:09 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 92, Full: False
app-1  | 19:53:10 | DEBUG    | _bot_stopped_speaking          | Bot stopped speaking
app-1  | 19:53:10 | INFO     | process_frame                  | Bot stopped speaking.
app-1  | 19:53:11 | INFO     | websocket_handler              | Pushing audio frame
app-1  | 19:53:12 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 469, Full: False
app-1  | 19:53:15 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 845, Full: False
app-1  | 19:53:18 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 1221, Full: False
app-1  | 19:53:21 | INFO     | websocket_handler              | Pushing audio frame
app-1  | 19:53:21 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 1596, Full: False
app-1  | 19:53:24 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 1972, Full: False
app-1  | 19:53:27 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 2349, Full: False
app-1  | 19:53:30 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 2724, Full: False
app-1  | 19:53:31 | INFO     | websocket_handler              | Pushing audio frame
app-1  | 19:53:33 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 3100, Full: False
app-1  | 19:53:36 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 3475, Full: False
app-1  | 19:53:39 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 3851, Full: False
app-1  | 19:53:41 | INFO     | websocket_handler              | Pushing audio frame
app-1  | 19:53:42 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 4226, Full: False
app-1  | 19:53:45 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 4601, Full: False
app-1  | 19:53:48 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 4976, Full: False
app-1  | 19:53:51 | INFO     | websocket_handler              | Pushing audio frame
app-1  | 19:53:52 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 5352, Full: False
app-1  | 19:53:55 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 5729, Full: False
app-1  | 19:53:58 | INFO     | push_audio_frame               | Pushing audio frame to queue. Queue size before push: 6104, Full: False

Here are the relevant functions where I have the logging taking place
When we push input raw audio frames (https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/transports/network/websocket_server.py#L109)

And when we push the input raw audio frame into the queue
https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/transports/base_input.py#L74

I have logging throughout the audio task handler even before it tries to get from the audio_in_queue and it never triggers as soon as the bug is active (https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/transports/base_input.py#L169)

Replicating the bug is very hard - you have to interrupt the bot just as it's in the middle of completing a speech generation which is in line with what @GavinVS mentioned as a possible cause of the error.

@jamsea jamsea self-assigned this Nov 27, 2024
@jamsea
Copy link
Contributor

jamsea commented Nov 27, 2024

@ajobi-uhc @GavinVS I've tried randomly killing Cartesia when it's in the middle of processing https://github.com/pipecat-ai/pipecat/pull/750/files#diff-3d731f119bd0e348b74be27b419995a1a34ad733cb8dd12ab75f3a0fc52592c5R227-R229 Passing nothing vs pushing an ErrorFrame here doesn't make a difference. I can also still see the LLM generating text so I don't think killing Cartesia simulates the issue like I thought it would.

I've also made a bot with no input transport that just continuously interrupts itself, and I'm not running into the issue https://github.com/pipecat-ai/pipecat/pull/750/files#diff-89c69c8c75fd520b80f636375afbc8fda2f6ecffcc3872087447213e1d7f49cc

The main thing I'm trying to do now is consistently reproduce the interruption issue that causes the queue to keep filling up. It's not necessarily wrong for the queue to keep filling up if the bot's in an interrupt state, but it is wrong that it never gets out of that state. Looking at logs where the bot gets "stuck" there's always an equal number of StartInterruption and StopInterruption frames in the right order, so that doesn't seem to be an issue either.

I'm still investigating this every day 👀

@ajobi-uhc
Copy link

how are you replicating the issue on your end? @jamsea do you have some protocol to get it to trigger? I've literally just been saying "hello" at the right times when I expect it to be in the middle of a tts generation and it triggers the bug (usually) - hoping there's an easier way

@jamsea
Copy link
Contributor

jamsea commented Nov 27, 2024

@ajobi-uhc I can only replicate this when using an LLM that's calling a function. Specifically, LLM function calls that make HTTP requests (hardcoded function call responses are all fine). Even then, it only happens less than 10% of the time for me. When I'm just talking to the WebSocket transport bot I can never get it stuck.

I agree, though; the LLM function call route is probably just causing the error for me because it's adding a bit of indeterminism to the timing when trying to interrupt the bot.

The main time sink has been just getting a consistent repro. Getting the bug to reproduce consistently will speed everything else up.

@ajobi-uhc
Copy link

@jamsea is there any update on this? its getting quite urgent for us as users routinely seem to get the bug in prod every 10 mins, any new findings that can help me debug on our side?

@aconchillo
Copy link
Contributor

@jamsea is there any update on this? its getting quite urgent for us as users routinely seem to get the bug in prod every 10 mins, any new findings that can help me debug on our side?

I will start looking at this later today. I'd really like to know what's going on and have a fix for the next Pipecat release. bear with me. 🙏

@ajobi-uhc
Copy link

Still getting this issue but while debugging another issue, I believe I found a possible cause. To recap if you interrupt at the right time just as the ai is generating something the whole pipeline gets "blocked" and the audio_task_handler that processes the the audio_in_queue is blocked

Another issue is that if the user has a weird cadence in their speech and stops speaking for a second and resumes, there's a race condition where the received audio from the tts generation is played even though the user is currently speaking, so debugging this found two issues

  1. under the race condition the resume processing frames never triggers as the required frame isn't received and the pipeline stays in a paused state effectively blocking everything
  2. the recieve_task_handler never actually checks if we are currently interrupted or not, ie it just plays the received audio which means the bot talks over the user.

In my fix I've added also check for StartInterruptionFrames in the resume_processing _pipeline and check if self._started in the receive task handler to push frames only if we can speak

@tombigly
Copy link

I just had a call crash. Issue seems to remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants