How to Return Intermediate Progress of Nested Reasoning? #10333

ymuichiro · 2025-01-29T02:12:50Z

ymuichiro
Jan 29, 2025

I would appreciate any comments from those who have good suggestions. I am developing an AI agent using semantic kernel. Initially, I was satisfied with function calling and reasoning tasks.

However, recently I've been developing features that handle multiple tasks with a single instruction, such as "information gathering → action → document creation → result reporting." Additionally, as the number of plugins has increased, the difficulty of selection in function calling has gradually increased. Therefore, I am working on multi-agent implementation and nested reasoning within plugins (executing multiple reasoning tasks together within plugins) to maintain accuracy.

The accuracy has improved with the above approach. However, the following issues have emerged:

With Multi-Agent and reasoning execution within plugins, it takes time to return results to user requests.
Since 1 is executed separately from the initial function calling, it is not reflected in streaming responses.

Here's one extreme example. There are plugins in this state. There are plugins that package multiple reasoning tasks to control specific thought flows. (In reality, it's more complex, with Plugin A containing 10-50 reasoning steps.)

from typing import cast

from semantic_kernel.connectors.ai import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureChatPromptExecutionSettings
from semantic_kernel.contents import ChatHistory
from semantic_kernel.functions import kernel_function
from semantic_kernel.kernel import Kernel

kernel = Kernel()
kernel.add_service(service=AzureChatCompletion(service_id="default"))
service = cast(AzureChatCompletion, kernel.get_service(service_id="default"))


class PluginA:
    def __init__(self) -> None:
        pass

    @kernel_function(description="dummy")
    async def run(self, args: str) -> str:
        chat_history = ChatHistory()
        chat_history.add_user_message(args)
        settings = service.get_prompt_execution_settings_class()(service_id=service.service_id)

        if isinstance(settings, AzureChatPromptExecutionSettings):
            settings.function_choice_behavior = FunctionChoiceBehavior.Auto(
                filters={"excluded_plugins": [PluginA.__name__]}
            )

        r = await service.get_chat_message_contents(chat_history, settings)

        # ... any

        return r[0].content


async def main():
    kernel.add_plugin(PluginA, plugin_name=PluginA.__name__)
    # kernel.add_plugin(PluginB, plugin_name="etc...")
    # ...

    settings = service.get_prompt_execution_settings_class()(service_id=service.service_id)

    chat_history = ChatHistory()
    chat_history.add_user_message("dummy")

    if isinstance(settings, AzureChatPromptExecutionSettings):
        settings.function_choice_behavior = FunctionChoiceBehavior.Auto(auto_invoke=True)

    async for chunk in service.get_streaming_chat_message_contents(chat_history, settings):
        if chunk:
            print(chunk)

The problem is that in such cases, we cannot return the reasoning progress within the plugin as a stream. This has increased user waiting time and caused a decline in UX.

What approach would you take in this situation...?

Answered by westey-m

Jan 31, 2025

Narrowing down the set of available functions for function calling using RAG might help to increase accuracy. See this sample on how to achieve that:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Optimization/PluginSelectionWithFilters.cs#L104

This sounds like a complex system though, so multiple strategies may have to be considered to speed things up, e.g:

Using smaller, and therefore also faster, models for tasks that don't require the full power of the larger models, to reduce latency, e.g. GPT-4o-mini instead of GPT-4o
Use parallel function invocation / calls, see https://learn.microsoft.com/en-us/semantic-kernel/concepts/ai-services/chat-completion/f…

View full answer

sophialagerkranspandey · 2025-01-30T17:51:42Z

sophialagerkranspandey
Jan 30, 2025
Collaborator

I would recommend storing a plugin in a vector search, adding @westey-m

2 replies

westey-m Jan 31, 2025
Collaborator

Narrowing down the set of available functions for function calling using RAG might help to increase accuracy. See this sample on how to achieve that:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Optimization/PluginSelectionWithFilters.cs#L104

This sounds like a complex system though, so multiple strategies may have to be considered to speed things up, e.g:

Using smaller, and therefore also faster, models for tasks that don't require the full power of the larger models, to reduce latency, e.g. GPT-4o-mini instead of GPT-4o
Use parallel function invocation / calls, see https://learn.microsoft.com/en-us/semantic-kernel/concepts/ai-services/chat-completion/function-calling/function-choice-behaviors#function-choice-behavior-options

@ymuichiro, can you also confirm the plugin selection issue you are seeing? E.g. are you seeing selection accuracy issues as the number of plugins increase?

Also adding @moonbox3 and @crickman who may have some further ideas on how to structure a large/complex set of agents to improve both actual performance and perceived performance.

Answer selected by sophialagerkranspandey

ymuichiro Feb 1, 2025
Author

Thank you for the detailed information. While I was writing additional explanations to your response, I came up with a better approach inspired by your suggestion about narrowing down plugins. Thank you.

Specifically, instead of controlling all steps through function calling, I'd like to try an implementation where we:

Create controllable areas using conventional coding methods
Let the LLM create plans that include choices like plugin selection and parallelization
Implement streaming while dynamically selecting between function calling and other patterns within each planned task

I'll experiment with this approach soon. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Return Intermediate Progress of Nested Reasoning? #10333

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to Return Intermediate Progress of Nested Reasoning? #10333

ymuichiro Jan 29, 2025

Replies: 1 comment · 2 replies

sophialagerkranspandey Jan 30, 2025 Collaborator

westey-m Jan 31, 2025 Collaborator

ymuichiro Feb 1, 2025 Author

ymuichiro
Jan 29, 2025

Replies: 1 comment 2 replies

sophialagerkranspandey
Jan 30, 2025
Collaborator

westey-m Jan 31, 2025
Collaborator

ymuichiro Feb 1, 2025
Author