Initial Checks
Description
I was working on mirrored tests in test_agent.py and test_streaming.py for #3523 and discovered there's a difference in sync and streaming version for test_early_strategy_with_external_tool_call test.
Streaming version: link;
Sync version: link;
Both tests have the same input, but the behavior is different (both tests pass).
In streaming, defer tool is handled as final output in this part.
But in sync, the output tool becomes the final output in _agent_graph.
Example Code
https://gist.github.com/Danipulok/04ccb71269064e0bae8963f6e99884ff
Python, Pydantic AI & LLM client version
Python 3.13.4
pydantic==2.12.4
pydantic-ai==1.25.1