Merged
Conversation
…umable agents (#1011) * feat(adk-middleware): robust streaming function call arguments support Refactor the streaming function call dispatch in EventTranslator to fix several correctness issues and add opt-in support for Gemini 3's stream_function_call_arguments mode. Closes #990 ## Breaking Changes None. All changes are backwards-compatible. The new `streaming_function_call_arguments` parameter defaults to False. ## What changed ### Streaming FC dispatch split into two explicit modes (#990) The `is_streaming_fc` condition previously used `func_call.name and will_continue` which was too broad — it matched partial events that should be skipped (no accumulated args yet). The dispatch is now split: - **Mode A** (Gemini 3+ `stream_function_call_arguments`): only active when `streaming_function_call_arguments=True` is passed to EventTranslator/ADKAgent. Triggers on `partial_args`, first chunk (name + will_continue + no args), or end chunk. - **Mode B** (accumulated args / progressive SSE): triggers on `has_args` with `will_continue`, existing tracking, or named FC in a partial event. This is the original behavior. ### Name-based dedup replaced with single-use tracking (#990) `_completed_streaming_fc_names` (a permanent set) suppressed repeat invocations of the same tool. Replaced with `_last_completed_streaming_fc_name` (Optional[str]) that clears after the non-partial event is filtered. TOOL_CALL_RESULT suppression uses the same single-use mechanism with None guards. ### ADK aggregator workarounds for stream_function_call_arguments The predictive_state_updates example includes monkey-patches for two ADK bugs that prevent streaming FC args from working out of the box. Filed upstream as google/adk-python#4311 — workaround code is annotated so it can be removed when the fix ships. ### Examples bumped to google-adk>=1.23.0 The examples pyproject.toml now pins `google-adk>=1.23.0` to test against the latest ADK. The library minimum remains `>=1.16.0`. ### Dojo e2e test comment updated The predictive_state_updates e2e test remains skipped but the comment now explains that the demo works without Vertex AI (falls back to Gemini 2.5 Flash) and documents what credentials enable full streaming. ## Test results 530 passed, 33 skipped (full suite). 3 new tests in test_lro_filtering.py: - test_mode_a_streaming_fc_with_flag_enabled - test_mode_a_first_chunk_skipped_without_flag - test_same_tool_called_twice_not_suppressed Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): remap confirmed FC ids to streaming ids in EventTranslator With PROGRESSIVE_SSE_STREAMING (default since ADK 1.22.0), partial and confirmed events for the same function call carry different ADK-generated ids. The EventTranslator was emitting TOOL_CALL_START/END with the partial id, but ToolCallResultEvent used the confirmed id. This caused _start_new_execution's tool_call_id tracking to never match, so backend tool results were never marked as processed — breaking replay filtering (test_skip_summarization_replay_scenario). Fix: when the confirmed (non-partial) event is filtered out (because the streaming path already emitted it), record a mapping from the confirmed id to the streaming id. _translate_function_response then remaps the ToolCallResultEvent's tool_call_id to match. Also adds two live integration tests for streaming function call arguments via Gemini 3 Pro Preview (skipped without Vertex AI creds): - test_streaming_fc_emits_incremental_tool_call_args - test_streaming_fc_tool_call_ids_consistent_across_result Test results: 565 passed, 0 failed. Addresses #990 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated files.json. * fix(adk-middleware): prevent duplicate TOOL_CALL emission for client-side tools with ResumabilityConfig With ResumabilityConfig, ADK emits client-side function calls from up to three sources with potentially different IDs: the LRO event, a confirmed non-partial event, and the ClientProxyTool execution. This caused the frontend to render tool call results (e.g., HITL task lists) multiple times. The fix introduces three layers of deduplication: - client_tool_names: EventTranslator skips all function calls for tools owned by ClientProxyTool, regardless of ID - client_emitted_tool_call_ids: shared set for ID-based dedup between proxy and translator - translator.emitted_tool_call_ids: proxy skips if translator already emitted (fallback for non-resumable flows) Adds 12 regression tests covering LRO, confirmed, partial, mixed tool call scenarios, and the full end-to-end resumable HITL flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): clear invocation_id after completed HITL resume and flex step count - Clear stored invocation_id after a resumable run completes successfully, preventing subsequent new runs from erroneously attempting HITL resumption with a stale ID (which produced no output) - Update HITL example prompt to respect user-requested step count instead of always generating exactly 10 steps Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updating files.json for latest agent changes. * docs(adk-middleware): add ResumabilityConfig usage guide for HITL workflows Document ADKAgent.from_app() with ResumabilityConfig for human-in-the-loop workflows, including requirements (google-adk >= 1.16.0), how it works, and a comparison table vs direct ADKAgent usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(adk-middleware): extract Gemini 3 workarounds into reusable module Move thought-signature repair callback from the predictive_state_updates example into src/ag_ui_adk/workarounds.py so the middleware auto-injects it as a before_model_callback when streaming_function_call_arguments=True. The aggregator patch (apply_aggregator_patch) is also extracted but NOT auto-applied — it conflicts with the event translator's Mode A streaming. The example still applies it explicitly when needed. Also switches the example model from gemini-3-pro-preview to gemini-3-flash-preview (required for stream_function_call_arguments). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(adk-middleware): use gemini-3-flash-preview consistently Update USAGE.md and integration test to reference gemini-3-flash-preview instead of gemini-3-pro-preview. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): strengthen HITL instruction for disabled step handling Clarify that disabled steps are permanently deleted from the plan so the agent answers "No" when asked whether a disabled step is included. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(dojo): update files.json for latest agent changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): fix multi-turn HITL sessions producing empty responses Two issues caused the second message in a session to produce no output: 1. The HITL instruction used "ALWAYS call generate_task_steps" which made the model call the tool even on greetings, creating a false HITL pause that blocked subsequent messages. Changed to only call on actual task requests. 2. The invocation_id was stored on every run for HITL resumption but only cleared when a previous stored ID already existed. On a normal first run (no HITL pause), the ID was stored but never cleared, causing the second run to attempt resumption of a completed invocation — producing no output. Fixed by also clearing when the ID was newly stored this run and there's no LRO tool pause. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(adk-middleware): auto-apply aggregator patch when streaming FC args enabled - ADKAgent.__init__ now calls apply_aggregator_patch() when streaming_function_call_arguments=True, so callers no longer need to apply it manually. - Remove manual apply_aggregator_patch() from predictive_state_updates example. - Fix streaming FC args integration test: filter TOOL_CALL_ARGS assertions by tool name to exclude the synthetic confirm_changes tool call, and remove unnecessary mode="ANY" from FunctionCallingConfig. - Update workarounds docstring and test to reflect auto-apply behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(adk-middleware): add streaming_function_call_arguments to from_app() The parameter was missing from the from_app() classmethod signature and cls() call, so callers using App-based construction couldn't enable streaming function call arguments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): decouple is_long_running_tool from translator event emission Client tool dedup filtering in translate_lro_function_calls causes no TOOL_CALL_END to be emitted for client tools, leaving is_long_running_tool as False. This clears the stored invocation_id after the run, breaking SequentialAgent HITL resumption. Set the flag directly from has_lro_function_call before calling the translator. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): bypass client_tool_names filter for streaming FC args When streaming_function_call_arguments=True, client tool partial chunks were filtered out by client_tool_names before reaching Mode A detection. This caused all streaming chunks to be dropped, with only a single bulk emission from ClientProxyTool. Skip the client_tool_names filter on partial events when streaming FC args is enabled so the translator can stream args incrementally. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): handle nameless streaming FC chunks via deferred flush ADK's populate_client_function_call_id assigns a fresh adk-<uuid> to every partial event and never propagates the tool name to partial chunks. This broke Mode A streaming detection which required func_call.name on the first chunk. Buffer TOOL_CALL_ARGS/END events when the name is unknown and flush them (START + buffered ARGS + END) when the complete (non-partial) event supplies the real tool name. Map confirmed→streaming IDs so function responses use the correct tool_call_id. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): stream nameless FC chunks immediately with name inference Replace the buffering/deferred-flush approach (which batched all events and defeated streaming) with immediate emission. The first nameless chunk defers only TOOL_CALL_START until partial_args arrive, then infers the tool name via json_path matching against client_tool_schemas and emits START + ARGS in real time. Subsequent chunks stream ARGS immediately. Name inference strategy: - Single client tool: use it directly (unambiguous) - Multiple tools: match partial_args json_paths against tool argument schemas (client_tool_schemas: Dict[str, Set[str]]) - Fallback: empty string (protocol-valid) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): add confirmed FC id to emitted_tool_call_ids for ClientProxyTool dedup After streaming completes, the complete event's confirmed FC id (which differs from the streaming id) is mapped but not added to emitted_tool_call_ids. ClientProxyTool receives the confirmed id and doesn't find it in the set, so it emits duplicate TOOL_CALL events. Add the confirmed id to emitted_tool_call_ids when recording the confirmed→streaming id mapping. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): suppress confirmed FC duplicate when streaming resolved_name is falsy When streaming FC args complete with an empty/falsy resolved_name (tool not in client_tool_names), _last_completed_streaming_fc_name was never set, so the confirmed event's FC passed through all filters causing duplicate TOOL_CALL emissions. Use _pending_streaming_completion_id to unconditionally suppress the first confirmed FC after streaming completes and map its ID to the streaming ID for consistent function responses. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(adk-middleware): stream FC args for opted-in LRO/HITL tools Allow specific LRO tools to stream their args incrementally via TOOL_CALL_ARGS events while still pausing for user input. Tools opt in via `stream_tool_call=True` on PredictStateMapping. The streaming END is deferred until the confirmed LRO event arrives, at which point the PredictState CustomEvent and TOOL_CALL_END are emitted together. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): skip client_tool_names filter for non-resumable agents and filter backend tools from streaming FC Two fixes: 1. Add is_resumable flag to EventTranslator so the client_tool_names filter in translate_lro_function_calls only applies for resumable agents (where ClientProxyTool handles emission). Non-resumable agents (agentic chat, haiku) now correctly emit tool call events via the LRO path. 2. Filter out backend tools (e.g. google_search) from streaming FC args when streaming_function_call_arguments is enabled, so ADK can execute them server-side without interference. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(adk-middleware): persist FunctionCall on early return and emit terminal events for synthetic tool results Two fixes: 1. When the aggregator monkey-patch (apply_aggregator_patch) is active globally, ADK yields FunctionCall events as partial/streaming events that aren't persisted to the session. Non-resumable agents that return early at LRO detection leave the session without the FunctionCall, causing "No function call event found" on the next run. Fix: manually persist the FunctionCall event before early return. 2. When confirm_changes (synthetic) tool results have no trailing messages, _handle_tool_result_submission returned without yielding any events, producing an empty SSE stream that triggers INCOMPLETE_STREAM on the client. Fix: emit RUN_STARTED + RUN_FINISHED for a valid terminal stream. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor(adk-middleware): remove all streaming function call arguments logic Remove Mode A (Gemini 3+ stream_function_call_arguments) and Mode B (accumulated args delta) streaming infrastructure. The upstream ADK bug (google/adk-python#4311) makes this unreliable; a resurrection document is included for when the fix lands. - Delete workarounds.py (aggregator patch, thought-signature repair) - Remove streaming state vars and methods from EventTranslator - Remove stream_tool_call from PredictStateMapping - Simplify predictive_state_updates example - Add STREAMING_FC_ARGS_RECONSTRUCTION.md for future re-implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * deprecate(adk-middleware): warn on non-resumable HITL flow, recommend from_app() with ResumabilityConfig The fire-and-forget HITL path via ADKAgent(adk_agent=...) is now deprecated for human-in-the-loop workflows. A DeprecationWarning is emitted at runtime when the old-style early-return is triggered. The direct constructor remains fully supported for agents without client-side tools. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* chore: release core packages * chore: release mastra sdk
BREAKING CHANGES
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.