Skip to content

feat: Add support for OpenAI Responses API#4791

Open
wzh4464 wants to merge 5 commits intoAider-AI:mainfrom
wzh4464:add-responses-api-support
Open

feat: Add support for OpenAI Responses API#4791
wzh4464 wants to merge 5 commits intoAider-AI:mainfrom
wzh4464:add-responses-api-support

Conversation

@wzh4464
Copy link

@wzh4464 wzh4464 commented Jan 23, 2026

Summary

This PR adds support for OpenAI's Responses API alongside the existing Chat Completions API, enabling aider to work with newer models like gpt-5.1-codex and gpt-5.2-codex that only support the Responses API.

Motivation

Several issues (#4039, #4707) report that Codex and GPT-5 models fail with parameter errors when using aider. The root cause is that these models only support the Responses API (/v1/responses), not the Chat Completions API (/v1/chat/completions).

Changes

Core Implementation

  • Added wire_api field to ModelSettings (default: "chat")
  • Created ResponsesAPIWrapper to convert Responses API format to Chat Completions format for compatibility
    • Handles ResponseOutputMessage objects with nested content arrays
    • Extracts text from content[].text fields
  • Created StreamingResponsesAPIWrapper for streaming responses
    • Fixed in commit 4ee8c9b: Updated to handle Responses API event-based streaming format
    • Correctly processes OUTPUT_TEXT_DELTA events with delta attribute
  • Modified send_completion() to route requests based on AIDER_WIRE_API environment variable
  • Updated simple_send_with_retries() to handle both API formats

Dependencies

  • Fixed grpcio version conflict: 1.76.01.67.0
    • Resolves incompatibility with litellm 1.80.10 which requires grpcio<1.68.0
    • The newer litellm version is needed for litellm.responses() API support
  • Added fastapi: Required by litellm.responses() API
  • Added orjson: Required by litellm for fast JSON serialization

Streaming Fix (commit 4ee8c9b)

The initial implementation incorrectly assumed streaming responses would have a simple chunk.output[].text structure. Testing revealed that Responses API uses an event-based streaming format:

  • Events like OUTPUT_TEXT_DELTA contain a delta attribute with the text
  • The fix checks for event.delta first, then falls back to the nested output structure
  • This resolves "Empty response received from LLM" errors in streaming mode

Usage

# Use Responses API for Codex models
export AIDER_WIRE_API="responses"
aider --model gpt-5.1-codex

# Or use Chat Completions API (default)
export AIDER_WIRE_API="chat"
aider --model gpt-4

Testing

Tested successfully with:

  • Model: gpt-5.1-codex
  • API: ChatAnywhere (https://api.chatanywhere.org/v1)
  • Test cases:
    • Non-streaming: Adding text to files ✅
    • Streaming: Real-time code generation ✅
    • Git repo with repo-map: Architecture analysis ✅
  • Code formatting: Passed pre-commit hooks (isort, black, flake8, codespell) ✅

Fixes

Notes

  • The implementation uses litellm.responses() which was added in recent litellm versions
  • Responses are wrapped to maintain compatibility with existing code expecting Chat Completions format
  • Environment variable (AIDER_WIRE_API) allows flexible switching between API types
  • Model settings can also specify wire_api per-model in YAML configuration

Known Issue (Not in Scope)

During testing in git repositories, we encountered a tree-sitter compatibility issue (AttributeError: 'tree_sitter.Query' object has no attribute 'captures' at repomap.py:289). This is a separate issue affecting the main branch with tree-sitter 0.25.x that occurs during initial repo scanning. It is already being addressed in PR #4369 and is not related to the Responses API changes in this PR.

CLA

I have read and agree to the Individual Contributor License Agreement.

This commit adds support for OpenAI's Responses API alongside the existing
Chat Completions API, allowing aider to work with newer models like
gpt-5.1-codex and gpt-5.2-codex that only support the Responses API.

Key changes:
- Added `wire_api` field to ModelSettings (default: "chat")
- Created ResponsesAPIWrapper to convert Responses API format to Chat Completions format
- Added StreamingResponsesAPIWrapper for streaming responses
- Modified send_completion() to route requests based on AIDER_WIRE_API env var
- Updated simple_send_with_retries() to handle both API formats
- Fixed grpcio version conflict (1.76.0 -> 1.67.0) for litellm compatibility

Usage:
  export AIDER_WIRE_API="responses"  # For Codex models
  aider --model gpt-5.1-codex

Fixes Aider-AI#4039 (Codex-mini temperature parameter issue)
Fixes Aider-AI#4707 (gpt-5.2-chat temperature parameter issue)
Related to Aider-AI#4591 (gpt-5-codex vision support)

The implementation uses litellm.responses() which was added in recent
litellm versions to support the new Responses API endpoint.
Copilot AI review requested due to automatic review settings January 23, 2026 06:45
@CLAassistant
Copy link

CLAassistant commented Jan 23, 2026

CLA assistant check
All committers have signed the CLA.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for OpenAI's Responses API (/v1/responses) to enable compatibility with newer models like gpt-5.1-codex and gpt-5.2-codex that only support this API format, addressing issues #4039 and #4707 where these models failed with parameter errors.

Changes:

  • Introduced ResponsesAPIWrapper and StreamingResponsesAPIWrapper classes to convert Responses API format to Chat Completions format for compatibility
  • Modified send_completion() to route requests to either litellm.responses() or litellm.completion() based on AIDER_WIRE_API environment variable or model's wire_api setting
  • Updated simple_send_with_retries() to handle both API response formats
  • Downgraded grpcio from 1.76.0 to 1.67.0 to resolve compatibility issue with litellm 1.80.10

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 8 comments.

File Description
requirements.txt Downgraded grpcio from 1.76.0 to 1.67.0 for litellm compatibility
aider/models.py Added ResponsesAPIWrapper classes, modified send_completion() and simple_send_with_retries() to support dual API routing via wire_api setting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aider/models.py Outdated
Comment on lines 1165 to 1187
# Handle both chat completions and responses API formats
wire_api = os.environ.get("AIDER_WIRE_API", getattr(self, "wire_api", "chat"))

if wire_api == "responses":
# Responses API format: has 'output' instead of 'choices'
if not hasattr(response, "output") or not response.output:
return None
# Extract text content from output items
for item in response.output:
if hasattr(item, "text") and item.text:
res = item.text
break
elif isinstance(item, dict) and "text" in item:
res = item["text"]
break
else:
return None
else:
# Chat completions API format: has 'choices'
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response handling logic in simple_send_with_retries is inconsistent with the wrapper design. In send_completion, the responses API result is wrapped with ResponsesAPIWrapper which converts it to have a choices attribute (lines 1133-1134). However, this function duplicates the conversion logic by checking wire_api == "responses" and directly accessing the raw output attribute (lines 1168-1181).

Since the wrapper already provides a uniform choices interface, this duplication is unnecessary and could lead to inconsistent behavior if the wrapper logic changes. The code should simply use response.choices[0].message.content for both API types, as the wrapper ensures compatibility.

Suggested change
# Handle both chat completions and responses API formats
wire_api = os.environ.get("AIDER_WIRE_API", getattr(self, "wire_api", "chat"))
if wire_api == "responses":
# Responses API format: has 'output' instead of 'choices'
if not hasattr(response, "output") or not response.output:
return None
# Extract text content from output items
for item in response.output:
if hasattr(item, "text") and item.text:
res = item.text
break
elif isinstance(item, dict) and "text" in item:
res = item["text"]
break
else:
return None
else:
# Chat completions API format: has 'choices'
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content
# Use the unified choices interface provided by send_completion / ResponsesAPIWrapper
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
other_messages = [msg for msg in messages if msg.get("role") != "system"]

if system_messages:
kwargs["instructions"] = system_messages[0].get("content", "")
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When converting messages for the responses API, only the first system message is used (line 1123). If there are multiple system messages in the conversation, all subsequent ones are silently dropped. This could lead to loss of important context.

Consider either concatenating all system messages or documenting this limitation. Alternatively, raise a warning if multiple system messages are detected and being dropped.

Suggested change
kwargs["instructions"] = system_messages[0].get("content", "")
# Concatenate all system message contents to avoid dropping context
instructions_parts = [
str(msg.get("content", ""))
for msg in system_messages
if msg.get("content")
]
if instructions_parts:
kwargs["instructions"] = "\n\n".join(instructions_parts)

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
Comment on lines 73 to 82
for item in self._responses.output:
if hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper only extracts the first text item from the response output (using break at line 77 or 82). If the responses API returns multiple output items, subsequent items are silently ignored. This could lead to incomplete responses if the API returns content in multiple parts.

Consider either concatenating all text items or documenting why only the first item is used.

Copilot uses AI. Check for mistakes.
if hasattr(item, "text"):
mock_chunk.choices[0].delta.content = item.text
break

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StreamingResponsesAPIWrapper initializes finish_reason to None but never updates it based on the chunk data. The code in base_coder.py checks for finish_reason == "length" to detect when the response is truncated (lines 1907-1911 in base_coder.py). Without populating this field from the responses API chunks, this check will never work properly for responses API streaming.

Consider extracting and setting the finish_reason from the chunk if it's available.

Suggested change
# Propagate finish_reason from the underlying chunk if available
finish_reason = None
# Check for finish_reason on delta first (most similar to chat completions)
if hasattr(chunk, "delta") and hasattr(chunk.delta, "finish_reason"):
finish_reason = chunk.delta.finish_reason
# Fallback: finish_reason directly on the chunk
elif hasattr(chunk, "finish_reason"):
finish_reason = chunk.finish_reason
# Fallback: look for finish_reason on output items
elif hasattr(chunk, "output"):
for item in chunk.output:
if hasattr(item, "finish_reason"):
finish_reason = item.finish_reason
break
if isinstance(item, dict) and "finish_reason" in item:
finish_reason = item["finish_reason"]
break
mock_chunk.choices[0].finish_reason = finish_reason

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
kwargs["instructions"] = system_messages[0].get("content", "")

# For responses API, input can be a list of messages
kwargs["input"] = other_messages if other_messages else messages
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no system messages exist, the fallback kwargs["input"] = other_messages if other_messages else messages could result in passing the same messages list that might contain system messages. This happens because:

  1. If other_messages is empty (all messages were system), it falls back to messages which includes the system messages
  2. This could cause the responses API to receive system messages in the input field even though they were filtered out

Consider using kwargs["input"] = other_messages unconditionally, or handle the empty case explicitly to avoid confusion.

Suggested change
kwargs["input"] = other_messages if other_messages else messages
kwargs["input"] = other_messages

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
Comment on lines 72 to 82
# Extract content from output items
for item in self._responses.output:
if hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper initializes tool_calls in the mock message object but never populates it from the responses API output (line 53). However, the code in send_completion can add function tools to the request (lines 1086-1089). If the responses API supports function calling and returns tool call data, it won't be extracted, potentially breaking function calling for responses API.

Verify whether the responses API supports function/tool calling, and if so, extract and populate the tool_calls field from the response output.

Suggested change
# Extract content from output items
for item in self._responses.output:
if hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break
# Extract content and tool_calls from output items
for item in self._responses.output:
# Try to extract tool_calls if present on this item
if getattr(choice.message, "tool_calls", None) is None:
# Attribute-style tool_calls/tool_call
if hasattr(item, "tool_calls") and getattr(item, "tool_calls"):
choice.message.tool_calls = getattr(item, "tool_calls")
elif hasattr(item, "tool_call") and getattr(item, "tool_call"):
# Normalize single tool_call to a list for compatibility
single_call = getattr(item, "tool_call")
choice.message.tool_calls = (
single_call if isinstance(single_call, list) else [single_call]
)
# Dict-style tool_calls/tool_call
elif isinstance(item, dict):
if "tool_calls" in item and item["tool_calls"]:
choice.message.tool_calls = item["tool_calls"]
elif "tool_call" in item and item["tool_call"]:
single_call = item["tool_call"]
choice.message.tool_calls = (
single_call if isinstance(single_call, list) else [single_call]
)
# Extract text content, preserving existing behavior
if hasattr(item, "text") and getattr(item, "text", None):
choice.message.content = item.text
choice.delta.content = item.text
break
elif isinstance(item, dict) and "text" in item and item["text"]:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break

Copilot uses AI. Check for mistakes.
wzh4464 and others added 3 commits January 23, 2026 15:02
- Added fastapi==0.128.0 (required by litellm.responses())
- Added orjson==3.11.5 (required by litellm for fast JSON serialization)
- Maintained grpcio==1.67.0 for litellm 1.80.10 compatibility
- Updated related dependency constraints

These dependencies are necessary to avoid runtime import errors when using
the Responses API functionality.
The OpenAI Responses API returns ResponseOutputMessage objects with a nested
structure: output -> message -> content[0] -> text

Updated both ResponsesAPIWrapper and StreamingResponsesAPIWrapper to:
1. Check for message type items in output
2. Extract text from nested content items
3. Include fallback for simpler response formats

This fixes the 'Empty response received from LLM' error when using
Responses API with Codex models.
The Responses API uses event-based streaming with OUTPUT_TEXT_DELTA
events that have a 'delta' attribute containing the text, rather than
nested output structures. Updated StreamingResponsesAPIWrapper to
check for event.delta first.

This fixes the "Empty response received from LLM" error when using
streaming mode with gpt-5.1-codex and gpt-5.2-codex models.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aider/models.py Outdated
Comment on lines 1190 to 1212
# Handle both chat completions and responses API formats
wire_api = os.environ.get("AIDER_WIRE_API", getattr(self, "wire_api", "chat"))

if wire_api == "responses":
# Responses API format: has 'output' instead of 'choices'
if not hasattr(response, "output") or not response.output:
return None
# Extract text content from output items
for item in response.output:
if hasattr(item, "text") and item.text:
res = item.text
break
elif isinstance(item, dict) and "text" in item:
res = item["text"]
break
else:
return None
else:
# Chat completions API format: has 'choices'
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ResponsesAPIWrapper creates a hybrid object that has both a choices attribute (chat completions format) and falls back to the original response object for other attributes like output via getattr. This means wrapped responses support both interfaces. In simple_send_with_retries, when wire_api is "responses", the code checks response.output - which works via getattr but is confusing because send_completion already wrapped it with choices. This dual-interface approach makes the code harder to understand and maintain. Consider either: (1) using only the choices interface and having simple_send_with_retries always use response.choices regardless of wire_api, or (2) not wrapping at all and handling the format conversion at the point of consumption.

Suggested change
# Handle both chat completions and responses API formats
wire_api = os.environ.get("AIDER_WIRE_API", getattr(self, "wire_api", "chat"))
if wire_api == "responses":
# Responses API format: has 'output' instead of 'choices'
if not hasattr(response, "output") or not response.output:
return None
# Extract text content from output items
for item in response.output:
if hasattr(item, "text") and item.text:
res = item.text
break
elif isinstance(item, dict) and "text" in item:
res = item["text"]
break
else:
return None
else:
# Chat completions API format: has 'choices'
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content
# Use the normalized chat-completions-style interface
if not hasattr(response, "choices") or not response.choices:
return None
res = response.choices[0].message.content

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
Comment on lines 40 to 43
def _convert_to_chat_format(self):
"""Convert responses API output to chat completions choices format"""
if not hasattr(self._responses, "output"):
return
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the response object doesn't have an "output" attribute, _convert_to_chat_format returns early without initializing self.choices. This means ResponsesAPIWrapper instances might not have a choices attribute, which will cause AttributeErrors when the code tries to access response.choices elsewhere (e.g., in base_coder.py line 1850). The wrapper should always initialize self.choices to at least an empty list to ensure a consistent interface.

Copilot uses AI. Check for mistakes.
Comment on lines 70 to 97
choice = MockChoice()

# Extract content from output items
for item in self._responses.output:
# Handle ResponseOutputMessage type
if hasattr(item, "type") and item.type == "message":
if hasattr(item, "content") and item.content:
# Extract text from content items
for content_item in item.content:
if hasattr(content_item, "text") and content_item.text:
choice.message.content = content_item.text
choice.delta.content = content_item.text
break
if choice.message.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
# Fallback: dict format
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break

self.choices = [choice]
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper creates mock objects with content initialized to None, but doesn't validate that content was successfully extracted before creating the choices list. If no valid content is found in any output item (all branches fail or items are empty), choice.message.content and choice.delta.content will remain None. While this might be acceptable for some edge cases, it could lead to unexpected None values propagating through the codebase. Consider initializing content to an empty string instead of None for better consistency with the chat completions API behavior.

Copilot uses AI. Check for mistakes.
self.github_copilot_token_to_open_ai_key(kwargs["extra_headers"])

res = litellm.completion(**kwargs)
# Use responses API or chat completions API based on wire_api setting
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't validate the wire_api value. If AIDER_WIRE_API or the model's wire_api setting contains an invalid value (e.g., "foo"), the code will silently default to using the chat completions API because the if condition on line 1141 only checks for "responses". This could lead to confusion if users mistype the value. Consider adding validation to raise a clear error for invalid wire_api values, or at least log a warning when an unrecognized value is encountered.

Suggested change
# Use responses API or chat completions API based on wire_api setting
# Use responses API or chat completions API based on wire_api setting
# Validate wire_api to catch misconfigurations (e.g., typos)
valid_wire_api_values = {None, "", "responses"}
if wire_api not in valid_wire_api_values:
sys.stderr.write(
f"Warning: Unrecognized wire_api value '{wire_api}'. "
"Falling back to chat completions API.\n"
)

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
Comment on lines 81 to 94
choice.delta.content = content_item.text
break
if choice.message.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
# Fallback: dict format
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ResponsesAPIWrapper, both message.content and delta.content are set to the same value (lines 80-81, 87-88, 93-94). The delta attribute is typically used for streaming responses, while message is for non-streaming. Setting both for a non-streaming response wrapper is unconventional and could cause confusion. For non-streaming responses, only message.content should be set, with delta.content remaining None or unset to match the standard chat completions API behavior.

Suggested change
choice.delta.content = content_item.text
break
if choice.message.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
# Fallback: dict format
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break
if choice.message.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text") and item.text:
choice.message.content = item.text
break
# Fallback: dict format
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]

Copilot uses AI. Check for mistakes.
Comment on lines 119 to 164
def __next__(self):
event = next(self._stream)

# Wrap each event to look like chat completions format
class MockChoice:
def __init__(self):
self.delta = type(
"obj",
(object,),
{
"content": None,
"function_call": None,
"reasoning_content": None,
"reasoning": None,
},
)()
self.finish_reason = None

class MockChunk:
def __init__(self):
self.choices = [MockChoice()]

mock_chunk = MockChunk()

# Handle Responses API event stream format
# Check for OUTPUT_TEXT_DELTA events (have delta attribute with text)
if hasattr(event, "delta") and event.delta:
mock_chunk.choices[0].delta.content = event.delta
# Fallback for other formats
elif hasattr(event, "output"):
for item in event.output:
# Handle ResponseOutputMessage type
if hasattr(item, "type") and item.type == "message":
if hasattr(item, "content") and item.content:
for content_item in item.content:
if hasattr(content_item, "text") and content_item.text:
mock_chunk.choices[0].delta.content = content_item.text
break
if mock_chunk.choices[0].delta.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text"):
mock_chunk.choices[0].delta.content = item.text
break

return mock_chunk
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an event in the streaming response has neither a "delta" attribute nor an "output" attribute, mock_chunk.choices[0].delta.content will remain None. While this might be intentional for some event types, the code doesn't handle StopIteration when the underlying stream ends. The next method should properly handle stream termination and raise StopIteration when appropriate to prevent infinite loops or unexpected behavior in consuming code.

Copilot uses AI. Check for mistakes.
aider/models.py Outdated
kwargs["instructions"] = system_messages[0].get("content", "")

# For responses API, input can be a list of messages
kwargs["input"] = other_messages if other_messages else messages
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are no non-system messages, the code sets kwargs["input"] = messages (which includes system messages). However, if system messages were found, they're already extracted into kwargs["instructions"], resulting in system messages being sent twice - once in "instructions" and once in "input". The logic should ensure system messages are excluded from "input" when they've been extracted to "instructions".

Suggested change
kwargs["input"] = other_messages if other_messages else messages
# Avoid sending system messages twice: once as instructions and again as input.
if system_messages:
# When system messages are present, only send non-system messages as input
kwargs["input"] = other_messages
else:
# When there are no system messages, preserve existing behavior
kwargs["input"] = other_messages if other_messages else messages

Copilot uses AI. Check for mistakes.
Comment on lines 33 to 165
class ResponsesAPIWrapper:
"""Wrapper to convert Responses API format to Chat Completions format"""

def __init__(self, responses_obj):
self._responses = responses_obj
self._convert_to_chat_format()

def _convert_to_chat_format(self):
"""Convert responses API output to chat completions choices format"""
if not hasattr(self._responses, "output"):
return

# Create a mock choices structure
class MockChoice:
def __init__(self):
self.message = type(
"obj",
(object,),
{
"content": None,
"tool_calls": None,
"reasoning_content": None,
"reasoning": None,
},
)()
self.delta = type(
"obj",
(object,),
{
"content": None,
"function_call": None,
"reasoning_content": None,
"reasoning": None,
},
)()
self.finish_reason = None

choice = MockChoice()

# Extract content from output items
for item in self._responses.output:
# Handle ResponseOutputMessage type
if hasattr(item, "type") and item.type == "message":
if hasattr(item, "content") and item.content:
# Extract text from content items
for content_item in item.content:
if hasattr(content_item, "text") and content_item.text:
choice.message.content = content_item.text
choice.delta.content = content_item.text
break
if choice.message.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text") and item.text:
choice.message.content = item.text
choice.delta.content = item.text
break
# Fallback: dict format
elif isinstance(item, dict):
if "text" in item:
choice.message.content = item["text"]
choice.delta.content = item["text"]
break

self.choices = [choice]

# Copy other attributes
if hasattr(self._responses, "id"):
self.id = self._responses.id
if hasattr(self._responses, "usage"):
self.usage = self._responses.usage

def __getattr__(self, name):
"""Fallback to original responses object for other attributes"""
return getattr(self._responses, name)


class StreamingResponsesAPIWrapper:
"""Wrapper for streaming responses API to mimic chat completions stream"""

def __init__(self, responses_stream):
self._stream = responses_stream

def __iter__(self):
return self

def __next__(self):
event = next(self._stream)

# Wrap each event to look like chat completions format
class MockChoice:
def __init__(self):
self.delta = type(
"obj",
(object,),
{
"content": None,
"function_call": None,
"reasoning_content": None,
"reasoning": None,
},
)()
self.finish_reason = None

class MockChunk:
def __init__(self):
self.choices = [MockChoice()]

mock_chunk = MockChunk()

# Handle Responses API event stream format
# Check for OUTPUT_TEXT_DELTA events (have delta attribute with text)
if hasattr(event, "delta") and event.delta:
mock_chunk.choices[0].delta.content = event.delta
# Fallback for other formats
elif hasattr(event, "output"):
for item in event.output:
# Handle ResponseOutputMessage type
if hasattr(item, "type") and item.type == "message":
if hasattr(item, "content") and item.content:
for content_item in item.content:
if hasattr(content_item, "text") and content_item.text:
mock_chunk.choices[0].delta.content = content_item.text
break
if mock_chunk.choices[0].delta.content:
break
# Fallback: direct text attribute
elif hasattr(item, "text"):
mock_chunk.choices[0].delta.content = item.text
break

return mock_chunk

Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ResponsesAPIWrapper and StreamingResponsesAPIWrapper classes, as well as the responses API integration in send_completion and simple_send_with_retries, lack test coverage. Given that the repository has comprehensive test coverage for other model functionality (see tests/basic/test_models.py and tests/basic/test_sendchat.py), tests should be added to verify: (1) ResponsesAPIWrapper correctly converts responses API format to chat completions format, (2) StreamingResponsesAPIWrapper properly handles streaming events, (3) send_completion routes to the correct API based on wire_api setting, and (4) simple_send_with_retries correctly extracts responses from both API formats.

Copilot uses AI. Check for mistakes.
remove_reasoning: Optional[str] = None # Deprecated alias for reasoning_tag
system_prompt_prefix: Optional[str] = None
accepts_settings: Optional[list] = None
wire_api: str = "chat" # "chat" for Chat Completions API, "responses" for Responses API
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new AIDER_WIRE_API environment variable and wire_api model setting are not documented. Users need to know: (1) the purpose of this setting, (2) valid values ("chat" or "responses"), (3) when to use each option, and (4) that the environment variable overrides the model setting. Consider adding documentation in the appropriate user-facing documentation files or adding docstring documentation to the ModelSettings class explaining the wire_api field.

Copilot uses AI. Check for mistakes.
pypandoc
litellm
fastapi # Required by litellm.responses() API
orjson # Required by litellm for fast JSON serialization
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states that orjson is "Required by litellm for fast JSON serialization", but this is not entirely accurate. While orjson may be used by litellm if available, it's not clear if it's strictly required or just optional. The comment should clarify whether this is a hard requirement for litellm.responses() support or if it's optional. If it's optional, the comment should indicate what functionality is lost without it.

Suggested change
orjson # Required by litellm for fast JSON serialization
orjson # Optional: used by litellm for faster JSON serialization; without it, standard json is used and may be slower

Copilot uses AI. Check for mistakes.
- Always initialize self.choices in ResponsesAPIWrapper for consistent interface
- Concatenate multiple system messages instead of dropping them
- Simplify simple_send_with_retries to use unified choices interface
- Add finish_reason extraction in StreamingResponsesAPIWrapper for truncation detection
- Add wire_api value validation with warning for invalid values
- Initialize content to empty string instead of None
- Remove delta.content from non-streaming wrapper (only for streaming)
- Properly handle StopIteration in streaming wrapper
- Improve dependency comments

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@wzh4464 wzh4464 force-pushed the add-responses-api-support branch from 13d0fe8 to 0a4044d Compare January 23, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when using azure/gpt-5.2-chat Codex-mini don't work with litellm

2 participants