-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Problem
Our code currently blocks all multimodal inputs for Cohere with this error (cohere.py:311):
raise RuntimeError('Cohere does not yet support multi-modal inputs.')However, Cohere launched Command A Vision on July 31, 2025, which supports text + image processing. The Cohere SDK (v5.18.0) already includes the necessary types (ImageUrlContent, ImageUrl), but we're not using them.
Evidence
Model availability:
- Model ID:
command-a-vision-07-2025 - Announced: July 31, 2025
- Capabilities: Text + Image processing (enterprise-focused OCR, document analysis, chart interpretation)
- Source: https://docs.cohere.com/changelog/2025-07-31-command-a-vision
SDK support verified:
# Current SDK v5.18.0 has these types:
from cohere import ImageUrlContent, ImageUrl, UserChatMessageV2Current blocking code (cohere.py:308-311):
elif isinstance(part, UserPromptPart):
if isinstance(part.content, str):
yield UserChatMessageV2(role='user', content=part.content)
else:
raise RuntimeError('Cohere does not yet support multi-modal inputs.')Proposed Implementation
Replace the error with proper multimodal handling:
elif isinstance(part, UserPromptPart):
if isinstance(part.content, str):
yield UserChatMessageV2(role='user', content=part.content)
else:
content_blocks = []
for item in part.content:
if isinstance(item, str):
content_blocks.append({'type': 'text', 'text': item})
elif isinstance(item, ImageUrl):
content_blocks.append(ImageUrlContent(
type='image_url',
image_url=ImageUrl(url=item.url, detail='auto')
))
elif isinstance(item, BinaryContent):
if item.is_image:
content_blocks.append(ImageUrlContent(
type='image_url',
image_url=ImageUrl(url=item.data_uri, detail='auto')
))
else:
raise RuntimeError('Only images are supported for binary content in Cohere.')
elif isinstance(item, DocumentUrl):
raise RuntimeError('DocumentUrl is not supported in Cohere.')
else:
raise RuntimeError(f'Unsupported content type for Cohere: {type(item).__name__}')
yield UserChatMessageV2(role='user', content=content_blocks)Implementation Notes
- Send ImageUrl directly (no download needed - API accepts URLs)
- Convert BinaryContent images to data URIs
- Support the
detailparameter ('auto', 'low', 'high') for controlling token usage - Max 20 images per request, 20MB total data
Testing
Add tests for:
- ImageUrl with public URLs
- BinaryContent with base64-encoded images
- Error handling for unsupported types (DocumentUrl, non-image BinaryContent)
Related
- Part of multimodal provider review from Review multi-modal input handling to send URLs when possible and don't error on types that are actually supported #3569
- Related to Add
force_downloadsupport for Anthropic and OAIR models and clarify properBinaryContentbase64 handling #3694 (multimodal fixes for OpenAI/Anthropic)