Skip to content

Cohere: Implement multimodal support for Command A Vision #3703

@dsfaccini

Description

@dsfaccini

Problem

Our code currently blocks all multimodal inputs for Cohere with this error (cohere.py:311):

raise RuntimeError('Cohere does not yet support multi-modal inputs.')

However, Cohere launched Command A Vision on July 31, 2025, which supports text + image processing. The Cohere SDK (v5.18.0) already includes the necessary types (ImageUrlContent, ImageUrl), but we're not using them.

Evidence

Model availability:

SDK support verified:

# Current SDK v5.18.0 has these types:
from cohere import ImageUrlContent, ImageUrl, UserChatMessageV2

Current blocking code (cohere.py:308-311):

elif isinstance(part, UserPromptPart):
    if isinstance(part.content, str):
        yield UserChatMessageV2(role='user', content=part.content)
    else:
        raise RuntimeError('Cohere does not yet support multi-modal inputs.')

Proposed Implementation

Replace the error with proper multimodal handling:

elif isinstance(part, UserPromptPart):
    if isinstance(part.content, str):
        yield UserChatMessageV2(role='user', content=part.content)
    else:
        content_blocks = []
        for item in part.content:
            if isinstance(item, str):
                content_blocks.append({'type': 'text', 'text': item})
            elif isinstance(item, ImageUrl):
                content_blocks.append(ImageUrlContent(
                    type='image_url',
                    image_url=ImageUrl(url=item.url, detail='auto')
                ))
            elif isinstance(item, BinaryContent):
                if item.is_image:
                    content_blocks.append(ImageUrlContent(
                        type='image_url',
                        image_url=ImageUrl(url=item.data_uri, detail='auto')
                    ))
                else:
                    raise RuntimeError('Only images are supported for binary content in Cohere.')
            elif isinstance(item, DocumentUrl):
                raise RuntimeError('DocumentUrl is not supported in Cohere.')
            else:
                raise RuntimeError(f'Unsupported content type for Cohere: {type(item).__name__}')
        yield UserChatMessageV2(role='user', content=content_blocks)

Implementation Notes

  • Send ImageUrl directly (no download needed - API accepts URLs)
  • Convert BinaryContent images to data URIs
  • Support the detail parameter ('auto', 'low', 'high') for controlling token usage
  • Max 20 images per request, 20MB total data

Testing

Add tests for:

  • ImageUrl with public URLs
  • BinaryContent with base64-encoded images
  • Error handling for unsupported types (DocumentUrl, non-image BinaryContent)

Related

References

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions