Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 22 additions & 9 deletions docs/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,21 +104,34 @@ print(result.output)

## User-side download vs. direct file URL

When you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will typically send the URL directly to the model API so that the download happens on their side.
When using one of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will default to sending the URL to the model, so the file is downloaded on their side.

Some model APIs do not support file URLs at all or for specific file types. In the following cases, Pydantic AI will download the file content and send it as part of the API request instead:
Support for file URLs varies depending on type and provider. Pydantic AI handles this as follows:

- [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel]: `AudioUrl` and `DocumentUrl`
- [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel]: All URLs
- [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel]: `DocumentUrl` with media type `text/plain`
- [`GoogleModel`][pydantic_ai.models.google.GoogleModel] using GLA (Gemini Developer API): All URLs except YouTube video URLs and files uploaded to the [Files API](https://ai.google.dev/gemini-api/docs/files).
- [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel]: All URLs
| Model | Supported URL types | Sends URL directly |
|-------|---------------------|-------------------|
| [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel] | `ImageUrl`, `AudioUrl`, `DocumentUrl` | `ImageUrl` only |
| [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] | `ImageUrl`, `AudioUrl`, `DocumentUrl` | Yes |
| [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel] | `ImageUrl`, `DocumentUrl` | Yes, except `DocumentUrl` (`text/plain`) |
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (Vertex) | All URL types | Yes |
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (GLA) | All URL types | [YouTube](models/google.md#document-image-audio-and-video-input) and [Files API](https://ai.google.dev/gemini-api/docs/files) URLs only |
| [`MistralModel`][pydantic_ai.models.mistral.MistralModel] | `ImageUrl`, `DocumentUrl` (PDF) | Yes |
| [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel] | `ImageUrl`, `DocumentUrl`, `VideoUrl` | No, defaults to `force_download` |

If the model API supports file URLs but may not be able to download a file because of crawling or access restrictions, you can instruct Pydantic AI to download the file content and send that instead of the URL by enabling the `force_download` flag on the URL object. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request.
A model API may be unable to download a file (e.g., because of crawling or access restrictions) even if it supports file URLs. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request. In such cases, you can instruct Pydantic AI to download the file content locally and send that instead of the URL by setting `force_download` on the URL object:

```py {title="force_download.py" test="skip" lint="skip"}
from pydantic_ai import ImageUrl, AudioUrl, VideoUrl, DocumentUrl

ImageUrl(url='https://example.com/image.png', force_download=True)
AudioUrl(url='https://example.com/audio.mp3', force_download=True)
VideoUrl(url='https://example.com/video.mp4', force_download=True)
DocumentUrl(url='https://example.com/doc.pdf', force_download=True)
```

## Uploaded Files

Some model providers like Google's Gemini API support [uploading files](https://ai.google.dev/gemini-api/docs/files). You can upload a file to the model API using the client you can get from the provider and use the resulting URL as input:
Some model providers like Google's Gemini API support [uploading files](https://ai.google.dev/gemini-api/docs/files). You can upload a file using the provider's client and passing the resulting URL as input:

```py {title="file_upload.py" test="skip"}
from pydantic_ai import Agent, DocumentUrl
Expand Down
20 changes: 19 additions & 1 deletion docs/models/google.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,25 @@ agent = Agent(model)

## Document, Image, Audio, and Video Input

`GoogleModel` supports multi-modal input, including documents, images, audio, and video. See the [input documentation](../input.md) for details and examples.
`GoogleModel` supports multi-modal input, including documents, images, audio, and video.

YouTube video URLs can be passed directly to Google models:

```py {title="youtube_input.py" test="skip" lint="skip"}
from pydantic_ai import Agent, VideoUrl
from pydantic_ai.models.google import GoogleModel

agent = Agent(GoogleModel('gemini-2.5-flash'))
result = agent.run_sync(
[
'What is this video about?',
VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
]
)
print(result.output)
```

See the [input documentation](../input.md) for more details and examples.

## Model settings

Expand Down
2 changes: 1 addition & 1 deletion pydantic_ai_slim/pydantic_ai/_mcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def add_msg(
'user',
mcp_types.ImageContent(
type='image',
data=base64.b64encode(chunk.data).decode(),
data=chunk.base64,
mimeType=chunk.media_type,
),
)
Expand Down
18 changes: 13 additions & 5 deletions pydantic_ai_slim/pydantic_ai/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,10 @@ class BinaryContent:
"""Binary content, e.g. an audio or image file."""

data: bytes
"""The binary data."""
"""The binary file data.

Use `.base64` to get the base64-encoded string.
"""

_: KW_ONLY

Expand Down Expand Up @@ -574,7 +577,12 @@ def identifier(self) -> str:
@property
def data_uri(self) -> str:
"""Convert the `BinaryContent` to a data URI."""
return f'data:{self.media_type};base64,{base64.b64encode(self.data).decode()}'
return f'data:{self.media_type};base64,{self.base64}'

@property
def base64(self) -> str:
"""Return the binary data as a base64-encoded string. Default encoding is UTF-8."""
return base64.b64encode(self.data).decode()

@property
def is_audio(self) -> bool:
Expand Down Expand Up @@ -775,7 +783,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
elif isinstance(part, BinaryContent):
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.media_type)
if settings.include_content and settings.include_binary_content:
converted_part['content'] = base64.b64encode(part.data).decode()
converted_part['content'] = part.base64
parts.append(converted_part)
elif isinstance(part, CachePoint):
# CachePoint is a marker, not actual content - skip it for otel
Expand Down Expand Up @@ -1378,7 +1386,7 @@ def new_event_body():
'kind': 'binary',
'media_type': part.content.media_type,
**(
{'binary_content': base64.b64encode(part.content.data).decode()}
{'binary_content': part.content.base64}
if settings.include_content and settings.include_binary_content
else {}
),
Expand Down Expand Up @@ -1412,7 +1420,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
elif isinstance(part, FilePart):
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.content.media_type)
if settings.include_content and settings.include_binary_content:
converted_part['content'] = base64.b64encode(part.content.data).decode()
converted_part['content'] = part.content.base64
parts.append(converted_part)
elif isinstance(part, BaseToolCallPart):
call_part = _otel_messages.ToolCallPart(type='tool_call', id=part.tool_call_id, name=part.tool_name)
Expand Down
4 changes: 2 additions & 2 deletions pydantic_ai_slim/pydantic_ai/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import base64
import warnings
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Callable, Iterator
from collections.abc import AsyncIterator, Callable, Iterator, Sequence
from contextlib import asynccontextmanager, contextmanager
from dataclasses import dataclass, field, replace
from datetime import datetime
Expand Down Expand Up @@ -721,7 +721,7 @@ def base_url(self) -> str | None:

@staticmethod
def _get_instructions(
messages: list[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
messages: Sequence[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
) -> str | None:
"""Get instructions from the first ModelRequest found when iterating messages in reverse.

Expand Down
60 changes: 40 additions & 20 deletions pydantic_ai_slim/pydantic_ai/models/anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@
omit as OMIT,
)
from anthropic.types.beta import (
BetaBase64PDFBlockParam,
BetaBase64PDFSourceParam,
BetaCacheControlEphemeralParam,
BetaCitationsConfigParam,
Expand Down Expand Up @@ -98,6 +97,7 @@
BetaRawMessageStreamEvent,
BetaRedactedThinkingBlock,
BetaRedactedThinkingBlockParam,
BetaRequestDocumentBlockParam,
BetaRequestMCPServerToolConfigurationParam,
BetaRequestMCPServerURLDefinitionParam,
BetaServerToolUseBlock,
Expand Down Expand Up @@ -1034,6 +1034,31 @@ def _add_cache_control_to_last_param(
# Add cache_control to the last param
last_param['cache_control'] = self._build_cache_control(ttl)

@staticmethod
def _map_binary_data(data: bytes, media_type: str) -> BetaContentBlockParam:
# Anthropic SDK accepts file-like objects (IO[bytes]) and handles base64 encoding internally
if media_type.startswith('image/'):
return BetaImageBlockParam(
source={'data': io.BytesIO(data), 'media_type': media_type, 'type': 'base64'}, # type: ignore
type='image',
)
elif media_type == 'application/pdf':
return BetaRequestDocumentBlockParam(
source=BetaBase64PDFSourceParam(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at what other sources this supports, and there's also BetaFileDocumentSourceParam, which takes a file_id for the file upload API.

We're adding support for uploaded files in #2611, but that PR has been stale for a bit so may be interesting for you to pick up.

data=io.BytesIO(data),
media_type='application/pdf',
type='base64',
),
type='document',
)
elif media_type == 'text/plain':
return BetaRequestDocumentBlockParam(
source=BetaPlainTextSourceParam(data=data.decode('utf-8'), media_type=media_type, type='text'),
type='document',
)
else:
raise RuntimeError(f'Unsupported binary content media type for Anthropic: {media_type}')

@staticmethod
async def _map_user_prompt(
part: UserPromptPart,
Expand All @@ -1049,30 +1074,25 @@ async def _map_user_prompt(
elif isinstance(item, CachePoint):
yield item
elif isinstance(item, BinaryContent):
if item.is_image:
yield BetaImageBlockParam(
source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore
type='image',
)
elif item.media_type == 'application/pdf':
yield BetaBase64PDFBlockParam(
source=BetaBase64PDFSourceParam(
data=io.BytesIO(item.data),
media_type='application/pdf',
type='base64',
),
type='document',
)
else:
raise RuntimeError('Only images and PDFs are supported for binary content')
yield AnthropicModel._map_binary_data(item.data, item.media_type)
elif isinstance(item, ImageUrl):
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
if item.force_download:
downloaded = await download_item(item, data_format='bytes')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also respect force_download for DocumentUrl + item.media_type == 'application/pdf' further down, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use _map_binary_content there, if we make it more generic so it can take the result of download_item? (Or take a FileUrl | BinaryContent and if it gets a FileUrl, do the download first)

yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
else:
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
elif isinstance(item, DocumentUrl):
if item.media_type == 'application/pdf':
yield BetaBase64PDFBlockParam(source={'url': item.url, 'type': 'url'}, type='document')
if item.force_download:
downloaded = await download_item(item, data_format='bytes')
yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
else:
yield BetaRequestDocumentBlockParam(
source={'url': item.url, 'type': 'url'}, type='document'
)
elif item.media_type == 'text/plain':
downloaded_item = await download_item(item, data_format='text')
yield BetaBase64PDFBlockParam(
yield BetaRequestDocumentBlockParam(
source=BetaPlainTextSourceParam(
data=downloaded_item['data'], media_type=item.media_type, type='text'
),
Expand Down
4 changes: 1 addition & 3 deletions pydantic_ai_slim/pydantic_ai/models/gemini.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations as _annotations

import base64
from collections.abc import AsyncIterator, Sequence
from contextlib import asynccontextmanager
from dataclasses import dataclass, field
Expand Down Expand Up @@ -375,9 +374,8 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[_GeminiPartUnion]
if isinstance(item, str):
content.append({'text': item})
elif isinstance(item, BinaryContent):
base64_encoded = base64.b64encode(item.data).decode('utf-8')
content.append(
_GeminiInlineDataPart(inline_data={'data': base64_encoded, 'mime_type': item.media_type})
_GeminiInlineDataPart(inline_data={'data': item.base64, 'mime_type': item.media_type})
)
elif isinstance(item, VideoUrl) and item.is_youtube:
file_data = _GeminiFileDataPart(file_data={'file_uri': item.url, 'mime_type': item.media_type})
Expand Down
Loading