Skip to content

Conversation

@mochow13
Copy link
Contributor

@mochow13 mochow13 commented Dec 7, 2025

Closes #3621

@DouweM DouweM changed the title #3621 - Pass s3:// file URLs directly to API in BedrockConverseModel Pass s3:// file URLs directly to API in BedrockConverseModel Dec 9, 2025
if item.url.startswith('s3://'):
source = {'s3Location': {'uri': item.url}}
else:
downloaded_item = await download_item(item, data_format='bytes', type_format='extension')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_item currently has logic gating for gs:// URLs; let's check s3:// URLs there as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the existing code in download_item checks for gs:// and youtube URLs:

    if item.url.startswith('gs://'):
        raise UserError('Downloading from protocol "gs://" is not supported.')
    elif isinstance(item, VideoUrl) and item.is_youtube:
        raise UserError('Downloading YouTube videos is not supported.')

What check do you mean for s3:// here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same check raising an error saying that download_item does not support s3:// URLs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok need to stop supporting download altogether. Updated.
Kept the check pretty simple. Not sure whether we should go for a proper url parsing here since the expectation is just bucketOwner param.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not drastically more code, I'd prefer proper URL parsing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not drastically more code. Used Python's urlib.

format = item.media_type.split('/')[1]
assert format in ('jpeg', 'png', 'gif', 'webp'), f'Unsupported image format: {format}'
image: ImageBlockTypeDef = {'format': format, 'source': {'bytes': downloaded_item['data']}}
image: ImageBlockTypeDef = {'format': format, 'source': cast(DocumentSourceTypeDef, source)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to cast if hint the type of source to be source: DocumentSourceTypeDef

Copy link
Contributor Author

@mochow13 mochow13 Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I only replaced any with the type previously 🤦
Updated to use proper hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass s3:// file URLs directly to API in BedrockConverseModel

4 participants