-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
Description
Description
It seems that callingtoModelOutput
when using models from @ai-sdk/google
does not map the content properly, and the models cant "see" the images.
If I ask sonnet "Whats in this image", and it calls a tool that responds with an image, sonnet will be able to "see" and describe the image, but when using gemini, it doens't work.
I suspect it has something to do with the convert-to-google-generative-ai-messages.ts
file, will probably take a look at it later.
Heres an example for my tool - it just responds with an base64 image:
...tool implementation
return [{
base64Image: await urlToBase64("https://images.unsplash.com/photo-1751225750479-43ad27b94fa0?w=900&auto=format&fit=crop&q=60&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxmZWF0dXJlZC1waG90b3MtZmVlZHwyfHx8ZW58MHx8fHx8")
}]
// return results
},
toModelOutput: (output: {
base64Image: string;
}[]) => {
return {
type: "content",
value: output.map((result) => ({
type: "media",
mediaType: "image/jpeg",
data: result.base64Image,
})),
}
},
Claude will tell me whats in the image, gemini wont.
AI SDK Version
- ai: 5.0.19
- @ai-sdk/google: 2.0.7
- @ai-sdk/anthropic: 2.0.5
Code of Conduct
- I agree to follow this project's Code of Conduct
nicolas-chaulet