-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix(provider/google): Handling of Image data from the tool call results. #8357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
{ | ||
type: 'media', | ||
mediaType: 'image/jpeg', | ||
data: output.base64Image!, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toModelOutput
function uses a non-null assertion on output.base64Image!
without checking if the property exists, which will cause issues when the tool execution fails.
View Details
📝 Patch Details
diff --git a/examples/ai-core/src/generate-text/google-image-tool-results.ts b/examples/ai-core/src/generate-text/google-image-tool-results.ts
index 124b24421..0daf9c833 100644
--- a/examples/ai-core/src/generate-text/google-image-tool-results.ts
+++ b/examples/ai-core/src/generate-text/google-image-tool-results.ts
@@ -34,14 +34,26 @@ const imageAnalysisTool = tool({
}
},
- toModelOutput(output: { base64Image?: string }) {
+ toModelOutput(output: { base64Image?: string; success?: boolean; error?: string }) {
+ if (!output.base64Image) {
+ return {
+ type: 'content',
+ value: [
+ {
+ type: 'text',
+ text: output.error || 'Failed to fetch image',
+ },
+ ],
+ };
+ }
+
return {
type: 'content',
value: [
{
type: 'media',
mediaType: 'image/jpeg',
- data: output.base64Image!,
+ data: output.base64Image,
},
],
};
Analysis
The toModelOutput
function assumes base64Image
is always present in the output object and uses the non-null assertion operator (!
) to access it. However, when the execute
function catches an error (lines 29-34), it returns an object with success: false
and error
properties, but no base64Image
property.
When the tool execution fails, output.base64Image
will be undefined
, and output.base64Image!
will still be undefined
, causing the media part to have data: undefined
. This will likely cause runtime errors or incorrect behavior when the Google Generative AI API receives undefined data for an image.
The fix should check if base64Image
exists before creating the media content, or handle the error case differently in toModelOutput
. For example:
toModelOutput(output: { base64Image?: string; success?: boolean; error?: string }) {
if (!output.base64Image) {
return {
type: 'content',
value: [
{
type: 'text',
text: output.error || 'Failed to fetch image',
},
],
};
}
return {
type: 'content',
value: [
{
type: 'media',
mediaType: 'image/jpeg',
data: output.base64Image,
},
],
};
}
@lgrammel ig Documentation for this is not needed right ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the example fails without your changes, and works once I add them. The description I got was
The image contains an abstract piece of art. It features various shapes and vibrant colors, resembling a modern, non-representational painting. The colors are predominantly bright, with visible brushstrokes and layering, creating a sense of depth and movement within the composition.
Which is somewhat odd? I tried to ask it what animal it sees in the image but in response to that I just get
I'm sorry, I cannot fulfill this request. The
analyzeImage
tool provided an image, but I am unable to process images to identify specific content like animals.
Not sure what's going on?
const base64Image = await urlToBase64( | ||
'https://images.unsplash.com/photo-1751225750479-43ad27b94fa0?w=900&auto=format&fit=crop&q=60&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxmZWF0dXJlZC1waG90b3MtZmVlZHwyfHx8ZW58MHx8fHx8fHx8', | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use a local test file instead? We have https://github.com/vercel/ai/blob/e83bfe37fbed408c7af127d5a256f82ea8e36fe9/examples/ai-core/data/comic-cat.png
This reply you got after removing my changes right ? |
@gr2m That's because we can't directly pass the image as function call response to the gemini, even if we pass it is unable to detect it, only trick I found in the forum that other developers uses is to pass the image as normal message parts to the gemini. |
I got both of the replies with your code built-in. So yes with your changes the error goes away, but it still doesn't seem to be working, which might be even worse?
but have you seen that working somewhere? I'd love to see an example |
It is working, I tried it for the 10-20 times everytime seems to work the reply I got was this -> the image is of cat. Which is correct right ? |
Last time I checked, Gemini didn't support vision from tool results.
Did you test with different images and verified the description matches each image correctly? Sometimes the model can hallucinate, specially when given a bunch of base64 text as input. |
|
I ran the example a few times and looks like the image is correctly sent and analyzed by Google A few times I got an odd result like the one below. Looks like it mixes reasoning with result text?
Might be unrelated to this PR. Have you seen it happen before? I'll do some more digging |
I also got another response where it didn't work correctly:{
"candidates": [
{
"content": {
"parts": [
{
"text": "thought\nThe user asked \"Whats in this image?\".\nI previously called the `analyzeImage()` tool, which returned `{\"analyzeImage_response\": {\"content\": \"Tool execution completed\", \"name\": \"analyzeImage\"}}`.\nThis response indicates that the tool *executed* but it did *not* provide a description of the image content. It simply stated \"Tool execution completed\".\nTherefore, I need to acknowledge that the tool ran but I don't have the information yet.\nSince I don't have the content of the image, I should ask the user to describe it or indicate that I cannot see it.\nGiven that the `analyzeImage` tool is supposed to analyze the image, and the output was \"Tool execution completed\" with no actual analysis, it suggests that the tool either failed to provide the analysis or simply indicated completion without providing the content.\n\nSince I don't have the actual content of the image, I cannot tell the user what is in it. I should inform the user that I cannot see the image."
},
{
"text": "I'm sorry, I cannot tell you what is in the image. My tool execution completed but it did not return any content about the image."
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 320,
"candidatesTokenCount": 245,
"totalTokenCount": 565,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 62
},
{
"modality": "IMAGE",
"tokenCount": 258
}
]
},
"modelVersion": "gemini-2.5-flash",
"responseId": "ucy8aNSAHIGVmtkPh92u-A4"
} |
Here is yet another one but this time I also logged the request body
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation does look good to me. But let me talk to the team about the unpredictable problems we see, might be a problem with the model provider, not the SDK.
The other thing I want us to look into is the thought\nThe user asked...
in some of the response, I think we should somehow parse that into reasoning, but again, not sure what the best approach is.
I have updated the approach to handle this more correctly, now it is working without giving thoughts and also not giving that abrupt response unless it misses to call the tool. Check now @gr2m |
Background
Gemini was not able to identify images returned from the tool calls.
Summary
Added handling of image data type from the tool call results.
Manual Verification
Added an example in generate-text to let the llm see image via tool call.
Tasks
pnpm changeset
in the project root)pnpm prettier-fix
in the project root)Related Issues
Fixes #8180