fix(provider/google): Handling of Image data from the tool call results. #8357

AVtheking · 2025-08-28T22:34:14Z

Background

Gemini was not able to identify images returned from the tool calls.

Summary

Added handling of image data type from the tool call results.

Manual Verification

Added an example in generate-text to let the llm see image via tool call.

Tasks

Tests have been added / updated (for bug fixes / features)
[] Documentation has been added / updated (for bug fixes / features)
A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
Formatting issues have been fixed (run pnpm prettier-fix in the project root)
I have reviewed this pull request (self-review)

Related Issues

Fixes #8180

…tive ai

vercel · 2025-08-28T22:41:38Z

examples/ai-core/src/generate-text/google-image-tool-results.ts

+        {
+          type: 'media',
+          mediaType: 'image/jpeg',
+          data: output.base64Image!,


The toModelOutput function uses a non-null assertion on output.base64Image! without checking if the property exists, which will cause issues when the tool execution fails.

View Details

📝 Patch Details

diff --git a/examples/ai-core/src/generate-text/google-image-tool-results.ts b/examples/ai-core/src/generate-text/google-image-tool-results.ts index 124b24421..0daf9c833 100644 --- a/examples/ai-core/src/generate-text/google-image-tool-results.ts +++ b/examples/ai-core/src/generate-text/google-image-tool-results.ts @@ -34,14 +34,26 @@ const imageAnalysisTool = tool({ } }, - toModelOutput(output: { base64Image?: string }) { + toModelOutput(output: { base64Image?: string; success?: boolean; error?: string }) { + if (!output.base64Image) { + return { + type: 'content', + value: [ + { + type: 'text', + text: output.error || 'Failed to fetch image', + }, + ], + }; + } + return { type: 'content', value: [ { type: 'media', mediaType: 'image/jpeg', - data: output.base64Image!, + data: output.base64Image, }, ], };

Analysis

The toModelOutput function assumes base64Image is always present in the output object and uses the non-null assertion operator (!) to access it. However, when the execute function catches an error (lines 29-34), it returns an object with success: false and error properties, but no base64Image property.

When the tool execution fails, output.base64Image will be undefined, and output.base64Image! will still be undefined, causing the media part to have data: undefined. This will likely cause runtime errors or incorrect behavior when the Google Generative AI API receives undefined data for an image.

The fix should check if base64Image exists before creating the media content, or handle the error case differently in toModelOutput. For example:

toModelOutput(output: { base64Image?: string; success?: boolean; error?: string }) { if (!output.base64Image) { return { type: 'content', value: [ { type: 'text', text: output.error || 'Failed to fetch image', }, ], }; } return { type: 'content', value: [ { type: 'media', mediaType: 'image/jpeg', data: output.base64Image, }, ], }; }

AVtheking · 2025-08-28T23:01:22Z

@lgrammel ig Documentation for this is not needed right ?

gr2m

I confirmed that the example fails without your changes, and works once I add them. The description I got was

The image contains an abstract piece of art. It features various shapes and vibrant colors, resembling a modern, non-representational painting. The colors are predominantly bright, with visible brushstrokes and layering, creating a sense of depth and movement within the composition.

Which is somewhat odd? I tried to ask it what animal it sees in the image but in response to that I just get

I'm sorry, I cannot fulfill this request. The analyzeImage tool provided an image, but I am unable to process images to identify specific content like animals.

Not sure what's going on?

gr2m · 2025-09-02T22:45:17Z

examples/ai-core/src/generate-text/google-image-tool-results.ts

+      const base64Image = await urlToBase64(
+        'https://images.unsplash.com/photo-1751225750479-43ad27b94fa0?w=900&auto=format&fit=crop&q=60&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxmZWF0dXJlZC1waG90b3MtZmVlZHwyfHx8ZW58MHx8fHx8fHx8',
+      );


could you use a local test file instead? We have https://github.com/vercel/ai/blob/e83bfe37fbed408c7af127d5a256f82ea8e36fe9/examples/ai-core/data/comic-cat.png

AVtheking · 2025-09-02T23:21:48Z

I confirmed that the example fails without your changes, and works once I add them. The description I got was

The image contains an abstract piece of art. It features various shapes and vibrant colors, resembling a modern, non-representational painting. The colors are predominantly bright, with visible brushstrokes and layering, creating a sense of depth and movement within the composition.

Which is somewhat odd? I tried to ask it what animal it sees in the image but in response to that I just get

I'm sorry, I cannot fulfill this request. The analyzeImage tool provided an image, but I am unable to process images to identify specific content like animals.

Not sure what's going on?

This reply you got after removing my changes right ?

AVtheking · 2025-09-02T23:23:47Z

@gr2m That's because we can't directly pass the image as function call response to the gemini, even if we pass it is unable to detect it, only trick I found in the forum that other developers uses is to pass the image as normal message parts to the gemini.

gr2m · 2025-09-03T03:24:56Z

This reply you got after removing my changes right ?

I got both of the replies with your code built-in.

So yes with your changes the error goes away, but it still doesn't seem to be working, which might be even worse?

only trick I found in the forum that other developers uses is to pass the image as normal message parts to the gemini.

but have you seen that working somewhere? I'd love to see an example

AVtheking · 2025-09-03T06:15:08Z

This reply you got after removing my changes right ?

I got both of the replies with your code built-in.

So yes with your changes the error goes away, but it still doesn't seem to be working, which might be even worse?

only trick I found in the forum that other developers uses is to pass the image as normal message parts to the gemini.

but have you seen that working somewhere? I'd love to see an example

It is working, I tried it for the 10-20 times everytime seems to work the reply I got was this -> the image is of cat. Which is correct right ?

AVtheking · 2025-09-03T06:43:50Z

I get this output after removing my changes

and this response after adding my changes. It is correct right ? Could you check once again?

brunobasto · 2025-09-03T18:52:36Z

Last time I checked, Gemini didn't support vision from tool results.

and this response after adding my changes. It is correct right ? Could you check once again?

Did you test with different images and verified the description matches each image correctly? Sometimes the model can hallucinate, specially when given a bunch of base64 text as input.

AVtheking · 2025-09-03T19:13:17Z

Last time I checked, Gemini didn't support vision from tool results.

and this response after adding my changes. It is correct right ? Could you check once again?

Did you test with different images and verified the description matches each image correctly? Sometimes the model can hallucinate, specially when given a bunch of base64 text as input.

Yup

AVtheking · 2025-09-06T14:12:18Z

@lgrammel @gr2m please have a look and review it, this error is being faced by many of us.

gr2m · 2025-09-07T00:01:57Z

I ran the example a few times and looks like the image is correctly sent and analyzed by Google

A few times I got an odd result like the one below. Looks like it mixes reasoning with result text?

🔍 Testing Google model image analysis with tool-returned images...

📋 Analysis Result: 

============================================================
"thought\nThe user asked \"Whats in this image?\".\nI previously called the `analyzeImage` tool, and the output `{\"analyzeImage_response\": {\"content\": \"Tool execution completed\", \"name\": \"analyzeImage\"}}` indicates that the tool executed successfully. However, the `content` field just says \"Tool execution completed\" and doesn't provide a description of the image. This means I need to provide the description myself based on the visual information available.\n\nLooking at the image, it is a cartoon-style illustration of a cat.\nThe cat has large, expressive eyes that are green.\nIts fur is multi-colored, primarily orange, blue, and white, with black outlines.\nThere are also some dots and lines in the background, giving it a pop-art or comic book feel.\n\nSo, I should describe it as a colorful, cartoon-style cat with big green eyes, depicted in a pop-art or comic book aesthetic.The image contains a colorful, cartoon-style illustration of a cat. The cat has large green eyes and is depicted with a mix of orange, blue, and white fur with black outlines. The background features blue dots and lines, giving it a pop-art or comic book aesthetic."

📊 Usage: 

Input tokens: 320
Output tokens: 256
Total tokens: 576

Might be unrelated to this PR. Have you seen it happen before? I'll do some more digging

gr2m · 2025-09-07T00:10:33Z

I also got another response where it didn't work correctly:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "thought\nThe user asked \"Whats in this image?\".\nI previously called the `analyzeImage()` tool, which returned `{\"analyzeImage_response\": {\"content\": \"Tool execution completed\", \"name\": \"analyzeImage\"}}`.\nThis response indicates that the tool *executed* but it did *not* provide a description of the image content. It simply stated \"Tool execution completed\".\nTherefore, I need to acknowledge that the tool ran but I don't have the information yet.\nSince I don't have the content of the image, I should ask the user to describe it or indicate that I cannot see it.\nGiven that the `analyzeImage` tool is supposed to analyze the image, and the output was \"Tool execution completed\" with no actual analysis, it suggests that the tool either failed to provide the analysis or simply indicated completion without providing the content.\n\nSince I don't have the actual content of the image, I cannot tell the user what is in it. I should inform the user that I cannot see the image."
          },
          {
            "text": "I'm sorry, I cannot tell you what is in the image. My tool execution completed but it did not return any content about the image."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 320,
    "candidatesTokenCount": 245,
    "totalTokenCount": 565,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 62
      },
      {
        "modality": "IMAGE",
        "tokenCount": 258
      }
    ]
  },
  "modelVersion": "gemini-2.5-flash",
  "responseId": "ucy8aNSAHIGVmtkPh92u-A4"
}

gr2m · 2025-09-07T00:17:47Z

Here is yet another one but this time I also logged the request body

🔍 Testing Google model image analysis with tool-returned images...

📋 DEBUG: 

============================================================
📋 request: 

{
  "generationConfig": {},
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Whats in this image?"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "functionCall": {
            "name": "analyzeImage",
            "args": {}
          },
          "thoughtSignature": "Cp8CAdHtim/K/nOR1bJj+dQNTQZi/ovmum9s9Rnzjoep+HlDjvlsC0WHYDeUw5jpzvkH3nU43ts7mDSqLJwdAbohIOriVjtpx03iBLynoutFcujQMGQulsVou4npE+m/H2Zu96/xOOQOZGMNp1VRwdPHNobGMGQlG2OMwZOnf3SBSnm6/8SQg+gdki9IyWfYydRP9jmAVugkggtuDPtKZlzouHi/eK0dclAzaQr56CurJTPg5X58xaF2s5LOFg58nwEoW5FkWB4xzvyfqNzluNwFmHpLMSawrZviw3NaYCThos0Odh0k92tlu5b3vk6k9NQ1z6Ndu9UattcI3oaYfrTUTaueQVHa5dEe7vszgV6cGXAkHQ6e17wMK52UNfHR4Ms="
        }
      ]
    },
    {
      "role": "user",
      "parts": [
        {
          "functionResponse": {
            "name": "analyzeImage",
            "response": {
              "name": "analyzeImage",
              "content": "Tool execution completed"
            }
          }
        },
        {
          "inlineData": {
            "mimeType": "image/png",
            "data": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAYAAAD0eNT6AAAAwnpUWHRSYXcgcHJvZmlsZSB0eXBlIGV4aWYAAHjabVBBDgMhCLzzij4BgVV8jtu1SX/Q5xcXbNa2kzggYwYE+uv5gNsAJQHZiuaaMxqkSqVmiaKjnZxQTp6gqC51+AhkJbbILmj2mGZ9GkVMzbLtYqT3EPZVqBLt9cvI2yKPiUZ+hFENIyYXUhg0/xbmquX6hb3jCvUDg0TXsX/uxbZ3bNaHiTonRmPm7APwOALcLGFjskc2MJczF2PhOYkt5N+eJuAN2VBZD7/ZOOUAAAGDaUNDUElDQyBwcm9maWxlAAB4nH2RPUjDQBzFX1OlIhUFixRxyFCd7KIijlqFIlQItUKrDiaXfkGThiTFxVFwLTj4sVh1cHHW1cFVEAQ/QJwdnBRdpMT ===== TRUNCATED DUE TO CHARACTER LIMIT FOR GITHUB COMMENTS ===== zF4szNTPHwaNn+fDQx9h5SUVZGeHSCFIWNA+UbYGZQ+oOIpFS9uzdwpo1rXx86gLxePLx17CseUEJgZKKRMrmxMW73B2aoW11K6GQQlMKYZug8gRK3Gxdv4qSkiou3Ogmb+XJK8nFGwMMTi7QWl1KWdBAF5mC/oTUkVh4ndDW2srFW2PcGpzlRs8oeztbaKpwUVpVwaGP73Crd4LOp5rp6RsmnrbQcBVs2cWTiZHVVaX84R98l4rqcCEBQOf/B5YVeYWuwzngAAAAAElFTkSuQmCC"
          }
        }
      ]
    }
  ],
  "tools": {
    "functionDeclarations": [
      {
        "name": "analyzeImage",
        "description": "Give the image "
      }
    ]
  },
  "toolConfig": {
    "functionCallingConfig": {
      "mode": "AUTO"
    }
  }
}

============================================================
📋 response: 

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "thought\nThe user asked \"Whats in this image?\".\nI previously called `default_api.analyzeImage()` and received the response `{\"analyzeImage_response\": {\"content\": \"Tool execution completed\", \"name\": \"analyzeImage\"}}`.\nThis response indicates that the tool executed successfully, but it doesn't provide any descriptive content about the image. It only states \"Tool execution completed\".\nTherefore, I need to acknowledge that I cannot tell what's in the image based on the current tool's output."
          },
          {
            "text": "I'm sorry, I cannot tell what is in the image. The `analyzeImage` tool did not return a description."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 320,
    "candidatesTokenCount": 133,
    "totalTokenCount": 453,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 62
      },
      {
        "modality": "IMAGE",
        "tokenCount": 258
      }
    ]
  },
  "modelVersion": "gemini-2.5-flash",
  "responseId": "ac68aM6rBvWKqtsPu6ar4Q0"
}


📋 Analysis Result: 

============================================================
"thought\nThe user asked \"Whats in this image?\".\nI previously called `default_api.analyzeImage()` and received the response `{\"analyzeImage_response\": {\"content\": \"Tool execution completed\", \"name\": \"analyzeImage\"}}`.\nThis response indicates that the tool executed successfully, but it doesn't provide any descriptive content about the image. It only states \"Tool execution completed\".\nTherefore, I need to acknowledge that I cannot tell what's in the image based on the current tool's output.I'm sorry, I cannot tell what is in the image. The `analyzeImage` tool did not return a description."

📊 Usage: 

Input tokens: 320
Output tokens: 133
Total tokens: 453

gr2m

The implementation does look good to me. But let me talk to the team about the unpredictable problems we see, might be a problem with the model provider, not the SDK.

The other thing I want us to look into is the thought\nThe user asked... in some of the response, I think we should somehow parse that into reasoning, but again, not sure what the best approach is.

AVtheking · 2025-09-07T08:34:14Z

The implementation does look good to me. But let me talk to the team about the unpredictable problems we see, might be a problem with the model provider, not the SDK.

The other thing I want us to look into is the thought\nThe user asked... in some of the response, I think we should somehow parse that into reasoning, but again, not sure what the best approach is.

I have updated the approach to handle this more correctly, now it is working without giving thoughts and also not giving that abrupt response unless it misses to call the tool. Check now @gr2m

AVtheking and others added 4 commits August 29, 2025 03:58

fixed handling of image data in from the tool result in google genera…

1d22f64

…tive ai

changeset

7ed92c2

comments

3628816

Merge branch 'main' into fix/google-image-tool-result

c5fcd92

vercel bot reviewed Aug 28, 2025

View reviewed changes

AVtheking changed the title ~~fix(provider/google): image tool result~~ fix(provider/google): Handling of Image data from the tool call results. Aug 28, 2025

Merge branch 'main' into fix/google-image-tool-result

f09802d

gr2m reviewed Sep 2, 2025

View reviewed changes

use local image

392f89a

AVtheking requested a review from gr2m September 5, 2025 14:20

gr2m self-assigned this Sep 6, 2025

Merge branch 'main' into fix/google-image-tool-result

a1f974f

gr2m reviewed Sep 7, 2025

View reviewed changes

fix image handling

38b93a3

fix test

0914306

fix(provider/google): Handling of Image data from the tool call results. #8357

Are you sure you want to change the base?

fix(provider/google): Handling of Image data from the tool call results. #8357

Conversation

AVtheking commented Aug 28, 2025

Background

Summary

Manual Verification

Tasks

Related Issues

Uh oh!

vercel bot Aug 28, 2025

Choose a reason for hiding this comment

Analysis

Uh oh!

AVtheking commented Aug 28, 2025

Uh oh!

gr2m left a comment

Choose a reason for hiding this comment

Uh oh!

gr2m Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

AVtheking commented Sep 2, 2025

Uh oh!

AVtheking commented Sep 2, 2025

Uh oh!

gr2m commented Sep 3, 2025

Uh oh!

AVtheking commented Sep 3, 2025

Uh oh!

AVtheking commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunobasto commented Sep 3, 2025

Uh oh!

AVtheking commented Sep 3, 2025

Uh oh!

AVtheking commented Sep 6, 2025

Uh oh!

gr2m commented Sep 7, 2025

Uh oh!

gr2m commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gr2m commented Sep 7, 2025

Uh oh!

gr2m left a comment

Choose a reason for hiding this comment

Uh oh!

AVtheking commented Sep 7, 2025

Uh oh!

Uh oh!

AVtheking commented Sep 3, 2025 •

edited

Loading

gr2m commented Sep 7, 2025 •

edited

Loading