Skip to content

Google models can't "see" images returned from tool calls #8180

@alonzuman

Description

@alonzuman

Description

It seems that callingtoModelOutput when using models from @ai-sdk/google does not map the content properly, and the models cant "see" the images.

If I ask sonnet "Whats in this image", and it calls a tool that responds with an image, sonnet will be able to "see" and describe the image, but when using gemini, it doens't work.

I suspect it has something to do with the convert-to-google-generative-ai-messages.ts file, will probably take a look at it later.

Heres an example for my tool - it just responds with an base64 image:

...tool implementation
return [{
        base64Image: await urlToBase64("https://images.unsplash.com/photo-1751225750479-43ad27b94fa0?w=900&auto=format&fit=crop&q=60&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxmZWF0dXJlZC1waG90b3MtZmVlZHwyfHx8ZW58MHx8fHx8")
      }]

      // return results
    },
    toModelOutput: (output: {
      base64Image: string;
    }[]) => {

      return {
        type: "content",
        value: output.map((result) => ({
          type: "media",
          mediaType: "image/jpeg",
          data: result.base64Image,
        })),
      }
    },

Claude will tell me whats in the image, gemini wont.

AI SDK Version

  • ai: 5.0.19
  • @ai-sdk/google: 2.0.7
  • @ai-sdk/anthropic: 2.0.5

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions