Image question answering¶
This example demonstrates how to use the Gemini API to analyze or understand images of cats, including using image URLs and base64 encoding.
Import necessary libraries
Replace with your Gemini API key
We'll start by using an image URL. Load an image of a cat from a URL
image_url = "https://cataas.com/cat"
image_response = requests.get(image_url)
image_content = image_response.content
Ask Gemini about the cat in the image
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=["What breed of cat is this?", types.Part.from_bytes(data=image_content, mime_type="image/jpeg")]
)
print("Response from URL Image:\n", response.text)
Now we'll use a local image file. Load a local image of a cat and encode it as Base64
Ensure the encoded string is a string
Ask Gemini a question about the cat, providing the image as a Base64 string
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=["Is this cat fluffy?", types.Part.from_bytes(data=base64.b64decode(encoded_string), mime_type="image/jpeg")]
)
print("\nResponse from Base64 Image:\n", response.text)
Running the Example¶
First, install the Google Generative AI library and requests
Download an example cat image (replace with your own if needed)
Then run the program with Python
$ python gemini-cat.py
Response from URL Image:
This looks like a British Shorthair cat.
Response from Base64 Image:
Yes, this cat appears to be fluffy.