Audio question answering¶

This example demonstrates how to ask a question about the content of an audio file using the Gemini API.

Import the necessary libraries

from google import genai
from google.genai import types
import requests
import os

Replace with your actual API key

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

Define a descriptive User-Agent following Wikimedia's policy

user_agent = "GeminiByExample/1.0 (https://github.com/strickvl/geminibyexample; contact@example.org) python-requests/2.0"

Download the audio file from the URL

url = "https://upload.wikimedia.org/wikipedia/commons/1/1f/%22DayBreak%22_with_Jay_Young_on_the_USA_Radio_Network.ogg"
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
response.raise_for_status()  # Raise an exception for bad status codes

Save the content to a variable Note: If your audio file is larger than 20MB, you should use the File API to upload the file first. The File API allows you to upload larger files and then reference them in your requests.

audio_bytes = response.content

You can now pass that audio file along with the prompt to Gemini

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        "What is the main topic of this audio?",
        types.Part.from_bytes(
            data=audio_bytes,
            mime_type="audio/ogg",
        ),
    ],
)

print(response.text)

Running the Example¶

First, install the Google Generative AI library and requests

$ pip install google-genai requests

Then run the program with Python

$ python audio-question.py
This audio features a male host and a travel expert, Pete Trabucco.

Further Information¶

Gemini docs link 1