PDF and CSV data analysis and summarization¶
This example demonstrates how to use the Gemini API to analyze data from PDF and CSV files.
Import necessary libraries
Initialize the Gemini client with your API key
We start with the PDF analysis. Download the PDF file
pdf_url = "https://www.princexml.com/samples/invoice/invoicesample.pdf"
pdf_data = httpx.get(pdf_url).content
Prompt to extract main players from the PDF
pdf_prompt = (
"Identify the main companies or entities mentioned in this invoice. "
"Summarize the data."
)
Generate content with the PDF and the prompt
pdf_response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[
types.Part.from_bytes(data=pdf_data, mime_type="application/pdf"),
pdf_prompt,
],
)
Print the PDF analysis result
Moving on to the CSV analysis now. You'll note that the process is very similar. You can also pass in code files, XML, RTF, Markdown, and more. We download the CSV file here.
csv_url = "https://gist.githubusercontent.com/suellenstringer-hye/f2231b3383538bcb1a5b051c7908f5b7/raw/0f4e0733a434733cda8e749bbbf33a93c2b5bbde/test.csv"
csv_data = httpx.get(csv_url).content
Prompt to analyze the CSV data
Generate content with the CSV data and the prompt
csv_response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[
types.Part.from_bytes(
data=csv_data,
mime_type="text/csv",
),
csv_prompt,
],
)
Print the CSV analysis result
Running the Example¶
First, install the Google Generative AI library and httpx (for downloading files)
Then run the program with Python
$ python pdf_csv_analysis.py
PDF Analysis Result:
The main company mentioned in the invoice is Sunny Farm.
CSV Analysis Result:
Okay, I've analyzed the provided data. Here's a summary of its contents:
**Data Format:**
* The data appears to be in CSV (Comma Separated Values) format.
* The first line is a header row defining the fields.
* Each subsequent line represents a record containing information about a person.
**Fields Present:**
The data includes the following fields for each person:
1. **first\_name:** The person's first name.
2. **last\_name:** The person's last name.
3. **company\_name:** The name of the company they are associated with.
4. **address:** The street address.
5. **city:** The city.
6. **county:** The county.
7. **state:** The state.
8. **zip:** The zip code.
9. **phone1:** The primary phone number.
10. **phone2:** A secondary phone number.
11. **email:** The email address.
12. **web:** The website address (presumably for the associated company).