CRM Data Q&A Agent - Local Parquet-based Analytics


	This is a 📊 Data Analytics Agent that analyzes Salesforce data from local Parquet files using DuckDB.

The agent demonstrates an advanced Retrieval-Augmented Generation workflow in a multi-agentic system with contextualized Natural-Language-to-SQL components powered by Long Context and In-Context Learning capabilities of Gemini 2.5 Pro.

🚀 Blog post: Forget vibe coding, vibe Business Intelligence is here!

The agent is built with Agent Development Kit and runs completely locally using:

DuckDB for high-performance SQL queries on Parquet files
Local file storage for session management
Gemini API for natural language processing (only cloud dependency)

The agent interprets questions about state of the business how it's reflected in CRM rather than directly referring to Salesforce data entities.
It generates SQL query to gather data necessary for answering the question using DuckDB
It creates interactive Vega-Lite diagrams.
It analyzes the results, provides key insights and recommended actions.

What are our best lead source in every country?

Agent Development Kit

The agent is built using Agent Development Kit (ADK) - a flexible and modular framework for developing and deploying AI agents.

The sample also demonstrates:

How to build a Web UI for ADK-based data agents using streamlit.
How to use Artifact Services with ADK.
How to stream and interpret session events.
How to create and use a custom Session Service.

🕵🏻‍♀️ Simple questions are complex

Examples of questions the agent can answer

"Top 5 customers in every country"
"What are our best lead sources?"
- or more specific "What are our best lead sources by value?"
Lead conversion trends in the US.

High-Level Design

🚀 Local Installation and Setup

This agent runs completely locally with minimal dependencies. No Google Cloud project or billing account required!

Prerequisites

Python 3.11 or higher
uv (recommended) or pip for package management
Gemini API key (free tier available)

Quick Start

Clone this repository:

git clone https://github.com/vladkol/crm-data-agent && cd crm-data-agent

Create a Python virtual environment:

# Using uv (recommended)
uv venv .venv --python 3.11 && source .venv/bin/activate

# Or using standard Python
python -m venv .venv && source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

# Using uv
uv pip install -r src/requirements.txt

# Or using pip
pip install -r src/requirements.txt

Set up configuration:

Create a .env file in the src directory using the template:

cp src/.env-template src/.env

Edit src/.env and add your Gemini API key (see Configuration section below).

Run the application:

./run_local.sh

Open your browser:

Navigate to http://localhost:8080 to start using the agent.

Configuration

The local setup requires minimal configuration. Create a src/.env file with the following variables:

Required Configuration

GEMINI_API_KEY - [REQUIRED] Your Gemini API key. Get one free at Google AI Studio.

Optional Configuration

LOCAL_DATA_DIR - [OPTIONAL] Directory for Parquet data files (default: sample-data)

SESSION_STORAGE_DIR - [OPTIONAL] Directory for session storage (default: .sessions)

SFDC_METADATA_FILE - [OPTIONAL] Salesforce metadata file (default: sfdc_metadata.json)

Example .env file

# Required: Get your free API key from https://aistudio.google.com/apikey
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Customize data and session directories
LOCAL_DATA_DIR=sample-data
SESSION_STORAGE_DIR=.sessions
SFDC_METADATA_FILE=sfdc_metadata.json

Data Setup

The application automatically manages sample data for you:

Automatic Download: On first run, the application will automatically download sample Parquet files to the sample-data directory.
Custom Data: You can add your own Parquet files to the sample-data directory. The application will automatically detect and load them.
Data Structure: The sample data includes typical Salesforce objects:
- Account, Contact, Lead, Opportunity
- Case, Task, Event
- User, RecordType, CurrencyType
- Historical data (CaseHistory, OpportunityHistory)

🔧 Troubleshooting

Common Issues and Solutions

1. Application Won't Start

Problem: Error when running ./run_local.sh

Solutions:

Ensure you have Python 3.11+ installed: python --version
Verify virtual environment is activated: which python should point to .venv
Check if all dependencies are installed: uv pip list or pip list
Make sure the script is executable: chmod +x run_local.sh

2. Missing Gemini API Key

Problem: "API key not found" or authentication errors

Solutions:

Verify your .env file exists in the src directory
Check that GEMINI_API_KEY is set in your .env file
Get a free API key from Google AI Studio
Ensure there are no extra spaces or quotes around the API key

3. Data Loading Issues

Problem: "No data found" or "Table not found" errors

Solutions:

Check if sample-data directory exists and contains .parquet files
Run the data deployment script manually: python utils/deploy_demo_data.py
Verify file permissions: ensure the application can read the sample-data directory
Check disk space: ensure you have enough space for the sample data (~50MB)

4. Port Already in Use

Problem: "Port 8080 is already in use"

Solutions:

Kill existing processes: lsof -ti:8080 | xargs kill -9
Use a different port by modifying the startup script
Check for other applications using port 8080

5. DuckDB Connection Issues

Problem: Database connection or SQL execution errors

Solutions:

Restart the application to reinitialize DuckDB
Check if Parquet files are corrupted by opening them with another tool
Verify Python has write permissions to create temporary DuckDB files
Clear any cached DuckDB files and restart

6. Memory Issues

Problem: Application crashes or runs slowly with large datasets

Solutions:

Increase available memory for the Python process
Use smaller datasets for testing
Monitor memory usage with top or htop
Consider using DuckDB's memory limit settings

7. Web Interface Issues

Problem: Blank page or JavaScript errors in browser

Solutions:

Clear browser cache and cookies
Try a different browser or incognito mode
Check browser console for JavaScript errors
Ensure the FastAPI server is running properly

Getting Help

If you encounter issues not covered here:

Check the Detailed Troubleshooting Guide for comprehensive solutions
Check the application logs for detailed error messages
Verify your environment matches the prerequisites
Try running with a fresh virtual environment
Check the GitHub Issues for similar problems

Performance Tips

First Run: Initial startup may be slower as data is downloaded and indexed
Query Performance: Complex queries on large datasets may take a few seconds
Memory Usage: DuckDB is memory-efficient but performance improves with more RAM
Data Size: Keep individual Parquet files under 1GB for optimal performance

🏃‍♂️ Running the Application

Once everything is set up:

# Start the application
./run_local.sh

# Open in your browser
open http://localhost:8080

The application will:

Initialize DuckDB and load Parquet data
Start the web server on port 8080
Be ready to answer your business questions!

🎯 What's Different in This Local Version

This version has been specifically adapted for local development and removes all Google Cloud dependencies:

No Cloud Setup Required: Runs entirely on your local machine
DuckDB Instead of BigQuery: High-performance local SQL engine
Local File Storage: No Firestore dependency for sessions
Minimal Configuration: Only requires a Gemini API key
Automatic Data Management: Downloads and manages sample data automatically
Same Great Features: All the AI-powered analytics capabilities you expect

📚 Additional Resources

Troubleshooting Guide: Comprehensive solutions for common issues
Agent Development Kit Documentation: Learn more about the underlying framework
DuckDB Documentation: Understanding the local SQL engine
Gemini API Documentation: Working with the AI capabilities

📃 License

This repository is licensed under the Apache 2.0 License - see the LICENSE file for details.

🗒️ Disclaimers

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.kiro/specs/local-parquet-migration		.kiro/specs/local-parquet-migration
metadata		metadata
src		src
tutorial/img		tutorial/img
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
run_local.sh		run_local.sh
simple_sql_test.py		simple_sql_test.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CRM Data Q&A Agent - Local Parquet-based Analytics

Agent Development Kit

🕵🏻‍♀️ Simple questions are complex

Examples of questions the agent can answer

High-Level Design

🚀 Local Installation and Setup

Prerequisites

Quick Start

Configuration

Required Configuration

Optional Configuration

Example .env file

Data Setup

🔧 Troubleshooting

Common Issues and Solutions

1. Application Won't Start

2. Missing Gemini API Key

3. Data Loading Issues

4. Port Already in Use

5. DuckDB Connection Issues

6. Memory Issues

7. Web Interface Issues

Getting Help

Performance Tips

🏃‍♂️ Running the Application

🎯 What's Different in This Local Version

📚 Additional Resources

📃 License

🗒️ Disclaimers

About

Uh oh!

Releases

Packages

Languages

License

izhangzhihao/agentic-data-team

Folders and files

Latest commit

History

Repository files navigation

CRM Data Q&A Agent - Local Parquet-based Analytics

Agent Development Kit

🕵🏻‍♀️ Simple questions are complex

Examples of questions the agent can answer

High-Level Design

🚀 Local Installation and Setup

Prerequisites

Quick Start

Configuration

Required Configuration

Optional Configuration

Example .env file

Data Setup

🔧 Troubleshooting

Common Issues and Solutions

1. Application Won't Start

2. Missing Gemini API Key

3. Data Loading Issues

4. Port Already in Use

5. DuckDB Connection Issues

6. Memory Issues

7. Web Interface Issues

Getting Help

Performance Tips

🏃‍♂️ Running the Application

🎯 What's Different in This Local Version

📚 Additional Resources

📃 License

🗒️ Disclaimers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages