This is a π Data Analytics Agent that analyzes Salesforce data from local Parquet files using DuckDB. |
The agent demonstrates an advanced Retrieval-Augmented Generation workflow in a multi-agentic system with contextualized Natural-Language-to-SQL components powered by Long Context and In-Context Learning capabilities of Gemini 2.5 Pro.
π Blog post: Forget vibe coding, vibe Business Intelligence is here!
The agent is built with Agent Development Kit and runs completely locally using:
- DuckDB for high-performance SQL queries on Parquet files
- Local file storage for session management
- Gemini API for natural language processing (only cloud dependency)
- The agent interprets questions about state of the business how it's reflected in CRM rather than directly referring to Salesforce data entities.
- It generates SQL query to gather data necessary for answering the question using DuckDB
- It creates interactive Vega-Lite diagrams.
- It analyzes the results, provides key insights and recommended actions.

The agent is built using Agent Development Kit (ADK) - a flexible and modular framework for developing and deploying AI agents.
The sample also demonstrates:
- How to build a Web UI for ADK-based data agents using streamlit.
- How to use Artifact Services with ADK.
- How to stream and interpret session events.
- How to create and use a custom Session Service.
- "Top 5 customers in every country"
- "What are our best lead sources?"
- or more specific "What are our best lead sources by value?"
- Lead conversion trends in the US.
This agent runs completely locally with minimal dependencies. No Google Cloud project or billing account required!
- Python 3.11 or higher
- uv (recommended) or pip for package management
- Gemini API key (free tier available)
- Clone this repository:
git clone https://github.com/vladkol/crm-data-agent && cd crm-data-agent
- Create a Python virtual environment:
# Using uv (recommended)
uv venv .venv --python 3.11 && source .venv/bin/activate
# Or using standard Python
python -m venv .venv && source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
# Using uv
uv pip install -r src/requirements.txt
# Or using pip
pip install -r src/requirements.txt
- Set up configuration:
Create a .env
file in the src
directory using the template:
cp src/.env-template src/.env
Edit src/.env
and add your Gemini API key (see Configuration section below).
- Run the application:
./run_local.sh
- Open your browser:
Navigate to http://localhost:8080
to start using the agent.
The local setup requires minimal configuration. Create a src/.env
file with the following variables:
GEMINI_API_KEY
- [REQUIRED] Your Gemini API key. Get one free at Google AI Studio.
LOCAL_DATA_DIR
- [OPTIONAL] Directory for Parquet data files (default: sample-data
)
SESSION_STORAGE_DIR
- [OPTIONAL] Directory for session storage (default: .sessions
)
SFDC_METADATA_FILE
- [OPTIONAL] Salesforce metadata file (default: sfdc_metadata.json
)
# Required: Get your free API key from https://aistudio.google.com/apikey
GEMINI_API_KEY=your_gemini_api_key_here
# Optional: Customize data and session directories
LOCAL_DATA_DIR=sample-data
SESSION_STORAGE_DIR=.sessions
SFDC_METADATA_FILE=sfdc_metadata.json
The application automatically manages sample data for you:
-
Automatic Download: On first run, the application will automatically download sample Parquet files to the
sample-data
directory. -
Custom Data: You can add your own Parquet files to the
sample-data
directory. The application will automatically detect and load them. -
Data Structure: The sample data includes typical Salesforce objects:
- Account, Contact, Lead, Opportunity
- Case, Task, Event
- User, RecordType, CurrencyType
- Historical data (CaseHistory, OpportunityHistory)
Problem: Error when running ./run_local.sh
Solutions:
- Ensure you have Python 3.11+ installed:
python --version
- Verify virtual environment is activated:
which python
should point to.venv
- Check if all dependencies are installed:
uv pip list
orpip list
- Make sure the script is executable:
chmod +x run_local.sh
Problem: "API key not found" or authentication errors
Solutions:
- Verify your
.env
file exists in thesrc
directory - Check that
GEMINI_API_KEY
is set in your.env
file - Get a free API key from Google AI Studio
- Ensure there are no extra spaces or quotes around the API key
Problem: "No data found" or "Table not found" errors
Solutions:
- Check if
sample-data
directory exists and contains.parquet
files - Run the data deployment script manually:
python utils/deploy_demo_data.py
- Verify file permissions: ensure the application can read the
sample-data
directory - Check disk space: ensure you have enough space for the sample data (~50MB)
Problem: "Port 8080 is already in use"
Solutions:
- Kill existing processes:
lsof -ti:8080 | xargs kill -9
- Use a different port by modifying the startup script
- Check for other applications using port 8080
Problem: Database connection or SQL execution errors
Solutions:
- Restart the application to reinitialize DuckDB
- Check if Parquet files are corrupted by opening them with another tool
- Verify Python has write permissions to create temporary DuckDB files
- Clear any cached DuckDB files and restart
Problem: Application crashes or runs slowly with large datasets
Solutions:
- Increase available memory for the Python process
- Use smaller datasets for testing
- Monitor memory usage with
top
orhtop
- Consider using DuckDB's memory limit settings
Problem: Blank page or JavaScript errors in browser
Solutions:
- Clear browser cache and cookies
- Try a different browser or incognito mode
- Check browser console for JavaScript errors
- Ensure the FastAPI server is running properly
If you encounter issues not covered here:
- Check the Detailed Troubleshooting Guide for comprehensive solutions
- Check the application logs for detailed error messages
- Verify your environment matches the prerequisites
- Try running with a fresh virtual environment
- Check the GitHub Issues for similar problems
- First Run: Initial startup may be slower as data is downloaded and indexed
- Query Performance: Complex queries on large datasets may take a few seconds
- Memory Usage: DuckDB is memory-efficient but performance improves with more RAM
- Data Size: Keep individual Parquet files under 1GB for optimal performance
Once everything is set up:
# Start the application
./run_local.sh
# Open in your browser
open http://localhost:8080
The application will:
- Initialize DuckDB and load Parquet data
- Start the web server on port 8080
- Be ready to answer your business questions!
This version has been specifically adapted for local development and removes all Google Cloud dependencies:
- No Cloud Setup Required: Runs entirely on your local machine
- DuckDB Instead of BigQuery: High-performance local SQL engine
- Local File Storage: No Firestore dependency for sessions
- Minimal Configuration: Only requires a Gemini API key
- Automatic Data Management: Downloads and manages sample data automatically
- Same Great Features: All the AI-powered analytics capabilities you expect
- Troubleshooting Guide: Comprehensive solutions for common issues
- Agent Development Kit Documentation: Learn more about the underlying framework
- DuckDB Documentation: Understanding the local SQL engine
- Gemini API Documentation: Working with the AI capabilities
This repository is licensed under the Apache 2.0 License - see the LICENSE file for details.
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.