Skip to content

izhangzhihao/agentic-data-team

Β 
Β 

Repository files navigation

CRM Data Q&A Agent - Local Parquet-based Analytics

This is a πŸ“Š Data Analytics Agent that analyzes Salesforce data from local Parquet files using DuckDB.

The agent demonstrates an advanced Retrieval-Augmented Generation workflow in a multi-agentic system with contextualized Natural-Language-to-SQL components powered by Long Context and In-Context Learning capabilities of Gemini 2.5 Pro.

πŸš€ Blog post: Forget vibe coding, vibe Business Intelligence is here!

The agent is built with Agent Development Kit and runs completely locally using:

  • DuckDB for high-performance SQL queries on Parquet files
  • Local file storage for session management
  • Gemini API for natural language processing (only cloud dependency)
  • The agent interprets questions about state of the business how it's reflected in CRM rather than directly referring to Salesforce data entities.
  • It generates SQL query to gather data necessary for answering the question using DuckDB
  • It creates interactive Vega-Lite diagrams.
  • It analyzes the results, provides key insights and recommended actions.
What are our best lead source in every country?

Agent Development Kit

The agent is built using Agent Development Kit (ADK) - a flexible and modular framework for developing and deploying AI agents.

The sample also demonstrates:

πŸ•΅πŸ»β€β™€οΈ Simple questions are complex

Top 5 customers by impact in the US this year

Examples of questions the agent can answer

  • "Top 5 customers in every country"
  • "What are our best lead sources?"
    • or more specific "What are our best lead sources by value?"
  • Lead conversion trends in the US.

High-Level Design

Top 5 customers in every country

πŸš€ Local Installation and Setup

This agent runs completely locally with minimal dependencies. No Google Cloud project or billing account required!

Prerequisites

  • Python 3.11 or higher
  • uv (recommended) or pip for package management
  • Gemini API key (free tier available)

Quick Start

  1. Clone this repository:
git clone https://github.com/vladkol/crm-data-agent && cd crm-data-agent
  1. Create a Python virtual environment:
# Using uv (recommended)
uv venv .venv --python 3.11 && source .venv/bin/activate

# Or using standard Python
python -m venv .venv && source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
# Using uv
uv pip install -r src/requirements.txt

# Or using pip
pip install -r src/requirements.txt
  1. Set up configuration:

Create a .env file in the src directory using the template:

cp src/.env-template src/.env

Edit src/.env and add your Gemini API key (see Configuration section below).

  1. Run the application:
./run_local.sh
  1. Open your browser:

Navigate to http://localhost:8080 to start using the agent.

Configuration

The local setup requires minimal configuration. Create a src/.env file with the following variables:

Required Configuration

GEMINI_API_KEY - [REQUIRED] Your Gemini API key. Get one free at Google AI Studio.

Optional Configuration

LOCAL_DATA_DIR - [OPTIONAL] Directory for Parquet data files (default: sample-data)

SESSION_STORAGE_DIR - [OPTIONAL] Directory for session storage (default: .sessions)

SFDC_METADATA_FILE - [OPTIONAL] Salesforce metadata file (default: sfdc_metadata.json)

Example .env file

# Required: Get your free API key from https://aistudio.google.com/apikey
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Customize data and session directories
LOCAL_DATA_DIR=sample-data
SESSION_STORAGE_DIR=.sessions
SFDC_METADATA_FILE=sfdc_metadata.json

Data Setup

The application automatically manages sample data for you:

  1. Automatic Download: On first run, the application will automatically download sample Parquet files to the sample-data directory.

  2. Custom Data: You can add your own Parquet files to the sample-data directory. The application will automatically detect and load them.

  3. Data Structure: The sample data includes typical Salesforce objects:

    • Account, Contact, Lead, Opportunity
    • Case, Task, Event
    • User, RecordType, CurrencyType
    • Historical data (CaseHistory, OpportunityHistory)

πŸ”§ Troubleshooting

Common Issues and Solutions

1. Application Won't Start

Problem: Error when running ./run_local.sh

Solutions:

  • Ensure you have Python 3.11+ installed: python --version
  • Verify virtual environment is activated: which python should point to .venv
  • Check if all dependencies are installed: uv pip list or pip list
  • Make sure the script is executable: chmod +x run_local.sh

2. Missing Gemini API Key

Problem: "API key not found" or authentication errors

Solutions:

  • Verify your .env file exists in the src directory
  • Check that GEMINI_API_KEY is set in your .env file
  • Get a free API key from Google AI Studio
  • Ensure there are no extra spaces or quotes around the API key

3. Data Loading Issues

Problem: "No data found" or "Table not found" errors

Solutions:

  • Check if sample-data directory exists and contains .parquet files
  • Run the data deployment script manually: python utils/deploy_demo_data.py
  • Verify file permissions: ensure the application can read the sample-data directory
  • Check disk space: ensure you have enough space for the sample data (~50MB)

4. Port Already in Use

Problem: "Port 8080 is already in use"

Solutions:

  • Kill existing processes: lsof -ti:8080 | xargs kill -9
  • Use a different port by modifying the startup script
  • Check for other applications using port 8080

5. DuckDB Connection Issues

Problem: Database connection or SQL execution errors

Solutions:

  • Restart the application to reinitialize DuckDB
  • Check if Parquet files are corrupted by opening them with another tool
  • Verify Python has write permissions to create temporary DuckDB files
  • Clear any cached DuckDB files and restart

6. Memory Issues

Problem: Application crashes or runs slowly with large datasets

Solutions:

  • Increase available memory for the Python process
  • Use smaller datasets for testing
  • Monitor memory usage with top or htop
  • Consider using DuckDB's memory limit settings

7. Web Interface Issues

Problem: Blank page or JavaScript errors in browser

Solutions:

  • Clear browser cache and cookies
  • Try a different browser or incognito mode
  • Check browser console for JavaScript errors
  • Ensure the FastAPI server is running properly

Getting Help

If you encounter issues not covered here:

  1. Check the Detailed Troubleshooting Guide for comprehensive solutions
  2. Check the application logs for detailed error messages
  3. Verify your environment matches the prerequisites
  4. Try running with a fresh virtual environment
  5. Check the GitHub Issues for similar problems

Performance Tips

  • First Run: Initial startup may be slower as data is downloaded and indexed
  • Query Performance: Complex queries on large datasets may take a few seconds
  • Memory Usage: DuckDB is memory-efficient but performance improves with more RAM
  • Data Size: Keep individual Parquet files under 1GB for optimal performance

πŸƒβ€β™‚οΈ Running the Application

Once everything is set up:

# Start the application
./run_local.sh

# Open in your browser
open http://localhost:8080

The application will:

  1. Initialize DuckDB and load Parquet data
  2. Start the web server on port 8080
  3. Be ready to answer your business questions!

🎯 What's Different in This Local Version

This version has been specifically adapted for local development and removes all Google Cloud dependencies:

  • No Cloud Setup Required: Runs entirely on your local machine
  • DuckDB Instead of BigQuery: High-performance local SQL engine
  • Local File Storage: No Firestore dependency for sessions
  • Minimal Configuration: Only requires a Gemini API key
  • Automatic Data Management: Downloads and manages sample data automatically
  • Same Great Features: All the AI-powered analytics capabilities you expect

πŸ“š Additional Resources

πŸ“ƒ License

This repository is licensed under the Apache 2.0 License - see the LICENSE file for details.

πŸ—’οΈ Disclaimers

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.

About

Your agentic data team

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Other 0.7%