Skip to content

vanna-ai/clickhouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vanna + ClickHouse with Row-Level Access Control

This example demonstrates how to connect Vanna to ClickHouse with row-level access control (RBAC). Users can ask natural language questions about their data, and Vanna automatically filters results based on their permissions.

Why Vanna over Native ClickHouse SQL Generation?

ClickHouse offers native text-to-SQL capabilities, but Vanna provides significant advantages for production use cases:

Capability ClickHouse Native Vanna
Access CLI only (technical users) Web UI, API, Chat interface
Query Strategy Single-shot LLM call Multi-turn reasoning with tool use
Consistency Variable results per run Learns from corrections, improves over time
Complex Questions One query at a time Runs multiple queries, synthesizes insights
RBAC Manual implementation Built-in user context and filtering

Example: "How can we improve taxi service?"

Native approach: Generates one SQL query, returns raw data. User must interpret.

Vanna approach:

  1. Runs multiple queries (revenue by hour, trip patterns, popular routes)
  2. Analyzes results across queries
  3. Synthesizes actionable recommendations
  4. Presents findings with visualizations

How RBAC Works

User Request                    Vanna Processing                     ClickHouse
─────────────────────────────────────────────────────────────────────────────────

[email protected]    →    UserResolver looks up           →    Query executes
asks "Show top pickups"       user's allowed neighborhoods         with filter:
                              (Midtown, Upper East Side, ...)      WHERE pickup_ntaname
                                                                   IN ('Midtown-...', ...)
                              ↓
                              SqlRunner injects
                              WHERE clause automatically

Users ask questions naturally - they never see the filter. They only receive data from their permitted areas.

Quick Start

1. Clone and Install

git clone https://github.com/vanna-ai/clickhouse.git
cd clickhouse
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env with your credentials

3. Run Server

python server.py

Access at http://localhost:8000

4. Test with Different Users

# Admin - sees all data (3M+ trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

# Manhattan user - sees only Manhattan (~2.7M trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

# Brooklyn user - sees only Brooklyn (~47K trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Client Request                          │
│  POST /api/vanna/v2/chat_sse                                    │
│  {"message": "...", "metadata": {"user_email": "user@..."}}     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        UserResolver                              │
│  - Extracts user email from request                             │
│  - Looks up allowed neighborhoods for user                      │
│  - Returns User object with metadata                            │
│                                                                 │
│  Production: Validate JWT/OAuth token from identity provider    │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Vanna Agent                               │
│  - Receives user question                                       │
│  - Generates SQL using LLM                                      │
│  - Can run multiple queries for complex questions               │
│  - Synthesizes insights from results                            │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ClickHouseSqlRunner                           │
│  - Receives SQL + ToolContext (contains User)                   │
│  - Checks user's allowed neighborhoods                          │
│  - Injects WHERE clause: pickup_ntaname IN (...)                │
│  - Executes modified SQL against ClickHouse                     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        ClickHouse                                │
│  - Executes filtered query                                      │
│  - Returns only rows matching user's permissions                │
└─────────────────────────────────────────────────────────────────┘

Customizing RBAC

Define Your Access Rules

Edit the neighborhood lists and user mapping in server.py:

# Define what data each role can access
SALES_TEAM_REGIONS = ["Northeast", "Southeast", ...]
MARKETING_REGIONS = ["West Coast", "Midwest", ...]

# Map users to their allowed data
USER_REGIONS = {
    "[email protected]": None,  # None = full access
    "[email protected]": SALES_TEAM_REGIONS,
    "[email protected]": MARKETING_REGIONS,
    "[email protected]": [],  # Empty = no access
}

Modify the Filter Column

Update _inject_neighborhood_filter() to filter on your column:

def _inject_neighborhood_filter(self, sql: str, regions: list) -> str:
    # Change 'pickup_ntaname' to your column
    filter_clause = f"(region IN ({regions_list}))"
    ...

Production Considerations

Authentication

This PoC reads user email from the request body for simplicity. In production, never trust user-provided identity. Instead:

class ProductionUserResolver(UserResolver):
    async def resolve_user(self, request_context: RequestContext) -> User:
        # Get token from Authorization header
        token = request_context.get_header('Authorization').replace('Bearer ', '')

        # Validate with your identity provider (Okta, Auth0, Azure AD, etc.)
        claims = jwt.decode(token, SECRET_KEY, algorithms=['HS256'])

        # Email is now verified by your IdP
        user_email = claims['email']

        # Look up permissions (from database, IdP claims, etc.)
        allowed_regions = db.get_user_regions(user_email)

        return User(
            id=user_email,
            email=user_email,
            metadata={'regions': allowed_regions}
        )

Permission Storage

Move the user-to-region mapping from code to a database:

# Instead of hardcoded dict:
USER_REGIONS = {"[email protected]": ["Region1", ...]}

# Query from database:
allowed_regions = db.query(
    "SELECT region FROM user_permissions WHERE email = ?",
    user_email
)

Audit Logging

Add logging to track who accessed what:

async def run_sql(self, args: RunSqlToolArgs, context: ToolContext) -> pd.DataFrame:
    logger.info(f"User {context.user.email} executing: {args.sql}")
    logger.info(f"Applied filter: {context.user.metadata.get('regions')}")
    ...

Files

File Description
server.py Main Vanna server with ClickHouse connection and RBAC
.env.example Template for environment variables
requirements.txt Python dependencies
.gitignore Prevents committing secrets

License

MIT

About

Connect Vanna to Clickhouse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages