Vanna + ClickHouse with Row-Level Access Control

This example demonstrates how to connect Vanna to ClickHouse with row-level access control (RBAC). Users can ask natural language questions about their data, and Vanna automatically filters results based on their permissions.

Why Vanna over Native ClickHouse SQL Generation?

ClickHouse offers native text-to-SQL capabilities, but Vanna provides significant advantages for production use cases:

Capability	ClickHouse Native	Vanna
Access	CLI only (technical users)	Web UI, API, Chat interface
Query Strategy	Single-shot LLM call	Multi-turn reasoning with tool use
Consistency	Variable results per run	Learns from corrections, improves over time
Complex Questions	One query at a time	Runs multiple queries, synthesizes insights
RBAC	Manual implementation	Built-in user context and filtering

Example: "How can we improve taxi service?"

Native approach: Generates one SQL query, returns raw data. User must interpret.

Vanna approach:

Runs multiple queries (revenue by hour, trip patterns, popular routes)
Analyzes results across queries
Synthesizes actionable recommendations
Presents findings with visualizations

How RBAC Works

User Request                    Vanna Processing                     ClickHouse
─────────────────────────────────────────────────────────────────────────────────

[email protected]    →    UserResolver looks up           →    Query executes
asks "Show top pickups"       user's allowed neighborhoods         with filter:
                              (Midtown, Upper East Side, ...)      WHERE pickup_ntaname
                                                                   IN ('Midtown-...', ...)
                              ↓
                              SqlRunner injects
                              WHERE clause automatically

Users ask questions naturally - they never see the filter. They only receive data from their permitted areas.

Quick Start

1. Clone and Install

git clone https://github.com/vanna-ai/clickhouse.git
cd clickhouse
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env with your credentials

3. Run Server

python server.py

Access at http://localhost:8000

4. Test with Different Users

# Admin - sees all data (3M+ trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

# Manhattan user - sees only Manhattan (~2.7M trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

# Brooklyn user - sees only Brooklyn (~47K trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
  -H "Content-Type: application/json" \
  -d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Client Request                          │
│  POST /api/vanna/v2/chat_sse                                    │
│  {"message": "...", "metadata": {"user_email": "user@..."}}     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        UserResolver                              │
│  - Extracts user email from request                             │
│  - Looks up allowed neighborhoods for user                      │
│  - Returns User object with metadata                            │
│                                                                 │
│  Production: Validate JWT/OAuth token from identity provider    │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Vanna Agent                               │
│  - Receives user question                                       │
│  - Generates SQL using LLM                                      │
│  - Can run multiple queries for complex questions               │
│  - Synthesizes insights from results                            │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ClickHouseSqlRunner                           │
│  - Receives SQL + ToolContext (contains User)                   │
│  - Checks user's allowed neighborhoods                          │
│  - Injects WHERE clause: pickup_ntaname IN (...)                │
│  - Executes modified SQL against ClickHouse                     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        ClickHouse                                │
│  - Executes filtered query                                      │
│  - Returns only rows matching user's permissions                │
└─────────────────────────────────────────────────────────────────┘

Customizing RBAC

Define Your Access Rules

Edit the neighborhood lists and user mapping in server.py:

# Define what data each role can access
SALES_TEAM_REGIONS = ["Northeast", "Southeast", ...]
MARKETING_REGIONS = ["West Coast", "Midwest", ...]

# Map users to their allowed data
USER_REGIONS = {
    "[email protected]": None,  # None = full access
    "[email protected]": SALES_TEAM_REGIONS,
    "[email protected]": MARKETING_REGIONS,
    "[email protected]": [],  # Empty = no access
}

Modify the Filter Column

Update _inject_neighborhood_filter() to filter on your column:

def _inject_neighborhood_filter(self, sql: str, regions: list) -> str:
    # Change 'pickup_ntaname' to your column
    filter_clause = f"(region IN ({regions_list}))"
    ...

Production Considerations

Authentication

This PoC reads user email from the request body for simplicity. In production, never trust user-provided identity. Instead:

class ProductionUserResolver(UserResolver):
    async def resolve_user(self, request_context: RequestContext) -> User:
        # Get token from Authorization header
        token = request_context.get_header('Authorization').replace('Bearer ', '')

        # Validate with your identity provider (Okta, Auth0, Azure AD, etc.)
        claims = jwt.decode(token, SECRET_KEY, algorithms=['HS256'])

        # Email is now verified by your IdP
        user_email = claims['email']

        # Look up permissions (from database, IdP claims, etc.)
        allowed_regions = db.get_user_regions(user_email)

        return User(
            id=user_email,
            email=user_email,
            metadata={'regions': allowed_regions}
        )

Permission Storage

Move the user-to-region mapping from code to a database:

# Instead of hardcoded dict:
USER_REGIONS = {"[email protected]": ["Region1", ...]}

# Query from database:
allowed_regions = db.query(
    "SELECT region FROM user_permissions WHERE email = ?",
    user_email
)

Audit Logging

Add logging to track who accessed what:

async def run_sql(self, args: RunSqlToolArgs, context: ToolContext) -> pd.DataFrame:
    logger.info(f"User {context.user.email} executing: {args.sql}")
    logger.info(f"Applied filter: {context.user.metadata.get('regions')}")
    ...

Files

File	Description
`server.py`	Main Vanna server with ClickHouse connection and RBAC
`.env.example`	Template for environment variables
`requirements.txt`	Python dependencies
`.gitignore`	Prevents committing secrets

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vanna + ClickHouse with Row-Level Access Control

Why Vanna over Native ClickHouse SQL Generation?

Example: "How can we improve taxi service?"

How RBAC Works

Quick Start

1. Clone and Install

2. Configure Environment

3. Run Server

4. Test with Different Users

Architecture

Customizing RBAC

Define Your Access Rules

Modify the Filter Column

Production Considerations

Authentication

Permission Storage

Audit Logging

Files

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py

vanna-ai/clickhouse

Folders and files

Latest commit

History

Repository files navigation

Vanna + ClickHouse with Row-Level Access Control

Why Vanna over Native ClickHouse SQL Generation?

Example: "How can we improve taxi service?"

How RBAC Works

Quick Start

1. Clone and Install

2. Configure Environment

3. Run Server

4. Test with Different Users

Architecture

Customizing RBAC

Define Your Access Rules

Modify the Filter Column

Production Considerations

Authentication

Permission Storage

Audit Logging

Files

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages