This example demonstrates how to connect Vanna to ClickHouse with row-level access control (RBAC). Users can ask natural language questions about their data, and Vanna automatically filters results based on their permissions.
ClickHouse offers native text-to-SQL capabilities, but Vanna provides significant advantages for production use cases:
| Capability | ClickHouse Native | Vanna |
|---|---|---|
| Access | CLI only (technical users) | Web UI, API, Chat interface |
| Query Strategy | Single-shot LLM call | Multi-turn reasoning with tool use |
| Consistency | Variable results per run | Learns from corrections, improves over time |
| Complex Questions | One query at a time | Runs multiple queries, synthesizes insights |
| RBAC | Manual implementation | Built-in user context and filtering |
Native approach: Generates one SQL query, returns raw data. User must interpret.
Vanna approach:
- Runs multiple queries (revenue by hour, trip patterns, popular routes)
- Analyzes results across queries
- Synthesizes actionable recommendations
- Presents findings with visualizations
User Request Vanna Processing ClickHouse
─────────────────────────────────────────────────────────────────────────────────
[email protected] → UserResolver looks up → Query executes
asks "Show top pickups" user's allowed neighborhoods with filter:
(Midtown, Upper East Side, ...) WHERE pickup_ntaname
IN ('Midtown-...', ...)
↓
SqlRunner injects
WHERE clause automatically
Users ask questions naturally - they never see the filter. They only receive data from their permitted areas.
git clone https://github.com/vanna-ai/clickhouse.git
cd clickhouse
pip install -r requirements.txtcp .env.example .env
# Edit .env with your credentialspython server.pyAccess at http://localhost:8000
# Admin - sees all data (3M+ trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
-H "Content-Type: application/json" \
-d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'
# Manhattan user - sees only Manhattan (~2.7M trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
-H "Content-Type: application/json" \
-d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'
# Brooklyn user - sees only Brooklyn (~47K trips)
curl -X POST 'http://localhost:8000/api/vanna/v2/chat_sse' \
-H "Content-Type: application/json" \
-d '{"message": "How many trips total?", "metadata": {"user_email": "[email protected]"}}'┌─────────────────────────────────────────────────────────────────┐
│ Client Request │
│ POST /api/vanna/v2/chat_sse │
│ {"message": "...", "metadata": {"user_email": "user@..."}} │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ UserResolver │
│ - Extracts user email from request │
│ - Looks up allowed neighborhoods for user │
│ - Returns User object with metadata │
│ │
│ Production: Validate JWT/OAuth token from identity provider │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Vanna Agent │
│ - Receives user question │
│ - Generates SQL using LLM │
│ - Can run multiple queries for complex questions │
│ - Synthesizes insights from results │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ClickHouseSqlRunner │
│ - Receives SQL + ToolContext (contains User) │
│ - Checks user's allowed neighborhoods │
│ - Injects WHERE clause: pickup_ntaname IN (...) │
│ - Executes modified SQL against ClickHouse │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ClickHouse │
│ - Executes filtered query │
│ - Returns only rows matching user's permissions │
└─────────────────────────────────────────────────────────────────┘
Edit the neighborhood lists and user mapping in server.py:
# Define what data each role can access
SALES_TEAM_REGIONS = ["Northeast", "Southeast", ...]
MARKETING_REGIONS = ["West Coast", "Midwest", ...]
# Map users to their allowed data
USER_REGIONS = {
"[email protected]": None, # None = full access
"[email protected]": SALES_TEAM_REGIONS,
"[email protected]": MARKETING_REGIONS,
"[email protected]": [], # Empty = no access
}Update _inject_neighborhood_filter() to filter on your column:
def _inject_neighborhood_filter(self, sql: str, regions: list) -> str:
# Change 'pickup_ntaname' to your column
filter_clause = f"(region IN ({regions_list}))"
...This PoC reads user email from the request body for simplicity. In production, never trust user-provided identity. Instead:
class ProductionUserResolver(UserResolver):
async def resolve_user(self, request_context: RequestContext) -> User:
# Get token from Authorization header
token = request_context.get_header('Authorization').replace('Bearer ', '')
# Validate with your identity provider (Okta, Auth0, Azure AD, etc.)
claims = jwt.decode(token, SECRET_KEY, algorithms=['HS256'])
# Email is now verified by your IdP
user_email = claims['email']
# Look up permissions (from database, IdP claims, etc.)
allowed_regions = db.get_user_regions(user_email)
return User(
id=user_email,
email=user_email,
metadata={'regions': allowed_regions}
)Move the user-to-region mapping from code to a database:
# Instead of hardcoded dict:
USER_REGIONS = {"[email protected]": ["Region1", ...]}
# Query from database:
allowed_regions = db.query(
"SELECT region FROM user_permissions WHERE email = ?",
user_email
)Add logging to track who accessed what:
async def run_sql(self, args: RunSqlToolArgs, context: ToolContext) -> pd.DataFrame:
logger.info(f"User {context.user.email} executing: {args.sql}")
logger.info(f"Applied filter: {context.user.metadata.get('regions')}")
...| File | Description |
|---|---|
server.py |
Main Vanna server with ClickHouse connection and RBAC |
.env.example |
Template for environment variables |
requirements.txt |
Python dependencies |
.gitignore |
Prevents committing secrets |
MIT