Meulex

heb. 'meule' (מעולה) — 'excellent'

Compliance‑aware, Slack‑native agentic RAG copilot boilerplate

Executive Summary

Meulex is a pragmatic, production‑friendly boilerplate for building a compliance‑aware, Slack‑native agentic RAG copilot. It provides a minimal yet extensible stack that can move from PoC to production quickly. The template leans on proven building blocks and structure inspired by Sentio while remaining vendor‑neutral and plug‑in friendly.

Core goals:

Fast local bootstrap with Docker Compose (Qdrant, Redis; optional Prometheus, Langfuse).
Clean FastAPI service exposing /chat, /embed, /slack/events, /health*, /metrics.
LangGraph RAG pipeline: hybrid retrieval (dense + BM25 + optional RRF/rerank), LLM cascade with fallback and budgets, concise citation output.
Observability by default: OTel traces, Prometheus metrics, structured logs.
Security first: input validation, safe logging, rate limits, headers, PII masking hooks, RBAC‑ready.

Use this template to deliver a working Slack assistant in ≤5 days, then layer SSO, audit, and multi‑tenancy without architectural changes.

Features

Hybrid retrieval: dense (Qdrant) + sparse (BM25/Pyserini), RRF fusion.
Optional semantic reranking via cross‑encoder (Jina) behind a feature flag.
LLM cascade and budgets: primary provider (OpenAI‑compatible) with fallback (e.g., Ollama) + circuit breakers and retries.
Prefect ingestion flow: load → chunk → embed → upsert (idempotent, version‑ ready).
Slack integration: request signing verification, idempotency, ack‑first.
Observability: OTel spans per pipeline node; Prom metrics (business + infra).
Security baseline: validation, security headers, rate limiting, PII masking, safe logging; /metrics guard option.
Pluggable providers: LLM, embeddings, vector DB, rerankers via config only.

Architecture Overview

High‑Level Data Flow

flowchart LR
  Client-->API
  API-->Cache
  Cache-->|hit|Response
  Cache-->|miss|Retriever
  subgraph Retriever
    Dense[Qdrant Dense]
    Sparse[BM25 Sparse]
    RRF[Reciprocal Rank Fusion]
  end
  Retriever-->LLM
  LLM-->Response

System Architecture (Mermaid)

graph TD
    subgraph Client_Layer
        SLACK[Slack users]
        API_CLIENTS[API clients]
    end

    subgraph API_Layer
        ROUTES[Routes: /chat, /embed, /slack/events, /health, /metrics]
        RL[Rate limiting]
        VAL[Request validation]
        SEC[Security headers]
    end

    subgraph Processing_Layer
        RETR[Hybrid retrieval]
        RRF[Fusion RRF]
        RER[Reranker optional]
        PROMPT[Prompt builder]
        LLM[LLM cascade primary and fallback]
    end

    subgraph Data_Layer
        QDRANT[Qdrant vector DB]
        REDIS[Redis cache]
        EMBED[Jina embeddings]
    end

    subgraph Observability
        METRICS[Prometheus metrics]
        TRACING[OpenTelemetry tracing]
        LOGS[Structured logs]
    end

    SLACK --> ROUTES
    API_CLIENTS --> ROUTES
    ROUTES --> RL
    RL --> VAL
    VAL --> SEC
    SEC --> RETR
    RETR --> RRF
    RRF --> RER
    RER --> PROMPT
    PROMPT --> LLM
    RETR --> QDRANT
    RETR --> REDIS
    RETR --> EMBED
    ROUTES --> METRICS
    ROUTES --> TRACING
    ROUTES --> LOGS

RAG Pipeline (Detailed)

graph TD
    START[Query input] --> VALIDATE[Validate and sanitize]
    VALIDATE --> CACHE{Semantic cache hit?}
    CACHE -->|Yes| RETURN[Return cached answer]
    CACHE -->|No| EMBEDQ[Embed query]
    EMBEDQ --> HYBRID[Hybrid retrieval]
    subgraph Retrieval
        DENSE[Dense search Qdrant]
        SPARSE[Sparse search BM25]
        FUSE[RRF fusion]
    end
    HYBRID --> DENSE
    HYBRID --> SPARSE
    DENSE --> FUSE
    SPARSE --> FUSE
    FUSE --> RERANK{Reranker enabled?}
    RERANK -->|Yes| RER[Rerank Top K]
    RERANK -->|No| SELECT[Select Top K]
    RER --> SELECT
    SELECT --> PROMPT[Assemble context and prompt]
    PROMPT --> GEN[LLM cascade generate]
    GEN --> STORE[Store in cache]
    STORE --> METRICS[Update metrics]
    METRICS --> END[Return answer and citations]
    RETURN --> END

Core Components

API (FastAPI): routing, validation, rate limiting, security headers, metrics.
Orchestration (LangGraph): nodes {retrieve → (rerank?) → select → generate}.
Ingestion (Prefect): directory/file → chunk → embed (batch) → upsert.
Retrieval: hybrid retriever with domain scorers (keyword, recency, MMR).
LLM: provider registry + cascade + prompt builder + optional streaming.
Vector store: Qdrant adapter (sync for simplicity, async for scale later).
Embeddings: Jina provider with batching, retries and caching hooks.
Caching: multi‑tier L1/L2 (memory + Redis), semantic cache keys.
Observability: OTel tracing, Prometheus metrics, structured JSON logs.
Security: validation, headers, rate limits, PII masking, RBAC‑ready guards.

For deeper reference and code patterns, see langgraph/src/** in this repo. Borrow minimal code where helpful; keep Meulex modules lean and focused.

Quick Start

Prerequisites

Python 3.12+
Docker & Docker Compose
API keys (optional at start): OpenAI‑compatible, Jina; Slack signing secret

Configuration

cp .env.example .env
# Edit LLM/embeddings/vector settings as needed

Key environment variables (excerpt):

Name	Description
LLM_PROVIDER	e.g., `openai` (primary)
LLM_MODEL	e.g., `gpt-4o-mini`
LLM_BASE_URL	OpenAI‑compatible endpoint
LLM_API_KEY	API key for LLM provider
EMBEDDER_NAME	e.g., `jina`
JINA_API_KEY	API key for embeddings
VECTOR_STORE	`qdrant`
QDRANT_URL	e.g., `http://localhost:6333`
REDIS_URL	e.g., `redis://localhost:6379/0`
FEATURE_FLAGS	path to YAML with toggles
SLACK_SIGNING_SECRET	Slack verification secret

Run (Docker Compose)

docker compose up -d
uvicorn meulex.api.app:app --host 0.0.0.0 --port 8000 --reload

Verify:

curl -s http://localhost:8000/health | jq .

Alternative Quick Start (CLI)

# Install (editable) and start infra
pip install -e .
docker compose up -d

# Start API via CLI
meulex api --host 0.0.0.0 --port 8000

# Ingest content
meulex ingest_file ./test_data/doc1.md
meulex ingest_directory ./test_data/

Slack setup: see detailed step-by-step guide in docs/setup_slack.md.

Ingesting Documents

You can ingest files in three ways. Pick what fits your setup.

Via API (no container access required)

DATA_DIR=${DATA_DIR:-./test_data}
MEULEX_BASE_URL=${MEULEX_BASE_URL:-http://localhost:8000}
for f in "$DATA_DIR"/*; do \
  jq -Rs --arg id "$(basename "$f")" '{id:$id, content:., metadata:{source:$id}}' "$f" \
  | curl -sS -X POST "$MEULEX_BASE_URL/embed" \
    -H 'Content-Type: application/json' -d @-; echo; \
done

CLI from host (when Qdrant port is reachable)

export QDRANT_URL=${QDRANT_URL:-http://localhost:6333}
DATA_DIR=${DATA_DIR:-./test_data}
meulex ingest_directory "$DATA_DIR" --collection meulex_docs

Inside container (service meulex-api)

CID=$(docker compose ps -q meulex-api)
DATA_DIR=${DATA_DIR:-./test_data}
docker cp "$DATA_DIR" "$CID":/app/test_data
docker compose exec meulex-api sh -lc \
  "python -m meulex.cli.main ingest_directory /app/test_data --collection meulex_docs"

Verify ingestion and retrieval:

curl -s http://localhost:8000/health | jq .
curl -s -X POST http://localhost:8000/chat -H 'Content-Type: application/json' \
  -d '{"question":"What is Meulex?"}' | jq .

Screenshots

CLI (Typer)

python -m meulex.cli.main ingest directory ./docs --collection meulex_docs
python -m meulex.cli.main ingest file ./docs/intro.md
python -m meulex.cli.main api start --host 0.0.0.0 --port 8000

API Reference (excerpt)

Endpoints and default rate limits

Endpoint	Method	Description	Default limit*
`/health`	GET	Health check	none
`/info`	GET	Service information	60/min
`/metrics`	GET	Prometheus metrics	token‑guarded
`/embed`	POST	Document ingestion	20/min
`/chat`	POST	RAG chat completion	10/min
`/slack/events`	POST	Slack Events API	100/min

*Actual limits are configurable via settings.

POST /chat

Request:

{
  "question": "What is Meulex?",
  "history": [{"role":"user","content":"..."}],
  "top_k": 3,
  "temperature": 0.7
}

Response:

{
  "answer": "...",
  "sources": [
    {"text": "...", "source": "doc1.md", "score": 0.87}
  ],
  "metadata": {"processing_time": 0.93, "model_used": "gpt-4o-mini"}
}

POST /embed

Request:

{ "id": "doc-001", "content": "# Intro...", "metadata": {"source": "md"} }

Response: { "status": "ok", "count": 12, "ids": ["..."] }

GET /health, /health/ready, /health/live

Basic and dependency health summaries.

GET /metrics

Prometheus exposition; optionally token‑guarded.

POST /slack/events

Slack Events API endpoint; verifies signature/timestamp; idempotent handling.

Slack Events Flow (Sequence)

sequenceDiagram
    participant Slack
    participant API as FastAPI /slack/events
    participant AUTH as SignatureVerifier
    participant PROC as SlackEventProcessor
    participant CHAT as Chat Handler
    participant RET as Hybrid Retriever
    participant LLM as LLM Cascade
    participant WEB as Slack Web API

    Slack->>API: POST /slack/events (event)
    API->>AUTH: Verify X-Slack-Signature + Timestamp
    AUTH-->>API: OK
    API->>PROC: Parse & route event
    PROC->>CHAT: question
    CHAT->>RET: retrieve(query, top_k)
    RET-->>CHAT: documents
    CHAT->>LLM: generate(messages)
    LLM-->>CHAT: answer + citations
    API-->>Slack: 200 OK (ack within ≤3s)
    PROC->>WEB: chat.postMessage(response)
    WEB-->>PROC: ok

Ingestion Pipeline (Prefect)

Load directory/file → Chunk (recursive splitter; default 512/64) → Embed (batched) → Upsert to Qdrant (idempotent).
Batch sizes and concurrency tuned via config; retries with jitter.
Emits OTel spans and Prometheus metrics per stage.

Retrieval & Generation

Dense search with Qdrant; sparse BM25 (or Pyserini) optional.
Reciprocal Rank Fusion (RRF) merges scores; weights configurable.
Optional reranker (Jina) gated by feature flag with circuit breaker + retry.
Prompt builder assembles concise context with token budget enforcement.
Cascade: primary LLM (OpenAI‑compatible) → fallback (Ollama/mock) on timeout or breaker open; budgets tracked with metrics.

Observability

Metrics (examples):

rag_requests_total{endpoint,method,status,provider}
rag_request_duration_seconds_bucket
rag_cache_hits_total{type,result}
rag_embeddings_generated_total{model,provider}
rag_documents_retrieved_total{strategy}
llm_tokens_total{provider} / llm_cost_cents_total{provider}

Tracing:

Spans for API middleware, retriever/reranker/selector/generator nodes, and ingestion stages. Correlate with request IDs in logs.

Logging:

Structured JSON; secrets masked; include request_id, endpoint, duration.

Security Baseline

Input validation with Pydantic; payload size limits; strict types.
Security headers (CSP, X‑Content‑Type‑Options, Referrer‑Policy, etc.).
Rate limits per route; stricter on /chat and /slack/events.
Slack signing verification (timestamp drift window; constant‑time compare).
PII masking hooks for outputs and logs.
/metrics and destructive ops can be guarded by token or local‑only.

Next steps (production): OIDC/JWT, SCIM, audit sink, DLP enhancements, tenant isolation at storage and config layers.

Configuration Guide

Feature flags (YAML example):

retrieval:
  enable_hybrid: true
  enable_reranker: false
llm:
  enable_streaming: false
observability:
  tracing: true
  metrics: true

Provider selection via env (no code changes):

LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
EMBEDDER_NAME=jina
VECTOR_STORE=qdrant

Environment variables (extended)

# Core
LOG_LEVEL=INFO
ENVIRONMENT=development

# LLM
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
FALLBACK_LLM_PROVIDER=ollama
TEMPERATURE=0.7

# Embeddings
EMBEDDER_NAME=jina
JINA_API_KEY=jina_...

# Vector DB
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=

# Caching
ENABLE_CACHE=true
REDIS_URL=redis://localhost:6379

# Retrieval
ENABLE_SPARSE_RETRIEVAL=true
ENABLE_RERANKER=false
DEFAULT_TOP_K=3

# Security
ENABLE_RATE_LIMITING=true
ENABLE_SECURITY_HEADERS=true
ENABLE_LOG_SANITIZATION=true
METRICS_TOKEN=

# Slack
SLACK_BOT_TOKEN=xoxb-...
SLACK_SIGNING_SECRET=...
SLACK_BOT_USER_ID=U123...

Testing

Run tests:

pytest -q

What to expect:

Unit: cascade fallbacks, hybrid RRF math, embeddings batching, PII masking, error contracts.
Integration: ingest → chat round‑trip (≥1 citation), Slack signature, metrics presence, traces created (mock tracer).

Quality:

88‑char line length; ≥85% type hints on public APIs; Google‑style docstrings.

Evaluation (CLI)

# Human‑readable output
meulex eval --format pretty

# Save JSON results
meulex eval --output results.json

# Use real providers (requires API keys)
meulex eval --mock false

Project Structure (suggested)

meulex/
├── api/                     # FastAPI app and routes
├── orchestrator/
│   ├── graphs/              # LangGraph DAGs
│   └── flows/               # Prefect flows (ingestion)
├── core/
│   ├── embeddings/          # Base + provider factories
│   ├── retrievers/          # Dense/Sparse/Hybrid + scorers
│   ├── rerankers/           # Optional rerankers
│   ├── vector/              # Qdrant adapters
│   └── caching/             # Cache manager, memory/Redis
├── llm/                     # Providers and cascade
├── observability/           # OTel + Prometheus
├── security/                # PII, policies, guards
├── config/                  # Pydantic settings + flags
├── cli/                     # Typer CLI entry points
└── tests/                   # Unit/Integration tests

Deployment

Docker

# Build image
docker build -t meulex:latest .

# Run with compose
docker compose up -d

# Scale API
docker compose up -d --scale api=3

Kubernetes (example)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: meulex-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: meulex-api
  template:
    metadata:
      labels:
        app: meulex-api
    spec:
      containers:
      - name: meulex
        image: meulex:latest
        ports:
        - containerPort: 8000
        env:
        - name: REDIS_URL
          value: "redis://redis-service:6379"
        - name: QDRANT_URL
          value: "http://qdrant-service:6333"

Production checklist

Configure real API keys (OpenAI/Jina)
Redis for distributed cache
Prometheus scraping and dashboards
Log aggregation (ELK/Datadog)
Reverse proxy + TLS
Vector data backup strategy
Alerts for critical metrics

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Update documentation
Open a pull request

Additional Resources

LangGraph, LangChain, Qdrant, FastAPI (see their docs)
Local links when running:
- API Docs: http://localhost:8000/docs
- Metrics: http://localhost:8000/metrics
- Health: http://localhost:8000/health

Roadmap (v2+)

SSO (OIDC), SCIM; org/tenant isolation and quotas.
Action tools via MCP (Jira/Zendesk/Salesforce) with policy dry‑run and audit.
Vector store adapters for Weaviate/OpenSearch; ranking plugins.
Hallucination/citation verifiers; regression sets and dashboards (Langfuse).
K8s manifests and GitOps overlays; autoscaling with custom metrics.

Acknowledgements

Structure and patterns inspired by Sentio.
Thanks to OpenAI/Ollama/Jina/Qdrant communities for high‑quality tooling.

License

Creative Commons Attribution‑NonCommercial 4.0 International (CC BY‑NC 4.0). See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
config		config
docs		docs
meulex		meulex
monitoring		monitoring
test_data		test_data
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
configure.sh		configure.sh
deploy.sh		deploy.sh
docker-compose.infrastructure.yml		docker-compose.infrastructure.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

chernistry/meulex

Folders and files

Latest commit

History

Repository files navigation

Meulex

Executive Summary

Features

Architecture Overview

High‑Level Data Flow

System Architecture (Mermaid)

RAG Pipeline (Detailed)

Core Components

Quick Start

Prerequisites

Configuration

Run (Docker Compose)

Alternative Quick Start (CLI)

Ingesting Documents

Screenshots

CLI (Typer)

API Reference (excerpt)

Endpoints and default rate limits

POST /chat

POST /embed

GET /health, /health/ready, /health/live

GET /metrics

POST /slack/events

Slack Events Flow (Sequence)

Ingestion Pipeline (Prefect)

Retrieval & Generation

Observability

Security Baseline

Configuration Guide

Environment variables (extended)

Testing

Evaluation (CLI)

Project Structure (suggested)

Deployment

Docker

Kubernetes (example)

Production checklist

Contributing

Additional Resources

Roadmap (v2+)

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages