Snappy - Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Read the Full Paper on arxiv.org: Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Snappy implements region-level document retrieval by unifying vision-language models with OCR through spatial coordinate mapping. Unlike traditional systems that return entire pages (VLMs) or lack semantic grounding (OCR-only), Snappy uses ColPali's patch-level similarity scores as spatial relevance filters over OCR-extracted regions; operating entirely at inference time without additional training.

Motivation

Vision-language models like ColPali achieve state-of-the-art document retrieval by embedding pages as images with fine-grained patch representations. However, they return entire pages as retrieval units, introducing irrelevant content into RAG context windows. Conversely, OCR systems extract structured text with bounding boxes but cannot assess which regions are relevant to a query.

Snappy bridges these paradigms through patch-to-region relevance propagation. The approach formalizes coordinate mapping between vision transformer patch grids (32×32) and OCR bounding boxes, repurposing ColPali's late interaction mechanism to generate interpretability maps. Patch similarity scores propagate to OCR regions via IoU-weighted intersection, enabling two-stage retrieval: efficient candidate retrieval using mean-pooled embeddings, followed by full-resolution region reranking.

This yields region-level granularity (return specific paragraphs, tables, or figures instead of entire pages), operates purely at inference time (no additional training), provides spatial interpretability (visual heatmaps showing which document regions match query tokens), and combines VLM semantic understanding with OCR structural precision in a production-ready system.

Demo 🎬

Snappy.Demo.mp4

Quick start

Prerequisites: Docker with Compose, Make (or use the equivalent docker compose commands below).

1. Copy envs and set the essentials

cp .env.example .env

Set OPENAI_API_KEY (required for chat). Other defaults are ready for local use.

2. Choose a profile

Minimal (ColPali only; works on CPU or GPU): make up-minimal
ML (adds DeepSeek OCR; needs NVIDIA GPU): make up-ml
Full (adds DuckDB analytics and deduplication): make up-full

If you prefer Compose directly: docker compose --profile minimal|ml|full up -d.

3. Open the UI

Frontend: http://localhost:3000
Backend: http://localhost:8000/docs (OpenAPI)
DuckDB UI (full profile): http://localhost:42130

Architecture

Configuration

Feature	When to enable	How
DeepSeek OCR	Need extracted text, markdown, or bounding boxes alongside visual retrieval; have an NVIDIA GPU.	Set `DEEPSEEK_OCR_ENABLED=true` and run `make up-ml` or profile `ml`.
DuckDB analytics	Want deduplication, inline OCR results from the backend, or SQL over OCR regions.	Set `DUCKDB_ENABLED=true` and run `make up-full` or profile `full`.
Mean pooling re-ranking	Improve search accuracy with two-stage retrieval (prefetch + re-rank). More accurate but requires more compute.	Set `QDRANT_MEAN_POOLING_ENABLED=true` in `.env`. Requires ColPali model with `/patches` support (enabled in `colmodernvbert`).
Interpretability maps	Visualize which document regions contribute to query matches. Useful for understanding and debugging retrieval behavior.	Available in the lightbox after search. Upload a document image and query to see token-level similarity heatmaps at `/api/interpretability`.
Region-level retrieval	Filter OCR regions by query relevance, reducing noise and improving precision. Uses interpretability maps to return only relevant regions.	Set `ENABLE_REGION_LEVEL_RETRIEVAL=true` in Configuration UI or `.env`. Adjust `REGION_RELEVANCE_THRESHOLD` (default 0.3) to control filtering sensitivity.
Binary quantization	Large collections and tight RAM/GPU budget (32x memory reduction).	Enabled by default. Toggle in `.env` if needed.

Troubleshooting

Progress stuck on upload/indexing: ensure Poppler is installed for PDF rasterization and check backend logs.
Missing images: confirm MinIO credentials/URLs and allowed domains in frontend/next.config.ts.
OCR not running: DEEPSEEK_OCR_ENABLED=true, GPU profile running, and /ocr/health reachable.
Config not sticking: /config/update is runtime-only; edit .env for persistence.

Documentation

Core concepts — Late Interaction explains multi-vector retrieval, MaxSim scoring, and two-stage search. Spatial Grounding covers how spatial information flows from pixels to regions. Analysis discusses when to use vision vs text RAG.

System internals — Streaming Pipeline details how the indexer overlaps stages. Architecture provides deeper component and flow descriptions. Configuration is the full config reference.

Development — Service-specific guides are in frontend/README.md and backend/README.md. See CONTRIBUTING.md for contribution guidelines.

Star History

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 751 Commits
.github		.github
.vscode		.vscode
assets		assets
backend		backend
colpali		colpali
deepseek-ocr		deepseek-ocr
docker		docker
docs		docs
duckdb		duckdb
frontend		frontend
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.eslintignore		.eslintignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
STREAMING_PIPELINE.md		STREAMING_PIPELINE.md
VERSIONING.md		VERSIONING.md
docker-compose.yml		docker-compose.yml
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snappy - Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Motivation

Demo 🎬

Quick start

Architecture

Configuration

Troubleshooting

Documentation

Star History

License

About

Uh oh!

Releases 18

Packages

Uh oh!

Uh oh!

Contributors 3

Languages

License

athrael-soju/Snappy

Folders and files

Latest commit

History

Repository files navigation

Snappy - Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation

Motivation

Demo 🎬

Quick start

Architecture

Configuration

Troubleshooting

Documentation

Star History

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Uh oh!

Contributors 3

Languages

Packages