Skip to content

Commit 7dbc1cd

Browse files
committed
feat: enhance architecture documentation with detailed indexing and query processing pipelines
1 parent e2cdc06 commit 7dbc1cd

File tree

1 file changed

+27
-11
lines changed

1 file changed

+27
-11
lines changed

backend/docs/architecture.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,33 @@ config:
1010
theme: base
1111
---
1212
flowchart TB
13-
U["User uploads document"] --> R["Rasterize to page images"]
14-
R --> E["ColPali embeds pages"] & O["OCR + region detection"]
15-
E --> Qdrant[("Qdrant: image/page vectors")]
16-
O --> Duck[("DuckDB: regions with text/tables/images")]
17-
Q["User question"] --> QE["ColPali query embedding"]
18-
QE --> Qdrant
19-
Qdrant --> K["Top-K image/page IDs"]
20-
K --> JR["Lookup regions by IDs in DuckDB"]
21-
JR --> LLM["LLM over text + table + image region"] & Duck
22-
LLM --> A["Answer to user"]
23-
Duck --> LLM
13+
subgraph Indexing["📄 Document Indexing Pipeline"]
14+
U["User uploads document"] --> R["Rasterize to page images"]
15+
R --> E["ColPali embeds pages"] & O["OCR + region detection"]
16+
E --> Qdrant[("Qdrant<br/>image/page vectors")]
17+
O --> Duck[("DuckDB<br/>regions with text/tables/images")]
18+
end
19+
20+
subgraph Query["🔍 Query Processing Pipeline"]
21+
Q["User question"] --> QE["ColPali query embedding"]
22+
QE --> Qdrant
23+
Qdrant --> K["Top-K image/page IDs"]
24+
K --> Duck
25+
Duck --> IM["Generate Interpretability Maps<br/>(patch-level attention)"]
26+
IM --> RF["Region-Level Filtering<br/>(score OCR regions by relevance)"]
27+
RF --> LLM["LLM over relevant regions<br/>(text + tables + images)"]
28+
LLM --> A["Answer with spatial context"]
29+
end
30+
31+
classDef userNode fill:#e1f5ff,stroke:#0288d1,stroke-width:2px
32+
classDef processNode fill:#fff9e6,stroke:#f9a825,stroke-width:2px
33+
classDef storageNode fill:#e8f5e9,stroke:#43a047,stroke-width:2px
34+
classDef resultNode fill:#f3e5f5,stroke:#8e24aa,stroke-width:2px
35+
36+
class U,Q userNode
37+
class R,E,O,QE,IM,RF processNode
38+
class Qdrant,Duck storageNode
39+
class K,LLM,A resultNode
2440
```
2541

2642
## Core Retrieval Paradigm

0 commit comments

Comments
 (0)