Skip to content

While logs show a successful document ingestion with ColPali the admin ui keeps on reporting status 'processing' #252

@usrflo

Description

@usrflo

I uploaded a PDF document and according to the logs the ingestion has been processed with ColPali.
But the document status in the admin ui keeps showing as "processing"; without ColPali it soon switched to processed.
A reconnect and reloading the admin ui doesn't change the situation.

morphik-app: /app/logs/morphik.log
2025-08-16 06:59:10 - core.embedding.colpali_embedding_model - INFO - Colpali running in mode: self_hosted with batch size: 1
2025-08-16 06:59:10 - core.embedding.colpali_embedding_model - INFO - Colpali initialization time: 43.52 seconds
2025-08-16 06:59:10 - core.vector_store.multi_vector_store - INFO - Initializing local storage for multi-vector chunks
2025-08-16 06:59:10 - core.services_init - INFO - Document service initialised
2025-08-16 06:59:10 - core.services_init - INFO - Workflow service initialised
2025-08-16 06:59:11 - core.api - WARNING - SENTRY_DSN is not set, skipping Sentry initialization
2025-08-16 06:59:11 - core.api - INFO - Document service initialized and stored on app.state
2025-08-16 06:59:11 - ee.routers - INFO - EE.ROUTERS.INIT_APP: Initializing enterprise routers...
2025-08-16 06:59:11 - ee.routers - INFO - EE.ROUTERS.INIT_APP: Finished initializing enterprise routers.
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing Database…
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Initializing PostgreSQL database tables and indexes...
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Created database tables successfully
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Creating optimised indexes for flattened columns …
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Flattened auth columns and indexes created successfully
2025-08-16 06:59:11 - core.database.postgres_database - INFO - PostgreSQL tables and indexes created successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Database initialization successful
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing Vector Store…
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Initializing PGVector store with 1536 dimensions
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Enabled pgvector extension
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Vector dimensions unchanged (1536), using existing table
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - PGVector store initialized successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Vector Store initialization successful (or not applicable).
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing ColPali Vector Store…
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - core.vector_store.multi_vector_store - INFO - MultiVectorStore initialized successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: ColPali Vector Store initialization successful (or not applicable).
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Attempting to initialize Redis connection pool…
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Redis settings for pool: host=redis, port=6379
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Successfully initialized Redis connection pool and stored on app.state.
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Core startup logic executed.
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Initializing PostgreSQL connection pool with size=10, max_overflow=15, pool_recycle=3600s
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Initializing PostgreSQL database tables and indexes...
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Created database tables successfully
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Creating optimised indexes for flattened columns …
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Flattened auth columns and indexes created successfully
2025-08-16 06:59:18 - core.database.postgres_database - INFO - PostgreSQL tables and indexes created successfully
2025-08-16 07:00:04 - core.database.postgres_database - INFO - Document update: updating fields ['storage_info', 'storage_files', 'system_metadata']
2025-08-16 07:00:04 - core.database.postgres_database - INFO - Document 4a1b9434-4502-434c-979a-d21942c7950a updated successfully
2025-08-16 07:00:04 - core.routes.ingest - INFO - File ingestion job queued (job_id=d0607e6d0e6c4d9d8fb8577c356f73b6, doc=4a1b9434-4502-434c-979a-d21942c7950a)

morphik-app: /app/logs/worker_ingestion.log
2025-08-16 06:59:12,963 - core.workers.ingestion_worker - INFO - Starting ingestion job for file: Dimplex_LA1422C_E-Doku_de-en-fr.pdf
2025-08-16 06:59:12,963 - core.workers.ingestion_worker - INFO - ColPali parameter received: use_colpali=True (type: <class 'bool'>)
2025-08-16 06:59:13,010 - core.workers.ingestion_worker - INFO - Downloading file from storage/ingest_uploads/c665afca-a31d-42e0-b8c2-338ae8a4b3c1/Dimplex_LA1422C_E-Doku_de-en-fr.pdf
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - File download took 0.00s for 0.18MB
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - ColPali decision: use_colpali=True, has_model=True, has_store=True, using_colpali=<core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - Processing decision for application/pdf file: skip_text_parsing=True (ColPali=<core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>, text_rules=False, native_format=True, image_rules=False)
2025-08-16 06:59:13,024 - core.workers.ingestion_worker - INFO - Skipping text extraction - ColPali will handle this file directly
2025-08-16 06:59:14,028 - core.workers.ingestion_worker - INFO - Document retrieval took 1.00s
2025-08-16 06:59:14,036 - core.workers.ingestion_worker - INFO - Initial document update took 0.01s
2025-08-16 06:59:14,044 - core.workers.ingestion_worker - INFO - No text chunking needed - ColPali will create image-based chunks
2025-08-16 06:59:14,044 - core.workers.ingestion_worker - INFO - Text chunking took 0.00s to create 0 chunks
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Colpali chunk creation took 2.27s
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Determined final page count for usage recording: 13 pages (ColPali used: <core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>)
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Skipping regular embeddings - will store only in ColPali vector store
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - ColPali streaming mode: processing 13 chunks with store batch size 16

morphik@morphik:~$ nvidia-smi 
Sat Aug 16 09:26:34 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 ...    Off | 00000000:01:00.0 Off |                  N/A |
| 29%   43C    P8              11W / 125W |      3MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions