-
Notifications
You must be signed in to change notification settings - Fork 282
Open
Description
I uploaded a PDF document and according to the logs the ingestion has been processed with ColPali.
But the document status in the admin ui keeps showing as "processing"; without ColPali it soon switched to processed.
A reconnect and reloading the admin ui doesn't change the situation.
morphik-app: /app/logs/morphik.log
2025-08-16 06:59:10 - core.embedding.colpali_embedding_model - INFO - Colpali running in mode: self_hosted with batch size: 1
2025-08-16 06:59:10 - core.embedding.colpali_embedding_model - INFO - Colpali initialization time: 43.52 seconds
2025-08-16 06:59:10 - core.vector_store.multi_vector_store - INFO - Initializing local storage for multi-vector chunks
2025-08-16 06:59:10 - core.services_init - INFO - Document service initialised
2025-08-16 06:59:10 - core.services_init - INFO - Workflow service initialised
2025-08-16 06:59:11 - core.api - WARNING - SENTRY_DSN is not set, skipping Sentry initialization
2025-08-16 06:59:11 - core.api - INFO - Document service initialized and stored on app.state
2025-08-16 06:59:11 - ee.routers - INFO - EE.ROUTERS.INIT_APP: Initializing enterprise routers...
2025-08-16 06:59:11 - ee.routers - INFO - EE.ROUTERS.INIT_APP: Finished initializing enterprise routers.
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing Database…
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Initializing PostgreSQL database tables and indexes...
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Created database tables successfully
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Creating optimised indexes for flattened columns …
2025-08-16 06:59:11 - core.database.postgres_database - INFO - Flattened auth columns and indexes created successfully
2025-08-16 06:59:11 - core.database.postgres_database - INFO - PostgreSQL tables and indexes created successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Database initialization successful
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing Vector Store…
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Initializing PGVector store with 1536 dimensions
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Enabled pgvector extension
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - Vector dimensions unchanged (1536), using existing table
2025-08-16 06:59:11 - core.vector_store.pgvector_store - INFO - PGVector store initialized successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Vector Store initialization successful (or not applicable).
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Initializing ColPali Vector Store…
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - psycopg.pool - WARNING - rolling back returned connection: <psycopg.Connection [INTRANS] (host=postgres database=morphik) at 0x7f475d4e6050>
2025-08-16 06:59:11 - core.vector_store.multi_vector_store - INFO - MultiVectorStore initialized successfully
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: ColPali Vector Store initialization successful (or not applicable).
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Attempting to initialize Redis connection pool…
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Redis settings for pool: host=redis, port=6379
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Successfully initialized Redis connection pool and stored on app.state.
2025-08-16 06:59:11 - core.app_factory - INFO - Lifespan: Core startup logic executed.
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Initializing PostgreSQL connection pool with size=10, max_overflow=15, pool_recycle=3600s
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Initializing PostgreSQL database tables and indexes...
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Created database tables successfully
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Creating optimised indexes for flattened columns …
2025-08-16 06:59:18 - core.database.postgres_database - INFO - Flattened auth columns and indexes created successfully
2025-08-16 06:59:18 - core.database.postgres_database - INFO - PostgreSQL tables and indexes created successfully
2025-08-16 07:00:04 - core.database.postgres_database - INFO - Document update: updating fields ['storage_info', 'storage_files', 'system_metadata']
2025-08-16 07:00:04 - core.database.postgres_database - INFO - Document 4a1b9434-4502-434c-979a-d21942c7950a updated successfully
2025-08-16 07:00:04 - core.routes.ingest - INFO - File ingestion job queued (job_id=d0607e6d0e6c4d9d8fb8577c356f73b6, doc=4a1b9434-4502-434c-979a-d21942c7950a)
morphik-app: /app/logs/worker_ingestion.log
2025-08-16 06:59:12,963 - core.workers.ingestion_worker - INFO - Starting ingestion job for file: Dimplex_LA1422C_E-Doku_de-en-fr.pdf
2025-08-16 06:59:12,963 - core.workers.ingestion_worker - INFO - ColPali parameter received: use_colpali=True (type: <class 'bool'>)
2025-08-16 06:59:13,010 - core.workers.ingestion_worker - INFO - Downloading file from storage/ingest_uploads/c665afca-a31d-42e0-b8c2-338ae8a4b3c1/Dimplex_LA1422C_E-Doku_de-en-fr.pdf
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - File download took 0.00s for 0.18MB
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - ColPali decision: use_colpali=True, has_model=True, has_store=True, using_colpali=<core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>
2025-08-16 06:59:13,011 - core.workers.ingestion_worker - INFO - Processing decision for application/pdf file: skip_text_parsing=True (ColPali=<core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>, text_rules=False, native_format=True, image_rules=False)
2025-08-16 06:59:13,024 - core.workers.ingestion_worker - INFO - Skipping text extraction - ColPali will handle this file directly
2025-08-16 06:59:14,028 - core.workers.ingestion_worker - INFO - Document retrieval took 1.00s
2025-08-16 06:59:14,036 - core.workers.ingestion_worker - INFO - Initial document update took 0.01s
2025-08-16 06:59:14,044 - core.workers.ingestion_worker - INFO - No text chunking needed - ColPali will create image-based chunks
2025-08-16 06:59:14,044 - core.workers.ingestion_worker - INFO - Text chunking took 0.00s to create 0 chunks
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Colpali chunk creation took 2.27s
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Determined final page count for usage recording: 13 pages (ColPali used: <core.vector_store.multi_vector_store.MultiVectorStore object at 0x7efb262f9790>)
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - Skipping regular embeddings - will store only in ColPali vector store
2025-08-16 06:59:16,317 - core.workers.ingestion_worker - INFO - ColPali streaming mode: processing 13 chunks with store batch size 16
morphik@morphik:~$ nvidia-smi
Sat Aug 16 09:26:34 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01 Driver Version: 535.247.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:01:00.0 Off | N/A |
| 29% 43C P8 11W / 125W | 3MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
No labels