Skip to content

Commit

Permalink
Change defaults for indexing chunk size to avoid OOM
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidMStraub committed Oct 20, 2024
1 parent 37a461c commit 4f3d038
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion gramps_webapi/api/search/indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,15 @@ def reindex_full(
self.index.delete_all()
self.index_public.delete_all()
obj_dicts = []
chunk_size = max(100, total // 10)
if self.use_semantic_text:
# semantic search indexing is slow and uses lots of memory, so we use
# a small chunk size: at most 100. If we have less than 1000 objects,
# use 1/10th as chunk size.
chunk_size = min(100, total // 10 + 1)
else:
# full-text search indexing is fast, so we use a large chunk size:
# at least 100 (but at most 10%).
chunk_size = max(100, total // 10)
prev: int | None = None
for i, obj_dict in enumerate(
iter_obj_strings(db_handle, semantic=self.use_semantic_text)
Expand Down

0 comments on commit 4f3d038

Please sign in to comment.