[c++] Provide different modes of handling memory when converting to Arrow objects #4334

XanthosXanthopoulos · 2025-12-06T17:04:24Z

Issue and/or context:
#4299 #4311 provide memory optimization to reduce overall memory copies and overallocation of memory but introduced a necessary memory copy in the read path.

Changes:
This PR introduces the option to optimize memory management for read operations based on two modes PERFORMANCE and EFFICIENCY

PERFORMANCE: The allocated buffers used for reading from TileDB are passed to the Arrow arrays and a new set of buffers is allocated for subsequent reads. This mode preserves the current behavior but can be wasteful on memory
EFFICIENCY: A new set of buffers is allocated to be used by the Arrow array. These buffers are appropriately sized to exactly match the data read from TileDB. A memory copy operation is then performed for each buffer. If the read buffers are completely filled then the memory copy is avoided and the buffers are used directly by Arrow and a new buffers is allocated for TileDB reads.

Notes for Reviewer:
There are currently two almost equivalent implementations for ColumnBuffer to be used during reads. The std::vector one uses a custom allocator to avoid initializing the memory during resize. Under Debug the custom allocator is not being optimized by the compiler and the resulting code is orders of magnitude slower (this can be somewhat mitigated by using -Og instead of -O0 optimization). In Release the compiler generated almost identical code with the array based implementation.

codecov · 2025-12-07T17:11:37Z

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.83%. Comparing base (1a00005) to head (5a81aba).
⚠️ Report is 1 commits behind head on xan/SOMA-688.

Additional details and impacted files

@@               Coverage Diff                @@
##           xan/SOMA-688    #4334      +/-   ##
================================================
- Coverage         86.83%   86.83%   -0.01%     
================================================
  Files               138      138              
  Lines             20743    20751       +8     
  Branches             16       16              
================================================
+ Hits              18013    18019       +6     
- Misses             2730     2732       +2

Flag	Coverage Δ
python	`89.15% <80.00%> (-0.02%)`	⬇️
r	`85.61% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
python_api	`89.15% <80.00%> (-0.02%)`	⬇️
libtiledbsoma	`76.77% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…s for TileDB to Arrow conversions

…tted (#4350)

XanthosXanthopoulos force-pushed the xan/mem_modes branch 2 times, most recently from 22ca7b6 to e72efeb Compare December 6, 2025 23:48

XanthosXanthopoulos force-pushed the xan/SOMA-688 branch from 1ac1e25 to 1a00005 Compare December 11, 2025 15:55

XanthosXanthopoulos added 6 commits December 11, 2025 17:55

Add custom allocator for vector backed buffers, implement memory mode…

7be48b1

…s for TileDB to Arrow conversions

Lint fix

7a7e58c

Migrate R to use multithreaded arrow conversion

3e88145

Fix compiler warnings

1be9ccd

Rebind buffers after each read operation

f5f18d9

Read memory mode from config

4eac8a5

XanthosXanthopoulos force-pushed the xan/mem_modes branch from 17ee63e to 4eac8a5 Compare December 11, 2025 15:56

XanthosXanthopoulos changed the title ~~[WIP] Implement memory mode for read operations~~ [c++] Provide different modes of handling memory when converting to Arrow objects Dec 11, 2025

XanthosXanthopoulos requested a review from jp-dark December 12, 2025 00:09

Include Rcpp header before other headers

b8a91dd

XanthosXanthopoulos marked this pull request as ready for review December 12, 2025 01:00

Store object references when setting column data until write is submi…

5a81aba

…tted (#4350)

XanthosXanthopoulos merged commit 44cfb0c into xan/SOMA-688 Dec 19, 2025
35 checks passed

XanthosXanthopoulos deleted the xan/mem_modes branch December 19, 2025 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[c++] Provide different modes of handling memory when converting to Arrow objects #4334

[c++] Provide different modes of handling memory when converting to Arrow objects #4334

Uh oh!

XanthosXanthopoulos commented Dec 6, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[c++] Provide different modes of handling memory when converting to Arrow objects #4334

[c++] Provide different modes of handling memory when converting to Arrow objects #4334

Uh oh!

Conversation

XanthosXanthopoulos commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XanthosXanthopoulos commented Dec 6, 2025 •

edited

Loading

codecov bot commented Dec 7, 2025 •

edited

Loading