Skip to content

Conversation

@XanthosXanthopoulos
Copy link
Collaborator

@XanthosXanthopoulos XanthosXanthopoulos commented Dec 6, 2025

Issue and/or context:
#4299 #4311 provide memory optimization to reduce overall memory copies and overallocation of memory but introduced a necessary memory copy in the read path.

Changes:
This PR introduces the option to optimize memory management for read operations based on two modes PERFORMANCE and EFFICIENCY

  • PERFORMANCE: The allocated buffers used for reading from TileDB are passed to the Arrow arrays and a new set of buffers is allocated for subsequent reads. This mode preserves the current behavior but can be wasteful on memory
  • EFFICIENCY: A new set of buffers is allocated to be used by the Arrow array. These buffers are appropriately sized to exactly match the data read from TileDB. A memory copy operation is then performed for each buffer. If the read buffers are completely filled then the memory copy is avoided and the buffers are used directly by Arrow and a new buffers is allocated for TileDB reads.

Notes for Reviewer:
There are currently two almost equivalent implementations for ColumnBuffer to be used during reads. The std::vector one uses a custom allocator to avoid initializing the memory during resize. Under Debug the custom allocator is not being optimized by the compiler and the resulting code is orders of magnitude slower (this can be somewhat mitigated by using -Og instead of -O0 optimization). In Release the compiler generated almost identical code with the array based implementation.

@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/mem_modes branch 2 times, most recently from 22ca7b6 to e72efeb Compare December 6, 2025 23:48
@codecov
Copy link

codecov bot commented Dec 7, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.83%. Comparing base (1a00005) to head (5a81aba).
⚠️ Report is 1 commits behind head on xan/SOMA-688.

Additional details and impacted files
@@               Coverage Diff                @@
##           xan/SOMA-688    #4334      +/-   ##
================================================
- Coverage         86.83%   86.83%   -0.01%     
================================================
  Files               138      138              
  Lines             20743    20751       +8     
  Branches             16       16              
================================================
+ Hits              18013    18019       +6     
- Misses             2730     2732       +2     
Flag Coverage Δ
python 89.15% <80.00%> (-0.02%) ⬇️
r 85.61% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.15% <80.00%> (-0.02%) ⬇️
libtiledbsoma 76.77% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XanthosXanthopoulos XanthosXanthopoulos changed the title [WIP] Implement memory mode for read operations [c++] Provide different modes of handling memory when converting to Arrow objects Dec 11, 2025
@XanthosXanthopoulos XanthosXanthopoulos marked this pull request as ready for review December 12, 2025 01:00
@XanthosXanthopoulos XanthosXanthopoulos merged commit 44cfb0c into xan/SOMA-688 Dec 19, 2025
35 checks passed
@XanthosXanthopoulos XanthosXanthopoulos deleted the xan/mem_modes branch December 19, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants