[Feature] Optimize ObjectRefresh
for lower memory usage and better performance
#4971
Closed
2 tasks done
Labels
enhancement
New feature or request
Search before asking
Motivation
The current implementation of
ObjectRefresh
first collects a list of all files underobject-location
, then writes them out to the table. This requires the driver node to have memory enough to reside the entire object listing.Another problem is that the current implementation generates a new commit for each file in the listing. This can result in an enormous amount of snapshots and poor refresh performance.
Solution
FileIO#listFilesIterative
to load file listing into memory in batches.The final effect of memory saving will depend on the actual implementation of the
FileIO
, but the worst case it can fallback to is what we already have now.Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: