-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reading cdf from a checkpointed table #3110
Conversation
Signed-off-by: Stephen Carman <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3110 +/- ##
==========================================
+ Coverage 72.37% 72.40% +0.03%
==========================================
Files 128 128
Lines 41301 41366 +65
Branches 41301 41366 +65
==========================================
+ Hits 29892 29953 +61
- Misses 9494 9500 +6
+ Partials 1915 1913 -2 ☔ View full report in Codecov by Sentry. |
@@ -107,6 +107,25 @@ impl CdfLoadBuilder { | |||
self | |||
} | |||
|
|||
async fn calculate_earliest_version(&self) -> DeltaResult<i64> { | |||
let ts = self.starting_timestamp.unwrap_or(DateTime::UNIX_EPOCH); | |||
for v in 0..self.snapshot.version() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be potentially expensive, can't we do a binary search between 0 and the last checkpoint version?
The log store has the get_earliest_version function, which is also not efficient. I thought redirection would be configurable there but it's not unfortunately. But we could replace that with a binary search there and it would help for other use cases as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am comfortable with an inefficient bug fix now and optimizations later. I'm going to pull this in ahead of the LakeFS work @ion-elgreco so I can release this as a patch before we start the 0.24 work
Alrighty, I can do a follow up on this then after lakefs work is in :) |
Yeah, I can touch up the perf on this a bit after all the other stuff is settled down. But I'd say generally users should provide versions if they know them, it will make everything work better. |
Description
Fixes a bug where you cannot read CDF from a table that has been checkpointed or vacuumed
Related Issue(s)