-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skip/limit options for pandas scan #4662
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4662 +/- ##
==========================================
- Coverage 86.13% 86.13% -0.01%
==========================================
Files 1373 1373
Lines 58295 58295
Branches 7210 7211 +1
==========================================
- Hits 50212 50210 -2
- Misses 7919 7921 +2
Partials 164 164 ☔ View full report in Codecov by Sentry. |
Benchmark ResultMaster commit hash:
|
7bf8f04
to
63d7fd1
Compare
63d7fd1
to
b709ea0
Compare
Benchmark ResultMaster commit hash:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Please check my comment below.
std::move(columnBindData), std::move(scanConfig)); | ||
auto scanConfig = PyScanConfig{ | ||
input->extraInput->constPtrCast<ExtraScanTableFuncBindInput>()->fileScanInfo.options}; | ||
KU_ASSERT(numRows >= scanConfig.skipNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm Do we have a check somewhere to ensure scanConfig.skipNum
is always smaller than numRows
? I'm concerned if this will lead to overflow under the case numRows < scanConfig.skipNum
.
I would also add a test case on that if we don't have one already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added check in scan config constructor that bounds skipNum
to the number of rows + added tests
Benchmark ResultMaster commit hash:
|
Description
Add support for skipping rows + limiting number of rows to scan when scanning from pandas dataframes.
Also reuse
PyarrowScanConfig
for pandas scan config and rename to more appropriate name (since it isn't pyarrow only)Contributor agreement