-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: Bump Apache Parquet 1.14.4 #11502
base: main
Are you sure you want to change the base?
Conversation
…)" (apache#11462)" This reverts commit 7cc16fa.
Row booleanCol = Row.of(36L, 4L, 0L, null, false, true); | ||
Row decimalCol = Row.of(91L, 4L, 1L, null, new BigDecimal("1.00"), new BigDecimal("2.00")); | ||
Row doubleCol = Row.of(91L, 4L, 0L, 1L, 1.0D, 2.0D); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] should we refactor this to pick file_size from the Datafiles themselve like we did we did in JDK 17 upgrade PR #7391 (comment)
Never the less looks like size in bytes is increasing in this version is it because they are more accurate now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @singhpk234, that's an excellent suggestion. I've copied your approach here as well. Parquet now also tracks how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.
This is precisely what we needed in Redshift as well, our CBO was falling behind with variable length data types, will give them HeadsUp ! Thankyou @Fokko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Parquet update looks good, I'm just wondering about the row size increase in the test. I would add at least in the comment in the test to explain the reason.
build.gradle
Outdated
@@ -119,6 +119,9 @@ allprojects { | |||
repositories { | |||
mavenCentral() | |||
mavenLocal() | |||
maven { | |||
url = uri("https://repository.apache.org/content/repositories/orgapacheparquet-1065") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: this is temporary during the Parquet release vote (just to not forget to remove this once Parquet release is out 😄 )
@@ -217,27 +217,27 @@ public void testPrimitiveColumns() throws Exception { | |||
|
|||
Row binaryCol = | |||
Row.of( | |||
52L, | |||
55L, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the size is growing here (I mean in the test) ?
Should we have two tests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11502 (comment) for the reason. What's the suggestion for the second test?
Row fixedCol = | ||
Row.of( | ||
44L, | ||
47L, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here about the size.
No description provided.