Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Bump Apache Parquet 1.14.4 #11502

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Build: Bump Apache Parquet 1.14.4 #11502

wants to merge 4 commits into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Nov 8, 2024

No description provided.

@Fokko Fokko changed the title Test out Apache Parquet 1.14.4 Test out Apache Parquet 1.14.4 RC2 Nov 8, 2024
Comment on lines 226 to 228
Row booleanCol = Row.of(36L, 4L, 0L, null, false, true);
Row decimalCol = Row.of(91L, 4L, 1L, null, new BigDecimal("1.00"), new BigDecimal("2.00"));
Row doubleCol = Row.of(91L, 4L, 0L, 1L, 1.0D, 2.0D);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] should we refactor this to pick file_size from the Datafiles themselve like we did we did in JDK 17 upgrade PR #7391 (comment)

Never the less looks like size in bytes is increasing in this version is it because they are more accurate now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @singhpk234, that's an excellent suggestion. I've copied your approach here as well. Parquet now also tracks how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how large the data is in memory after compression (this is handy for strings where you don't know that upfront) so you can allocate buffers directly to the right size.

This is precisely what we needed in Redshift as well, our CBO was falling behind with variable length data types, will give them HeadsUp ! Thankyou @Fokko

@jbonofre jbonofre self-requested a review November 9, 2024 09:38
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parquet update looks good, I'm just wondering about the row size increase in the test. I would add at least in the comment in the test to explain the reason.

build.gradle Outdated
@@ -119,6 +119,9 @@ allprojects {
repositories {
mavenCentral()
mavenLocal()
maven {
url = uri("https://repository.apache.org/content/repositories/orgapacheparquet-1065")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: this is temporary during the Parquet release vote (just to not forget to remove this once Parquet release is out 😄 )

@@ -217,27 +217,27 @@ public void testPrimitiveColumns() throws Exception {

Row binaryCol =
Row.of(
52L,
55L,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the size is growing here (I mean in the test) ?
Should we have two tests ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #11502 (comment) for the reason. What's the suggestion for the second test?

Row fixedCol =
Row.of(
44L,
47L,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here about the size.

@Fokko Fokko changed the title Test out Apache Parquet 1.14.4 RC2 Build: Bump Apache Parquet 1.14.4 Nov 12, 2024
@Fokko Fokko marked this pull request as ready for review November 12, 2024 12:45
build.gradle Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants