Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg fails to read the parquet file while rewriting data files. #11553

Open
3 tasks
himani1126 opened this issue Nov 14, 2024 · 0 comments
Open
3 tasks

Iceberg fails to read the parquet file while rewriting data files. #11553

himani1126 opened this issue Nov 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@himani1126
Copy link

Apache Iceberg version

1.2.1

Query engine

Spark

Please describe the bug 🐞

While compacting data files using spark 3.3, I am seeing the following error as spark is unable to read the parquet file. This happens even when I am using the 1.6.1 Iceberg version. However, the compaction works using Trino engine.

This is the command I am executing

SparkActions.get(sparkSession).rewriteDataFiles(table).execute()

And this is the error I am seeing

Error reading file(s): s3://bucket_name/98c95396e6a3f96506862b4a87d3fc71.gz.parquet
java.lang.IllegalArgumentException: Should not read definition, past page end
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:145) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.PageIterator.currentDefinitionLevel(PageIterator.java:115) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ColumnIterator.currentDefinitionLevel(ColumnIterator.java:101) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:418) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119) 

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@himani1126 himani1126 added the bug Something isn't working label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant