Iceberg fails to read the parquet file while rewriting data files. #11553

himani1126 · 2024-11-14T23:57:59Z

Apache Iceberg version

1.2.1

Query engine

Spark

Please describe the bug 🐞

While compacting data files using spark 3.3, I am seeing the following error as spark is unable to read the parquet file. This happens even when I am using the 1.6.1 Iceberg version. However, the compaction works using Trino engine.

This is the command I am executing

SparkActions.get(sparkSession).rewriteDataFiles(table).execute()

And this is the error I am seeing

Error reading file(s): s3://bucket_name/98c95396e6a3f96506862b4a87d3fc71.gz.parquet
java.lang.IllegalArgumentException: Should not read definition, past page end
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:145) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.PageIterator.currentDefinitionLevel(PageIterator.java:115) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ColumnIterator.currentDefinitionLevel(ColumnIterator.java:101) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:418) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

himani1126 added the bug Something isn't working label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg fails to read the parquet file while rewriting data files. #11553

Iceberg fails to read the parquet file while rewriting data files. #11553

himani1126 commented Nov 14, 2024

Iceberg fails to read the parquet file while rewriting data files. #11553

Iceberg fails to read the parquet file while rewriting data files. #11553

Comments

himani1126 commented Nov 14, 2024

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute