You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While compacting data files using spark 3.3, I am seeing the following error as spark is unable to read the parquet file. This happens even when I am using the 1.6.1 Iceberg version. However, the compaction works using Trino engine.
Error reading file(s): s3://bucket_name/98c95396e6a3f96506862b4a87d3fc71.gz.parquet
java.lang.IllegalArgumentException: Should not read definition, past page end
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:145) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.PageIterator.currentDefinitionLevel(PageIterator.java:115) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ColumnIterator.currentDefinitionLevel(ColumnIterator.java:101) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:418) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:738) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135) ~[iceberg-spark-runtime-3.3_2.12-1.6.1.jar:?]
at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.2.1
Query engine
Spark
Please describe the bug 🐞
While compacting data files using spark 3.3, I am seeing the following error as spark is unable to read the parquet file. This happens even when I am using the 1.6.1 Iceberg version. However, the compaction works using Trino engine.
This is the command I am executing
SparkActions.get(sparkSession).rewriteDataFiles(table).execute()
And this is the error I am seeing
Willingness to contribute
The text was updated successfully, but these errors were encountered: