This repository has been archived by the owner on Sep 18, 2023. It is now read-only.
Feature list while using ArrowFileFormat read or write parquet #1171
Labels
enhancement
New feature or request
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In #1161 , we are trying to use ArowFileFormat to read or write parquet file. But we meet some suite test failed, some of them are due to the lack of ArrowFileFormat functionality. The feature list is below:
Highest priority
read from parquet files with changing schema
,Enabling/disabling merging partfiles when merging parquet schema
,SPARK-10005 Schema merging for nested struct
,alter datasource table add columns - parquet
,alter datasource table add columns - partitioned - parquet
,SPARK-10301 requested schema clipping - requested schema contains physical schema
,schema mismatch failure error message for parquet vectorized reader
compression codec
store and retrieve column stats in different time zones
,analyze column command
,writing with aggregation
,Migration from INT96 to TIMESTAMP_MICROS timestamp type
Low priority
SPARK-26709: OptimizeMetadataOnlyQuery does not handle empty records correctly
SPARK-33593: Vector reader got incorrect data with binary partition value
,SPARK-21167: encode and decode path correctly
,special characters in output path
SPARK-15804: write out the metadata to parquet file
,SPARK-15895 summary files in non-leaf partition directories
,SPARK-11044 Parquet writer version fixed as version1
SPARK-10005 Schema merging for nested struct
,SPARK-10301 requested schema clipping - requested schema contains physical schema
,SPARK-10301 requested schema clipping - physical schema contains requested schema
,SPARK-10301 requested schema clipping - schemas overlap but don't contain each other
,SPARK-10301 requested schema clipping - deeply nested struct
,SPARK-10301 requested schema clipping - out of order
,SPARK-10301 requested schema clipping - schema merging
,Standard mode - SPARK-10301 requested schema clipping - UDT
,Legacy mode - SPARK-10301 requested schema clipping - UDT
cases when literal is max
filter pushdown - timestamp
,filter pushdown - decimal
,filter pushdown - date
Enabling/disabling ignoreMissingFiles using parquet
SPARK-24204 error handling for unsupported Null data types - csv, parquet, orc
writing data out metrics: parquet
Read row group containing both dictionary and plain encoded pages
SPARK-8121: spark.sql.parquet.output.committer.class shouldn't be overridden
,SPARK-7837 Do not close output writer twice when commitTask() fails
The text was updated successfully, but these errors were encountered: