[Bug] Hive DDL and paimon schema mismatched #4556

GangYang-HX · 2024-11-20T09:38:40Z

Search before asking

I searched in the issues and found nothing similar.

Paimon version

Paimon-0.8.1

Compute Engine

Flink-1.18.1

Minimal reproduce step

Start a Spark offline task containing a large number of tasks to read the Paimon table data
During the offline task, add a new field
Not necessarily displayed, there is a high probability！！！

What doesn't meet your expectations?

The alterTable operation is not atomic. When reading the Paimon table data, the Hive field and Paimon latest-schema information will be checked. There is a certain probability that they will not match and eventually cause query exceptions.

Hive DDL and paimon schema mismatched! It is recommended not to write any column definition as Paimon external table can read schema from the specified location. There are 1665 fields in Hive DDL: id, sticky_album_id ...... There are 1666 fields in Paimon schema: id, sticky_album_id ...... at org.apache.paimon.hive.HiveSchema.checkFieldsMatched(HiveSchema.java:249) at org.apache.paimon.hive.HiveSchema.extract(HiveSchema.java:165) at org.apache.paimon.hive.PaimonStorageHandler.getDataFieldsJsonStr(PaimonStorageHandler.java:89) at org.apache.paimon.hive.PaimonStorageHandler.configureInputJobProperties(PaimonStorageHandler.java:84) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:438) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:468) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:354) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:354) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:184) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:184) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:184) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:181)

Anything else?

org.apache.paimon.hive.HiveCatalog#alterTableImpl

org.apache.paimon.hive.HiveSchema#checkFieldsMatched

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

GangYang-HX added the bug Something isn't working label Nov 20, 2024

GangYang-HX changed the title ~~[Bug]~~ [Bug] Hive DDL and paimon schema mismatched Nov 20, 2024

GangYang-HX mentioned this issue Nov 21, 2024

[Hive] Fix Hive DDL and paimon schema mismatched bug #4561

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Hive DDL and paimon schema mismatched #4556

[Bug] Hive DDL and paimon schema mismatched #4556

GangYang-HX commented Nov 20, 2024

[Bug] Hive DDL and paimon schema mismatched #4556

[Bug] Hive DDL and paimon schema mismatched #4556

Comments

GangYang-HX commented Nov 20, 2024

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?