Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Hive DDL and paimon schema mismatched #4556

Open
2 tasks done
GangYang-HX opened this issue Nov 20, 2024 · 0 comments
Open
2 tasks done

[Bug] Hive DDL and paimon schema mismatched #4556

GangYang-HX opened this issue Nov 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@GangYang-HX
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

Paimon-0.8.1

Compute Engine

Flink-1.18.1

Minimal reproduce step

  1. Start a Spark offline task containing a large number of tasks to read the Paimon table data
  2. During the offline task, add a new field
    Not necessarily displayed, there is a high probability!!!

What doesn't meet your expectations?

The alterTable operation is not atomic. When reading the Paimon table data, the Hive field and Paimon latest-schema information will be checked. There is a certain probability that they will not match and eventually cause query exceptions.

Hive DDL and paimon schema mismatched! It is recommended not to write any column definition as Paimon external table can read schema from the specified location. There are 1665 fields in Hive DDL: id, sticky_album_id ...... There are 1666 fields in Paimon schema: id, sticky_album_id ...... at org.apache.paimon.hive.HiveSchema.checkFieldsMatched(HiveSchema.java:249) at org.apache.paimon.hive.HiveSchema.extract(HiveSchema.java:165) at org.apache.paimon.hive.PaimonStorageHandler.getDataFieldsJsonStr(PaimonStorageHandler.java:89) at org.apache.paimon.hive.PaimonStorageHandler.configureInputJobProperties(PaimonStorageHandler.java:84) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:438) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:468) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:354) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:354) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:184) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:184) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:184) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:181)

Anything else?

image org.apache.paimon.hive.HiveCatalog#alterTableImpl image org.apache.paimon.hive.HiveSchema#checkFieldsMatched

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@GangYang-HX GangYang-HX added the bug Something isn't working label Nov 20, 2024
@GangYang-HX GangYang-HX changed the title [Bug] [Bug] Hive DDL and paimon schema mismatched Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant