Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] CALL sys.expire_partitions failed when using hive metastore and setting 'metastore.partitioned-table' = 'false'. #4873

Open
2 tasks done
JingFengWang opened this issue Jan 9, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@JingFengWang
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

release-1.0

Compute Engine

spark-3.2.0

Minimal reproduce step

step 1: To create a date partitioned table, set 'metastore.partitioned-table' = 'false'
step 2: Write the test data of the last N days
step 3: CALL sys.expire_partitions(table => 'db.tb', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd');

What doesn't meet your expectations?

An exception is thrown when sys.expire_partitions is executed and no snapshot is generated, so the partitions that need to expire after the snapshot expires will not be physically deleted.

Anything else?

Exception information:
spark-sql> CALL sys.expire_partitions(table => 'db.tb', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd');
25/01/08 17:50:18 ERROR SparkSQLDriver: Failed in [CALL sys.expire_partitions(table => 'db.tb', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd')]
java.lang.RuntimeException: MetaException(message:Invalid partition key & values; keys [], values [2025-01-01, 15, ])
at org.apache.paimon.operation.PartitionExpire.deleteMetastorePartitions(PartitionExpire.java:175)
at org.apache.paimon.operation.PartitionExpire.doExpire(PartitionExpire.java:162)
at org.apache.paimon.operation.PartitionExpire.expire(PartitionExpire.java:139)
at org.apache.paimon.operation.PartitionExpire.expire(PartitionExpire.java:109)
at org.apache.paimon.spark.procedure.ExpirePartitionsProcedure.lambda$call$2(ExpirePartitionsProcedure.java:115)
at org.apache.paimon.spark.procedure.BaseProcedure.execute(BaseProcedure.java:88)
at org.apache.paimon.spark.procedure.BaseProcedure.modifyPaimonTable(BaseProcedure.java:78)
at org.apache.paimon.spark.procedure.ExpirePartitionsProcedure.call(ExpirePartitionsProcedure.java:87)
at org.apache.paimon.spark.execution.PaimonCallExec.run(PaimonCallExec.scala:32)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@JingFengWang JingFengWang added the bug Something isn't working label Jan 9, 2025
@yangjf2019
Copy link
Contributor

yangjf2019 commented Jan 10, 2025

Hi @JingFengWang ,Can you provide the table building statement?and is your table internal or external?

@JingFengWang
Copy link
Author

JingFengWang commented Jan 14, 2025

Hi @yangjf2019
-- create internal table
CREATE TABLE db.tb (
dt BIGINT COMMENT '时间戳,毫秒',
randomnum INT COMMENT '随机数',
version STRING COMMENT 'xx',
....
day STRING COMMENT 'day',
hour STRING COMMENT 'hour'
) USING paimon
PARTITIONED BY (day, hour)
TBLPROPERTIES (
'write-only' = 'true',
'write-buffer-spillable' = 'true',
'write-buffer-for-append' = 'true',
'file.format' = 'orc',
'file.compression' = 'zstd',
'target-file-size' = '536870912',
'bucket' = '80',
'bucket-key' = 'randomnum'
);

@JingFengWang
Copy link
Author

JingFengWang commented Jan 14, 2025

Hi @yangjf2019

-- Solution
--- a/paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveMetastoreClient.java
+++ b/paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveMetastoreClient.java
@@ -29,13 +29,10 @@ import org.apache.paimon.utils.PartitionPathUtils;

 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
-import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.PartitionEventType;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.*;
 import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;

 import java.util.ArrayList;
 import java.util.HashMap;
@@ -54,6 +51,8 @@ public class HiveMetastoreClient implements MetastoreClient {

     private static final String HIVE_LAST_UPDATE_TIME_PROP = "transient_lastDdlTime";

+    private static final Logger LOG = LoggerFactory.getLogger(HiveMetastoreClient.class);
+
     private final Identifier identifier;

     private final ClientPool<IMetaStoreClient, TException> clients;
@@ -154,6 +153,11 @@ public class HiveMetastoreClient implements MetastoreClient {
                                     false));
         } catch (NoSuchObjectException e) {
             // do nothing if the partition not exists
+        } catch (MetaException e) {
+            // When using hive metastore and 'metastore.partitioned-table' = 'false',
+            // there is no storage partition information in the hive metasotre,
+            // so the exception should be caught here.
+        } catch (TException e) {
         }
     }

@yangjf2019
Copy link
Contributor

yangjf2019 commented Jan 15, 2025

Hi @JingFengWang ,I tried to reproduce your problem, environment: spark 3.2, paimon 0.9.0,hive 3.1.3 but not using spark-sql client.The call command is working!
Can you try it with scala code? Also, can you look at the paimon conf configuration for spark-sql? Here is my code:

  def main(args: Array[String]): Unit = {
    val catalogName ="your_paimon_catalog_name"
    val paimonDatabase = "db"
    val table = "tb"
    val thriftServer = "thrift://localhost:9083"
    val warehouse =  "hdfs://hadoop.single.node:9000/user/hive/warehouse"
    val spark =
      SparkSession
        .builder()
        .appName(PaimonHiveCatalogExpireApp.getClass.getSimpleName)
        .config(s"spark.sql.catalog.$catalogName", "org.apache.paimon.spark.SparkCatalog")
        .config(s"spark.sql.catalog.$catalogName.metastore", "hive")
        .config(s"spark.sql.catalog.$catalogName.uri", thriftServer)
        .config(s"spark.sql.catalog.$catalogName.warehouse", warehouse)
        .config("spark.sql.extensions", "org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions")
        .master("local")
        .getOrCreate()
    spark.sql(
      s"""
        |use $catalogName
        |""".stripMargin)

    spark.sql(
      s"""
        |create database $paimonDatabase
        |""".stripMargin)

    spark.sql(
      s"""
        |use $paimonDatabase
        |""".stripMargin)

    spark.sql(
      s"""
         |CREATE TABLE $paimonDatabase.$table (
         |   dt BIGINT COMMENT '时间戳,毫秒',
         |   randomnum INT COMMENT '随机数',
         |   version STRING COMMENT 'xx',
         |   day STRING COMMENT 'day',
         |   hour STRING COMMENT 'hour'
         |) USING paimon
         |PARTITIONED BY (day, hour)
         |TBLPROPERTIES (
         |    'write-only' = 'true',
         |    'write-buffer-spillable' = 'true',
         |    'write-buffer-for-append' = 'true',
         |    'file.format' = 'orc',
         |    'file.compression' = 'zstd',
         |    'target-file-size' = '536870912',
         |    'bucket' = '80',
         |    'bucket-key' = 'randomnum'
         |)
         |
         |""".stripMargin)

    spark.sql(
      s"""
         |CALL sys.expire_partitions(
         |    table => '$paimonDatabase.$table',
         |    expiration_time => '1 d',
         |    timestamp_formatter => 'yyyy-MM-dd'
         |    )
         |
         |""".stripMargin)
  }

@yangjf2019
Copy link
Contributor

Image

@JingFengWang
Copy link
Author

JingFengWang commented Jan 16, 2025

Hi @yangjf2019 The problem occurs in version paimon-1.0. But 0.9 also has this problem.I tested 0.9.
Also, have you written any test data? There needs to be a partition that is expired.
Your test code does not write test data to the partition that is out of date.
There must be expired partitions and data, which is very important for testing. If there is no expired partition, the exception data will not be triggered.

@yangjf2019
Copy link
Contributor

Well,I try to write some data for the expire partition.And see it tommorow.

@yangjf2019
Copy link
Contributor

Late reply! I have the following problems.

25/01/20 14:50:31 WARN PartitionValuesTimeExpireStrategy: Can't extract datetime from partition day:20250116,hour:17. If you want to configure partition expiration, please:
  1. Check the expiration configuration.
  2. Manually delete the partition using the drop-partition command if the partition value is non-date formatted.
  3. Use 'update-time' expiration strategy by set 'partition.expiration-strategy', which supports non-date formatted partition.

@yangjf2019
Copy link
Contributor

There is no reproduction of your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants