[Bug] [Paimon] The data has repeated after the dynamic hash table write. #8565

Pandas886 · 2025-01-21T02:41:14Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

source

 FakeSource {
    result_table_name = "fake"
    row.num = 300
     int.min = 1
      int.max = 300
    schema = {
      fields {
        name = "string"
        id = "int",
        age="int"
      }
    }
  }

paimon schema

{
  "version" : 3,
  "id" : 0,
  "fields" : [
  {
    "id" : 0,
    "name" : "name",
    "type" : "STRING"
  },

  {
    "id" : 1,
    "name" : "id",
    "type" : "INT NOT NULL"
  },

 {
       "id" : 2,
       "name" : "age",
       "type" : "INT NOT NULL"
     }
 ],
  "highestFieldId" : 2,
  "partitionKeys" : [ ],
  "primaryKeys" : [ "id" ],
  "options" : {
    "bucket" : "-1",
    "file.format":"orc",
    "manifest.format" : "orc",
    "dynamic-bucket.target-row-num":"30"
  },
  "timeMillis" : 1731551425602
}

After multiple writes to the paimon dynamic bucket table from the source, the same primary key was not merged, resulting in duplicate data.

The issue appears to be related to rows with the same primary key not being assigned to the same bucket.

SeaTunnel Version

2.3.8

SeaTunnel Config

NAN

Running Command

NAN

Error Exception

NAN

Zeta or Flink or Spark Version

FLINK 1.18

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Pandas886 added the bug label Jan 21, 2025

Pandas886 changed the title ~~[Bug] [Paimon] 动态分桶表写入后数据重复了~~ [Bug] [Paimon] The data has repeated after the dynamic hash table write. Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Paimon] The data has repeated after the dynamic hash table write. #8565

[Bug] [Paimon] The data has repeated after the dynamic hash table write. #8565

Pandas886 commented Jan 21, 2025

[Bug] [Paimon] The data has repeated after the dynamic hash table write. #8565

[Bug] [Paimon] The data has repeated after the dynamic hash table write. #8565

Comments

Pandas886 commented Jan 21, 2025

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct