Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flink] Supports debezium-bson format of kafka which collected from mongodb via debezium #4870

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

lizc9
Copy link
Contributor

@lizc9 lizc9 commented Jan 8, 2025

Purpose

Linked issue: open #4615

Tests

API and Format

Documentation

@lizc9 lizc9 changed the title [flink] Supports debezium-bson formats of kafka data which collected … [flink] Supports debezium-bson format of kafka data which collected … Jan 8, 2025
@lizc9 lizc9 changed the title [flink] Supports debezium-bson format of kafka data which collected … [flink] Supports debezium-bson format of kafka data which collected from mongodb via debezium Jan 8, 2025
@lizc9 lizc9 changed the title [flink] Supports debezium-bson format of kafka data which collected from mongodb via debezium [flink] Supports debezium-bson format of kafka which collected from mongodb via debezium Jan 8, 2025
@lizc9 lizc9 force-pushed the master branch 3 times, most recently from c37818d to e96aa44 Compare January 8, 2025 16:40
Copy link
Contributor

@yuzelin yuzelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! I left some comments, please take a look.

for (Map.Entry<String, BsonValue> entry : document.entrySet()) {
String fieldName = entry.getKey();
resultMap.put(fieldName, toJsonString(BsonValueConvertor.convert(entry.getValue())));
rowTypeBuilder.field(fieldName, DataTypes.STRING());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can map bson type to actual paimon type.

Copy link
Contributor Author

@lizc9 lizc9 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because mongodb does not have a schema, the data type of the same field in different documents may be different. For safety reasons, string type is used. By the way, this is the same as mongodb-cdc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, BsonValueConvertor is used here for data conversion, while mongodb-cdc does not convert data. There will be inconsistencies here, and it may be necessary to discuss.
For example:
The data of mongodb-cdc is:

{
  "_id": "{\"$oid\":\"64001c996f4de7ff3189d374\"}",
  "updated_at": "{\"$numberLong\":\"1732232838425\"}}"
}

The data after BsonValueConvertor conversion is:

{
  "_id": "64001c996f4de7ff3189d374",
  "updated_at": "1732232838425"
}

I think it is also possible to configure whether to use BsonValueConvertor for conversion through TypeMapping

Copy link
Contributor

@yuzelin yuzelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@yuzelin yuzelin merged commit 37e26f3 into apache:master Jan 10, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants