MySQL CDC Plugin #3014

le-vlad · 2024-11-18T22:51:09Z

Adds support for MySQL CDC using golang-canal lib

Features supported:

BinLog Streaming
Initial snapshot streaming
Using composite primary keys for snapshot
Storing binlog position using cache

Versions tested:

"8.0", "9.0", "9.1"

rockwotj

Initial pass - thanks!

rockwotj · 2024-11-26T18:27:17Z

internal/impl/mysql/input_mysql_stream.go

+			Description("The key to store the last processed binlog position."),
+		service.NewStringField(fieldFlavor).
+			Description("The flavor of MySQL to connect to.").
+			Example("mysql"),


What is the alternative? Should this be the default?

The alternative can be mariadb, since it's compatible with MySQL.
But if we release this connector for MySQL deployments only - I can remove this field.

rockwotj · 2024-11-26T18:27:35Z

internal/impl/mysql/input_mysql_stream.go

+			Description("The flavor of MySQL to connect to.").
+			Example("mysql"),
+		service.NewBoolField(fieldMaxSnapshotParallelTables).
+			Description("Int specifies a number of tables to be streamed in parallel when taking a snapshot. If set to true, the connector will stream all tables in parallel. Otherwise, it will stream tables one by one.").


This declared as a bool field

rockwotj · 2024-11-26T18:28:06Z

internal/impl/mysql/input_mysql_stream.go

+		service.NewStringField(fieldCheckpointKey).
+			Description("The key to store the last processed binlog position."),


Don't we also need a cache field? Then this is the key for the cache

I was following CockroachDB code example. We had only one field for checkpointer/cache

rockwotj · 2024-11-26T18:28:49Z

internal/impl/mysql/input_mysql_stream.go

+			Description("If set to true, the connector will query all the existing data as a part of snapshot procerss. Otherwise, it will start from the current binlog position."),
+		service.NewAutoRetryNacksToggleField(),
+		service.NewIntField(fieldCheckpointLimit).
+			Description("The maximum number of messages that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level. Any given LSN will not be acknowledged unless all messages under that offset are delivered in order to preserve at least once delivery guarantees.").


LSN is a postgres concept not MySQL

rockwotj · 2024-11-26T18:29:55Z

internal/impl/mysql/input_mysql_stream.go

+		return nil, err
+	}
+
+	if streamInput.binLogCache, err = conf.FieldString(fieldCheckpointKey); err != nil {


Based on the docs this isn't quite right.

What exactly, and again, I'm following CockroachDB example

CheckpointKey should be the key used in the cache, not the cache itself. We should name this field checkpoint_cache or bin_log_position_cache. We can't hardcode the key to mysql_binlog_position like we do right now because it prevents someone from using the same cache for multiple streams, we need a config and override for the key and a better name for the actual cache.

rockwotj · 2024-11-26T18:48:04Z

internal/impl/mysql/snapshot.go

+	}
+
+	// 2. Acquire global read lock (minimizing lock time)
+	if _, err := s.lockConn.ExecContext(ctx, "FLUSH TABLES WITH READ LOCK"); err != nil {


This WITH READ LOCK is only held with the context of this statement or the connection?

Only for this statement. We execute UNLOCK TABLES a few lines down.

rockwotj · 2024-11-26T18:48:16Z

internal/impl/mysql/snapshot.go

+		return nil, fmt.Errorf("failed to start consistent snapshot: %v", err)
+	}
+
+	// 2. Acquire global read lock (minimizing lock time)


Can we comment as to why we need to FLUSH TABLES?

rockwotj · 2024-11-26T18:48:48Z

internal/impl/mysql/snapshot.go

+		return nil, fmt.Errorf("failed to start transaction: %v", err)
+	}
+
+	// Execute START TRANSACTION WITH CONSISTENT SNAPSHOT


This is not a helpful comment. Maybe better would be to explain why we need to use this transaction option?

rockwotj · 2024-11-26T18:50:34Z

internal/impl/mysql/snapshot.go

+
+func (s *Snapshot) getRowsCount(table string) (int, error) {
+	var count int
+	if err := s.tx.QueryRowContext(s.ctx, "SELECT COUNT(*) FROM "+table).Scan(&count); err != nil {


Is this actually fast? Should we be querying the table stats instead? https://stackoverflow.com/a/61548683

From what I found table stats query is not accurate, meaning we may get a lower number of records, wherefore missing some of the snapshot data.
Even in the link you sent it says: Estimate but very performant.
So, I'd stick with count(*). But let me know your thoughts

Why do we need a count anyways? In postgres we just query until we get no rows back, meaning we've reached the end of a table (because the pk is sorted). Can we do the same here? I think this will be slow and expensive for large tables.

rockwotj · 2024-11-26T18:50:50Z

internal/impl/mysql/snapshot.go

+SELECT COLUMN_NAME
+FROM information_schema.KEY_COLUMN_USAGE
+WHERE TABLE_NAME = '%s' AND CONSTRAINT_NAME = 'PRIMARY';


Will this yield them in the right order?

In my tests - yes.
but I'll add ORDER BY ORDINAL_POSITION; just to be sure

rockwotj

Round two! Looking better, thanks for your work here 😄

rockwotj · 2024-12-03T01:15:03Z

internal/impl/mysql/integration_test.go

Can you create an integration test using all datatypes to prevent the issue we had with postgres and some types not being handled correctly? I asked ChatGPT to generate a table with all types and got this:

CREATE TABLE all_data_types ( -- Numeric Data Types tinyint_col TINYINT, smallint_col SMALLINT, mediumint_col MEDIUMINT, int_col INT, bigint_col BIGINT, decimal_col DECIMAL(10, 2), numeric_col NUMERIC(10, 2), float_col FLOAT, double_col DOUBLE, -- Date and Time Data Types date_col DATE, datetime_col DATETIME, timestamp_col TIMESTAMP, time_col TIME, year_col YEAR, -- String Data Types char_col CHAR(10), varchar_col VARCHAR(255), binary_col BINARY(10), varbinary_col VARBINARY(255), tinyblob_col TINYBLOB, blob_col BLOB, mediumblob_col MEDIUMBLOB, longblob_col LONGBLOB, tinytext_col TINYTEXT, text_col TEXT, mediumtext_col MEDIUMTEXT, longtext_col LONGTEXT, enum_col ENUM('option1', 'option2', 'option3'), set_col SET('a', 'b', 'c', 'd'), -- Spatial Data Types geometry_col GEOMETRY, point_col POINT, linestring_col LINESTRING, polygon_col POLYGON, multipoint_col MULTIPOINT, multilinestring_col MULTILINESTRING, multipolygon_col MULTIPOLYGON, geometrycollection_col GEOMETRYCOLLECTION );

rockwotj · 2024-12-03T01:17:56Z

internal/impl/mysql/input_mysql_stream.go

+	}
+}
+
+func (i *mysqlStreamInput) OnPosSynced(eh *replication.EventHeader, pos mysqlReplications.Position, gtid mysqlReplications.GTIDSet, synced bool) error {


synced bool is force bool in the interface - can we rename that here? I guess it doesn't matter for us here because we always sync...

Can you explain when this is called? Because right now we seem to blindly overwrite currentLogPosition in multiple places and I don't know if that's always OK to do.

So, I was just following the instructions from the comment:
// OnPosSynced Use your own way to sync position. When force is true, sync position immediately.

Basically, it should be executed when canal lib is up to some position in the binlog and requires the consumer to sync this position.

I added force check for that to sync position when requested.

But actually. I think we don't need to use this method. This method is called every time we receive a message to indicate the current position.

As the message itself doesn't have binglog file name inside, only position.
But since we handle binlog rotation events we can update this correctly.

I tested it, looks like we are just fine without this method.

rockwotj · 2024-12-03T01:20:58Z

go.mod

@@ -63,6 +63,7 @@ require (
 	github.com/getsentry/sentry-go v0.28.1
 	github.com/go-faker/faker/v4 v4.4.2
 	github.com/go-jose/go-jose/v3 v3.0.3
+	github.com/go-mysql-org/go-mysql v1.9.1


The latest is 1.10 - can we upgrade?

rockwotj · 2024-12-03T02:11:09Z

internal/impl/mysql/input_mysql_stream.go

+	// canal stands for mysql binlog listener connection
+	canal       *canal.Canal
+	mysqlConfig *mysql.Config
+	canal.DummyEventHandler


nit: generally embedded structs are listed first - can we do that here?

rockwotj · 2024-12-03T02:56:56Z

internal/impl/mysql/input_mysql_stream.go

+	logger *service.Logger
+	res    *service.Resources
+
+	streamCtx context.Context


You should be using the shutSig contexts instead of this.

rockwotj · 2024-12-03T02:59:15Z

internal/impl/mysql/input_mysql_stream.go

+		return nil, err
+	}
+
+	if streamInput.binLogCache, err = conf.FieldString(fieldCheckpointKey); err != nil {


CheckpointKey should be the key used in the cache, not the cache itself. We should name this field checkpoint_cache or bin_log_position_cache. We can't hardcode the key to mysql_binlog_position like we do right now because it prevents someone from using the same cache for multiple streams, we need a config and override for the key and a better name for the actual cache.

rockwotj · 2024-12-03T03:02:11Z

internal/impl/mysql/input_mysql_stream.go

+
+// ---- Redpanda Connect specific methods end----
+
+// --- MySQL Canal handler methods ----


The other handler methods we don't need those? Can you explain why?

Yes. I think we don't need to implement some of these methods since we don't use GTID to store the position. Instead, we rely on the binlog position.

OnGTID used to track transaction IDs

OnRowsQueryEvent used to track executed queries. We don't need this for CDC

OnTableChanged - we don't propagate changes changes to CDC consumers.

OnDDL - we don't propagate DDL changes.

rockwotj · 2024-12-03T03:02:30Z

internal/impl/mysql/input_mysql_stream.go

+				return nil
+			}
+
+			if msgType, ok := lastMsg.MetaGet("type"); ok && msgType == "snapshot" {


This is still incorrect.

rockwotj · 2024-12-03T03:03:41Z

internal/impl/mysql/snapshot.go

+
+func (s *Snapshot) getRowsCount(table string) (int, error) {
+	var count int
+	if err := s.tx.QueryRowContext(s.ctx, "SELECT COUNT(*) FROM "+table).Scan(&count); err != nil {


Why do we need a count anyways? In postgres we just query until we get no rows back, meaning we've reached the end of a table (because the pk is sorted). Can we do the same here? I think this will be slow and expensive for large tables.

rockwotj · 2024-12-03T03:05:11Z

internal/impl/mysql/snapshot.go

+	}
+}
+
+func (s *Snapshot) prepareSnapshot(ctx context.Context) (*mysql.Position, error) {


@toddfarmer have any concerns about how we take a snapshot of the DB?

rockwotj · 2024-12-03T15:42:09Z

internal/impl/mysql/input_mysql_stream.go

+
+func init() {
+	err := service.RegisterBatchInput(
+		"mysql_stream", mysqlStreamConfigSpec,


Based on some internal conversations, we should rename this to be more explicit it's a CDC connector

Suggested change

"mysql_stream", mysqlStreamConfigSpec,

"mysql_cdc", mysqlStreamConfigSpec,

* use passed in context * don't hold onto a long lived context as struct member (golang anti-pattern)

We made a similar move with the postgres cdc component

Clean up the control flow and error handling to be simpler

Need to do for snapshot phase too

le-vlad marked this pull request as ready for review November 21, 2024 12:52

rockwotj self-requested a review November 21, 2024 13:43

rockwotj reviewed Nov 26, 2024

View reviewed changes

le-vlad requested a review from rockwotj November 28, 2024 16:12

rockwotj reviewed Dec 3, 2024

View reviewed changes

rockwotj self-requested a review December 20, 2024 21:04

le-vlad and others added 23 commits December 20, 2024 21:04

feat(mysql_cdc): working on mysql_cdc plugin

04d1081

chore(): continue working on mysql cdc

51ee05a

chore(): removed in test secret

25269ba

chore(): working on mysql cdc

60224a8

chore(): added mysql tests

d8090ea

chore(): multi-version testing for mysql cdc

a20567b

chore(): implemented snapshot support

ad04693

chore(): reset dumper config

415ca77

fix(): table filtering for mysql

eb1759a

fix(): golangci-lint

7e8c3ee

chore(): added table validation

d349ddd

fix(): lint

f0b5f6c

fix(): added integration.CheckSkip(t) to mysql cdc tests

5a41ea4

chore(): work on pr notes

1a292f0

chore(): updated comments and use ctx from shutdown

46c9836

chore(): removed rows count && small pr notes

e3b4a60

mysqlcdc: rename component

f832412

mysql: seperate cache key vs cache that is used in configs

a1decce

snapshot: cleanup context usage

5dc2fd0

* use passed in context * don't hold onto a long lived context as struct member (golang anti-pattern)

mysql: fold mode into operation

69a1c4a

We made a similar move with the postgres cdc component

mysql: fix ack fn

340abdd

mysql: use error return type

36dc9d2

mysql: simplify constructor

1dda403

rockwotj added 14 commits December 20, 2024 21:06

mysql: cleaner import name

99783b4

mysql: escape table regex

d5490ef

mysql: simplify checkpointer

8e0d939

mycdc: use lexicographically ordered binlog position

976581e

mycdc: draw the rest of the owl

7011f46

Clean up the control flow and error handling to be simpler

mycdc: make the linter happy

1224e61

mycdc: cleanup integration tests

dc6af1d

mycdc: decode stream message by type

a39840f

Need to do for snapshot phase too

mycdc: add more types

807f450

mycdc: fix snapshot cleanup and missing pks

f66925c

mycdc: fix shutdown hang

05d16ab

mycdc: fix snapshot types

0df6ae0

mycdc: handle all types 🎉

839ee14

mycdc: cleanup tests for MySQL CDC

ebc752a

rockwotj force-pushed the mysql_cdc branch from 5a20f00 to ebc752a Compare December 20, 2024 21:06

rockwotj added 3 commits December 20, 2024 21:08

mycdc: handle nil streaming values

43adf68

add changelog entry

8834980

make linter happy

80ef106

		service.NewStringField(fieldCheckpointKey).
		Description("The key to store the last processed binlog position."),


		// ---- Redpanda Connect specific methods end----

		// --- MySQL Canal handler methods ----

	"mysql_stream", mysqlStreamConfigSpec,
	"mysql_cdc", mysqlStreamConfigSpec,

MySQL CDC Plugin #3014

Are you sure you want to change the base?

MySQL CDC Plugin #3014

Conversation

le-vlad commented Nov 18, 2024 • edited Loading

Features supported:

Versions tested:

rockwotj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

le-vlad commented Nov 18, 2024 •

edited

Loading