Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kv] Support version merge engine #277

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sunxiaojian
Copy link

@sunxiaojian sunxiaojian commented Dec 27, 2024

Purpose

Support version merge engine

Linked issue: close #213

Tests

com.alibaba.fluss.connector.flink.sink.FlinkTableSinkITCase#testVersionMergeEngineWithTypeBigint
com.alibaba.fluss.connector.flink.sink.FlinkTableSinkITCase#testVersionMergeEngineWithTypeTimestamp
com.alibaba.fluss.client.table.FlussTableITCase#testMergeEngineWithVersion

API and Format

No

Documentation

No

@CLAassistant
Copy link

CLAassistant commented Dec 27, 2024

CLA assistant check
All committers have signed the CLA.

@sunxiaojian sunxiaojian force-pushed the support-merge-engine-version branch 2 times, most recently from 944bdb3 to 7f115bf Compare December 27, 2024 11:12
@sunxiaojian
Copy link
Author

@wuchong CLA has been sent, PTAL

@wuchong
Copy link
Member

wuchong commented Dec 29, 2024

Thanks for the contribution @sunxiaojian ! Will take a look. cc @luoyuxia as well.

@wuchong
Copy link
Member

wuchong commented Dec 29, 2024

@sunxiaojian sunxiaojian force-pushed the support-merge-engine-version branch 2 times, most recently from b1363d6 to 5b67c20 Compare December 30, 2024 02:12
@sunxiaojian
Copy link
Author

Copy link
Collaborator

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunxiaojian Thanks for the pr... I left some comments... PTAL

throw new IllegalArgumentException(
"When the merge engine is set to version, the 'table.merge-engine.version.column' cannot be empty.");
}
return new MergeEngine(Type.VERSION, column);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need two checks for merge_engine_version_column:
1: the column is in the table
2: the data type of the column is supported as version column

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

> oldRow.getTimestampLtz(fieldIndex, ((TimestampType) dataType).getPrecision())
.toEpochMicros();
} else {
throw new FlussRuntimeException("Unsupported data type: " + dataType.asSummaryString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the data type of version column while creating the table...

And I think TINYINT/SMALLINT should also be supported..
Maybe DATE type also can be supported?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be frequently used as a version column, but can be expanded later if necessary

import com.alibaba.fluss.record.KvRecord;

/** Merge engine wrapper for table. */
public interface MergeEngineWrapper {
Copy link
Collaborator

@luoyuxia luoyuxia Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method writeRecord,getAppendedRecordCount, getLogOffset is an interface method, but only AbstractMergeEngineWrapper...And the overwrite method update by VersionMergeEngineWrapper is not an interface method.. Looks a little strange to me...
And there's some duplicate code piece in method update between AbstractMergeEngineWrapper and VersionMergeEngineWrapper..

I'm wondering whether it'll be better to organize these classes as following:
1:
Introduce an interface RowMergeEngine

public interface RowMergeEngine {

    /**
     * Upsert the old row with the new row. Return the row after upsert, null if the new row is ignored.
     *
     * @param oldRow the old row
     * @param newRow the new row
     */
    @Nullable
    BinaryRow upsertRow(BinaryRow oldRow, BinaryRow newRow);
}

2: And then implement it with a VersionRowMergeEngine:

public class VersionRowMergeEngine implements RowMergeEngine {
   @Nullable
    @Override
    public BinaryRow upsertRow(BinaryRow oldRow, BinaryRow newRow) {
        RowType rowType = schema.toRowType();

        if (checkNewRowVersion(mergeEngine, rowType, oldRow, newRow)) {
            return newRow;
        }
        return null;
    }
}

3: Then, we won't need to extract the put kv logic to AbstractMergeEngineWrapper, but hold a RowMergeEngine in KvTablet.. If RowMergeEngine is not null, we use
RowMergeEngine to merge the row to a new row.. May be some thing like:

byte[] oldValue = getFromBufferOrKv(key);
// it's update
if (oldValue != null) {
  if (rowMergeEngine == null) {
    newRow = updateRow(
    oldRow, kvRecord.getRow(), partialUpdater);
  } else {
    newRow = rowMergeEngine.mergeRow(oldRow, kvRecord.getRow());
}

if (newRow != null) {
walBuilder.append(RowKind.UPDATE_BEFORE, oldRow);
walBuilder.append(RowKind.UPDATE_AFTER, newRow);
appendedRecordCount += 2;
....
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks,Let me think about how to make better modifications first

}
DataType dataType = rowType.getTypeAt(fieldIndex);
if (dataType instanceof BigIntType) {
return newRow.getLong(fieldIndex) > oldRow.getLong(fieldIndex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the field value is null? We should use InternalRow.FieldGetter and handle null value...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -124,7 +124,8 @@ void beforeEach() throws Exception {
conf,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ut for version merge engine in KvTabletTest...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -67,17 +68,21 @@ public class FlinkTableSink
private boolean appliedUpdates = false;
@Nullable private GenericRow deleteRow;

private final MergeEngine mergeEngine;

public FlinkTableSink(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation to throw exception in class FlinkTableSink that update/delete/partial update is not supported in the version merge engine and a test to verify the exception should be thrown...

Copy link
Author

@sunxiaojian sunxiaojian Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should support updates and partial updates if your update includes this version field

@sunxiaojian
Copy link
Author

@sunxiaojian Thanks for the pr... I left some comments... PTAL

@luoyuxia Thanks for the review. I will handle it as soon as possible

@sunxiaojian sunxiaojian force-pushed the support-merge-engine-version branch 3 times, most recently from f53900a to c289ee1 Compare January 4, 2025 17:08
@sunxiaojian sunxiaojian force-pushed the support-merge-engine-version branch from c289ee1 to d3134fe Compare January 5, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Introduce version merge engine for primary key table
4 participants