Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kv] Support index lookup for primary key table #222

Merged
merged 2 commits into from
Jan 3, 2025

Conversation

swuferhong
Copy link
Collaborator

Purpose

Linked issue: #65

Index lookup is a feature that exposes lookup capabilities built on top of secondary indexes. By using secondary indexes, the required data can be located quickly, which can be utilized in conjunction with Flink to implement delta joins.
The purpose of this PR is to provide index lookup for kv tables. The implementation approach is to define the primary key of the kv storage as "secondary keys + primary key", and set the bucket key to the secondary keys. This way, when looking up data through the secondary keys, the corresponding bucket and server can be quickly identified, providing efficient point query capabilities.

Tests

API and Format

Documentation

@wuchong wuchong linked an issue Dec 18, 2024 that may be closed by this pull request
2 tasks
@swuferhong swuferhong force-pushed the index-lookup-1216 branch 3 times, most recently from b95540f to 90f6295 Compare December 20, 2024 09:55
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our current index is not a general index, it is just a prefix of primary key index. So, actually, it is just a prefix scan/lookup for the prefix of primary key (the prefix should include bucket key). I don't want to call this indexLookup because it occupies the API for future possible index (index on arbitrary columns).

How about changing the API into prefixLookup? The parameter key should be the prefix of primary key and must include bucket key. For DDL, we don't need to introduce new options table.index.keys, we can just continue to use bucket.key.

As we don't have force checks for bucket key is a prefix of primary key. We have to add some best practices for Delta Join cases in the future documentation. For tables used for DeltaJoin queries, the best practice is putting columns of bucket key before other columns in the definition of primary key. Otherwise, the prefixLookup doesn't work when the parameter key only contains bucket join. For example, given a primary key table orders with schema user_id, item_id, order_id, col1, col2, col3 (order_id can be used as primary key as it is unique). If the join key is (user_id, item_id), the primary key of the table must be set to user_id, item_id, order_id and bucket key to user_id, item_id. The prefixLookup will not work if the primary key is set to order_id, user_id, item_id, because the join key is not a prefix of primary key.

@swuferhong swuferhong force-pushed the index-lookup-1216 branch 2 times, most recently from eeff7c0 to 23bc3fd Compare December 26, 2024 08:17
@swuferhong
Copy link
Collaborator Author

@wuchong comments addressed. PR ready

@swuferhong swuferhong force-pushed the index-lookup-1216 branch 2 times, most recently from c26f475 to 593cb02 Compare December 26, 2024 10:12
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will push a commit to address the comments.

wuchong added a commit to swuferhong/fluss that referenced this pull request Jan 2, 2025
In the previous commit, the bucket key is only allowed to be a prefix of primary key, but we should allow it to be a subset of primary key. Besides, this commit fixes various bugs around PrefixLookup.
@wuchong wuchong force-pushed the index-lookup-1216 branch from 593cb02 to 64993c7 Compare January 2, 2025 16:10
In the previous commit, the bucket key is only allowed to be a prefix of primary key, but we should allow it to be a subset of primary key. Besides, this commit fixes various bugs around PrefixLookup.
@wuchong wuchong force-pushed the index-lookup-1216 branch from 64993c7 to 0f63302 Compare January 2, 2025 16:15
@swuferhong
Copy link
Collaborator Author

@wuchong LGTM +1, and CI passed.

@wuchong wuchong merged commit 2c3fff4 into alibaba:main Jan 3, 2025
2 checks passed
wuchong pushed a commit that referenced this pull request Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Fluss support index lookup for primary key table
2 participants