-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support comparison semantics for batch serialize/deserialize of Column #9756
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
a311faa
to
eeb1ac3
Compare
Signed-off-by: guo-shaoge <[email protected]>
eeb1ac3
to
c07d13a
Compare
e19c851
to
7c725cb
Compare
Signed-off-by: guo-shaoge <[email protected]>
7c725cb
to
84ee65b
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
/retest |
Signed-off-by: guo-shaoge <[email protected]>
3878b4b
to
47cdf91
Compare
@@ -165,10 +165,27 @@ class ColumnAggregateFunction final : public COWPtrHelper<IColumn, ColumnAggrega | |||
|
|||
const char * deserializeAndInsertFromArena(const char * src_arena, const TiDB::TiDBCollatorPtr &) override; | |||
|
|||
void countSerializeByteSizeUnique( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to rename countSerializeByteSizeUnique
to countSerializeUniqueByteSize
for better readability?
Similarly, the name for other methods could follow the same pattern. E.g. serializeToPosUnique
to serializeUniqueToPos
and deserializeAndInsertFromPosUnique
to deserializeUniqueAndInsertFromPos
.
} | ||
else | ||
{ | ||
inline_memcpy(pos[i], &data[array_offsets[start + i - 1]], len * sizeof(T)); | ||
if (len <= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments here to explain why is 4 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not my code. @gengliqi can you help to explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a simple optimization. If the length is very small, copying them one by one is faster than std::memcpy.
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yibin87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
38e03ed
to
8070321
Compare
8070321
to
06ff1d4
Compare
Signed-off-by: guo-shaoge <[email protected]>
06ff1d4
to
7572680
Compare
} | ||
else | ||
{ | ||
inline_memcpy(pos[i], &data[array_offsets[start + i - 1]], len * sizeof(T)); | ||
if (len <= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a simple optimization. If the length is very small, copying them one by one is faster than std::memcpy.
{ | ||
assert(sizeAt(i) >= 1); | ||
// Minus 1 because of terminating zero. | ||
byte_size[i] += sizeof(UInt32) + (sizeAt(i) - 1) * max_bytes_one_char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size * max_bytes_one_char
may waste lots of memory. For example, a utf8 character is 3 bytes, max_bytes_one_char
is 4 bytes. 4 bytes is enough but here need 12 bytes.
const void * src = &chars[offsetAt(start + i)]; | ||
if constexpr (has_collator) | ||
{ | ||
auto sort_key = collator->sortKey(reinterpret_cast<const char *>(src), str_size - 1, *sort_key_container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sortKey
is a virtual function. How about adding a batch version to reduce the overhead of virtual functions?
What problem does this PR solve?
Issue Number: close #9761
Problem Summary: #9553 add serialize/deserialize interface by column-wise, but it didn't handle collator for ColumnString and real copy format for ColumnDecimal([sign, limb_count, limb_data])
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note