Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator support batch serialize #9777

Merged
merged 54 commits into from
Mar 10, 2025

Conversation

guo-shaoge
Copy link
Contributor

@guo-shaoge guo-shaoge commented Jan 9, 2025

What problem does this PR solve?

Issue Number: close #9692

Problem Summary: Reduce virtual function call for key_serialize and key_string.

  1. For key_serialized, batch-wise serialization/deserialization can reduce virtual function call (Also enable prefetch for key_serialized, because we can batch get hash after this pr)
  2. For key_string, batch-wise sortKey can reduce virtual function call.

Workload: tpch-50g
Queries: same with #9679

  # --q5 Q3-1: key_serialized as group by method;  33/300M; HashMap with StringRef key
  "select /*+ mpp_1phase_agg() */ sum(l_discount), l_returnflag from lineitem group by l_returnflag, l_discount;"
  # --q6 Q3-2: key_serialized as group by method; 77M/300M; HashMap with StringRef key
  "select /*+ mpp_1phase_agg() */ sum(l_discount) as csum, l_returnflag from lineitem group by l_returnflag, l_discount, l_extendedprice having csum > 100;"


  # --q7 Q4-1: two_keys_num64_strbinpadding: 21/300M; HashMap with StringRef key
  "select /*+ mpp_1phase_agg() */ sum(l_discount) from lineitem group by l_returnflag, L_LINENUMBER;"
  # --q8 Q4-2: two_keys_num64_strbinpadding; 29.9M/300M; HashMap with StringRef key
  "select /*+ mpp_1phase_agg() */ sum(l_discount) as csum, l_partkey from lineitem group by l_returnflag, l_partkey having csum > 100;"
 
Query nightly-20250205 opt-only_batch opt-batch+prefetch rate-opt_only_batch rate-opt_batch+prefetch
Q3-1 1.79 1.54 1.58 13.97% 11.73%
Q3-2 7.45 6.13 4.9 17.72% 34.23%
Q4-1 2 2.07 2.09 -3.50% -4.50%
Q4-2 5.34 4.48 3.57 16.10% 33.15%

Workload: clickbench
Queries: https://github.com/ClickHouse/ClickBench/blob/fdfdb5d94f2a668dce1f63d55498aa34510e4c9c/clickhouse/queries.sql#L11

Query nightly-20250205 opt-only_batch opt-batch+prefetch rate-opt_only_batch rate-opt_batch+prefetch
q10 242.6 234.7 233 3.26% 3.96%
q11 269.3 264.6 251.2 1.75% 6.72%
q13 879.1 851.2 830.1 3.17% 5.57%
q14 662.4 620.9 634 6.27% 4.29%
q16 1.53 1.42 1.4 7.19% 8.50%
q17 1.4 1.33 1.26 5.00% 10.00%
q18 4.55 3.81 3.59 16.26% 21.10%

NOTE:

  1. nightly-20250205 commit: fe563a1; opt-batch commit: f20224a
  2. For Q4-1/Q4-2, nightly key is two_keys_num64_strbinpadding. opt key is key_serialized.

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Jan 9, 2025
@guo-shaoge guo-shaoge force-pushed the hashagg_batch_serialize branch 4 times, most recently from ff696d3 to 8bbb062 Compare January 14, 2025 16:06
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 22, 2025
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge force-pushed the hashagg_batch_serialize branch from 30a5b1c to 241537d Compare January 29, 2025 03:57
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2025
@guo-shaoge guo-shaoge force-pushed the hashagg_batch_serialize branch 3 times, most recently from fc95526 to 3354b7f Compare February 4, 2025 04:59
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge requested a review from yibin87 February 27, 2025 06:05
Copy link
Contributor

@yibin87 yibin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 28, 2025
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge requested a review from windtalker March 7, 2025 03:25
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 10, 2025
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: windtalker, yibin87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-28 01:46:22.415291328 +0000 UTC m=+579530.368449593: ☑️ agreed by yibin87.
  • 2025-03-10 02:49:58.353242397 +0000 UTC m=+151353.958771322: ☑️ agreed by windtalker.

@ti-chi-bot ti-chi-bot bot merged commit 67438e8 into pingcap:master Mar 10, 2025
5 checks passed
@guo-shaoge guo-shaoge deleted the hashagg_batch_serialize branch March 10, 2025 07:53
guo-shaoge added a commit to guo-shaoge/tiflash that referenced this pull request Mar 11, 2025
guo-shaoge added a commit to guo-shaoge/tiflash that referenced this pull request Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregator support batch allocate memory for key_serialized
3 participants