-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CH] support approx_count_distinct #8528
Labels
enhancement
New feature or request
Comments
velox implementation:https://github.com/apache/incubator-gluten/pull/1676/files |
The performance of gluten has no significant advantage over vanilla. Need to improve it. Gluten: 0: jdbc:hive2://localhost:10000/> select l_orderkey % 10, approx_count_distinct(l_partkey) from lineitem group by l_orderkey % 10 order by l_orderkey % 10 ;
+--------------------+-----------------------------------+
| (l_orderkey % 10) | approx_count_distinct(l_partkey) |
+--------------------+-----------------------------------+
| 0 | 18813 |
| 1 | 20083 |
| 2 | 18534 |
| 3 | 18015 |
| 4 | 19054 |
| 5 | 19177 |
| 6 | 19685 |
| 7 | 18463 |
| 8 | 19816 |
| 9 | 18993 |
+--------------------+-----------------------------------+
10 rows selected (2.203 seconds)
0: jdbc:hive2://localhost:10000/> select approx_count_distinct(l_partkey) from lineitem;
+-----------------------------------+
| approx_count_distinct(l_partkey) |
+-----------------------------------+
| 20083 |
+-----------------------------------+
1 row selected (0.131 seconds) Vanilla: 0: jdbc:hive2://localhost:10000/> select l_orderkey % 10, approx_count_distinct(l_partkey) from lineitem group by l_orderkey % 10 order by l_orderkey % 10 ;
+--------------------+-----------------------------------+
| (l_orderkey % 10) | approx_count_distinct(l_partkey) |
+--------------------+-----------------------------------+
| 0 | 18531 |
| 1 | 18741 |
| 2 | 18387 |
| 3 | 18535 |
| 4 | 18674 |
| 5 | 18444 |
| 6 | 18286 |
| 7 | 18364 |
| 8 | 18415 |
| 9 | 19079 |
+--------------------+-----------------------------------+
10 rows selected (2.383 seconds)
0: jdbc:hive2://localhost:10000/> select approx_count_distinct(l_partkey) from lineitem;
+-----------------------------------+
| approx_count_distinct(l_partkey) |
+-----------------------------------+
| 19522 |
+-----------------------------------+
1 row selected (0.262 seconds) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
support approx_count_distinct
The text was updated successfully, but these errors were encountered: