Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] support approx_count_distinct #8528

Open
taiyang-li opened this issue Jan 14, 2025 · 2 comments · May be fixed by #8550
Open

[CH] support approx_count_distinct #8528

taiyang-li opened this issue Jan 14, 2025 · 2 comments · May be fixed by #8550
Assignees
Labels
enhancement New feature or request

Comments

@taiyang-li
Copy link
Contributor

Description

support approx_count_distinct

@taiyang-li taiyang-li added the enhancement New feature or request label Jan 14, 2025
@taiyang-li taiyang-li self-assigned this Jan 14, 2025
@taiyang-li
Copy link
Contributor Author

@taiyang-li
Copy link
Contributor Author

taiyang-li commented Jan 16, 2025

The performance of gluten has no significant advantage over vanilla. Need to improve it.

Gluten:

0: jdbc:hive2://localhost:10000/> select l_orderkey % 10, approx_count_distinct(l_partkey) from lineitem group by l_orderkey % 10 order by l_orderkey % 10 ;     
+--------------------+-----------------------------------+
| (l_orderkey % 10)  | approx_count_distinct(l_partkey)  |
+--------------------+-----------------------------------+
| 0                  | 18813                             |
| 1                  | 20083                             |
| 2                  | 18534                             |
| 3                  | 18015                             |
| 4                  | 19054                             |
| 5                  | 19177                             |
| 6                  | 19685                             |
| 7                  | 18463                             |
| 8                  | 19816                             |
| 9                  | 18993                             |
+--------------------+-----------------------------------+
10 rows selected (2.203 seconds)


0: jdbc:hive2://localhost:10000/> select approx_count_distinct(l_partkey) from lineitem;  
+-----------------------------------+
| approx_count_distinct(l_partkey)  |
+-----------------------------------+
| 20083                             |
+-----------------------------------+
1 row selected (0.131 seconds)

Vanilla:

0: jdbc:hive2://localhost:10000/> select l_orderkey % 10, approx_count_distinct(l_partkey) from lineitem group by l_orderkey % 10 order by l_orderkey % 10 ; 
+--------------------+-----------------------------------+
| (l_orderkey % 10)  | approx_count_distinct(l_partkey)  |
+--------------------+-----------------------------------+
| 0                  | 18531                             |
| 1                  | 18741                             |
| 2                  | 18387                             |
| 3                  | 18535                             |
| 4                  | 18674                             |
| 5                  | 18444                             |
| 6                  | 18286                             |
| 7                  | 18364                             |
| 8                  | 18415                             |
| 9                  | 19079                             |
+--------------------+-----------------------------------+
10 rows selected (2.383 seconds)


0: jdbc:hive2://localhost:10000/> select approx_count_distinct(l_partkey) from lineitem; 
+-----------------------------------+
| approx_count_distinct(l_partkey)  |
+-----------------------------------+
| 19522                             |
+-----------------------------------+
1 row selected (0.262 seconds)

@taiyang-li taiyang-li linked a pull request Jan 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant