Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector query with "where" clause returns incorrect count #5468

Closed
philrz opened this issue Nov 12, 2024 · 1 comment · Fixed by #5503
Closed

Vector query with "where" clause returns incorrect count #5468

philrz opened this issue Nov 12, 2024 · 1 comment · Fixed by #5503
Labels
bug Something isn't working

Comments

@philrz
Copy link
Contributor

philrz commented Nov 12, 2024

Repro is with super commit 411514f.

This is a simplification of the bench2/q4 query.

Test data is the contents of repro.jsup.gz:

{log_time:2012-01-01T00:00:44Z,client_ip:249.92.17.134,request:"/courses/cs106/2004/Assignments/rudimentary-interp.html",status_code:304(uint16),object_size:0(uint64)}(=bench2)
{log_time:2012-10-01T00:24:30Z,client_ip:249.92.17.134,request:"/people/sr/",status_code:200(uint16),object_size:2242(uint64)}(=bench2)
{log_time:2012-05-12T10:23:22Z,client_ip:251.58.48.137,request:"/robots.txt",status_code:404(uint16),object_size:506(uint64)}(=bench2)

This query against the original Super JSON returns the expected result.

$ super -version
Version: v1.18.0-142-g411514fd

$ super -c '
summarize
    num_requests := count()
    where log_time >= 2012-10-01T00:00:00Z
    by client_ip
' repro.jsup.gz 

{client_ip:249.92.17.134,num_requests:1(uint64)}
{client_ip:251.58.48.137,num_requests:0(uint64)}

However if I turn the Super JSON into Super Columnar and repeat the same query, now the counts are too high.

$ super -f csup -o repro.csup repro.jsup.gz 

$ super dev vector query '
summarize
    num_requests := count()
    where log_time >= 2012-10-01T00:00:00Z
    by client_ip
' repro.csup

{client_ip:251.58.48.137,num_requests:1(uint64)}
{client_ip:249.92.17.134,num_requests:2(uint64)}

If I drop the where clause, the results match.

@philrz philrz added the bug Something isn't working label Nov 12, 2024
mattnibs added a commit that referenced this issue Nov 25, 2024
Fix issue with incorrect counts when using where clauses on a
aggregation function in vector runtime. If a value fails the where
clause make the value as null so it is skipped by the aggregation
function.

Closes #5468
mattnibs added a commit that referenced this issue Nov 25, 2024
Fix issue with incorrect counts when using where clauses on a
aggregation function in vector runtime. If a value fails the where
clause make the value as null so it is skipped by the aggregation
function.

Closes #5468
mattnibs added a commit that referenced this issue Nov 26, 2024
Fix issue with incorrect counts when using where clauses on a
aggregation function in vector runtime. If a value fails the where
clause make the value as null so it is skipped by the aggregation
function.

Closes #5468
mattnibs added a commit that referenced this issue Nov 26, 2024
Fix issue with incorrect counts when using where clauses on a
aggregation function in vector runtime. If a value fails the where
clause make the value as null so it is skipped by the aggregation
function.

Closes #5468
@philrz
Copy link
Contributor Author

philrz commented Nov 26, 2024

Verified in super commit f030936.

The query now returns the correct count.

$ super -version
Version: v1.18.0-171-gf0309369

$ super dev vector query '
summarize
    num_requests := count()
    where log_time >= 2012-10-01T00:00:00Z
    by client_ip
' repro.csup
{client_ip:249.92.17.134,num_requests:1(uint64)}
{client_ip:251.58.48.137,num_requests:0(uint64)}

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant