Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual behavior of 'Count' with 'SORTBY' #502

Closed
SMAH1 opened this issue Nov 19, 2024 · 4 comments · Fixed by #503
Closed

Unusual behavior of 'Count' with 'SORTBY' #502

SMAH1 opened this issue Nov 19, 2024 · 4 comments · Fixed by #503

Comments

@SMAH1
Copy link

SMAH1 commented Nov 19, 2024

Hi
when I try use Count after Orderby cause return wrong result.
Sample code is:

/*
 * Redis-OM: 0.7.6
 * dotnet core 8.0
 * OS Program Run: Windows 10 22H2
 * Redis Info:
 *     redis_version:7.4.1
 *     redis_git_sha1:00000000
 *     redis_git_dirty:0
 *     redis_build_id:5918102922f65a0d
 *     redis_mode:standalone
 *     os:Linux 5.15.0-124-generic x86_64
 *     arch_bits:64
 *     monotonic_clock:POSIX clock_gettime
 *     multiplexing_api:epoll
 *     atomicvar_api:c11-builtin
 *     gcc_version:11.4.0
 */

using BugRedisOM;
using Redis.OM;
using StackExchange.Redis;

/*
[Document(StorageType = StorageType.Json, Prefixes = new[] { "BugTest:DesktopStatusEventLog" }, IndexName = "bug-desktopstatuseventlog-idx")]
public class DesktopStatusEventLog
{
    [RedisIdField] public string Id { get; set; } = string.Empty;
    [Indexed] public long DateTime { get; set; }
    [Indexed] public string Desktop { get; set; }
}
*/

var redis = ConnectionMultiplexer.Connect("localhost:6379");
var provider = new RedisConnectionProvider(redis);

provider.Connection.CreateIndex(typeof(DesktopStatusEventLog));

var EventLogsSet = provider.RedisCollection<DesktopStatusEventLog>();

string[] desktops = ["D1", "D2", "D3", "D4", "D5", "D6"];

if (!EventLogsSet.Any())
{
    long now = DateTimeOffset.Now.ToUnixTimeSeconds();
    long dtStart = now - 1440 * 60 - 360 * 60;
    long dtEnd = now;

    var rnd = new Random(570);
    var eventLogs = new List<DesktopStatusEventLog>();
    foreach (var item in desktops)
    {
        for (var dt = dtStart; dt < dtEnd; dt += rnd.Next(2, 100 * 60))
        {
            eventLogs.Add(
                new DesktopStatusEventLog()
                {
                    Id = Guid.NewGuid().ToString(),
                    Desktop = desktops[rnd.Next(desktops.Length)],
                    DateTime = dt
                });
        }
    }

    foreach (var item in eventLogs) { EventLogsSet.Insert(item); }
}

var cnt1 = EventLogsSet.AsQueryable().Where(x => x.Desktop == "D1").Count();
var cnt2 = EventLogsSet.AsQueryable().Where(x => x.Desktop == "D1").OrderBy(x => x.DateTime).Count();

// "FT.AGGREGATE" "bug-desktopstatuseventlog-idx" "@Desktop:{D1}" "GROUPBY" "0" "REDUCE" "COUNT" "0" "AS" "COUNT"
// "FT.AGGREGATE" "bug-desktopstatuseventlog-idx" "@Desktop:{D1}" "SORTBY" "2" "@DateTime" "ASC" "GROUPBY" "0" "REDUCE" "COUNT" "0" "AS" "COUNT"

Console.WriteLine("Count All 1 (without Order) is " + cnt1);
Console.WriteLine("Count All 2 (with Order)    is " + cnt2);

In sample code also mentions the generated Redis commends. The result is:

Count All 1 (without Order) is 38
Count All 2 (with Order)    is 10

The result may be compatible with Redis logic, but not with SQL logic (which users probably expect).

@slorello89
Copy link
Member

Interesting!

I think I know what the issue is. Redis OM is correctly creating the command and passing it to Redis, and interpreting the result correctly. But there's a small quirk in FT.AGGREGATE which I've pinged our engineering team to get an answer on.

SORTBY in FT.AGGREGATE has an optional MAX parameter, which is the maximum number of records it will sort before it stops, in general when you see things like this in RediSearch, it's to prevent any crazy long running function from blocking the Redis Server while it executes (e.g. if you had a million records you may not necessarily want to block Redis while it sorts all of them)

So you'll see when you chop off the reducer from the one that's coming back with 10 results:

127.0.0.1:6379> "FT.AGGREGATE" "bug-desktopstatuseventlog-idx" "@Desktop:{D1}" "SORTBY" "2" "@DateTime" "ASC"
 1) (integer) 38
 2) 1) "DateTime"
    2) "1731916575"
 3) 1) "DateTime"
    2) "1731916575"
 4) 1) "DateTime"
    2) "1731916575"
 5) 1) "DateTime"
    2) "1731920080"
 6) 1) "DateTime"
    2) "1731929273"
 7) 1) "DateTime"
    2) "1731929487"
 8) 1) "DateTime"
    2) "1731932922"
 9) 1) "DateTime"
    2) "1731937210"
10) 1) "DateTime"
    2) "1731938997"
11) 1) "DateTime"
    2) "1731940358"

That's what's getting passed down the pipeline to the COUNT reducer (so yep only 10).

Now if you ran the same query with the MAX parameter set to some value, let's try 100:

127.0.0.1:6379> "FT.AGGREGATE" "bug-desktopstatuseventlog-idx" "@Desktop:{D1}" "SORTBY" "2" "@DateTime" "ASC" MAX 100
 1) (integer) 38
 2) 1) "DateTime"
    2) "1731916575"
 3) 1) "DateTime"
    2) "1731916575"
 4) 1) "DateTime"
    2) "1731916575"
 5) 1) "DateTime"
    2) "1731920080"
 6) 1) "DateTime"
    2) "1731929273"
 7) 1) "DateTime"
    2) "1731929487"
 8) 1) "DateTime"
    2) "1731932922"
 9) 1) "DateTime"
    2) "1731937210"
10) 1) "DateTime"
    2) "1731938997"
11) 1) "DateTime"
    2) "1731940358"
12) 1) "DateTime"
    2) "1731942575"
13) 1) "DateTime"
    2) "1731944731"
14) 1) "DateTime"
    2) "1731947221"
15) 1) "DateTime"
    2) "1731948295"
16) 1) "DateTime"
    2) "1731950635"
17) 1) "DateTime"
    2) "1731951415"
18) 1) "DateTime"
    2) "1731951968"
19) 1) "DateTime"
    2) "1731955934"
20) 1) "DateTime"
    2) "1731957986"
21) 1) "DateTime"
    2) "1731961684"
22) 1) "DateTime"
    2) "1731961960"
23) 1) "DateTime"
    2) "1731963696"
24) 1) "DateTime"
    2) "1731963730"
25) 1) "DateTime"
    2) "1731975411"
26) 1) "DateTime"
    2) "1731976673"
27) 1) "DateTime"
    2) "1731978955"
28) 1) "DateTime"
    2) "1731980089"
29) 1) "DateTime"
    2) "1731985585"
30) 1) "DateTime"
    2) "1731989376"
31) 1) "DateTime"
    2) "1731991464"
32) 1) "DateTime"
    2) "1731992808"
33) 1) "DateTime"
    2) "1731998504"
34) 1) "DateTime"
    2) "1731998981"
35) 1) "DateTime"
    2) "1732001322"
36) 1) "DateTime"
    2) "1732009023"
37) 1) "DateTime"
    2) "1732009517"
38) 1) "DateTime"
    2) "1732018501"
39) 1) "DateTime"
    2) "1732021650"

You'll see it comes back with all 38 DateTime's sorted, so if you now pass that into the pipeline you'll get your answer:

127.0.0.1:6379> "FT.AGGREGATE" "bug-desktopstatuseventlog-idx" "@Desktop:{D1}" "SORTBY" "2" "@DateTime" "ASC" MAX 100 GROUPBY 0 REDUCE COUNT 0 as COUNT
1) (integer) 1
2) 1) "COUNT"
   2) "38"

Now, Redis OM .NET does not at the moment support the ability to add a MAX parameter to a sortby, we could add it, it shouldn't be that much of a lift, but I guess my question for you would be - if you are just counting these, why do you need to sort them? the order should be irrelevant for the count.

@slorello89
Copy link
Member

FYI @SMAH1 I spoke with engineering, it is indeed true that the default is 10.

@SMAH1
Copy link
Author

SMAH1 commented Nov 20, 2024

Thanks for your follow-up. But the solution is not to use MAX. Because when we are using Count, we do not know the number to set MAX from!
The correct solution in my opinion is that when using functions like Count, Sort should be completely ignored and not created in the final command to send to Redis.

This problem occurred to me when I was creating a Query function with its Orderby and then sending it to another function where I control the chunk by adding Take and Skip but at the same time I need to know the Count of data.

@slorello89
Copy link
Member

@SMAH1 - I've opened a PR to allow you to set the MAX value, IMO that's the correct path, if a user adds a sortby to the pipeline who are we to say they did not intend to do so. You can just set the MAX to a very high value if you want to give yourself headroom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants