`ignore nulls` in over window function #17601

lmatz · 2024-07-08T07:04:04Z

Is your feature request related to a problem? Please describe.

For example, LAST_VALUE (order_id IGNORE NULLS) is supported in popular DBs such as MySQL, DuckDB, Redshift, Snowflake and Starrocks.

It has appeared in four users' use cases that ignore nulls is a must for expressing their workload. Otherwise, not only do they need to impose awkward limitations on the application semantics they want to convey, but they also have much worse performance.

The use case is similar to the one described in the question: https://stackoverflow.com/questions/37470931/how-to-ignore-nulls-in-postgresql-window-functions-or-return-the-next-non-null:

I need another column indicating the next non-null COL1 value for each row

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

lmatz · 2024-07-11T10:05:44Z

Just document an example of finding the closet index event of the each trade event.

For each row in the table merge, it is either a index event or trade event. One of them must be NULL and the other one must be non-NULL.

CREATE TABLE merge (
    index varchar,
    trades varchar,
    event_time INT
);

insert into merge values ('a', NULL, 1);
insert into merge values ('b', NULL, 5);
insert into merge values ('c', NULL, 10);
insert into merge values ('d', NULL, 20);

insert into merge values (NULL, 'X', 2);
insert into merge values (NULL, 'Y', 8);
insert into merge values (NULL, 'W', 10);
insert into merge values (NULL, 'Z', 20);


select * from merge;
           Result
--------------------------
index,trades,event_time
null,X,2
null,Y,8
null,W,10
null,Z,20
a,null,1
b,null,5
c,null,10
d,null,20

Then we construct the following query:

select t2.index, t2.trades, t2.event_time, t2.after, t2.before, t2.after_time, merge.event_time as before_time FROM
(
select t1.index, t1.trades, t1.event_time, t1.after, t1.before, merge.event_time as after_time from
(
    SELECT
    index,
    trades,
    event_time,
    first_value(index ignore nulls) over (order by event_time rows between CURRENT ROW AND unbounded following) as after,
    last_value(index ignore nulls) over (order by event_time rows between unbounded preceding AND CURRENT ROW) as before
    from merge
) t1
inner join 
merge 
on
t1.after = merge.index
) t2
inner JOIN
merge
on
t2.before = merge.index
where t2.index is NULL

            Result
-------------------------
index,trades,event_time,after,before,after_time,before_time
null,X,2,b,a,5,1
null,Y,8,c,b,10,5
null,W,10,c,b,10,5
null,Z,20,d,c,20,10

github-actions · 2024-10-18T02:03:27Z

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

lmatz · 2024-10-18T03:05:53Z

Since now we have supported ASOF Join, ignore nulls can be paused if there is no real use case.

stdrc · 2024-11-29T06:36:29Z

Recently a user asked about the following use case:

create table raw_data (
    ts timestamp,
    foo int,
    bar int
);
insert into raw_data values (now(), null, 1);
insert into raw_data values (now(), null, 10);
insert into raw_data values (now(), 7, null);
insert into raw_data values (now(), null, 8);

create materialized view mv1 as
select
    ts,
    last_value(foo ignore nulls) over (order by ts) as foo,
    last_value(bar ignore nulls) over (order by ts) as bar
from raw_data;

create materialized view mv2 as
select
    ts,
    last_value(foo) filter (where foo is not null) over (order by ts) as foo,
    last_value(bar) filter (where bar is not null) over (order by ts) as bar
from raw_data;

-- Desired output:
ts  |  foo  |  bar
--------------------
... |  null |  1
... |  null |  10
... |  7    |  10
... |  7    |  8

Each new row inserted into raw_data only updates one data column, with null on other columns. The two MVs, which are equivalent, are what the user want. However we don't support neither of the two syntaxes yet.

But I found an interesting workaround using SINK INTO TABLE and CHANGELOG, for this specific case:

create table raw_data (
    ts timestamp,
    foo int,
    bar int
) append only;

create table latest (
    id int,
    ts timestamp,
    foo int,
    bar int,
    primary key (id)
) on conflict do update if not null with version column (ts);

create sink s into latest as
select 1 as id, ts, foo, bar from t;

create materialized view changes as
with cl as changelog from latest
select ts, foo, bar from cl where id = 1 and changelog_op = 1 or changelog_op = 3;

The changes MV will have what the user desired.

tabVersion · 2024-11-29T07:01:14Z

Recently a user asked about the following use case:

create table raw_data (
    ts timestamp,
    foo int,
    bar int
);
insert into raw_data values (now(), null, 1);
insert into raw_data values (now(), null, 10);
insert into raw_data values (now(), 7, null);
insert into raw_data values (now(), null, 8);

create materialized view mv1 as
select
    ts,
    last_value(foo ignore nulls) over (order by ts) as foo,
    last_value(bar ignore nulls) over (order by ts) as bar
from raw_data;

create materialized view mv2 as
select
    ts,
    last_value(foo) filter (where foo is not null) over (order by ts) as foo,
    last_value(bar) filter (where bar is not null) over (order by ts) as bar
from raw_data;

-- Desired output:
ts  |  foo  |  bar
--------------------
... |  null |  1
... |  null |  10
... |  7    |  10
... |  7    |  8

Each new row inserted into raw_data only updates one data column, with null on other columns. The two MVs, which are equivalent, are what the user want. However we don't support neither of the two syntaxes yet.

But I found an interesting workaround using SINK INTO TABLE and CHANGELOG, for this specific case:

create table raw_data (
    ts timestamp,
    foo int,
    bar int
) append only;

create table latest (
    id int,
    ts timestamp,
    foo int,
    bar int,
    primary key (id)
) on conflict do update if not null with version column (ts);

create sink s into latest as
select 1 as id, ts, foo, bar from t;

create materialized view changes as
with cl as changelog from latest
select ts, foo, bar from cl where id = 1 and changelog_op = 1 or changelog_op = 3;

The changes MV will have what the user desired.

worth a blog 🤣 #HackOnRisingWave

lmatz added the type/feature label Jul 8, 2024

github-actions bot added this to the release-1.10 milestone Jul 8, 2024

lmatz removed this from the release-1.10 milestone Jul 10, 2024

stdrc self-assigned this Aug 13, 2024

stdrc mentioned this issue Aug 13, 2024

feat(parser): parse IGNORE NULLS in (window) function calls #18028

Merged

9 tasks

github-actions bot added the no-issue-activity label Oct 18, 2024

stdrc mentioned this issue Nov 29, 2024

Aggregate function with filter over window #11506

Open

stdrc linked a pull request Dec 18, 2024 that will close this issue

feat(expr): support IGNORE NULLS for first_value/last_value #19847

Open

8 tasks

stdrc removed the no-issue-activity label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ignore nulls` in over window function #17601

`ignore nulls` in over window function #17601

lmatz commented Jul 8, 2024 •

edited

Loading

lmatz commented Jul 11, 2024

github-actions bot commented Oct 18, 2024

lmatz commented Oct 18, 2024

stdrc commented Nov 29, 2024 •

edited

Loading

tabVersion commented Nov 29, 2024

ignore nulls in over window function #17601

ignore nulls in over window function #17601

Comments

lmatz commented Jul 8, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

lmatz commented Jul 11, 2024

github-actions bot commented Oct 18, 2024

lmatz commented Oct 18, 2024

stdrc commented Nov 29, 2024 • edited Loading

tabVersion commented Nov 29, 2024

`ignore nulls` in over window function #17601

`ignore nulls` in over window function #17601

lmatz commented Jul 8, 2024 •

edited

Loading

stdrc commented Nov 29, 2024 •

edited

Loading