-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing values from adjacent pipeline inputs #5308
Comments
For comparison, here's the equivalent in SQL using
|
There was another recent community Slack thread with a use case that seems to fit under this same umbrella. Their example started with these two separate input records:
with the user's goal being to replace the embedded record under To achieve this with what's in the language currently, since assembling the desired output requires combining fields from separate record values in the input data stream, it seems inevitable it has to kick off with an aggregation to get the multiple values into a single complex value. Then the complex value can be manipulated to move the embedded record. On their own, the user came up with this approach that uses array indexing.
This has the advantage of not having to know anything about the second input value, i.e., it works just as fine even if it's not a record.
As a possible alternative that may fit the user's original stated goal of moving a record, I proposed using the idiom below that turns the whole of the input into a single record, which then allows for the move via a single call to
In conclusion, however, we recognize the language would require some enhancement to achieve this more directly, similar to what was shown above with SQL's |
tl;dr
A community user recently inquired about how to compute the delta between the values in back-to-back input records. The only solution we could come up with that uses existing building blocks used
collect()
to turn the whole of the input to an an array and process it withover
, but this has perf/memory limitations. An approach that works in a streaming mode in pipeline context would be preferred.Details
Repro is with Zed commit d103420.
The user question as posed in a community Slack thread:
The best we could come up with is a program like this that reads the whole of the input to an array first.
The user is quite happy with this solution since it works fine for the amount of data they've got. However, knowing Zed's per-value size limitations, this approach can only go so far. The fact the entire array needs to be assembled first also had a performance hit, plus the disadvantage that it can't stream outputs incrementally.
While the design of a precise solution is TBD, in preliminary chats, the topic of "window functions" and
LAG
/LEAD
as they appear in SQL come to mind.The text was updated successfully, but these errors were encountered: