Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra output when yielding literals as input data #5324

Open
philrz opened this issue Oct 8, 2024 · 1 comment
Open

Extra output when yielding literals as input data #5324

philrz opened this issue Oct 8, 2024 · 1 comment

Comments

@philrz
Copy link
Contributor

philrz commented Oct 8, 2024

tl;dr

I can't explain why the first line of output is repeated twice here.

$ zq -z '
yield [{id:1},{id:2}]
| over this
| left join (
  yield {id:1,name:"a"}
) on id=id name'

{id:1,name:"a"}
{id:1,name:"a"}
{id:2}

Details

Repro is with Zed commit b05e70b. This issue was discovered via community Slack thread.

These variations both work as expected.

$ zq -version
Version: v1.18.0-18-gb05e70bd

$ cat names.zson
{id:1,name:"a"}

$ zq -z '
yield [{id:1},{id:2}]
| over this
| left join (
  file names.zson
) on id=id name'

{id:1,name:"a"}
{id:2}
$ cat input.zson 
{left: {id:1}}
{left: {id:2}}
{right: {id:1,name:"a"}}

$ cat input.zson | zq -z '                                                         
switch (
  case has(left) => yield left
  case has(right) => yield right
) | left join on id=id name' -

{id:1,name:"a"}
{id:2}

However, in the user's original program they happened to have the record that formed the right-hand input to the join specified via yielded record literal, and for some reason once we do that the line {id:1,name:"a"} is repeated in the output.

$ zq -z '
yield [{id:1},{id:2}]
| over this
| left join (
  yield {id:1,name:"a"}
) on id=id name'

{id:1,name:"a"}
{id:1,name:"a"}
{id:2}
@philrz philrz added bug Something isn't working community and removed bug Something isn't working labels Oct 8, 2024
@philrz
Copy link
Contributor Author

philrz commented Oct 8, 2024

We reviewed this one as a group and I now have an explanation for why it's happening. @mccanne pointed out that because the yield is inside of the join ( ), a yield of the the constant value {id:1,name:"a"} is triggered by each upstream value, i.e., once for the {id:1} and once for the {id:2}. A simple non-join example of the same effect:

$ echo '1 2 3' | zq -z 'yield "hi"' -
"hi"
"hi"
"hi"

By contrast, from and file are currently implemented to only provide input data from the referenced data source one time.

Since it's effectively working as designed, this might just be a motivation to discourgage sourcing input data this way since this side effect would probably elude many users. However, we plan to design some other join improvements in the near future, so for now I've added this one to the Epic #4081 so we can make sure to review it again when we're sitting down to look at the others. @mccanne also pointed out that at some point we'll likely enhance from to have a way for it to fire with each upstream input when desired (#4752) so this may also be worthy of considering as relates to that.

@philrz philrz changed the title Extra output when joining against literal record Extra output when yielding literals as input data Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant