Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "from" operator to accept upstream input #4752

Closed
philrz opened this issue Aug 14, 2023 · 4 comments · Fixed by #5437 or #5476
Closed

Allow "from" operator to accept upstream input #4752

philrz opened this issue Aug 14, 2023 · 4 comments · Fixed by #5437 or #5476

Comments

@philrz
Copy link
Contributor

philrz commented Aug 14, 2023

At the time of the filing of this issue, Zed is at commit 2689e24.

This topic was recently highlighted thanks to a user that asked the following question in a community Slack thread:

is there a way to make get requests with an authentication header?
what I'm trying to do is take a CSV, take a specific column, input into an API request, and pick a field from the API response, and then merge that result into a new CSV

We ultimately helped the user by using proposing some shell glue with multiple invocations to zq. However, their experience exposed a limitation of the many input operators (from operator et al): They currently can only sit at the "head" of a Zed pipeline. However, in this case the user ultimately wanted to take information from upstream in the Zed pipeline and make it part of an HTTP request, e.g., their pseudocode:

What I'm trying to do is run something like this:
http https://company.clearbit.com/v1/domains/find name=="$COMPANY_NAME_FROM_ZED_ITERATION" --auth=key | jq .key
For each entry in an array from a zed expression

Thinking about this in a more general way, @nwt recognized that the input operators could be changed so that when data comes from upstream, the operator could run itself and everything downstream once per this value that comes from the upstream. This would enable things such as:

  1. Populating headers/body/URLs/etc. of an HTTP request
    1. That should hopefully cover his user's specific inquiry
    2. FWIW, when I opened Scripted loading of data found below a URL #4267, I envisioned something like this being a necessary building block
  2. Sourcing input data from pools/filenames generated upstream

In a discussion of this topic, @nwt and @mccanne recognized that special care would need to be taken for the optimizer to remain effective despite this dynamic behavior.

In another more recent discussion about the API-enabling variant of this, @mccanne wondered if we might have some kind of async/pipeline flag that would allow for parallelizing the HTTP connections invoked. This would mean the order is not guaranteed across the results, so another flag could be used to guarantee order when that's needed.

@philrz philrz changed the title Allow "from" opeator to accept upstream input Allow "from" operator to accept upstream input Aug 15, 2023
@philrz
Copy link
Contributor Author

philrz commented Oct 12, 2023

Note to self: The API-enabling variant of this definitely seems worthy of a blog post once it's done.

@philrz
Copy link
Contributor Author

philrz commented Dec 21, 2023

We've been discussing this one as a team but have had uncertainty about the correct design approach, so we're putting the topic back on ice for a bit while we focus on other priorities.

@philrz
Copy link
Contributor Author

philrz commented Oct 8, 2024

See #5324 (comment) for a semi-related topic.

@philrz
Copy link
Contributor Author

philrz commented Nov 15, 2024

Verified in super commit f1213fa.

For a very simple example of from accepting upstream input, we'll start with these two data files.

$ cat a.json 
{"hello": "world"}

$ cat b.json 
{"goodbye": "everyone"}

Now we'll send an array of the filenames as input, and use the newly-added from eval(...) to iterate over the array and have input pulled from each referenced filename.

$ super -version
Version: v1.18.0-153-gf1213fa5

$ echo '["a.json", "b.json"]' | super -c 'from eval(this) | yield this' -
{hello:"world"}
{goodbye:"everyone"}

For a more sophisticated example that shows the API use case, the current README for the super project shows this. Just to capture the point-in-top description of what's it's doing in the event it continues to evolve over time:

Here's a SuperSQL query that fetches some data from GitHub Archive, computes the set of repos touched by each user, ranks them by number of repos, picks the top five, and joins each user with their original created_at time from the current GitHub API.

$ super -c "FROM 'https://data.gharchive.org/2015-01-01-15.json.gz'
| SELECT union(repo.name) AS repos, actor.login AS user
  GROUP BY user
  ORDER BY len(repos) DESC
  LIMIT 5
| FORK (
  => FROM eval(f'https://api.github.com/users/{user}')
   | SELECT VALUE {user:login,created_at:time(created_at)}
  => PASS
  )
| JOIN USING (user) repos"

{user:"ARoiD",created_at:2010-07-31T02:51:58Z,repos:|["Tox/toxic","cgeo/cgeo","sddm/sddm","Astonex/Docs","Tox/Tox-Docs","Tox/toxme.se","rsudev/Antox","schwabe/cgeo","Astonex/Antox","lodash/lodash","Astonex/ToxBox","Lineflyer/cgeo","bestiejs/json3","Astonex/Tox-STS","JFreegman/toxic","Tox/Tox-Website","jdalton/docdown","strycore/scripts","SteamedFish/vimrc","isohuntto/openbay","lifetyper/scripts","lodash/lodash-cli","polarssl/polarssl","SteamedFish/config","culmor30/cgeo-wear","samueltardieu/cgeo","iBeliever/cross-pkg","schwabe/ics-openvpn","bestiejs/platform.js","Ramblurr/Anki-Android","SteamedFish/gfwiplist","bestiejs/benchmark.js","quantum-os/qml-extras","quantum-os/quantum-os","quantum-os/sddm-theme","lifetyper/FreeRouter_V2","quantum-os/qml-material","cernekee/ics-openconnect","quantum-os/quantum-shell","rankjie/anyconnect-gfw-list"]|}
{user:"altmer",created_at:2023-08-04T07:10:49Z,repos:|["sass/sass","apache/cxf","hapijs/hapi","apache/camel","apache/log4j","apache/storm","apache/wss4j","google/guava","qos-ch/slf4j","apache/hadoop","apache/mahout","django/django","moment/moment","tj/git-extras","apache/xalan-j","gwtproject/gwt","less/less-docs","python/cpython","apache/activemq","tastejs/todomvc","apache/axis2-java","easymock/easymock","mwclient/mwclient","postgres/postgres","zzzeek/sqlalchemy","apache/lucene-solr","mariofusco/lambdaj","SeleniumHQ/selenium","apache/commons-lang","thymeleaf/thymeleaf","alexz-enwp/wikitools","sebastianbenz/Jnario","freemarker/freemarker","ehcache/ehcache-jcache","wikimedia/pywikibot-core","spring-projects/spring-ws","bouil/angular-google-chart","spring-projects/spring-amqp","spring-projects/spring-data","spring-projects/spring-batch","spring-projects/spring-social","spring-projects/spring-webflow","spring-projects/spring-integration"]|}
{user:"automatic-frog",created_at:2014-11-11T00:17:49Z,repos:|["osp/osp.work.annak","osp/osp.work.medor","osp/osp.foundry.dlf","osp/osp.work.bessst","osp/osp.foundry.mill","osp/osp.foundry.vj12","osp/osp.tools.fonzie","osp/osp.work.acsr-WP","osp/osp.foundry.reglo","osp/osp.foundry.crickx","osp/osp.foundry.polsku","osp/osp.tools.PDFutils","osp/osp.tools.strokify","osp/osp.work.medor.www","osp/osp.foundry.metadin","osp/osp.foundry.osp-din","osp/osp.foundry.stories","osp/osp.foundry.w-droge","osp/osp.tools.ethertoff","osp/osp.work.tuned-city","osp/osp.foundry.cimatics","osp/osp.foundry.logisoso","osp/osp.relearn.off-grid","osp/osp.tools.html2print","osp/osp.tools.screenshot","osp/osp.work.osp-website","osp/osp.foundry.alfphabet","osp/osp.foundry.limousine","osp/osp.foundry.philibert","osp/osp.work.cosic.rescue","osp/osp.work.oralsite.www","osp/osp.foundry.sans-guilt","osp/osp.foundry.libertinage","osp/osp.live.europe-refresh","osp/osp.work.maisons-phenix","osp/osp.work.splinterfields","osp/osp.foundry.ax-28-script","osp/osp.foundry.univers-else","osp/osp.tools.wordpress-theme","osp/osp.foundry.notcouriersans","osp/osp.work.multiple-art-days","osp/osp.live.hachures-tourneurs","osp/osp.relearn.gesturing-paths","osp/osp.work.maisons-phenix.www","osp/osp.foundry.le-patin-helvete","osp/osp.foundry.sans-guilt-wafer","osp/osp.work.balsamine.2014-2015","osp/osp.work.travelling-feministe","osp/osp.work.variable-publication","osp/osp.workshop.typojanchi-seoul","osp/osp.live.ethertoff-presentation","osp/osp.tools.collaboration-agreement","osp/osp.tools.mediawiki.skin.ustensile","osp/osp.workshop.self-conscious-design"]|}
{user:"guybedford",created_at:2011-02-03T14:55:09Z,repos:|["jspm/npm","jspm/nodelibs-fs","jspm/nodelibs-os","jspm/nodelibs-vm","jspm/nodelibs-dns","jspm/nodelibs-net","jspm/nodelibs-tls","jspm/nodelibs-tty","jspm/nodelibs-url","systemjs/systemjs","jspm/nodelibs-http","jspm/nodelibs-path","jspm/nodelibs-repl","jspm/nodelibs-util","jspm/nodelibs-zlib","jspm/nodelibs-dgram","jspm/nodelibs-https","jspm/nodelibs-assert","jspm/nodelibs-buffer","jspm/nodelibs-crypto","jspm/nodelibs-domain","jspm/nodelibs-events","jspm/nodelibs-stream","jspm/nodelibs-timers","jspm/nodelibs-cluster","jspm/nodelibs-console","jspm/nodelibs-process","jspm/nodelibs-punycode","jspm/nodelibs-readline","jspm/nodelibs-constants","jspm/nodelibs-querystring","jspm/nodelibs-child_process","jspm/nodelibs-string_decoder"]|}
{user:"opencm",created_at:2012-06-06T07:24:54Z,repos:|["GigaSpaces-QA/DevOps","cloudify-cosmo/repex","cloudify-cosmo/yo-ci","cloudify-cosmo/grafana","cloudify-cosmo/packman","cloudify-cosmo/cloudify-cli","cloudify-cosmo/version-tool","cloudify-cosmo/cloudify-manager","cloudify-cosmo/cloudify-packager","cloudify-cosmo/cloudify-dsl-parser","cloudify-cosmo/cloudify-bash-plugin","cloudify-cosmo/cloudify-chef-plugin","cloudify-cosmo/cloudify-rest-client","cloudify-cosmo/cloudify-cli-packager","cloudify-cosmo/cloudify-system-tests","cloudify-cosmo/cloudify-amqp-influxdb","cloudify-cosmo/cloudify-fabric-plugin","cloudify-cosmo/cloudify-puppet-plugin","cloudify-cosmo/cloudify-script-plugin","cloudify-cosmo/cloudify-agent-packager","cloudify-cosmo/cloudify-diamond-plugin","cloudify-cosmo/cloudify-plugins-common","cloudify-cosmo/cloudify-libcloud-plugin","cloudify-cosmo/cloudify-plugin-template","cloudify-cosmo/cloudify-openstack-plugin","cloudify-cosmo/cloudify-softlayer-plugin","cloudify-cosmo/cloudify-cloudstack-plugin","cloudify-cosmo/cloudify-libcloud-provider","cloudify-cosmo/cloudify-manager-blueprints","cloudify-cosmo/cloudify-nodecellar-example","cloudify-cosmo/cloudify-openstack-provider","cloudify-cosmo/cloudify-hello-world-example"]|}

Thanks @mccanne!

@philrz philrz closed this as completed Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants