-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripted loading of data found below a URL #4267
Comments
Recent changes in the linked PRs are getting this close to working. This is not yet in a GA release but rather is available at the tip of For instance, the following starts from the list of filenames shown in the HTML if you browse to https://github.com/brimdata/zed-sample-data/tree/main/zeek-default. The HTML is parsed to isolate just the relative paths of each file, then each one is retrieved and all the downloaded records are counted.
However, the original goal specifically of loading data isn't yet possible with the lake, as even the query above is currently prevented from running with
Will continue to hold this issue open until that starts working. |
At the time this issue is being filed Zed is at commit 313c4d4.
We recently noticed a tweet where a user pointed to some Python (archived as azure.ipynb.txt.gz in case the Gist should disappear) which is downloading and prepping-for-query a list of CSV files all accessible under a single URL prefix. Given Zed's flexibility, we recognized that the language should be capable of doing the same with much less code, e.g.:
Ultimately this is likely to require some enhancements along the lines of things we've already discussed, e.g., a
load
operator within the language. Having spent a few quick minutes looking at the data from this specific tweet, a couple other things I spotted:The text was updated successfully, but these errors were encountered: