Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasters API #5

Open
asinghvi17 opened this issue Jan 18, 2023 · 12 comments
Open

Rasters API #5

asinghvi17 opened this issue Jan 18, 2023 · 12 comments

Comments

@asinghvi17
Copy link
Member

It doesn't look like there is an automated solution to get rasters in the way that we've obtained vectors, so we'd have to create a manual database of all links. Then, they are presented as zipfiles, meaning that we would have to use something like Scratch.jl and ZipFile.jl to unpack them manually and store them in a package-specific scratchspace. Assuming no name duplicates, this should also allow us to present a list of already-downloaded rasters.

@haakon-e
Copy link
Collaborator

There's this folder in their github repo, but I don't quite understand what the files are here -- they all seem to be empty to me: https://github.com/nvkelso/natural-earth-vector/tree/v5.1.2/zips

@asinghvi17
Copy link
Member Author

asinghvi17 commented Jan 18, 2023

Yeah, I noticed that as well. I think they might have tried to include rasters there at some point, but because the file sizes were so large they gave up. They have some entries about rasters in the Makefile but they are pretty vague.

I was planning on manually grabbing links from e.g. https://www.naturalearthdata.com/downloads/10m-raster-data/ and going from there, since they don't look like they're going to change. We should also post a note saying that that API is pretty breakable...

@haakon-e
Copy link
Collaborator

Ah okay. Not as "automatic" as I thought... e.g. 1:50m Natural Earth I only has a subset of the high-res datasets: https://www.naturalearthdata.com/downloads/50m-raster-data/50m-natural-earth-1/

@haakon-e
Copy link
Collaborator

haakon-e commented Jan 18, 2023

Here's a potential lead: the R-version of this package provides a function that discovers the relevant zip files from https://naturalearth.s3.amazonaws.com/[...], and should supposedly work with rasters too...
https://github.com/ropensci/rnaturalearth/blob/e7161ac7d3efe827e1951cf1ed798e506985b620/R/ne_file_name.R

I don't have R installed on this computer, so I haven't had the chance to test their function yet.

@haakon-e
Copy link
Collaborator

haakon-e commented Jan 18, 2023

Okay. A little bit more info:
The site: https://naturalearth.s3.amazonaws.com
provides an XML tree of keys/pages. The latest "tag" available seems to be 5.0.1 which seems to be from 2022-03-18.
Manually searching the page, I find e.g. 5.0.1/10m_raster/NE1_LR_LC.zip, which should correspond to the medium-size of this raster.
The corresponding download link is: https://naturalearth.s3.amazonaws.com/5.0.1/10m_raster/NE1_LR_LC.zip. So I think this is semi-automatable.

  1. For the (presumed) latest version: 5.0.1, search all tags matching the 10m_raster sub-key
  2. Fetch all files for that key.
  3. Get the SHA string and add zip file to Artifacts.
  • For users, we will need to do unzipping, as you discussed above.

Note that we would need to add some utility functions to "translate" e.g. NE1_LR_LC to a slightly more human-readable string.

For GitHub CI or similar, we can periodically check that page if there exists any tags higher than the current tag (i.e. 5.0.1).

@asinghvi17
Copy link
Member Author

Do we need to store the zipfiles as Artifacts? Since they're lazy anyway (and most are usually on the order of hundreds of MB anyway), we could just download using Base.download, unzip and store in a scratch space provided by https://github.com/JuliaPackaging/Scratch.jl. Then, our exported function can point the user at whatever directory they request, and we can keep track of the "version" of the dataset using a local TOML file or similar.

@haakon-e
Copy link
Collaborator

Hmm. Though we're most definitely in the "write-once, read-many times" paradigm outlined by Scratch.jl, for which Artifacts make more sense "philosophically", as a matter of making practical progress I agree Scratch.jl seems to fulfill our needs.

It's unfortunate we can't customize download/unpack-instructions with Artifacts, it seems like we "only" would need to inject our custom instructions here.

@asinghvi17
Copy link
Member Author

asinghvi17 commented Jan 19, 2023

If we muck around with the code, it might actually be possible - Artifacts uses 7z to unpack stuff, which does support zip files. But for now it seems to make sense to use scratch spaces, and if possible transition to Artifacts after the fact.
See

https://github.com/JuliaLang/Pkg.jl/blob/9a2e065230cfad8abfc1b82a048f57d2fae5a331/src/PlatformEngines.jl#L522

@haakon-e
Copy link
Collaborator

Good catch! I suspect we may need some support / buy-in from Pkg developers to support this -- currently Tar.extract fails since we try to pass it a .zip file as opposed to .tar-type archive.

Should we proceed with Scratch then? It would be nice if we can hide the internals so that the user "API" is still only
https://github.com/JuliaGeo/NaturalEarth.jl/blob/master/src/NaturalEarth.jl#L21

Do you have any design ideas for versioning? Feel free to take a stab at it if you have some time.

@asinghvi17
Copy link
Member Author

I tried XML parsing for a bit but didn't make too much progress. Over the course of this I did find github.com/EcoJulia/RasterDataSources.jl, which sounds like something we could hook into. I would rather not have Rasters.jl as a dependency of this package :D so a user could, theoretically, just type Raster(NaturalEarth{v"..."}, :rastername) to get the raster they want. We'd just have to figure out the parsing then.

@haakon-e
Copy link
Collaborator

Thanks for looking into this.
That does indeed look like a feasible solution.

Alternatively we could wait until julia 1.9 is released with the extensions feature. In that case the additional dependency might not be too bad.
Though if RasterDataSources.jl is a well-tested package, I don't see why we should reinvent the wheel if it as easy, theoretically, as it seems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants