Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo structure? #1

Open
haakon-e opened this issue Jan 16, 2023 · 7 comments
Open

repo structure? #1

haakon-e opened this issue Jan 16, 2023 · 7 comments

Comments

@haakon-e
Copy link
Collaborator

@asinghvi17 Do you have any thoughts about how to structure this repo?

My initial commit is a rough sketch of how I imagine we can maintain this repo:

  • geojson_files.jl looks up all available datasets from this github directory: link.
  • create_artifacts.jl does the job of creating Artifacts.toml. I imagine we can hook that up to GitHub CI to periodically check for new datasets or updates to existing datasets.
    • note: currently, I just download from [...]/master/[...], but we should probably do this by tags or something to ensure reproducibility. Although I think Artifacts will fail if a file is updated upstream since the SHA will change... Anyways -- just some food for thought.
  • The current dependencies in Project.toml are not really necessary, and, along with the aforementioned files, should be moved to some subdirectory that is tied to Github CI or tests.
  • Down the road we can discuss integration with GeoMakie.
@asinghvi17
Copy link
Member

Hey, thanks for giving this a start! It turned out that I had made this repo a while ago, probably looking to do a similar thing, but then didn't end up doing it.

I just tried the package, but couldn't seem to get it to instantiate - Julia was unable to download the initial artifact, and when I tried that link on Github it hung indefinitely. Strangely, even when I tried to get the raw file from Github I had the same issue.

We could download the Shapefile zips directly from the source, which is hosted by AWS and has a considerably faster download. That means that we'd need to somewhat "roll our own" artifacts, and we couldn't use artifact-strings in that case, using artifacts only as a storage solution.

I wouldn't mind writing that up, and it shouldn't require too much maintenance either. Discovery is actually possible through their Github repo, so there's that advantage as well. Unfortunately, in this case, we would not be able to track by version as we might on Github,


All in all I think the design looks good!

We should definitely download from tags - we will have to manually update them as time goes by but it shouldn't be too hard; it could conceivably be automated using e.g. Github Actions as well.

The current dependencies in Project.toml are not really necessary,

Agreed. We probably only need GeoJSON as a dependency (or GeoJSON2 if it looks like we need speed).

Down the road we can discuss integration with GeoMakie.

This should slot nicely into GeoMakie as a dependency so we can supply coastlines in a more flexible way. No changes will be needed on this end AFAIK.

@haakon-e
Copy link
Collaborator Author

I think I've fixed instantiation errors now, and also moved to installing by tags. Let me know if you still have any issues.

Running build/create_artifacts.jl takes about a minute or two (which includes temporarily downloading all artifacts), so I don't think download speeds should be an issue. Though I'd be happy to move to shapefile zips if that's significantly faster (we could also provide both sources as prioritized alternatives in Artifacts.toml, with geojson as backup).
In case we use Shapefiles, I think we'd have to implement my functions from MakieOrg/GeoMakie.jl#139 here.

@asinghvi17
Copy link
Member

I found githack.com which can act as a CDN for github, so we can use that as a secondary download link if the first one fails. With that change, it works on my machine. Will push this up shortly.

GeoJSON is getting some perf improvements now, so I'm not too worried about that.

The issue with zips is that Artifacts doesn't natively support those, so we'd have to unpack them ourselves, possibly using scratchspaces. This sounds like a hassle, so let's save that until and unless there's no other option?

@haakon-e
Copy link
Collaborator Author

githack looks really neat!

Looks like tar.gz is supported natively by Artifacts, but I don't know how much of an advantage they are compared to uncompressed files, especially since we (currently) don't bundle different data files in the same Artifact.

  • Though it isn't inconceivable that bundling the same data (e.g. coastlines) at different resolutions, or "similar" data types at the same resolution is a terrible idea... Though it sounds like quite a bit of manual (and subjective) work to make those decisions, which I don't think is advisable now, if ever.

@asinghvi17
Copy link
Member

Yeah, I like the framework we have going on now. One thing I do want to do is to add a basic dataset search, i.e., I should be able to do some kind of search like datasets(scale = 10, keywords = ["admin0"]) within the package. Otherwise in terms of vector functionality we're pretty much set.

Would you be interested in moving this package to the JuliaGeo org? If so I can reach out to them and ask. It would probably make this package a little more discoverable, as well as having more people around to help with maintenance etc.

Looks like tar.gz is supported natively by Artifacts

Tarballs and gzips are, but .zip is not, unfortunately :(. That's why we'd have to do this whole scratchspaces/manual unzipping thing, otherwise they'd just be manual additions to the artifact list.

@haakon-e
Copy link
Collaborator Author

haakon-e commented Jan 18, 2023

Yeah, I totally agree with search. It may also possibly be convenient with some helper functions that guide users through the "hierarchy" of vectors: i.e. category = [admin, bathymetry, coastline, ...], resolution=[10, 50, 110], etc.

Also, very happy about moving the package to JuliaGeo. I was just invited as collaborator there earlier today, so that wouldn't make any difference to me.

(ps: apologies, I just force pushed to master to fix a stupid bug I introduced in a small change I just made, I hope you didn't notice)

@asinghvi17
Copy link
Member

Sounds good - looks like that and rasters are our only TODOs before this is ready to go for all available data!

About JuliaGeo: I just realized that I'm a member there already, so I'll transfer the repo now.

No worries - also I'm not reading a force push on my end, so I think it's fine. But yeah, now that it's semi-working we should probably not push directly to master anymore :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants