Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample data collection with Cloud Optimized GeoTIFF #107

Closed
1 task done
j08lue opened this issue Oct 15, 2024 · 16 comments
Closed
1 task done

Add sample data collection with Cloud Optimized GeoTIFF #107

j08lue opened this issue Oct 15, 2024 · 16 comments
Assignees
Milestone

Comments

@j08lue
Copy link
Collaborator

j08lue commented Oct 15, 2024

We added a nice Sentinel-2 L2A data collection (over Poland), copying items from the Copernicus Data Space Ecosystem STAC that link to JPEG2000 Sentinel-2 band files:

However, the JPEG2000 driver linked in GDAL (OpenJPEG) is failing under load and generally has poor performance

So we need a nice collection instead that has Cloud Optimized GeoTIFF (COG).

To allow us to test and demo typical data access patterns, the collection should

  1. Feature adjacent tiles / images that can be stitched together in dynamic tiling (mosaic) or coverages requests
  2. Feature repeated coverage of the same area, such that we can test time series functionality

Acceptance criteria

  • eoAPI STAC in EOEPCA has a collection of COGs that meet the above criteria
@j08lue
Copy link
Collaborator Author

j08lue commented Oct 15, 2024

One possibility could be to copy selected collections from Maxar Open Data to a CloudFerro S3 bucket and copy STAC records into our service.

Its CC-BY-NC-4.0 license seems to permit that we create a copy (for non-commercial use), if I read it correctly.

And we have all the tooling to create the STAC metadata https://github.com/vincentsarago/MAXAR_opendata_to_pgstac/tree/main - thanks @vincentsarago

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 15, 2024

Maybe @MathewNWSH knows by heart, whether any nice COG collections exist in eodata.cloudferro.com that perhaps also already have STAC records somewhere?

As I previously noted - looking at the CloudFerro STAC https://radiantearth.github.io/stac-browser/#/external/https://pgstac.demo.cloudferro.com - the only collection with COGs seems to be Sentinel-1 Ground Range Detected (GRD). We could use this, but SAR data cannot be validated visually as easily as optical imagery.

Maybe there are other collections of COGs in CloudFerro that are just not in the STAC yet?

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 15, 2024

Sentinel-1 GRD could be an option, though. RGB composites of the different polarizations tend to look quite nice https://gis.stackexchange.com/a/400780

image

@pantierra
Copy link
Collaborator

pantierra commented Oct 21, 2024

What about using Sentinel-2 COGs by Amazon's Registry for Open Data? AFAIK there are no egress costs. And we could go for a similar area as before (e.g. Poland) to keep size of data a bit limited.

@MathewNWSH
Copy link

Maybe @MathewNWSH knows by heart, whether any nice COG collections exist in eodata.cloudferro.com that perhaps also already have STAC records somewhere?

As I previously noted - looking at the CloudFerro STAC https://radiantearth.github.io/stac-browser/#/external/https://pgstac.demo.cloudferro.com - the only collection with COGs seems to be Sentinel-1 Ground Range Detected (GRD). We could use this, but SAR data cannot be validated visually as easily as optical imagery.

Maybe there are other collections of COGs in CloudFerro that are just not in the STAC yet?

Hi @j08lue,

sorry I missed your message :(

I believe that The COG S2 data will be available at some point for selected parts of Europe.

(sorry I don't know precise term)

Other COG data that I'm aware of are S1GRD and Copernicus DEM (at least I think so).

@pantierra
Copy link
Collaborator

Thanks, @MathewNWSH! @j08lue, I would prefer to use S2L2A over S1GRD, it is probably nicer to show. I can imagine it would be good to have this as a use case also when running it on other premises than cloudferro.

What do you think for this purpose to download a timerange of a year or so of S2L2A-cogs over Poland (or even a smaller area) and the create a STAC Catalog from it (rio-stac should help us with that) that we can then ingest?

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 21, 2024

Ok, let us copy some S2L2A COGs over to CloudFerro for now.

It would be great to have rich Sentinel-2 STAC metadata, more than what rio-stac produces.

I see three ways to get there:

  1. Amend the existing sentinel-2-l2a collection and just add another asset like B04_10m_cog for all bands, in original resolution (not the up/downsampled ones)
  2. Copy STAC metadata from somewhere and change it to point to our bucket
  3. Generate rich STAC data with https://github.com/stactools-packages/sentinel2

Could (1) be easiest, perhaps? We probably need to iterate over the items in our collection to copy the relevant files.

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 21, 2024

We need to find out how to create and get write access to a bucket in CloudFerro in the same region as our EOEPCA infrastructure. I would ask about that on EOEPCA Slack.

@pantierra
Copy link
Collaborator

pantierra commented Oct 25, 2024

We received access to the cloudferro infrastructure of the project. Next steps ahead.

  • Create an list of STAC items for Sentinel-2 COG data for 2023 over the area of Iceland (using the STAC search result from Earth-Search)
  • Create a STAC collection for this data
  • Spinn up a bucket for data storage on cloudferro through the infrastructure as a code
  • Ingest the STAC collection and items into eoAPI deployed for EOEPCA+

@pantierra
Copy link
Collaborator

  • There is a script we use to obtain search results from Earth Search, strip out the jp2 assets and leave only the cogs and dump this into a json file.
  • The result is a static catalog here, and can be inspected with this stac-browser.
  • It looks like we can use the files directly from the AWS Open Data Registry as they are publicly available. I suggest we get it to work first, and then consider whether we want to have a mirror on cloudferro.

Next steps:

  • Looking into ingestioning this into pgstac on the deployed eoapi.

@j08lue
Copy link
Collaborator Author

j08lue commented Oct 29, 2024

The sample looks great!

Agreed, if performance is ok with loading directly from AWS us-west-2, we could skip the copying. How many GB are we talking about, roughly?

@pantierra
Copy link
Collaborator

I currently took all imagery over Iceland from 2023 with a cloud coverage lower than 5%. There are 226 items of it. Each contain assets of around 180MB which should be a total of ~40GB. If we play with the cloud coverage it could be more or less.

@pantierra
Copy link
Collaborator

Data should now be available - please check it @j08lue.

Related PRs:

@j08lue
Copy link
Collaborator Author

j08lue commented Nov 6, 2024

The collection looks great. As icing on the cake, we could add a collection thumbnail, so it looks nice in STAC Browser, similar to what the Poland Sentinel-2 collection has:

https://radiantearth.github.io/stac-browser/#/external/eoapi.develop.eoepca.org/stac/collections/sentinel-2-l2a?.asset=asset-thumbnail

"assets": {
"thumbnail": {
"href": "https://s3.waw3-2.cloudferro.com/swift/v1/stac-png/S2_L2A.jpg",
"type": "image/jpeg",
"roles": [
"thumbnail"
],
"title": "Sentinel-2 Level-2A",
"proj:code": null,
"proj:shape": [
360,
640
]
}
},

Ideally a mosaic of the actual data or just one of the nicest scenes. Or just a screen grab from a cloudless Sentinel-2 product, like https://s2maps.eu/.

image

@j08lue j08lue closed this as completed Nov 12, 2024
@j08lue
Copy link
Collaborator Author

j08lue commented Nov 12, 2024

@j08lue j08lue added this to the Q3 milestone Nov 12, 2024
@pantierra
Copy link
Collaborator

Yeah, once the Transaction endpoints work, i wanted to add the thumbnail through them as the first use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants