-
Notifications
You must be signed in to change notification settings - Fork 6
High-Level: HLS data to the dashboard #70
Comments
@jvntf just fyi I noticed we can also access the HLS files using URS authentication https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T55GEM.2022035T235241.v2.0/HLS.S30.T55GEM.2022035T235241.v2.0.B03.tif but I don't think we'll want to use that if we can configure direct S3 access through AWS IAM policies. |
Thinking about this a little bit more, I'm pretty sure we'll want to reuse the mosaic generation that Sean has implemented for planetary computer. For planetary computer I think Sean mentioned a daily global mosaic which may still be higher temporal resolution than what we need for the dashboard but if it's easy to copy those mosaic records that already exist for planetary computer into our database than maybe that is the best thing to do. I'm going to also checkin to see if we have any preliminary landcover stories to inform the decision about what HLS temporal and spatial extents and resolutions are right for the first release of the dashboard. |
Notes from today’s meeting with @sharkinsspatial : Aimee: I think we want the daily mosaics that are being used for MS Planetary Computer Sean: no HLS data in the planetary computer, store portions of the old 1.5 data and that's not global just for sample regions We’re working on BLM forest service request for SWIR for fire season and have started to experiment with HLS global daily mosaics for this purpose, but we have been running into performance problems:
Sean: But in general, tile construction should not take 30 seconds so we have to determine what is causing the titiler lambda function to take over 30 seconds About mosaics:
Sean
Next steps for HLS data for EO dashboard:
There are nearly 3 million HLS Sentinel-2 granules and nearly 4.5 million Landsat granules. The story that is currently planned is to highlight flooding in Manville, NJ September 2021. TL;DR: David Bitner says to just stick it in the database Brian F showed an S30 tile (HLS.S30.T18TWK.2021245T154911.v2.0) so assuming that's the dataset we want to use for this use case:
I was proposing that we generate STAC records for the subset we need for our use case. Brian F showed an S30 tile for the flooding in Manville, NJ on Sept 2, 2021 and so I was thinking of generating a few daily global mosaics of S30 for a few days before and after the flood which would require loading about 113,700 granules into our STAC database (query: [https://search.earthdata.nasa.gov/search/granules?p=C2021957295-LPCLOUD&pg[0][v]=f&pg[0][…]0!3!!&m=40.76707484150656!-75.22251319885254!7!1!0!0%2C2).
I think we just need sentinel
https://github.com/stac-utils/pgstac#bulk-data-loading STAC records are inline with CMR metadata and the data itself, see metadata "rel" of "http://esipfed.org/ns/fedsearch/1.1/metadata" or any of the links in the metadata which end in "_stac.json" Note from David:
Other next steps:
Notes from Sean: The most difficult area here is maintaining consistent synchronization with CMR. Given the async nature of the HLS processing pipelines, granules might created at variable times after collection. For example, our nominal latency is 3 days so you might query CMR for a date 4 days after it has passed, but we might process a new granule for that date 5 days later (which you’d then be missing in pgstac). I’d like to work with Lauren so that Cumulus publication can support a configurable SNS topic that we could use to continuously ingest forward processing data and avoid any syncing issues (this is how we currently handle the Landsat data in the HLS pipeline). @sharkinsspatial please correct anything I misstated |
@freitagb mentioned that there exists many of these STAC records already which we could probably obtain so that we don't have to go through URS authentication for the STAC metadata generation. We should check in with him to see if it makes sense for us to use the STAC records he is maintaining in the staging environment versus the STAC records maintained in LP.DAAC |
@abarciauskas-bgse There are inline STAC records available as public links for all the HLS granules without authentication via the LPDDAC
|
Cool thanks @sharkinsspatial that's good to know, that wasn't clear in the conversation today with @freitagb |
@sharkinsspatial mentioned that he has a lambda to update the AWS credentials stored for the lambda every 30 minutes, which updates Sean also has a lambda that will queue list of CMR records from query to SQS to generate STAC records, using bulk loading Plan is to tagup with @jvntf and @anayeaye about this infrastructure at some point soon |
Here are some notes from the meeting mostly focused on the HLS stack that I took earlier today: https://docs.google.com/document/d/15XB0lP3bm8MbtgLZb_bdALu0JJX0w9OlmivvBwNht7Y/edit I think we agreed to use the same infrastructure configuration as Sean but starting with just one day of Sentinel data to benchmark timing and test configuration between all the components in our AWS environment. @jvntf does that make sense to you? It sounded like you may be able to start on this ticket soon. |
My current understanding of Seans workflow for HLS data:
|
Our HLS collection was inserter using id |
ingested 4 hours of HLS data ending with |
ingested the records under HLSS30.002 as well. i'll remove the HLSS30 collection / records from the db |
Awesome! |
Since HLS is a large volume dataset, we are working on it in parts and I'm leaving this ticket open as a catch all for HLS work for now. |
@aboydnw to make some followup tickets for continuous ingest & multi-daac stuff and link to this ticket |
Closing in favor of NASA-IMPACT/veda-data-airflow#99 and NASA-IMPACT/veda-data-airflow#100 |
@sharkinsspatial will be our point of contact for this dataset since he has produced and published the COGs to LP.DAAC
Identify dataset and where it will be accessed from.
I just know from Sean that the datasets are available in Earthdata Search https://search.earthdata.nasa.gov/search?gdf=Cloud%20Optimized%20GeoTIFF%20(COG). Looks like these datasets are in the
lp-prod-protected
bucket but are they using request-payer? Can we configure access to these files from our853
dashboard AWS account?We need to ask Sean if we should use the inline STAC records to ingest into our pgSTAC database or generate new metadata using rio-stac
Ask about specific variables and required spatial and temporal extent.If the dataset is ongoing (i.e. new files are continuously added and should be included in the dashboard), design and construct the forward-processing workflow.
Sean says the pipeline is creating 150k COGS every day 😱 what do we need for the dashboard?
Verify the COG output with the science team by sharing in a visual interface.
Verify the metadata output with STAC API developers and any systems which may be depending on this STAC metadata (e.g. the front-end dev team).
If the dataset should be backfilled, create and monitor the backward-processing workflow.
Engage the science team to add any required background information on the methodology used to derive the dataset.
Add the dataset to an example dashboard.
The text was updated successfully, but these errors were encountered: