Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save_asset_per_band: assets in different s3 folders + band-specific filename #877

Open
JorisCod opened this issue Sep 23, 2024 · 9 comments
Assignees

Comments

@JorisCod
Copy link

JorisCod commented Sep 23, 2024

The goal of this issue is to save assets to different folders, each with a specific name according to the band name.

Specific use case in mind: saving of monthly composites in different s3 folders (one folder/month)

Prefix example:
vito/products/2020/'<MONTH'>/

Filename example:
LCFM_LSF-MONTHLY_<DATE_START>_<DATE_END>29TNE_V001-SATIO_B<BAND_NAME>M.tif

So what's parametric is month, band_name, resolution, date_start and date_end. Resolution can be considered fixed (*). Month and band_name can be derived as follows:

band_name = openeo_band_name.split('')[-1]
month = openeo_band_name.split('
')[0]

(alternatively with fixed indices).

date_start and date_end are harder as they are to be derived from month:

import calendar
from datetime import datetime

def get_start_and_end_dates(year: int, month: int):
    # Get the first day of the month
    start_date = datetime(year, month, 1)

    # Get the last day of the month
    last_day = calendar.monthrange(year, month)[1]
    end_date = datetime(year, month, last_day)

    return start_date, end_date

# Example usage
year = 2020
month = 2
start_date, end_date = get_start_and_end_dates(year, month)

print("Start date:", start_date.strftime('%Y%m%d'))
print("End date:", end_date.strftime('%Y%m%d'))

The solution might be to provide a list of prefixes and filenames to use, both lists having the same length as the number of bands. This is quite general as well as flexible.

'* An ideal workflow is to have 1 job for all resolutions. This is actually the goal of main workflow of LCFM, to have a workflow for both 10m and 20m bands.

@JorisCod JorisCod changed the title Save_asset_per_band: save assets in different s3 folders Save_asset_per_band: assets in different s3 folders + band-specific filename Sep 23, 2024
@jdries
Copy link
Contributor

jdries commented Sep 24, 2024

@JeroenVerstraelen to plan in next sprint
a list or dictionary mapping band names to preferred asset name is indeed a way forward.
Allowing subdirectory in prefix is (hopefully) also possible.

@bossie
Copy link
Collaborator

bossie commented Sep 30, 2024

Related to Open-EO/openeo-geotrellis-extensions#317 in that this also accepts a dictionary mapping band names to scale, offset etc.

@EmileSonneveld
Copy link
Contributor

@JorisCod , do you have and example process graph that will need this treatment?

@JorisCod
Copy link
Author

JorisCod commented Oct 4, 2024

Jod id: 241004c6b52a4745b1280a94d7b9e257

Notebook: https://github.com/VITO-RS-Vegetation/lcfm-production/blob/s1_difference/notebooks/sentinel1.ipynb

putting in this code:

save_result_options = { "filename_prefix": "lcfm-s1",
                       "separate_asset_per_band" : True }

result_datacube = stats.save_result(
            format="GTiff",
            options=save_result_options,
        )

result_datacube = result_datacube.process(
    "export_workspace",
    arguments={
        "data": result_datacube,
        "workspace": "lcfm-workspace",
        "merge": f"Sentinel1/test/LCFM_S1-RAW_{year}_{tile}_V001",
    },
)

@JorisCod
Copy link
Author

JorisCod commented Oct 4, 2024

Actually, the job above failed, though unclear to me why (the online web editor is crashing for me, maybe too many jobs).

@EmileSonneveld
Copy link
Contributor

Would it be ok to use bands names in the format of:
'vito/products/2020/<MONTH>/LCFM_LSF-MONTHLY_<DATE_START>_<DATE_END>29TNE_V001-SATIO_B<BAND_NAME>M.tif'
And that separate_asset_per_band will interpret the slashes to make the folders?
We can't use a slash in a filename anyway.

Then we can support "filename_prefix": "" to get rid of the "openEO_" prefix

@VictorVerhaert
Copy link

One problem with this solution is that the resulting tiff will always contain a band with the full path as bandname, which I think will need to be included in the resulting stac collection to load in these bands and in turn produce a lengthy and confusing load_stac if someone wants to use the results.
A cleaner solution imo (but harder to implement) would be to be able to provide a dictionary with band names as keys and filepaths as values.

@EmileSonneveld
Copy link
Contributor

Would it be fine to only work without temporal dimension?

dc = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=spatial_extent_tap,
    temporal_extent=["2023-06-01", "2023-06-05"],
    bands=['B04', "SCL"],
)

dc = dc.reduce_dimension("t", openeo.processes.first)

dc = dc.save_result(format="GTiff", options={
    "separate_asset_per_band": True,
    "filepath_per_band": ["test/2023_B04.tiff", "test2/classification.tiff"],
})
results
├── collection.json
├── metadata.json
├── test
│   ├── 2023_B04.tiff
│   └── 2023_B04.tiff.json
└── test2
    ├── classification.tiff
    └── classification.tiff.json

When there is a temporal dimension, the date should be included in the filename, and then we need some kind of template system

@bossie
Copy link
Collaborator

bossie commented Oct 28, 2024

Feel free to adapt/rename/improve the "bands_metadata" format option (a dictionary of band names to tiff tags).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants