Skip to content

Latest commit

 

History

History
122 lines (59 loc) · 6.22 KB

README.md

File metadata and controls

122 lines (59 loc) · 6.22 KB

geetiles

download Google Earth Engine datasets to tiles as geotiff arrays

Uses the Google Earth Engine High Volume Endpoint which, according to the documentation:

This service is designed to support a much larger number of simultaneous requests per user, but provides less caching, so it's best for small queries that don't involve any sort of aggregation (like fetching tiles from pre-built images).

install with

pip install geetiles

authenticate to Google Earth Engine CLI with

earthengine authenticate --force

1. create grid on a given area of interest as wkt

geet grid --aoi_wkt_file luxembourg.wkt  --chip_size_meters 1000 --aoi_name lux --dest_dir .

you can find the file luxembourg.wkt under data. Usually you would have to provide your own with your area of interest, with coordinates expressed in WSG84 degrees lon/lat.

this generates file ./lux_partitions_aschips_14c55eb7d417f.geojson. Use a tool such as QGIS to view it.

2. download tiles

For this step, you will need a Google Cloud earth engine project configured (as of 13/11/24) - documentation for this can be found here. Make a copy of your project name

geet download --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson  --dataset_def sentinel2-rgb-median-2020 --pixels_lonlat [100,100] --skip_if_exists --project {EE_PROJECT_NAME}

this fills the folder lux_partitions_aschips_14c55eb7d417f/sentinel2-rgb-median-2020 with RGB geotiff images of size 100x100 pixels.

If using sentinel2-rgb-median-2020 as dataset_def, which is an alias to Sentinel-2 MSI Level 2-A GEE dataset, taking the median of the cloudless chips over the year 2020.

If using esaworldcover-2020 as dataset_def, which is an alias to ESA WorldCover 10m v100 GEE dataset.

Other usages

Other ways to create the set of tiles (shapes)

  • As random partitions with at most 5km size length (figure below left).

    geet random --aoi_wkt_file luxembourg.wkt  --max_rectangle_size_meters 20000 --aoi_name lux --dest_dir .
    
  • Using the reference administrative divisions at EU Eurostat (figure below right)

    geet select --orig_shapefile COMM_RG_01M_2016_4326.zip --aoi_wkt_file luxembourg.wkt --tiles_name communes --aoi_name lux --dest_dir .
    

Using your own code to define the GEE source image object.

geet download --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson  --dataset_def crops.py --pixels_lonlat [100,100] --skip_if_exists --skip_confirm --n_processes 20 --project {EE_PROJECT_NAME}

where crops.py contains a python class DatasetDefinition following the structure of the predefined ones under defs. The files crops.py will be saved under the destination folder for reference. The destination folder is created alongside the tiles_file.

Split geometries in train, test, val using geographic bands

With a certain angle

geet split --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --nbands 8 --train_pct .5 --test_pct 0.3 --val_pct 0.2  --angle 0.78

Keeping chips within the same coarser geometry in the same split. In this case, the train/test/val proportions may vary from the ones specified as chips will be distributed across the coarser geometries. First we must intersect the geometries

geet intersect --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --foreign_tiles_file lux_partitions_communes_1a471c686e053.geojson

and then, do the split

geet split --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --nbands 8 --train_pct .5 --test_pct 0.3 --val_pct 0.2  --angle 0.785 --foreign_tiles_name communes

here is how it would result.

Computing label proportions

With respect to a dataset downloaded with segmentation labels.

geet lp.compute --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --dataset_name esa-world-cover

We can also add the label proportions of the coarser tile in which each chip is embedded. First, we need to download the labels for each coarser tile from GEE.

geet download --tiles_file lux_partitions_communes_1a471c686e053.geojson  --dataset_def esa-world-cover  --meters_per_pixel 20  --skip_if_exists --project {EE_PROJECT_NAME}

then, compute the label proportions at this coarser tiles:

geet lp.compute --tiles_file lux_partitions_communes_1a471c686e053.geojson --dataset_name esa-world-cover

and then compute the label proportions from the coarser tiles.

geet lp.from_foreign --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --foreign_tiles_file lux_partitions_communes_1a471c686e053.geojson --dataset_name esa-world-cover

The resulting proportions are added in the corresponding tiles_file

Creating a dataset to share it

This will create a zip file, with a pickle per chip containing a dictionary with the chip image, label and proportions.

geet zip.dataset --tiles_file lux_partitions_aschips_14c55eb7d417f.geojson --foreign_tiles_file lux_partitions_communes_1a471c686e053.geojson --images_dataset_def sentinel2-rgb-median-2020 --labels_dataset_def esa-world-cover --readme_file README.txt

Some notes

  • the hash codes in the name files are computed using the participating geometries. This ensures that changing geometries do not override each other(such as for random partitions, or a wkt with slightly different coordinates).
  • the splits are saved both as a column in the corresponding tiles_file (which is a geojson) and in a separte csv file. This is to enable fast loading from csv (as loading from geojson might take a while, especially for large dataset).