-
Notifications
You must be signed in to change notification settings - Fork 2
PsuedoCode ‐ Patching
Recall the general pipeline process for EO data: patching. Patching is when we take a subset of a fixed size from a larger scene or AOI. In general, there are two ways to do the patching process. 1) you can pre-patch your data and then save it to a file or 2) you can patch on the fly using a custom dataset.
Sources
- SatClip Example for Random Clean Patches - SatClip
In this case, we will pre-chip the images to have consistent chipped datasets. Some advantages to this method is that we are free to choose the data structure of choice to save. This will allow flexibility for when people create their custom datasets provided they are simple data structures like
.tif
,.png
ornumpy
arrays. In addition, the user will not have to worry about making patches.
Operations
- Load Analysis-Ready Data
- Initialize Normalizer
- Pre-Patching
- Save ML-Ready Data
- Save Normalizer
PsuedoCode
# select analysis-ready files and load data
analysis_ready_files: List[str] = …
ds: GeoDataset = load_dataset(analysis_ready_files)
# calculate transformation parameters
transform_params: Dataclass = calculate_transform_params(ds, **params)
save_normalizer(…, transform_params)
# define patcher and patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)
patcher: Patcher = Patcher(patch_size, stride)
# save patches to ML Ready Bucket
file_path: Path = Path(…)
save_name_id: str = …
num_workers: int = …
save_patches(patcher, num_workers, file_path, save_name_id)
Operations
- Load ML-Ready Data
- Load Normalizer
- Apply Normalizer
- Create Dataset
PsuedoCode
# get ml ready data files
ml_ready_data_files: List[str] = […]
# load transform params, init transform
transform_params: PyTree = load_tranform_params(…)
transformer = init_transformer(transform_params)
# create ML dataset
ds: MLDataset = MLDataset(files, transformer)
# demo item
num_samples: int = …
sample: Tensor[“B C H W”] = ds.sample(num_samples)
In this case, we will create a dataset that does some preprocessing on-the-fly. We just need to save the scenes to a chosen data structure and then we need a custom dataset which allows us to subset AOI and take patches. Some advantages of this is that we don’t need to double-save the data, we can retain some of the meta-data of the data, we have more flexibility to experiment with different patching strategies. Some disadvantages of this approach is that we need a more advanced dataset which requires more code and it can be very expensive if the memory is not managed properly.
Operations
- Load Analysis-Ready Data
- Apply Normalizer
- Patch On The Fly
PseudoCode
# get analysis ready data files
analysis_ready_files: List[str] = […]
# load transform params, init transform
transform_params: PyTree = …
Transformer: Callable = init_transformer(transform_params)
# initialize patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)
# initialize dataset
ds: Dataset = Dataset(
analysis_ready_files,
transformer,
patch_size,
stride,
**kwargs
)
# demo item
sample: Tensor[“1 C 256 256”] = ds.sample(1)
Libraries
There are a number of libraries that offer this patching strategy.
-
xrpatcher is a lightweight patcher for
xarray.Dataset
structures which can easily be composed with PyTorch Datasets. - torchgeo provide some lightweight datasets for rasters and vectors and include geo information.
- Raster-Vision
This research is funded through a NASA 22-MDRAIT22-0018 award (No 80NSSC23K1045) and managed by Trillium Technologies Inc (trillium.tech).