Sampling #313

m-mohr · 2021-12-13T13:42:01Z

For several use cases such as ML training, one or more sampling processes could be useful.

This came up in the discussions around #295

Related work:

Alternative:

polygon_to_point?
raster <-> vector processes #155

clausmichele · 2021-12-15T15:42:30Z

The sampling process could also allow to "extract" all the valid points setting a parameter. This would cover the case where we select some polygons (filter_spatial) and we want to keep all the pixels inside of them for training.

Some questions that come up in my mind:

Is the output a vector-cube?
Does it keep x,y information about the points or do we discard them?

clausmichele · 2021-12-16T15:42:06Z

The sampling will be performed using a new process called polygon_to_points:

polygon_to_points takes as input a geoJSON (or other supported formats, TBD) and returns the same structure with point coordinates
possible parameters are:

Sampling strategy: random, fibonacci, ...
Max number of points to extract
Other?

m-mohr · 2021-12-16T15:47:28Z

We still need to verify which name to choose, they should be somewhat aligned between vector processes.
Input should probably be GeoJSON or vector cube, as usual.
Would it be useful to also pass lines and get points for them or should this be purely polygon focused?
Sampling strategy could be a callback (so random and Fibonacci would be separate processes) or a given set of methods.

m-mohr · 2021-12-20T14:52:06Z

Do we need more than random sampling? I couldn't really find (popular) implementations of Fibonacci.
Number of samples: Allow percentage and an absolute number of pixels?

And lastly @clausmichele , I'm a bit confused about: "takes as input [...] and returns the same structure with point coordinates". This sounds like you'd extract points along the polygon borders, but I assume you meant that it samples from the whole inner polygon, right?

clausmichele · 2021-12-21T08:07:12Z

Sorry for being not so clear, with same structure I was referring to the geoJSON or vector-cube structure/data format, not the content itself.

Allow percentage and an absolute number of pixels?

I'm not sure actually. The percentage could be tricky since I would not know how to define a maximum number of points (100%) which could be extract from a vector layer, in theory they could be infinite!

m-mohr · 2021-12-21T09:41:37Z

Thanks for the clarification, @clausmichele.

I thought about basing percentages around the pixel centers. You should have a known number of pixels and could create a list of points for the pixel centers, right?

clausmichele · 2021-12-21T10:53:11Z

From the last meeting I understood that the process should take as input geometries (geoJSON) and output also geometries to be used in aggregate_spatial, that's why I firstly called it "polygon_to_points". The output would be a list of points which can be used in aggregate_spatial to create a vector-cube for training the ML model.

@mattia6690 @jdries can you comment on this? Do I remember correctly?

m-mohr · 2021-12-21T11:01:25Z

Ah, okay! Then I misunderstood or remembered incorrectly. I can change that. That makes the process simpler by only creating points from a shape without directly combining them with the values from the raster cube. That's also fine for me, just didn't fully understood the idea then. Stay tuned for an updated version... :-)

ValentinaHutter · 2022-02-23T08:03:47Z

From the last meeting I understood that the process should take as input geometries (geoJSON) and output also geometries to be used in aggregate_spatial, that's why I firstly called it "polygon_to_points". The output would be a list of points which can be used in aggregate_spatial to create a vector-cube for training the ML model.

@mattia6690 @jdries can you comment on this? Do I remember correctly?

I am working on an implementation of aggregate_spatial at EODC. Is there an example of aggregate_spatial being used with vector_to_random_points or vector_to_regular_points somewhere? Like an input of aggregate_spatial and the corresponding output? :)

clausmichele · 2022-02-23T08:11:23Z

This is my implementation of it https://github.com/SARScripts/openeo_odc_driver/blob/d1383aa872bdef5a8bde37f5d48b1f7a56cdd57e/openeo_odc_driver.py#L566

There's a lot of room for improvements, the properties are not kept for example. I will provide you also an example using vector_to_*_points soon.

ValentinaHutter · 2022-02-23T08:37:58Z

Thanks a lot, that's great! :)

Co-authored-by: clausmichele <[email protected]>

m-mohr added the ML label Dec 13, 2021

m-mohr mentioned this issue Dec 13, 2021

Random Forest: Training/Regression, Classifier/Predicting... #295

Closed

m-mohr self-assigned this Dec 16, 2021

m-mohr added a commit that referenced this issue Dec 20, 2021

Add raster_to_points #313

fb7bedf

m-mohr added a commit that referenced this issue Dec 20, 2021

Add raster_to_points #313

00dbaff

m-mohr added a commit that referenced this issue Dec 20, 2021

Add raster_to_points #313

6ca3d70

m-mohr mentioned this issue Dec 20, 2021

Add vector_to_points #313 #315

Merged

m-mohr linked a pull request Dec 20, 2021 that will close this issue

Add vector_to_points #313 #315

Merged

m-mohr added a commit that referenced this issue Dec 21, 2021

Add vector_to_points #313

1acc305

m-mohr added a commit that referenced this issue Mar 15, 2022

Add vector_to_points #313 (#315)

536508c

Co-authored-by: clausmichele <[email protected]>

m-mohr closed this as completed Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling #313

Sampling #313

m-mohr commented Dec 13, 2021 •

edited

Loading

clausmichele commented Dec 15, 2021

clausmichele commented Dec 16, 2021

m-mohr commented Dec 16, 2021

m-mohr commented Dec 20, 2021 •

edited

Loading

clausmichele commented Dec 21, 2021 •

edited

Loading

m-mohr commented Dec 21, 2021

clausmichele commented Dec 21, 2021

m-mohr commented Dec 21, 2021

ValentinaHutter commented Feb 23, 2022 •

edited

Loading

clausmichele commented Feb 23, 2022

ValentinaHutter commented Feb 23, 2022

Sampling #313

Sampling #313

Comments

m-mohr commented Dec 13, 2021 • edited Loading

clausmichele commented Dec 15, 2021

clausmichele commented Dec 16, 2021

m-mohr commented Dec 16, 2021

m-mohr commented Dec 20, 2021 • edited Loading

clausmichele commented Dec 21, 2021 • edited Loading

m-mohr commented Dec 21, 2021

clausmichele commented Dec 21, 2021

m-mohr commented Dec 21, 2021

ValentinaHutter commented Feb 23, 2022 • edited Loading

clausmichele commented Feb 23, 2022

ValentinaHutter commented Feb 23, 2022

m-mohr commented Dec 13, 2021 •

edited

Loading

m-mohr commented Dec 20, 2021 •

edited

Loading

clausmichele commented Dec 21, 2021 •

edited

Loading

ValentinaHutter commented Feb 23, 2022 •

edited

Loading