Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling #313

Closed
m-mohr opened this issue Dec 13, 2021 · 11 comments · Fixed by #315
Closed

Sampling #313

m-mohr opened this issue Dec 13, 2021 · 11 comments · Fixed by #315
Assignees
Labels

Comments

@m-mohr
Copy link
Member

m-mohr commented Dec 13, 2021

For several use cases such as ML training, one or more sampling processes could be useful.

This came up in the discussions around #295

Related work:

Alternative:

@clausmichele
Copy link
Member

The sampling process could also allow to "extract" all the valid points setting a parameter. This would cover the case where we select some polygons (filter_spatial) and we want to keep all the pixels inside of them for training.

Some questions that come up in my mind:

  1. Is the output a vector-cube?
  2. Does it keep x,y information about the points or do we discard them?

@clausmichele
Copy link
Member

The sampling will be performed using a new process called polygon_to_points:

  • polygon_to_points takes as input a geoJSON (or other supported formats, TBD) and returns the same structure with point coordinates
  • possible parameters are:
  1. Sampling strategy: random, fibonacci, ...
  2. Max number of points to extract
  3. Other?

@m-mohr
Copy link
Member Author

m-mohr commented Dec 16, 2021

  • We still need to verify which name to choose, they should be somewhat aligned between vector processes.
  • Input should probably be GeoJSON or vector cube, as usual.
  • Would it be useful to also pass lines and get points for them or should this be purely polygon focused?
  • Sampling strategy could be a callback (so random and Fibonacci would be separate processes) or a given set of methods.

@m-mohr m-mohr self-assigned this Dec 16, 2021
@m-mohr
Copy link
Member Author

m-mohr commented Dec 20, 2021

  • Do we need more than random sampling? I couldn't really find (popular) implementations of Fibonacci.
  • Number of samples: Allow percentage and an absolute number of pixels?

And lastly @clausmichele , I'm a bit confused about: "takes as input [...] and returns the same structure with point coordinates". This sounds like you'd extract points along the polygon borders, but I assume you meant that it samples from the whole inner polygon, right?

m-mohr added a commit that referenced this issue Dec 20, 2021
m-mohr added a commit that referenced this issue Dec 20, 2021
m-mohr added a commit that referenced this issue Dec 20, 2021
@m-mohr m-mohr linked a pull request Dec 20, 2021 that will close this issue
@clausmichele
Copy link
Member

clausmichele commented Dec 21, 2021

Sorry for being not so clear, with same structure I was referring to the geoJSON or vector-cube structure/data format, not the content itself.

Allow percentage and an absolute number of pixels?

I'm not sure actually. The percentage could be tricky since I would not know how to define a maximum number of points (100%) which could be extract from a vector layer, in theory they could be infinite!

@m-mohr
Copy link
Member Author

m-mohr commented Dec 21, 2021

Thanks for the clarification, @clausmichele.

I thought about basing percentages around the pixel centers. You should have a known number of pixels and could create a list of points for the pixel centers, right?

@clausmichele
Copy link
Member

From the last meeting I understood that the process should take as input geometries (geoJSON) and output also geometries to be used in aggregate_spatial, that's why I firstly called it "polygon_to_points". The output would be a list of points which can be used in aggregate_spatial to create a vector-cube for training the ML model.

@mattia6690 @jdries can you comment on this? Do I remember correctly?

@m-mohr
Copy link
Member Author

m-mohr commented Dec 21, 2021

Ah, okay! Then I misunderstood or remembered incorrectly. I can change that. That makes the process simpler by only creating points from a shape without directly combining them with the values from the raster cube. That's also fine for me, just didn't fully understood the idea then. Stay tuned for an updated version... :-)

m-mohr added a commit that referenced this issue Dec 21, 2021
@ValentinaHutter
Copy link

ValentinaHutter commented Feb 23, 2022

From the last meeting I understood that the process should take as input geometries (geoJSON) and output also geometries to be used in aggregate_spatial, that's why I firstly called it "polygon_to_points". The output would be a list of points which can be used in aggregate_spatial to create a vector-cube for training the ML model.

@mattia6690 @jdries can you comment on this? Do I remember correctly?

I am working on an implementation of aggregate_spatial at EODC. Is there an example of aggregate_spatial being used with vector_to_random_points or vector_to_regular_points somewhere? Like an input of aggregate_spatial and the corresponding output? :)

@clausmichele
Copy link
Member

This is my implementation of it https://github.com/SARScripts/openeo_odc_driver/blob/d1383aa872bdef5a8bde37f5d48b1f7a56cdd57e/openeo_odc_driver.py#L566

There's a lot of room for improvements, the properties are not kept for example. I will provide you also an example using vector_to_*_points soon.

@ValentinaHutter
Copy link

Thanks a lot, that's great! :)

m-mohr added a commit that referenced this issue Mar 15, 2022
Co-authored-by: clausmichele <[email protected]>
@m-mohr m-mohr closed this as completed Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants