Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: masking outliers + conflicting distributions + incidence angles + high uncertainty classification + majority filter #6

Merged
merged 5 commits into from
Jan 9, 2024

Conversation

MartinSchobben
Copy link
Collaborator

@MartinSchobben MartinSchobben commented Dec 18, 2023

There is no mask method in the local instance, so for now I just use multiplication to do the same thing.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MartinSchobben MartinSchobben changed the title WIP: preprocessing outliers + conflicting distributions + incidence angles WIP: preprocessing outliers + conflicting distributions + incidence angles + high uncertainty classification Dec 18, 2023
@MartinSchobben MartinSchobben changed the title WIP: preprocessing outliers + conflicting distributions + incidence angles + high uncertainty classification WIP: masking outliers + conflicting distributions + incidence angles + high uncertainty classification Dec 18, 2023
Copy link

review-notebook-app bot commented Dec 20, 2023

View / edit / reply to this conversation on ReviewNB

SwamyDev commented on 2023-12-20T06:48:52Z
----------------------------------------------------------------

I was shortly confused by the "multi-band datacube" term thinking "where is VH or L-Band here?", but then I realised multi-band in the openEO sense. However, I'd still better write "The merged preprocessed datacube..." since these aren't really bands, even if openEO pretends.


Copy link

review-notebook-app bot commented Dec 20, 2023

View / edit / reply to this conversation on ReviewNB

SwamyDev commented on 2023-12-20T06:48:53Z
----------------------------------------------------------------

You could consider putting this nice function into a python module.


@MartinSchobben
Copy link
Collaborator Author

MartinSchobben commented Dec 20, 2023

I can't see these comments now, as I blocked this app. Somehow it keeps showing up in this repo. Can you comment on the qmd in GitHub?

@SwamyDev
Copy link
Collaborator

Lol, works great then this app ^^, yeah I'll do a normal review then

Copy link
Collaborator

@SwamyDev SwamyDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, works really nicely also the results are very well presented!

notebooks/1_yeoda_dc.qmd Outdated Show resolved Hide resolved
Comment on lines 189 to 214
def view_flood_map(df):
# selecting a subsection of the data and reprojecting
flood_map = df[0, 13070:14000, 12900:14200]
flood_map = flood_map.rio.reproject(f"EPSG:4326", nodata=np.nan)
# add open streetmap
request = cimgt.OSM()
# initialize figure
fig = plt.figure(figsize=(13,9))
axis = plt.axes(projection=ccrs.PlateCarree(), frameon=True)
axis.add_image(request, 15)
# add the data
flood_map = flood_map.plot(
ax=axis,
transform=ccrs.PlateCarree(),
levels=[0, 1, 2],
colors=["#00000000", "#ff0000"],
add_colorbar=False
)
# legend and title
cbar = fig.colorbar(flood_map, ax=axis, location="bottom", shrink=0.6)
cbar.ax.get_xaxis().set_ticks([])
for j, lab in enumerate(['non-flood','flood']):
cbar.ax.text((2 * j + 1) / 2.0, 0.5, lab, ha='center', va='center')
cbar.ax.get_xaxis().labelpad = 10
tk = fig.gca()
tk = tk.set_title("Flood map")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider putting this nice function into a python module.

Copy link
Collaborator Author

@MartinSchobben MartinSchobben Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the lazy approach for now whereby I do not show this code chunk in the final webpage: https://martinschobben.github.io/openeo-flood-mapper-local/. But I'll consider this.

@MartinSchobben
Copy link
Collaborator Author

There is no mask method in the local instance, so for now I just use multiplication to do the same thing.

I will now then also consider implementing mask in the client side processing module of openEO. Although it feels a bit redundant as the ultimate goal would be to use the workflow with the EODC backend. I guess a more complete openeo.local would help development of remote openeo workflows.

@MartinSchobben
Copy link
Collaborator Author

Adding mask as a process for this workflow was actually not that hard. You can see the results here: https://martinschobben.github.io/openeo-flood-mapper-local/. Although the output of execute() is now quite verbose.

@clausmichele
Copy link
Member

clausmichele commented Dec 20, 2023

@MartinSchobben I did implement mask already, there's an open PR here Open-EO/openeo-processes-dask#165

It would be nice if you could review it! We need anyway someone at EODC to review and approve it, since the repo is still officially maintained by them.

@clausmichele
Copy link
Member

@MartinSchobben the verboseness of the logs when you call .execute() depends on your logging level. If you set it to INFO, you won't see any logging message at all. (I just run the notebook and didn't get any log message).
Please also make sure to have the latest version of openeo-processes-dask and openeo-pg-parser-networkx.

@MartinSchobben
Copy link
Collaborator Author

@MartinSchobben I did implement mask already, there's an open PR here Open-EO/openeo-processes-dask#165

It would be nice if you could review it! We need anyway someone at EODC to review and approve it, since the repo is still officially maintained by them.

@clausmichele Ah great! Even better. I'll have a look then.

@MartinSchobben
Copy link
Collaborator Author

@clausmichele, but to clarify we are not EODC. We are TUWien.

@clausmichele
Copy link
Member

@clausmichele, but to clarify we are not EODC. We are TUWien.

I know!

@MartinSchobben
Copy link
Collaborator Author

I am now getting closer to what the openeo workflow should look like with mask from the PR. I do note, however, that processing time has substantially increased with mask included.

@clausmichele
Copy link
Member

A bit of overhead is understandable, since the mask process needs to take care about multiple possibilities, depending on the input datacube. However, it would be interesting to understand what's the step taking the most time! I will also try and check.
By the way, nothing prevents to use a normal multiplication when in full control of the input datacubes!

@MartinSchobben
Copy link
Collaborator Author

Hmm, interesting. I was using multiplication at first. But, as the idea of this repo was to showcase transforming an existing pipeline (Copernicus GFM) into openEO syntax, I found it useful to stick to the given processes.

I'll also have a look what causes this increased processing time.

@clausmichele
Copy link
Member

But a simple multiplication is also represented as a basic openEO process, so there's nothing wrong with it. But as I was saying, it could fail if the input datacubes are not matching. Anyway, I'm doing some tests and the time consuming part seems to be calling the .where method. I'm investigating if it makes sense to keep it or replace it with a multiplication, since the whole code before is checking if the inputs are aligned.

@clausmichele
Copy link
Member

clausmichele commented Dec 21, 2023

The main issue is handling no data values and the replacement value, that's why we rely on .where.
If you can find a faster approach to reproduce the result we get with where it would be cool 🥇
Here an example of what the openEO mask process is doing:

>>> import xarray as xr
>>> import numpy as np

>>> data = np.arange(25).reshape(5, 5).astype(np.float32)
>>> data[0,0] = np.nan
>>> data[0,1] = 0
>>> input_data = xr.DataArray(data, dims=("x", "y"))
>>> print(input_data)
<xarray.DataArray (x: 5, y: 5)>
array([[nan,  0.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.],
       [20., 21., 22., 23., 24.]], dtype=float32)
Dimensions without coordinates: x, y
>>> mask = input_data > 4
>>> print(mask)
<xarray.DataArray (x: 5, y: 5)>
array([[False, False, False, False, False],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])
Dimensions without coordinates: x, y
>>> replacement = np.nan
>>> masked_data = input_data.where(~mask,replacement)
>>> print(masked_data)
<xarray.DataArray (x: 5, y: 5)>
array([[nan,  0.,  2.,  3.,  4.],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan]], dtype=float32)
Dimensions without coordinates: x, y

@MartinSchobben
Copy link
Collaborator Author

MartinSchobben commented Dec 21, 2023

A further question @clausmichele, regarding the above comment. I would actually expect the cube for openeo-processes-dask to be:

import xarray as xr
import numpy as np
import dask.array as da
data = da.arange(size).reshape(x, y).astype(np.float32)
data[0,0] = np.nan
data[0,1] = 0
input_data = xr.DataArray(data, dims=("x", "y"))

But it appears to be a normal xarray. Is this correct? And could parallel computing of .where not make a difference as well?

I think I answered it myself, openeo local processing loads data chunked as far as I can tell.

@clausmichele
Copy link
Member

clausmichele commented Dec 21, 2023

A further question @clausmichele, regarding the above comment. I would actually expect the cube for openeo-processes-dask to be:

import xarray as xr
import numpy as np
import dask.array as da
data = da.arange(size).reshape(x, y).astype(np.float32)
data[0,0] = np.nan
data[0,1] = 0
input_data = xr.DataArray(data, dims=("x", "y"))

But it appears to be a normal xarray. Is this correct? And could parallel computing of .where not make a difference as well?

Well, the .where call would work in both cases, so with an xarray object based on numpy or dask arrays. With Dask it can/will be parallelized, but it can't be faster than a single multiplication.
(The above code was just an example to understand how the mask process should work with some sample numers)

@MartinSchobben MartinSchobben changed the title WIP: masking outliers + conflicting distributions + incidence angles + high uncertainty classification WIP: masking outliers + conflicting distributions + incidence angles + high uncertainty classification + majority filter Jan 4, 2024
@MartinSchobben
Copy link
Collaborator Author

MartinSchobben commented Jan 4, 2024

I reverted to masking by multiplication instead of using the process mask to speed up processing. I also added apply_neighborhood to mimic the majority filter for speckle removal. See here, https://github.com/MartinSchobben/openeo-processes-dask/tree/add-apply-neighborhood, the initial implementation of apply_neighborhood. At the moment I do not understand the complete definition of this process by openEO, especially the overlap parameter is unclear to me.

@MartinSchobben
Copy link
Collaborator Author

@clausmichele FYI, I put this here: Open-EO/openeo-processes-dask#215. We have trouble understanding the definition of apply_neighborhood. Perhaps you find this interesting.

@MartinSchobben MartinSchobben merged commit 165949e into interTwin-eu:main Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants