Handle ops between FDGrid and Python functions #458

matthieubulte · 2022-06-22T08:09:11Z

Hi there 👋

The goal of this PR is to be able to make it easier to do operations between FDGrids and normal python functions, like this

fd + (lambda x: x**2)

The feature seems to work and tests are still green but the PR probably needs a little bit more work to be mergeable (I don't think I manage well all possible grids since it's not clear to me what is possible to have as a grid), so I would be happy to get guidance on this.

Cheers!

codecov · 2022-06-22T11:42:58Z

Codecov Report

Merging #458 (84c644d) into develop (80f4f23) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop     #458   +/-   ##
========================================
  Coverage    80.48%   80.49%           
========================================
  Files           95       95           
  Lines         7442     7445    +3     
========================================
+ Hits          5990     5993    +3     
  Misses        1452     1452

Impacted Files	Coverage Δ
skfda/representation/grid.py	`84.94% <100.00%> (+0.12%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80f4f23...84c644d. Read the comment docs.

skfda/representation/grid.py

vnmabus · 2022-06-22T11:47:55Z

skfda/representation/grid.py

@@ -741,11 +741,14 @@ def _get_op_matrix(
 self._check_same_dimensions(other)
 return other.data_matrix

+ elif isinstance(other, Callable):
+ return np.array([[other(x) for x in _gp] for _gp in self.grid_points]).T


A couple of things:

I think we should limit support only to vectorized functions. Otherwise we have to call repeatedly the function for every point, which is too slow.

I think this code won't work for functions of several variables (e.g. surfaces). For supporting that case we should probably create a matrix with every coordinate were functions are going to be evaluated, call the function with that matrix and then reshape the result.

Another option would be to check first if the function supports vectorized input, and fall back to the slow point by point evaluation otherwise. This would be (much) less performant, but probably useful for newcomers and small datasets.

FData are callables too. What should happen if we do ops between FData with different domains? And between two FDataGrid with the same domain and different grid_points? (we could lose commutativity in this case!) And between a FDataGrid and a FDataBasis with the same domain?

My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].
2/3. This is true, it's now fixed.

This is also true, and a good point, which is why the testing for Callable is after testing for FDataGrid. That way all your concerns are tackled by the FDataGrid handler.

[1] https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array

Also, I don't think this will ever be a hot path, so I wouldn't necessarily spend too much time thinking about micro-optimizing this.

Sorry for the delay.

My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].

I didn't mean to use np.vectorize, but to assume that the other function accepts an array as input and return the array as output. This way, it would be very efficient for ufuncs like np.sqrt or np.sin, or functions created composing these, as they perform the iteration in C code, probably using SIMD instructions for further performance.

2/3. This is true, it's now fixed.

I would first try passing the array of coordinates, and if that fails pass the coordinates as different parameters for the efficiency reasons explained above.

2. This is also true, and a good point, which is why the testing for `Callable` is after testing for `FDataGrid`. That way all your concerns are tackled by the `FDataGrid` handler.

Ok, so for now FDataGrid is special, and two FDataGrids with different domains cannot be operated together. If a user wants that they can always use lambda x: my_fdatagrid(x) instead. We can also change this behaviour in the future if we found a reasonable way to support it.

tests/test_grid.py

matthieubulte · 2022-06-22T14:47:31Z

I have added a commit to address most of your comments, let me know what you think!

Handle ops between FDGrid and Python functions

b1d5e2f

vnmabus requested changes Jun 22, 2022

View reviewed changes

Handle ops between vector-valued FDGrid and Python functions

31b357e

Merge branch 'develop' into develop

84c644d

pcuestas mentioned this pull request Mar 25, 2024

Operations between FData objects and other callables #607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle ops between FDGrid and Python functions #458

Handle ops between FDGrid and Python functions #458

matthieubulte commented Jun 22, 2022

codecov bot commented Jun 22, 2022 •

edited

Loading

vnmabus Jun 22, 2022 •

edited

Loading

matthieubulte Jun 22, 2022

matthieubulte Jun 22, 2022

vnmabus Jun 29, 2022

matthieubulte commented Jun 22, 2022

Handle ops between FDGrid and Python functions #458

Are you sure you want to change the base?

Handle ops between FDGrid and Python functions #458

Conversation

matthieubulte commented Jun 22, 2022

codecov bot commented Jun 22, 2022 • edited Loading

Codecov Report

vnmabus Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

matthieubulte Jun 22, 2022

Choose a reason for hiding this comment

matthieubulte Jun 22, 2022

Choose a reason for hiding this comment

vnmabus Jun 29, 2022

Choose a reason for hiding this comment

matthieubulte commented Jun 22, 2022

codecov bot commented Jun 22, 2022 •

edited

Loading

vnmabus Jun 22, 2022 •

edited

Loading