Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ops between FDGrid and Python functions #458

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

matthieubulte
Copy link

Hi there 👋

The goal of this PR is to be able to make it easier to do operations between FDGrids and normal python functions, like this

fd + (lambda x: x**2)

The feature seems to work and tests are still green but the PR probably needs a little bit more work to be mergeable (I don't think I manage well all possible grids since it's not clear to me what is possible to have as a grid), so I would be happy to get guidance on this.

Cheers!

@codecov
Copy link

codecov bot commented Jun 22, 2022

Codecov Report

Merging #458 (84c644d) into develop (80f4f23) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop     #458   +/-   ##
========================================
  Coverage    80.48%   80.49%           
========================================
  Files           95       95           
  Lines         7442     7445    +3     
========================================
+ Hits          5990     5993    +3     
  Misses        1452     1452           
Impacted Files Coverage Δ
skfda/representation/grid.py 84.94% <100.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80f4f23...84c644d. Read the comment docs.

skfda/representation/grid.py Outdated Show resolved Hide resolved
@@ -741,11 +741,14 @@ def _get_op_matrix(
self._check_same_dimensions(other)
return other.data_matrix

elif isinstance(other, Callable):
return np.array([[other(x) for x in _gp] for _gp in self.grid_points]).T
Copy link
Member

@vnmabus vnmabus Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things:

  • I think we should limit support only to vectorized functions. Otherwise we have to call repeatedly the function for every point, which is too slow.
  • I think this code won't work for functions of several variables (e.g. surfaces). For supporting that case we should probably create a matrix with every coordinate were functions are going to be evaluated, call the function with that matrix and then reshape the result.
  • Another option would be to check first if the function supports vectorized input, and fall back to the slow point by point evaluation otherwise. This would be (much) less performant, but probably useful for newcomers and small datasets.
  • FData are callables too. What should happen if we do ops between FData with different domains? And between two FDataGrid with the same domain and different grid_points? (we could lose commutativity in this case!) And between a FDataGrid and a FDataBasis with the same domain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].
    2/3. This is true, it's now fixed.
  2. This is also true, and a good point, which is why the testing for Callable is after testing for FDataGrid. That way all your concerns are tackled by the FDataGrid handler.

[1] https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think this will ever be a hot path, so I wouldn't necessarily spend too much time thinking about micro-optimizing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

  1. My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].

I didn't mean to use np.vectorize, but to assume that the other function accepts an array as input and return the array as output. This way, it would be very efficient for ufuncs like np.sqrt or np.sin, or functions created composing these, as they perform the iteration in C code, probably using SIMD instructions for further performance.

   2/3. This is true, it's now fixed.

I would first try passing the array of coordinates, and if that fails pass the coordinates as different parameters for the efficiency reasons explained above.

2. This is also true, and a good point, which is why the testing for `Callable` is after testing for `FDataGrid`. That way all your concerns are tackled by the `FDataGrid` handler.

Ok, so for now FDataGrid is special, and two FDataGrids with different domains cannot be operated together. If a user wants that they can always use lambda x: my_fdatagrid(x) instead. We can also change this behaviour in the future if we found a reasonable way to support it.

tests/test_grid.py Show resolved Hide resolved
@matthieubulte
Copy link
Author

I have added a commit to address most of your comments, let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants