Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ops between FDGrid and Python functions #458

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions skfda/representation/grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from typing import (
TYPE_CHECKING,
Any,
Callable,
Iterable,
Optional,
Sequence,
Expand Down Expand Up @@ -710,12 +711,11 @@ def __eq__(self, other: object) -> NDArrayBool: # type: ignore[override]

def _get_op_matrix(
self,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
matthieubulte marked this conversation as resolved.
Show resolved Hide resolved
) -> Union[None, float, NDArrayFloat, NDArrayInt]:
if isinstance(other, numbers.Real):
return float(other)
elif isinstance(other, np.ndarray):

if other.shape in {(), (1,)}:
return other
elif other.shape == (self.n_samples,):
Expand All @@ -741,11 +741,14 @@ def _get_op_matrix(
self._check_same_dimensions(other)
return other.data_matrix

elif isinstance(other, Callable):
return np.array([[other(x) for x in _gp] for _gp in self.grid_points]).T
Copy link
Member

@vnmabus vnmabus Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things:

  • I think we should limit support only to vectorized functions. Otherwise we have to call repeatedly the function for every point, which is too slow.
  • I think this code won't work for functions of several variables (e.g. surfaces). For supporting that case we should probably create a matrix with every coordinate were functions are going to be evaluated, call the function with that matrix and then reshape the result.
  • Another option would be to check first if the function supports vectorized input, and fall back to the slow point by point evaluation otherwise. This would be (much) less performant, but probably useful for newcomers and small datasets.
  • FData are callables too. What should happen if we do ops between FData with different domains? And between two FDataGrid with the same domain and different grid_points? (we could lose commutativity in this case!) And between a FDataGrid and a FDataBasis with the same domain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].
    2/3. This is true, it's now fixed.
  2. This is also true, and a good point, which is why the testing for Callable is after testing for FDataGrid. That way all your concerns are tackled by the FDataGrid handler.

[1] https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think this will ever be a hot path, so I wouldn't necessarily spend too much time thinking about micro-optimizing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

  1. My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].

I didn't mean to use np.vectorize, but to assume that the other function accepts an array as input and return the array as output. This way, it would be very efficient for ufuncs like np.sqrt or np.sin, or functions created composing these, as they perform the iteration in C code, probably using SIMD instructions for further performance.

   2/3. This is true, it's now fixed.

I would first try passing the array of coordinates, and if that fails pass the coordinates as different parameters for the efficiency reasons explained above.

2. This is also true, and a good point, which is why the testing for `Callable` is after testing for `FDataGrid`. That way all your concerns are tackled by the `FDataGrid` handler.

Ok, so for now FDataGrid is special, and two FDataGrids with different domains cannot be operated together. If a user wants that they can always use lambda x: my_fdatagrid(x) instead. We can also change this behaviour in the future if we found a reasonable way to support it.


return None

def __add__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand All @@ -756,14 +759,14 @@ def __add__(

def __radd__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

return self.__add__(other)

def __sub__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand All @@ -774,7 +777,7 @@ def __sub__(

def __rsub__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand All @@ -785,7 +788,7 @@ def __rsub__(

def __mul__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand All @@ -796,14 +799,14 @@ def __mul__(

def __rmul__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

return self.__mul__(other)

def __truediv__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand All @@ -814,7 +817,7 @@ def __truediv__(

def __rtruediv__(
self: T,
other: Union[T, NDArrayFloat, NDArrayInt, float],
other: Union[T, NDArrayFloat, NDArrayInt, float, Callable],
) -> T:

data_matrix = self._get_op_matrix(other)
Expand Down
7 changes: 5 additions & 2 deletions tests/test_grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@

import numpy as np


class TestFDataGrid(unittest.TestCase):

# def setUp(self): could be defined for set up before any test
Expand Down Expand Up @@ -151,7 +150,7 @@ def test_coordinates(self):
fd.data_matrix)

def test_add(self):
fd1 = FDataGrid([[1, 2, 3, 4], [2, 3, 4, 5]])
fd1 = FDataGrid([[1, 2, 3, 4], [2, 3, 4, 5]], [0, 1, 2, 3])

fd2 = fd1 + fd1
np.testing.assert_array_equal(fd2.data_matrix[..., 0],
Expand All @@ -161,6 +160,10 @@ def test_add(self):
np.testing.assert_array_equal(fd2.data_matrix[..., 0],
[[3, 4, 5, 6], [4, 5, 6, 7]])

fd2 = fd1 + (lambda x: x)
matthieubulte marked this conversation as resolved.
Show resolved Hide resolved
np.testing.assert_array_equal(fd2.data_matrix[..., 0],
[[1, 3, 5, 7], [2, 4, 6, 8]])

fd2 = fd1 + np.array(2)
np.testing.assert_array_equal(fd2.data_matrix[..., 0],
[[3, 4, 5, 6], [4, 5, 6, 7]])
Expand Down