-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics: Soft dynamic-time-warping divergence for FDataGrid #412
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 85.53% // Head: 85.06% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #412 +/- ##
===========================================
- Coverage 85.53% 85.06% -0.47%
===========================================
Files 141 143 +2
Lines 11280 11370 +90
===========================================
+ Hits 9648 9672 +24
- Misses 1632 1698 +66
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I wrote de code in python and used @jit decorators with fastmath=True and cache=True. Computation times are comparable to my cython version. |
What is missing is the test part: any suggestion ? Checking expected properties of divergence like symmetry, positivity, indiscernibility would be enough ? @vnmabus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to check the algorithm.
For the tests, IMHO the best test is to compare the results with an existing implementation if there is any, for a particular dataset.
from skfda.misc._math import half_sq_euclidean | ||
|
||
|
||
def _check_shape_postive_cost_mat(cost, expected_shape, shape_only=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use type hints even in private functions, to catch possible errors.
" ({}, {})".format(n1, n2) | ||
) | ||
|
||
_check_shape_postive_cost_mat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should always check these things. Maybe we can check in the tests for particular data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it looks inappropriate here but otherwise, when the cost matrix does not satisfay these properties the output value of the algorithm does not make any sense. That's why I put a boolean check_cost
(default: False) as input.
Should I let it here or is the user supposed to check it before ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that if it is only done when check_cost
is True
, and it is false by default, it is not a problem.
By the way, the code can be made more compact (and better tested) by generating the costs if cost
is a callable and then reusing the same code path as if cost
where a sequence of ndarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so only a callable
as for the input cost
.
Actually, I could raise a warning instead of a ValueError
, that lets more flexibility to the user (but increases the risk of making wrong conclusion) but at least make the user aware of the that risk WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to allow only a callable. I wanted to say that after the callable is evaluated, the code path for a callable input is the same as for a ndarray
, so both can be merged together.
I am not sure about the warning/exception thing. It is easier to handle an exception programatically, and you can always call the function again without checking if that is what you want. Also, if the algorithm require these preconditions maybe it shouldn't continue. Is there any case in which you would want to call the algorithm breaking the preconditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I've been busy too.
So I followed your suggestion, the main code path is the following:
(1) pre-computations of cost matrices,
(2) Check the properties of each cost matrix if check_cost
is True,
(3) Return the sdtw value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking too long to review this more thoroughly, I have been busy.
# and symmetry | ||
if cost is not None: | ||
if isinstance(cost, tuple): | ||
if not len(cost) == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better (easier to understand) to have the positive condition in the if
instead of the else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is taking a great shape!
Remember to also add some tests for this feature, preferably comparing with an existing implementation and/or known results.
|
||
# np.sum(X ** 2, axis=1)=inner_product(X, X) | ||
cost_11 = 0.5 * inner_product( | ||
arg1.data_matrix[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for not passing just the FDataGrid
(thus using the information of the grid points to compute the functional inner product, instead of the multivariate one)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the grid points could be an extension yes, but the original implementation (not functional based) does not consider the grid points (this is reminded in the function header), so the functions inputs only needs the data_matrix
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, checking again the original paper I now see what this function is supposed to do.
So, it receives as input two functions
with
If that is the case, I would put that explicitly in the documentation, and maybe check that there is only one observation per FDataGrid just in case. Of course then it makes no sense to consider the actual values of the grid points, unless you want to consider the output as a function of two variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay now, except for the tests (another PR?).
# @pairwise_metric_optimization.register | ||
# def _pairwise_metric_optimization_sdtw( | ||
# metric: sdtwDivergence, | ||
# elem1: FData, | ||
# elem2: FData, | ||
# ) -> NDArrayFloat: | ||
|
||
# return metric(elem1, elem2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this part, I let the code commented because I don't know how to the decorator @pairwise_metrice_optimization.register
must be used. Is its goal to compute a similarity or Gram matrix in a distributed way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its goal is to be able to provide a more efficient implementation of the pairwise distance matrix between two data sets. Given the datasets
Note that by default
By default, what it is done (if you do not implement this optimization) is to build the datagrids
|
||
# np.sum(X ** 2, axis=1)=inner_product(X, X) | ||
cost_11 = 0.5 * inner_product( | ||
arg1.data_matrix[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the grid points could be an extension yes, but the original implementation (not functional based) does not consider the grid points (this is reminded in the function header), so the functions inputs only needs the data_matrix
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have some tests (comparing with known results) as part of this PR, to guarantee that it works.
If you also want to do an example for the documentation, that could be in a separate PR.
|
||
# np.sum(X ** 2, axis=1)=inner_product(X, X) | ||
cost_11 = 0.5 * inner_product( | ||
arg1.data_matrix[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, checking again the original paper I now see what this function is supposed to do.
So, it receives as input two functions
with
If that is the case, I would put that explicitly in the documentation, and maybe check that there is only one observation per FDataGrid just in case. Of course then it makes no sense to consider the actual values of the grid points, unless you want to consider the output as a function of two variables.
# @pairwise_metric_optimization.register | ||
# def _pairwise_metric_optimization_sdtw( | ||
# metric: sdtwDivergence, | ||
# elem1: FData, | ||
# elem2: FData, | ||
# ) -> NDArrayFloat: | ||
|
||
# return metric(elem1, elem2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its goal is to be able to provide a more efficient implementation of the pairwise distance matrix between two data sets. Given the datasets
Note that by default
By default, what it is done (if you do not implement this optimization) is to build the datagrids
) -> None: | ||
"""check compatible, one sample and one-dimensional""" | ||
|
||
if (fdata1.dim_domain != fdata2.dim_domain): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can reuse here the new functions check_fdata_same_dimensions
and check_fdata_dimensions
in skfda.misc.validation
.
) | ||
|
||
|
||
class sdtwDivergence(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class names start with uppercase. Thus it would be SDTWDivergence
.
|
||
|
||
@njit(cache=True) | ||
def _sdtw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if this is this the same (or similar) dynamic programming algorithm used for Fisher-Rao registration? If it is, we could also use it in that case and remove the dependency on fdasrsf.
Content
I'm proposing the soft dynamic-time-warping (SDTW) divergence as a metric between FDataGrid objects. The domain dimension (time) must be one. The co-domain dimension can be greater than one. I use it for my work and noted in #363 that asked classical dtw implementation.
This metric is a smooth version of the DTW which is defined by a min operator and depends on a pre-defined alignment cost (ie a discrepancy measure between all pairs (x(t_i), x(t_j))), eg squared euclidean cost, or roughly speaking a Gram-matrix. In SDTW, the min operator is replaced by the soft-min (through a negative log-sum-exp like function). The smoothness is controlled by
gamma
. SDTW divergence is positive, symmetric, indiscernable (SDTW(X, Y) = 0 <=> X=Y) but does not satisfy the triangle inequality (depends on the alignment cost). See "Differentiable Divergences Between Time Series", Blondel et al., 2021 AISTATS. Thus it can be used like a distance for clustering, k-nn classification, etc.The main advantages of the SDTW divergence are:
Implementation
Since SDTW is defined by the soft-min operator, computing SDTW divergence between two FDataGrid objects, X and Y, requires an iterative procedure to solve the minimization problem, that we solve with a dynamic-programming recursion in O(mn) where m and n are the respective size of the grid points of X and Y. I coded the recursion with ctyhon in sdtw_fast.pyx, and so have added a setup.py in ~/misc/metrics to generate the .c extension.
Improvements
sdtwDivergence
, with different cost alignment_pairwise_metric_optimization_sdtw
.Usage:
Any other suggestion