Scores for `FDataIrregular` objects #609

pcuestas · 2024-04-01T16:33:32Z

Motivation

Computing scores between FDataIrregular objects is a missing functionality of the package, and it can be useful when measuring the quality of conversions from irregular objects to basis representation.

Desired functionality

Compute scores when both y_true and y_pred are FDataIrregular objects.

How to implement each score?

There is a big problem when implementing scores for FDataIrregular: the mean of an FDataIrreuglar objects is not well defined. Most of the scores (for FData objects) involve computing the mean of an FData object.

We can surpass this issue in some of the cases when we want the "uniform_average" of the score and not the "raw_values".
An example where we can avoid computing the mean is mean_absolute_error. The mean absolute error is defined this way:

To avoid having to calculate the mean of the FDataIrregular when multioutput="uniform_average", we can change the order of the mean and the integral. That is, instead of:

We can use:

Where $D_i$ and $V_i$ correspond to the domain of the $i$-th irregular curve and its lebesgue measure, respectively. I am not sure if this choice of not using the whole domain $D$ and its volume $V$ is the best, perhaps it would be less confusing to not bother computing the $V_i$'s, but I believe that the result would be less accurate, implicitly giving more weight to curves that have more spread-out points.

This idea can be applied to mean_absolute_error, mean_absolute_percentage_error, mean_squared_error and mean_squared_log_error. I am going to implement these in feature/scoring-fdatairregular.

`r2_score`

I believe that the r2_score can not be implemented for the FDataIrregular case, as its definition is to compare how well y_pred predicts the values of y_true in relation to how well the mean does, and the mean is not defined.

A possible implementation of r2_score for FDataIrregular objects would be to just compute the r2_score of (y_true.values, y_pred.values). However, I do not think this is a good option, as it disregards the functional structure of the curves, ignoring the points where they are measured and the mean of the values does not have the same meaning as in the other cases (FDataGrid and FDataBasis). Moreover, a user can manually call r2_score(y_true.values, y_pred.values) explicitly, so I do not think we should implement this score for irregular data, as it is not properly defined.

The case of explained_variance_score is very similar to that of r2_score.

The text was updated successfully, but these errors were encountered:

(testing included to assert equality with the `FDataGrid` case)

ooodragon94 · 2024-04-14T06:54:16Z

hi, thank you for opening up the issue.
I think this is another method where FDataIrregular is not well defined on.

I'm trying to apply FPCA using this code.
https://fda.readthedocs.io/en/stable/auto_examples/plot_fpca_inverse_transform_outl_detection.html#sphx-glr-auto-examples-plot-fpca-inverse-transform-outl-detection-py

I have functions with R^3 -> R.

can FPCA be implemented on FDataIrregular too?

(or should I open up another issue?)

pcuestas · 2024-04-14T07:38:45Z

Hello, @ooodragon94.

As I understand, your case is very different from the one I outlined in this issue. There are ways to implement FPCA for irregular data, but we haven't implemented that yet, as FDataIrregular is a very recent addition to the package. You should definitely open another issue explaining the type of data that you have and what you want to do in detail.

The development efforts tend to be steered towards what users request, so it will be very useful to know what you would like to have in the package.

pcuestas · 2024-06-30T15:22:44Z

After discussing this issue with @vnmabus and Alberto Suárez, we concluded that the integral of a functional data object should always be the integral over its domain $D$, and not over the interval bounded by the endpoints of the discretization grid (called $D_i$ in the original issue description). This is discussed in depth in #619.

In #610 , I have implemented the changes explained above; that is, dividing each integral by the measure $V_i$ of the smallest interval $D_i$ that contains the $i$-th curve's discretization points:

However, once the integral of discretized datasets is properly defined #619 (over the domain of the functional data object), these scores must be redefined so that the integrals are divided by the domain's measure: $V$, instead of $V_i$. For example, the MAE formula will be:

$$MAE = \frac{1}{\sum w_i}\sum_{i=1}^N w_i \frac{1}{V}\int_D |X_i(t) - \hat X_i(t)|\ dt.$$

Implement scores for `FDatairregular` objects as described in #609

pcuestas added the enhancement label Apr 1, 2024

pcuestas self-assigned this Apr 1, 2024

pcuestas added a commit that referenced this issue Apr 1, 2024

Implement scores for FDatairregular objects as described in #609

951dea3

(testing included to assert equality with the `FDataGrid` case)

pcuestas mentioned this issue Apr 1, 2024

Implement scores for FDatairregular objects as described in #609 #610

Merged

pcuestas mentioned this issue Jun 30, 2024

Integral of discretized functional data #619

Open

vnmabus added a commit that referenced this issue Jul 5, 2024

Merge pull request #610 from GAA-UAM/feature/scoring-fdatairregular

d19e1bd

Implement scores for `FDatairregular` objects as described in #609

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scores for `FDataIrregular` objects #609

Scores for `FDataIrregular` objects #609

pcuestas commented Apr 1, 2024 •

edited

Loading

ooodragon94 commented Apr 14, 2024 •

edited

Loading

pcuestas commented Apr 14, 2024

pcuestas commented Jun 30, 2024 •

edited

Loading

Scores for FDataIrregular objects #609

Scores for FDataIrregular objects #609

Comments

pcuestas commented Apr 1, 2024 • edited Loading

Motivation

Desired functionality

How to implement each score?

r2_score

ooodragon94 commented Apr 14, 2024 • edited Loading

pcuestas commented Apr 14, 2024

pcuestas commented Jun 30, 2024 • edited Loading

Scores for `FDataIrregular` objects #609

Scores for `FDataIrregular` objects #609

pcuestas commented Apr 1, 2024 •

edited

Loading

`r2_score`

ooodragon94 commented Apr 14, 2024 •

edited

Loading

pcuestas commented Jun 30, 2024 •

edited

Loading