Replies: 4 comments 6 replies
-
As this is not an issue of the package, but instead a usage question, I am moving it to Discussions. Your data is very atypical for Functional Data Analysis. Usually data in FDA is continuous and may take a continuum of values, while your data seems to take only two values (as far as I can see, the jump from one to the other is instantaneous, between measures). My first approach would be to use multivariate tools encoding each signal as a sequence of ones and zeros. Alternatively if you know that the data has always the same number of peaks you can "compress" your data as a vector of positions and durations of the peaks. If you really like to use functional tools, for example for alignment, I would use the discretized representation or maybe a wavelet basis, such as the haar basis. This basis is not currently implemented, but you can add custom bases easily by just subclassing the Basis class and implementing the |
Beta Was this translation helpful? Give feedback.
-
Your data has almost all the variation in phase instead of amplitude. It is thus not very surprising that your classification, and PCA, becomes worse after alignment. What you can do here is to register the data, and then use the obtained warpings in phase space (the |
Beta Was this translation helpful? Give feedback.
-
I am glad you made it to work!
Yeah, that was the idea. But now that you told me more about your problem, it seems that the information you want is in amplitude space, so aligning the data would maybe be better. I now also understand why you did not want to use a binary representation. I think the original images misguided me about the nature of your problem. The question I have now is if you should not attempt to align the data and pick the point in the middle between the two maxima, as it seems that every point in that interval has all the information you care about. |
Beta Was this translation helpful? Give feedback.
-
Just a note: it seems that from your data, you lose the abnormality when aligning, but this might not be the case if you try aligning the derivatives. Indeed, in the decreasing phase, you would have 2 peaks for anomalies, and only one for nominal samples. It might be another lead. |
Beta Was this translation helpful? Give feedback.
-
Hi scikit-fda team,
I have a time-series dataset with 50 traces and approximate 330 data points per trace.
The mean trace looks like this:
I would like to apply FPCA on them and later use PC score for a classification task. There are 2 ways I tried out:
Method 1: apply FPCA directly on the traces in discrete form.
Top 5 components' explained variance ratio are:
[0.41167929 0.14467096 0.08427026 0.04618083 0.03430349]
Method 2: use Elastic Registration to remove phase variation in prior to apply FPCA.
Elastic Registration
:FPCA
:Top 5 components' explained variance ratio are:
[0.11482679 0.10225417 0.09011544 0.08024505 0.06961226]
Given the above observation, I have the following question:
Elastic Registration
is not a good way to preprocess data in prior to feed into FPCA, based on the explained variance ratio? Do you know why it is so?In one of your example, FPCA can be done on either discrete form or functional form. Therefore, I tried out the same experiment with basis found in issue #367 (
BSpline, n_basis=200
) and obtained similar result in explained variance ratio, with and without registration, so this seems data form is irrelevant. Do you have any suggestion on proper way to preprocess my data before feeding in FPCA?Thank you.
Sincerely,
Vinh
Beta Was this translation helpful? Give feedback.
All reactions