-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add TimeSeriesScaleMeanMaxVariance() #333
base: main
Are you sure you want to change the base?
[MRG] add TimeSeriesScaleMeanMaxVariance() #333
Conversation
the method won't pass |
# retain the scaling relation cross the signals, | ||
# the max std_t is set to self.std | ||
max_std = max(std_t) | ||
if max_std ==0.: max_std = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split up over multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add space to right and left of ==
X_ = check_dims(X_, X_fit_dims=self._X_fit_dims, extend=False) | ||
mean_t = numpy.nanmean(X_, axis=1, keepdims=True) | ||
std_t = numpy.nanstd(X_, axis=1, keepdims=True) | ||
# retain the scaling relation cross the signals, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these comments still relevant? Does it need to be stateful? I.e. should there be a fit() and transform() where in the fit() the self.std is stored and used in the transform() method?
Hi, Thank you for this PR! I think this could indeed be useful for some use cases, but I will let @rtavenar confirm this. I left 2 very minor remarks. I checked out the |
One other thing that we could consider is, instead of making this part of the main codebase, to create an example/tutorial "defining a custom preprocessor" and then using this specific preprocessing technique. I can imagine that there are more alternatives to normalise out there, and they only have minimal differences. Including all of those into tslearn could quickly result in a bloated code-base. |
Hi @tonylee2016 The scalers we have implemented so far only rely on per-time series information. And we seem to have more and more requests like this one. I would be OK to have such scalers in tslearn but I strongly believe that we should put effort in a clear documentation of the differences between scalers. Also, maybe it would be worth it to have names that explicitly state that these scalers rely on global information, or have a subpackage dedicated to "global" scalers. I am open to suggestions on these points and would like to clarify that before considering the add of these preprocessors in tslearn. |
Hi, @GillesVandewiele and @rtavenar thanks for the comments and extra info. After reading #280 and #285, here are my thoughts on a more generic scaler framework
I can spend some time on 3 in this PR. But 2 and 1 may be handled by #285? @GillesVandewiele |
use inheritance ans skip check_methods_subset_invariance
28ea436
to
e7eaea5
Compare
Codecov Report
@@ Coverage Diff @@
## main #333 +/- ##
==========================================
- Coverage 94.70% 94.54% -0.17%
==========================================
Files 59 59
Lines 4511 4525 +14
==========================================
+ Hits 4272 4278 +6
- Misses 239 247 +8
Continue to review full report at Codecov.
|
When I was using the package for K-shape, one issue found is that the original
TimeSeriesScaleMeanVariance
may amplify small changes (noises) in the signal. Because that it normalizes each signal per its own variance.Here I am adding a new scaling class that preserves the amplitude relation across signals, this scaling method may be useful for some applications.
TimeSeriesScaleMeanMaxVariance
is similar to the existingTimeSeriesScaleMeanVariance
with one exception, the max std of all signals is used for normalization. As a result, the amplitude relation across signals is preserved.