[MRG] add TimeSeriesScaleMeanMaxVariance() #333

tonylee2016 · 2021-01-07T23:14:38Z

When I was using the package for K-shape, one issue found is that the original TimeSeriesScaleMeanVariance may amplify small changes (noises) in the signal. Because that it normalizes each signal per its own variance.

Here I am adding a new scaling class that preserves the amplitude relation across signals, this scaling method may be useful for some applications.

TimeSeriesScaleMeanMaxVariance is similar to the existing TimeSeriesScaleMeanVariance with one exception, the max std of all signals is used for normalization. As a result, the amplitude relation across signals is preserved.

tonylee2016 · 2021-01-08T02:06:56Z

the method won't pass check_methods_subset_invariance, It is expected, I've excluded it from the test.

GillesVandewiele · 2021-01-08T07:56:44Z

tslearn/preprocessing/preprocessing.py

+        # retain the scaling relation cross the signals,
+        # the max std_t is set to self.std
+        max_std = max(std_t)
+        if max_std ==0.: max_std = 1


Split up over multiple lines.

And add space to right and left of ==

GillesVandewiele · 2021-01-08T07:57:16Z

tslearn/preprocessing/preprocessing.py

+        X_ = check_dims(X_, X_fit_dims=self._X_fit_dims, extend=False)
+        mean_t = numpy.nanmean(X_, axis=1, keepdims=True)
+        std_t = numpy.nanstd(X_, axis=1, keepdims=True)
+        # retain the scaling relation cross the signals,


Are these comments still relevant? Does it need to be stateful? I.e. should there be a fit() and transform() where in the fit() the self.std is stored and used in the transform() method?

GillesVandewiele · 2021-01-08T08:01:59Z

Hi,

Thank you for this PR! I think this could indeed be useful for some use cases, but I will let @rtavenar confirm this.

I left 2 very minor remarks. I checked out the check_methods_subset_invariance from sklearn and it indeed makes sense that the results change depending on the sub-batch you provide to the method (as it uses the max std of that batch to normalise). However, we must ensure that only the relevant tests are skipped, and that as many as possible tests are performed. Are we sure that adding _skip_test as a tag does not skip more tags than needed?

GillesVandewiele · 2021-01-08T08:04:15Z

One other thing that we could consider is, instead of making this part of the main codebase, to create an example/tutorial "defining a custom preprocessor" and then using this specific preprocessing technique. I can imagine that there are more alternatives to normalise out there, and they only have minimal differences. Including all of those into tslearn could quickly result in a bloated code-base.

rtavenar · 2021-01-08T10:09:47Z

Hi @tonylee2016
I think this is closely related to #280 and #285

The scalers we have implemented so far only rely on per-time series information. And we seem to have more and more requests like this one.

I would be OK to have such scalers in tslearn but I strongly believe that we should put effort in a clear documentation of the differences between scalers.

Also, maybe it would be worth it to have names that explicitly state that these scalers rely on global information, or have a subpackage dedicated to "global" scalers.

I am open to suggestions on these points and would like to clarify that before considering the add of these preprocessors in tslearn.

tonylee2016 · 2021-01-08T21:22:30Z

Hi, @GillesVandewiele and @rtavenar

thanks for the comments and extra info. After reading #280 and #285, here are my thoughts on a more generic scaler framework

move the std and mu calculation to fit(). As [WIP] Fit method at preprocessing file #280 #285 is already doing it.
add an optional scaler Infos gathered from fit(): so that user know what data the scaler is based on
better document and an example for TimeSeriesScaleMeanMaxVariance() to show the different effects

I can spend some time on 3 in this PR. But 2 and 1 may be handled by #285?

@GillesVandewiele _skip_test will skip all estimator checks. However, I did not find a way to only skip check_methods_subset_invariance.

use inheritance ans skip check_methods_subset_invariance

Add compile.py

codecov-commenter · 2022-05-18T00:51:26Z

Codecov Report

Merging #333 (28ea436) into main (87495c1) will decrease coverage by 0.16%.
The diff coverage is 28.57%.

❗ Current head 28ea436 differs from pull request most recent head 2819398. Consider uploading reports for the commit 2819398 to get more accurate results

@@            Coverage Diff             @@
##             main     #333      +/-   ##
==========================================
- Coverage   94.70%   94.54%   -0.17%     
==========================================
  Files          59       59              
  Lines        4511     4525      +14     
==========================================
+ Hits         4272     4278       +6     
- Misses        239      247       +8

Impacted Files	Coverage Δ
tslearn/preprocessing/__init__.py	`100.00% <ø> (ø)`
tslearn/preprocessing/preprocessing.py	`89.10% <28.57%> (-9.75%)`	⬇️
tslearn/tests/sklearn_patches.py	`92.59% <0.00%> (+0.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87495c1...2819398. Read the comment docs.

tonylee2016 changed the title ~~add TimeSeriesScaleMeanMaxVariance()~~ [WIP] add TimeSeriesScaleMeanMaxVariance() Jan 7, 2021

tonylee2016 changed the title ~~[WIP] add TimeSeriesScaleMeanMaxVariance()~~ [MRG] add TimeSeriesScaleMeanMaxVariance() Jan 8, 2021

GillesVandewiele reviewed Jan 8, 2021

View reviewed changes

Base automatically changed from master to main January 26, 2021 12:41

tonylee2016 added 2 commits March 16, 2021 15:41

add TimeSeriesScaleMeanMaxVariance

72d67e3

some update and fix

e7eaea5

use inheritance ans skip check_methods_subset_invariance

tonylee2016 force-pushed the TimeSeriesScaleMeanMaxVariance branch from 28ea436 to e7eaea5 Compare March 16, 2021 22:42

galaxie500 and others added 3 commits July 29, 2021 17:05

Add compile.py

58ac80a

add .whl for docker-airflow build

2b98526

Merge pull request #1 from galaxie500/TimeSeriesScaleMeanMaxVariance

2819398

Add compile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] add TimeSeriesScaleMeanMaxVariance() #333

[MRG] add TimeSeriesScaleMeanMaxVariance() #333

tonylee2016 commented Jan 7, 2021 •

edited

Loading

tonylee2016 commented Jan 8, 2021

GillesVandewiele Jan 8, 2021

GillesVandewiele Jan 8, 2021

GillesVandewiele Jan 8, 2021

GillesVandewiele commented Jan 8, 2021

GillesVandewiele commented Jan 8, 2021

rtavenar commented Jan 8, 2021

tonylee2016 commented Jan 8, 2021

codecov-commenter commented May 18, 2022

[MRG] add TimeSeriesScaleMeanMaxVariance() #333

Are you sure you want to change the base?

[MRG] add TimeSeriesScaleMeanMaxVariance() #333

Conversation

tonylee2016 commented Jan 7, 2021 • edited Loading

tonylee2016 commented Jan 8, 2021

GillesVandewiele Jan 8, 2021

Choose a reason for hiding this comment

GillesVandewiele Jan 8, 2021

Choose a reason for hiding this comment

GillesVandewiele Jan 8, 2021

Choose a reason for hiding this comment

GillesVandewiele commented Jan 8, 2021

GillesVandewiele commented Jan 8, 2021

rtavenar commented Jan 8, 2021

tonylee2016 commented Jan 8, 2021

codecov-commenter commented May 18, 2022

Codecov Report

tonylee2016 commented Jan 7, 2021 •

edited

Loading