Skip to content

Jaunty Estimation of Hierarchical Time Series Clustering

Notifications You must be signed in to change notification settings

HPI-Information-Systems/jet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JET

Jaunty Estimation of Hierarchical Time Series Clustering

Installation

python setup.py install

Usage

JET is a Scikit-Learn BaseEstimator with ClusterMixin class. It has a fit_predict function that expects a list of time series (List[np.ndarray]) that will be clustered. The available time series distance measures are Shape Based Distance, Move Split Merge, and Dynamic Time Warping. These measure can handle only univariate time series. Therefore, JET can handle only univariate time series, too.

import numpy as np
from jet import JET, JETMetric

from scipy.cluster.hierarchy import dendrogram
import matplotlib.pyplot as plt

# generate 100 random example time series with lengths between 30 and 50
list_of_time_series = [np.random.rand(np.random.randint(30, 50)) for _ in range(100)]

jet = JET(
    n_clusters=10,                              # number of clusters to find: $c$ in paper
    n_pre_clusters=None,                        # number of pre-clusters to find: $c_{pre}$ in paper; default is $3\sqrt{n}$ (3*np.sqrt(len(X))) if None is set
    n_jobs=1,                                   # number of parallel jobs
    verbose=False,                              # output status messages
    metric=JETMetric.SHAPE_BASED_DISTANCE,      # distance metric for time series distances; Options: SHAPE_BASED_DISTANCE, MSM, DTW, or custom
    c = 700                                     # cost parameter for MSM distance metric
)

# returns cluster label for each time series
labels = jet.fit_predict(list_of_time_series)

# plot the dendrogram
dendrogram(jet._ward_clustering._linkage_matrix)
plt.show()

Bring Your Own Distance Measure

You can define your own distance measure function as shown below. (This enables you to cluster also multivariate time series if you have a suitable measure!)

import numpy as np
from jet import JET, JETMetric

def custom_distance_measure(x: np.ndarray, y: np.ndarray) -> float:
    min_len = min(len(x), len(y))
    distance = np.power(x[:min_len] - y[:min_len], 2)
    return distance

jet = JET(
    n_clusters=10,
    metric=JETMetric(custom_distance_measure)
)

Experiments

Code for the experiments was created with Tidewater and is described in the README.

Reference

@article{wenig2024jet,
  title={JET: Fast Estimation of Hierarchical Time Series Clustering},
  author={Wenig, Phillip and H{\"o}fgen, Mathias and Papenbrock, Thorsten},
  journal={Engineering Proceedings},
  volume={68},
  number={1},
  pages={37},
  year={2024},
  publisher={MDPI}
}

About

Jaunty Estimation of Hierarchical Time Series Clustering

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages