Inertia in k-means clustering #594
-
Dear all, I'm using the k-means algorithm for clustering a set of curves. Does anybody know, how the inertia is exactly calculated in this case? Suppose I have the grid_points as [0, 1, 2, 3] and the data_matrix as [[1, 2, 3, 4], [6, 5, 4, 3], [5, 3, 1, -1]]. If I'm using 2 clusters, I would e.g. retrieve the cluster_centers = [[5.5, 1],[4, 2],[2.5, 3],[1, 4]] and an inertia of 10.5, which is the sum of the squared distances. The single distances read [0., 2.29128785, 2.29128785]. I would really appreciate, if anybody could tell me how these distances are calculated exactly, ideally for this specific example. Thanks a lot in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In your case you have the discretization grid |
Beta Was this translation helpful? Give feedback.
In your case you have the discretization grid$\mathbf{t} = (0, 1, 2, 3)$ and the functions $x_0, x_1, x_2$ with $x_0(\mathbf{t}) = (1, 2, 3, 4)$ , $x_1(\mathbf{t}) = (6, 5, 4, 3)$ and $x_2(\mathbf{t}) = (5, 3, 1, -1)$ .$c_0, c_1$ with $c_0(\mathbf{t}) = (5.5, 4, 2.5, 1)$ and $c_1(\mathbf{t}) = (1, 2, 3, 4)$ .$f$ and $g$ is configurable using the $L^2$ distance, that is $d(f, g) = \sqrt{\int_{\mathcal{T}} |f(t)-g(t)|^2dt}$ (thus, analog to the Euclidean distance).
$d_0 …
The cluster centers are
The distance between two functions
metric
parameter. By default is theThe distances appearing in the inertia are those between each observation and the cluster it belongs to (the closest one). Thus: