GitHub - subhadeep-123/KMeans-and-Agglomerative-Clustering-on-Mall-Customers-dataset: Kmeans & Hierarchical Agglomerative Clustering algorithms implementation on Mall Customers dataset

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types tomers dataset

Agglomerative: This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top-down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

K-means Clustering

K-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means due to the name. Applying the 1-nearest neighbor classifier to the cluster centers obtained by k-means classifies new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
Hierarchical Agglomerative Clustering.ipynb		Hierarchical Agglomerative Clustering.ipynb
Kmeans clustering.ipynb		Kmeans clustering.ipynb
LICENSE		LICENSE
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical clustering

K-means Clustering

About

Releases

Packages

Languages

License

subhadeep-123/KMeans-and-Agglomerative-Clustering-on-Mall-Customers-dataset

Folders and files

Latest commit

History

Repository files navigation

Hierarchical clustering

K-means Clustering

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages