K-Means Clustering Animation Reflections
This post reflects on two animations of the k-means clustering algorithm, which illustrate how the algorithm iteratively assigns data points to clusters and updates cluster centroids.
The animations are sourced from:
Reflection
Both animations show k-means as an iterative process. The process is as follows:
- choose the number of clusters
- place initial centroids
- assign each point to its nearest centroid
- update each centroid to the mean of its assigned points
- repeat until the clusters stabilise.
The first animation shows how the starting position of centroids can strongly affect the early clustering process. Even if the initial placement is poor, the algorithm gradually improves the grouping by repeatedly reassigning points and moving centroids.
In regards to the second animation via “I’ll choose” and “Uniform Points”, it highlights that k-means will still create clusters even when the data may not contain meaningful natural groups. This is because concretely the algorithm is based on distance optimisation, and does not take real-world meaning / context into account. It works best when clusters are relatively compact and well separated, but it can be misleading with uniform, irregular, overlapping, or high-dimensional datasets.
Ethically, if clustering is used to segment people, such as customers, students, patients, or job applicants, the chosen value of k, dataset features, and assumptions behind the model need to be justified. Otherwise, the algorithm may create artificial categories, which can lead to biased, unfair, or poorly explained decisions.
