• K-means is a very popular clustering algorithm, simple and capable to process massive data.
  • Its goal is to partition the dataset into K clusters while minimizing the error function defined to be the sum of distances between centroids and their associated elements.
  • The number of clusters, K, and the number of iterations or convergence conditions must be specified before running the algorithm.
  • After the centroids are initialized, expression vectors are assigned to the cluster with the closest centroid according to a similarity metric.
  • In each iteration, the mean vector for each cluster is computed as the new centroid and points are reassigned. Iterations are repeated until either the clusters converge (the centroid of each cluster does not change), or the maximum number of iterations has been reached.
  • Centroid initialization plays an important role in the K-means algorithm.