Clustering: A conceptual approach


What is clustering? 

An easy definition of the problem is
Clustering is the process of grouping items into clusters, so that items in the
same cluster
are similar to each other.
Each underlined word in the above definition is subject to interpretation and design choice. The choices the modeler makes determine what clustering problem she ends up with, what kind of patterns she will be looking for, and what kinds of algorithms she will use.

In this book, we will focus on a conceptual understanding of clustering. We will explain how one might make design choices in clustering, and what those choices mean for the patterns one is looking for.

A rough list of topics: 
  • Basics: partition-based clustering, k-(mean/median/center), hierarchical clustering
  • Density estimation
  • Correlation clustering
  • Spectral clustering
  • Graph clustering
  • Choosing k: elbow methods, ROC curves, phase transitions
  • Clustering as compression
  • Metaclustering: Validating clusterings, finding alternate clusterings
  • Axiomatic treatment
  • Soft and nonparametric clustering
  • Clustering with outliers
  • Large-data clustering (coresets, streams)

(This book started as an occasional series of essays on clustering: for all posts in this topic, click here)

  1. Clustering: an occasional series
  2. The "I don't like you" view.
  3. $k$-means
  4. Hierarchical methods
  5. Correlation clustering: "I don't like you, but I like them"
  6. Spectral Clustering
  7. An interlude: time-series clustering by Sorelle Friedler.
  8. Mixture models: classification versus clustering
  9. Choosing the number of clusters I: The elbow method
  10. Choosing the number of clusters II: Diminishing returns and the ROC method.
  11. Choosing the number of clusters III: Phase transitions
  12. An interlude: New results on learning mixtures of Gaussians
  13. Clustering as compression
  14. Clustering with outliers (by Sergei Vassilvitskii)
  15. Axioms of clustering (by Sergei Vassilvitskii) 
  16. Large-data clustering Part I: Clusters of clusters

Disqus for The Geomblog