## Monday, July 18, 2011

### Axioms of Clustering

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it..."
-- Justice Potter Stewart in Jacobellis v. Ohio.
What makes one clustering better than another? To answer that question we have previously assumed that a well motivated objective function was given to us. Then the situation is easy, we compute the value of the objective and decide accordingly. In practice, however, clustering is often an intermediate step in a longer chain of computations, and there is not specific motivation for one objective function over another. Debating the pros and cons of the multitude of possible objective functions can easily take hours. Instead, we can further the I know it when I see it'' intuition by writing down the properties that any good clustering objective should satisfy and see if this axiomatic approach guides us towards the right solution.

This approach was undertaken by Kleinberg in his work on clustering axioms. A clustering function is one that takes a set of points (and their pairwise distances) and returns a partition of the data. The function should abide by the simple axioms:

• Scale-Invariance: If all distances are scaled by a constant factor, the clustering should not change. Put another way, measuring the distance in miles or kilometers (or nanometers) should not change the final clustering partition.
• Richness: Depending on the pairwise distances, any partition should be possible. For example, the clustering should not always put two specific points $x$ and $y$ together, regardless of the distance metric.
• Consistency: If the pairwise distance between points in a cluster decreases, while the distances between a pair of points in different clusters increases, the clustering should not change. Indeed, such a transformation makes clusters tighter' and better separated from each other, so why should the clustering change?
Each of the axioms seems reasonable, and it is surprising that there is no clustering function that is consistent with all three axioms ! The trouble lies with the last axiom: by changing the distances we require that a clustering stay the same, but an even better clustering may emerge. For example, consider points lying in several `obvious'' clusters, and proceed to move one of these clusters out to infinity. As we move the points further and further out, the resulting dataset appears to be two-clusterable (see the image below).

This leads to a slightly different tack at this problem. Instead of thinking about a specific clustering, or a specific partition of the data, instead we try to define an objective function, so that the optimum clustering under that objective behaves in a reasonable manner. This is the approach adapted by Ackerman and Ben-David. Let $m$ be a function measuring the quality of the clustering. As before, Scale-Invariance requires the measure $m$ to be independent of the units on the distances and Richness requires that every clustering can be the optimum (under $m$) clustering.

The Consistency axiom is the one undergoing a subtle change. Instead of requiring that the clustering stay the same if such a perturbation to the distances occurs, we only require that the score, or measure of the clustering {\em improve} under such a transformation. This breaks the counterexample above -- while the score of a 1-cluster decreases as we move the distances, the score of a 2-clustering decreases even more and surpasses the score of the 1-clustering. Indeed, the authors demonstrate a set of different clustering metrics that are consistent under this set of axioms.

-- Sergei Vassilvitskii

## Friday, July 01, 2011

### SODA 2012 submission question

This is what the SODA 2012 submission guidelines say (emphasis mine):
The submission, excluding title page and bibliography, must not exceed 10 pages (authors should feel free to send submissions that are significantly shorter than 10 pages.) If 10 pages are insufficient to include a full proof of the results, then a complete full version of the paper (reiterating the material in the 10-page abstract) must be appended to the submission after the bibliography. The length of the appended paper is not limited, and it will be read at the discretion of the committee. A detailed proof in the appended complete version is not a substitute for establishing the main ideas of the validity of the result within the 10-page abstract.
If I'm reading this right, you can't just shunt certain proofs and exposition to an appendix, as was previously done. You have to make an entire full version of the paper and append it to the shortened version. Is that correct ?

Update (7/5): I have an update from Yuval Rabani, the PC chair, which resolves the question (in the affirmative):
This is an experiment I introduced in response to conflicting attitudes in the PC. The idea is that some PC members would like to read the entire paper, and it's hard to follow the paper when you have to jump back and forth between the 10 page abstract and the appendix. So the idea is that most committee members will just print the 10 page abstract, and those
interested in the entire paper will print the attached full version. Since this is an experiment, I don't intend to enforce the guidelines very strictly, but if it's not too much of an effort on your part, we do prefer a submission according to the rules.