Friday, February 25, 2005

SODA Outtakes: Lloyd's algorithm and analysing heuristics

Uri Feige's invited talk at SODA 05 was on 'Rigorous Analysis of Heuristics for NP-hard Problems'. His main argument (at least in the initial portion of his talk) was that it's a good idea to analyze known heuristics for problems, because they work well in practice, and a rigorous analysis will shed light on why.

[Aside: in the theoryCS community, a "heuristic" usually refers to a procedure that has no provable performance guarantees; this could mean either no clear running time analysis, or no clear quality of optimization guarantee. The term "algorithm" is usually reserved for a procedure that has well-defined steps and ends in a well-defined state after a definite (finite) number of steps. One will occasionally find the term 'algorithm' used to refer to a heuristic in non-theory areas; beware !]

Doing such analysis, although maybe not as "sexy" as coming up with a new algorithm, is important also for "sociological" reasons. Although it might technically be correct to argue that a particular heuristic is not reliable and thus should not be used, the resolute theoretician is often faced with a mountain of empirical evidence in support of said heuristic, compared with the possibly complex and hard-to-understand algorithm that she proposes.

Some of the best known heuristics show up in clustering, especially since this is a problem that transcends many different application domains, including areas that are not as familiar with theoretical algorithmic methods. One of the best known methods for clustering with k centers is Lloyd's algorithm (or the k-means algorithm), which works essentially as a discrete variant of expectation-maximization. The algorithm itself is very simple:
1. Choose k "centers"
2. Assign each point to its nearest center.
3. Compute the k centroids of points assigned to a given center. These are the new k "centers".
4. Go back to step 2.
It is known that Lloyd's algorithm will hit a local minimum fairly quickly (though not a global one: k-means is NP-hard). However, there is no analysis of how quickly this happens. In a SODA 2005 paper, Har-Peled and Sadri examine the complexity of k-means. They show that:
1. In 1D, and in d-D (on the grid), the number of iterations of the k-means algorithm is polynomial in the number of points, the dimension, and the "spread" (the ratio of max distance to min-distance).
2. Even in 1D, there are example point sets on which the k-means algorithm takes a linear number of iterations to converge.
What they also show is that slight modifications to the k-means heuristic result in algorithms that converge in time independent of the dimension (and with much better upper bounds). The simpler of their two schemes can be described as follows: in Step 2 of the original approach, merely move one "misclassified" point to its true cluster, rather than all.

They go on to present empirical evidence in support of their claim that these algorithms generate similar quality answers (in similar time) to k-means. This part is still a work in progress: there are very fast implementations of k-means that they don't compare against, (but the data structures might be used to speed up their approach as well).

It is not clear to me that their paper answers to any degree the question of why Lloyd's algorithm works as well as it appears to. However, the fact that they can prove bounds for their alternate algorithms suggests that maybe this is a line of attack to take when analyzing Lloyd's method. These alternates are also quite simple, and so could conceivably be used as k-means replacements if one wished guaranteed running time bounds.

One final point: the authors have their code available online. Would that more would do the same !

1. test

Posted by Suresh

2. Suresh says "It is not clear to me that their paper answers to any degree the question of why Lloyd's algorithm works as well as it appears to. "

It seems to me that your evaluation above is much
more negative than warranted. Why isn't the analysis
a large step towards asnwering things about Lloyd's
algorithm? They are presenting a worst case
lower bound. The modified algorithm
retains the intuition of the original algorithm
while giving performance bounds. A more refined
analysis would need understanding in some sense
the average case and here there can be many
interpretations. At the end of the day one
might still not "understand" why Lloyd's algorithm
works.

Posted by Chandra Chekuri

3. The gap IMO rests in the experimental setup. I don't say that the paper is deficient in this regard, but that any modification of k-means that claims to provide intuition about how it works should behave in the same way. They have some experimental evidence supporting this, but more is needed.

Maybe I should have relaxed my statement. There is some progress that has been made (which is why I mentioned this paper in the first place), but it is still unclear how to connect SinglePnt (which seems the more natural of the two alternates) and k-means.

At first cut, it appears that k-means is much more efficient because of the multiple misclassifications it fixed in each iteration. The results on SinglePnt suggest that this might be an illusion. Can some kind of dominance be proved ? Moving many points is not much better than moving one etc ?

Posted by Suresh

4. BTW, this work is hopefully a first step on this problem. Although me & Bardia went with it as far as we could, there is still a huge gap between the lower bound and the upper bound. It would be really nice to close it or just improve it, but I think it would require a new idea. A super linear (resp. quadratic) lower bound would be (resp. very) exciting even in high dimensions.

My intuition is that SinglePnt dominates the k-means method. So, my intruition is that our analysis in fact would also hold for this case. However, proving this connection I am afraid is going to be very painful, since the behavior of k-means seems to be intricate.

In fact, in the talk, I said that I "conjecture" that there is such a realtionship. The double quatation is because we want credit for this conjecture only if it is correct.

In short, this paper is wide open for further improvement...

Posted by Sariel Har-Peled