However, perhaps one of the most important heuristics that is used in practice is not discussed, and I have yet to find a source that examines its power fully. This heuristic is "Alternating minimization", and appears "in the wild" in two wildly popular heuristics, the k-means algorithm in clustering and the Iterated Closest Pair algorithm in model registration.

Suppose you have a cost function D defined over the product of domains P and Q. For concreteness, let's say that P is the space of all assignments of points to clusters, and Q is the space of all assignments of centers to clusters. The standard k-means cost function asks for an assignment of points to clusters, and an assignment of centers to clusters, so that the sum of squared distances of points to their cluster centers is minimized.

In general, such an optimization is hard to perform. However, it is often the case that for a fixed p in P, we can minimize D(p, Q), and vice versa. Continuing the example above, using Euclidean distance as the underlying distance, it's easy to see that for a fixed set of points in a cluster, the minimizing center is their centroid. Moreover, for a fixed set of centroids, the assignment is a straight nearest neighbour assignment.

This is the idea behind alternating minimization. Starting with arbitrary p0 and q0, we repeat the two-step procedure:

- p_i = arg min D(p, q_{i-1})
- q_i = arg min D(p_i, q)

Now there are three questions that one usually has to answer for an alternating minimization. The first one is relatively straightforward:

Does it converge ? In general, it's often fairly easy to write down a potential function that decreases strictly with each iteration and is always positive. Thus, convergence can often be established for this procedure.

Does it converge to the optimal solution ? In general, no. This procedure can often get stuck in local minima. However, (and this is one of the more attractive aspects of this heuristic), Csiszar and Tusnady showed that as long as a certain "5-point" property holds (a simple inequalities involving iterates of the procedure), the process will converge to the optimal solution. In their paper, this result is used to show convergence when P and Q are convex sets of distributions, and D is the Kullback-Leibler distance.

Another useful example where the procedure converges is when P and Q are convex sets, and D computes the closest distance between the two sets. In that case, each iterative step determines the closest point in a convex set closest to a given point.

A variant of the above question is: does it converge to a close-to-optimal solution ? Again, in general, one cannot say anything useful about this. However, for k-means, recent work by Arthur and Vassilvitskii, and Ostrovsky, Rabani, Schulman and Swamy, has shown that choosing start points carefully can yield provable bounds on the quality.

Does it converge in polynomial time ? Perhaps the most important question about this heuristic is whether it converges in polynomial time. For k-means, it can be shown that there are examples that will force the heuristic to take superpolynomial to converge. However, recent results in smoothed complexity have shown that if the input is perturbed ever so slightly, then the iterative procedure converges in polynomial time (with high probability). In a sense, one can view this result as explaining the success of the method in practice.

Alternating minimization appears in many domains, and is a common trick for "solving" complex optimizations. It should definitely be taught wherever heuristics are discussed.