Wednesday, May 09, 2012

Multiplicative Weight Updates as Zombie Binary Search

The multiplicative weight update method (MWU hereafter) is a neat algorithm design technique with applications in machine learning, geometry and optimization among others. However, it's viewed (and discussed) as an advanced technique, with very technical examples requiring lots of background (see the Arora-Hazan-Kale survey first example).

After pondering the use of MWU for a while now, it seems to me that this should be taught as a standard algorithms tool that naturally follows from divide and conquer and prune-and-search. In what follows, I'll sketch out how this might work.

A quick caveat: the MWU, like many powerful tools, has multiple interpretations, and the survey hints at many of them (for example, MWU  = derandomization of Chernoff bound). I don't intend to imply that there's only one way to interpret the technique, but that the approach I'll describe is the most accessible one when learning the method for the first time.



The divide and conquer unit in an algorithms class might cover sorting and FFTs, driven by the standard D&C recurrence relation. Prune and search is introduced as a variation, with median finding and binary search being the prototypical examples.

Prune-and-search works like magic, if you're seeing it for the first time. Do some work, throw away the rest, and your $n \log n$ running time goes down to linear. (As an Indian of a certain age, this always reminds me of "thoda khao, thoda phenko" (eat a bit, throw a bit) from Jaane Bhi Do Yaaro).

But what makes it tick is determining the constant factor that needs to be thrown away. The most direct way to do this is deterministically: on a linear time budget, find a point that splits the input into two roughly balanced parts.

But that's expensive in practice. Indeed, algorithms that use median finding as a subroutine try to avoid using a deterministic procedure because it can be slow. When you get to more advanced methods like parametric search, it gets even worse.

The neat trick to apply here is to randomize ! We know that we don't have to find an exact split point - merely one that will approximately balance the two sides of the recursion. For median finding, we can pick three points at random, and take their median. With a sufficiently high probability, this median will create a 1/3-2/3 split, and away we go !

This idea surfaces again and again, especially in the many randomized algorithms in computational geometry. As a design strategy, it's quite effective - design an algorithm that works if you can do balanced splits, and then find the split by choosing randomly. Invoke the Chernoff God and some satellite deities, and you're done.

Which brings us to the MWU.

Randomized splitting says that we're willing to lie a little about the split process, and things still work out. But whether you do randomized splitting or deterministic, the end result is still that you throw away some reasonable fraction of the input, NEVER TO SEE IT AGAIN.

Suppose you can't even do that ?

Think of noisy binary search (or 20 questions with a liar). Now, even your decision on what to prune is error-prone. You might be eliminating things that you need to consider later. So it's not clear that you can make any kind of search progress. But let's be reasonable and limit the adversary (or the noise) in some manner. Let's say that you'll only misclassify a small fraction of the input in each iteration (where small is "strictly less than half"). Let's also assume that I can tell you how important certain points are, so that you have to take that into consideration when defining your "small fraction".

So how does MWU now work ?  I tell you a distribution over the input, and you give me back a rule that's reasonably good at (say) classifying points. I increase the importance of things you made mistakes on (so you try not to do it again), and repeat.

Eventually, what happens is that the weight of things that you make mistakes on increases rapidly. But the overall weight of the input can't increase too much, because I only increase the weight of things you make mistakes on, which is not a lot. Intuitively, what happens is tha the first weight catches up with the second, at which point the algorithm is complete. You can show that if the updating schedule is chosen correctly, the process terminates in logarithmic steps.

Essentially, MWU functions as a zombie binary search. You want to kill elements, but they keep coming back. Thankfully, each time they come back they're slightly weaker, so you have a better chance of hitting them next time. Eventually, head is severed from neck, and your zombies are dead (and your points are found).

My only wish at this point is that whenever you think of MWU you think of zombies. Now that's impact :)

5 comments:

  1. In my mind this invokes a graphic comic strip :)
    (Randall Munroe might be up for making one)

    ReplyDelete
  2. Thanks for the great zombie intuition! I found another (extended) version of the survey at http://ie.technion.ac.il/~ehazan/papers/MWsurvey.pdf

    ReplyDelete
  3. I am not convinced that MWU is one unified techniques. My take on the usage of MWU in CG that it is just a technique for solving LPs. You can get low-crossing spanning trees from LP without the reweighting technique (I have a note on this), and the same goes for geometric set cover.
    --S

    ReplyDelete
  4. Sariel, that's why I had the disclaimer. There are a number of ways of thinking about MWU, but the LP-based view is not something that's accessible to students doing D&C - the view I proposed is a little more gentle.

    I actually wanted to do a more general post on the different views of MWU, but that was getting too long and complicated. In particular the derandomized Chernoff view is quite interesting.

    ReplyDelete
  5. Avrim Blum has some nice slides that present multiplicative weights through this interpretation in the expert advice setting. He starts with the setting in which there is one perfect expert who is always right -- you just don't know who he is ahead of time. Here, you can use the halving algorithm, which is essentially binary search. When you no longer have a perfect expert, this is too brittle, and so you move to the weighted majority algorithm, and finally randomized weighted majority (which is the same algorithm as multiplicative weights). I like this way of presenting these algorithms -- I taught the randomized weighted majority algorithm to my undergrad algorithmic game theory students this semester using this progression.

    ReplyDelete

Disqus for The Geomblog