Given n items, determine whether a majority element (one occurring strictly more than n/2 times) exists. You are allowed one pass over the data (which means you can read the elements in sequence, and that's it), and exactly TWO units of working storage.This is a special case of the general Misra-Gries estimator for finding frequent items in a stream with small space. The MG estimator allows you to find (approximately) all items that occur more than $\epsilon n$ times using $O(1/\epsilon)$ space. Setting $\epsilon = 1/2$ yields the desired result.

This result has been proved and reproved a number of times, and there's a fairly general proof based on the fact that what you're really doing is approximating the true frequency to an additive error of $\epsilon n$ (and so any item that occurs more often than that will be retained).

What's more interesting is how you might go about

*proving*the basic statement from first principles if you don't know the proof. This came up at dinner, when I was talking to people who hadn't heard of the problem before and were able to reconstruct a proof on the fly.

So let's return to the basic statement, and let's be a little more precise. The precise guarantee required is as follows:

- If a strict majority element exists, you MUST return it
- If not, you can return any element you want (even the most
*infrequent*one)

Because of the second statement above, we only have to worry about streams in which a strict majority element

**does**exist.**Observation 1:**if you haven't seen a majority element in any prefix of the stream, you can throw that prefix away.

**Why:**we know there's a majority element somewhere. If it didn't show up so far, it has to be at least much of a majority in what's remaining.

**Observation 2:**This prefix could be as short as two elements long.

In other words, we see an element. Call it x. If we now see $y \ne x$, x is no longer a majority element

*so far*, and we can chuck it, starting a new stream at $y$.
But if we do see $x$ again, then it could be a majority element. How do we keep track of whether it ever stops being a majority element ?

We could keep track of the number of items seen so far. But that's an extra counter. Why not just pair instances of $x$ with instances of other elements seen so far by subtraction. If any time we hit zero, we can invoke observation 1 and start again.

If not, then we will end with a nonzero count for $x$, which must be the correct element.

And this gives us the exact algorithm we need.