Comments on The Geomblog: Finding the right reference for finding the majority element

@Anonymous: finding the most frequent elements is ...

2011-09-25T16:55:22.055-06:00

@Anonymous: finding the most frequent elements is certainly not trivial in the streaming model, i.e., when the algorithm is allowed to make a single pass and use limited storage. Misra-Gries algorithm, although simple, is an elegant solution to this problem.

It is true that the algorithm is not particularly ...

2011-09-25T16:11:27.400-06:00

It is true that the algorithm is not particularly time-efficient:

...and that is exactly the problem. Finding the most frequent element is trivial if time & space are not extremely limited.

@Anonymous: I am not sure why you believe that the...

2011-09-25T14:08:53.371-06:00

@Anonymous: I am not sure why you believe that the Misra-Gries algorithm, as proposed in the paper, does not work in the streaming model.
The algorithm (4) on page 5 of the
technical report considers all elements in the stream, in arbitrary order, and for each element performs the appropriate counter updates.

It is true that the algorithm is not particularly time-efficient: processing each element takes time O(k) or so. Still, this leads to roughly O(1/eps) update time when searching for elements that appear at least eps n times. This is not too bad if eps is not too small. Of course, we do know now how to do this much faster, as per the references you mentioned.

It should be observed that the generalization of t...

2011-09-24T10:26:18.494-06:00

It should be observed that the generalization of the majority algorithm for k counters as proposed by Misra-Gries doesn't work in the data stream model.

As Graham points out Karp et al. discussed this issue and Demaine et al. actually propose data structures to make the updates happen in O(1) time.

For tight bounds on the performance of the generalized algorithm we have those in Demaine et al. in the stochastic model and the those of Bose et al. for the adversarial model, both of which are tight.

This is clearly a case where single authorship (except for the vanilla majority algorithm which clearly belongs to Booyer and Moore) is hard to assign to a single group.

The k-counter algorithm as we use it today would be best described as the Misra-Gries-Karp-Papadimitriou-Shenker-Demaine-Lopez-Ortiz-Munro-Bose-Kranakis-Morin-Tang frequent items algorithm [or MGKPSDLMBKMT for "short" :) ]

Cool. What's interesting also is the connectio...

2011-09-24T03:18:39.986-06:00

Cool. What's interesting also is the connection between Boyer/Moore and Misra/Gries, as outlined in the link.

I did my best to reconstruct the history surroundi...

2011-09-24T03:16:21.451-06:00

I did my best to reconstruct the history surrounding this area when I surveyed the topic. See Sections 3.1 and 3.2 of
http://www.dimacs.rutgers.edu/~graham/pubs/papers/freqvldbj.pdf

Graham