Friday, July 02, 2004

Apple, Wired and FOCS 2004.

Jeff Erickson snarks about Apple's discovery of the wonders of searching as-opposed-to sorting. Just to demonstrate that our community is on the cutting edge of technology trends, I present to you, hot off the presses, from the FOCS 2004 accepted papers list:

No Sorting? Better Searching!
Gianni Franceschini and Roberto Grossi

No paper link yet, alas...

1 comment:

  1. Sorting is commonly meant as the task of arranging keys in increasing or decreasing order (or small variations of this order). Given $n$ keys underlying a total order, the best organization in an array is maintaining them in sorted order. Searching requires $\Theta(\log n)$ comparisons in the worst case, which is optimal.

    We demonstrate that this basic fact in data structures does not hold for the general case of multi-dimensional keys, whose comparison cost is proportional to their length. In previous work [STOC94,STOC95,SICOMP01], Andersson, Hagerup, H{\aa}stad and Petersson study the complexity of searching a sorted array of $n$ keys, each of length $k$, arranged in lexicographic (or alphabetic) order for an arbitrary, possibly unbounded, ordered alphabet. They give sophisticated arguments for proving a tight bound in the worst case for this basic data organization, up to a constant factor, obtaining
    \Theta\left(\frac{k \log \log n}{\log \log (4 + \frac{k \log \log n}{\log n})}
    + k + \log n \right)
    character comparisons (or probes). Note that the bound is $\Theta(\log n)$ when $k=1$, which is the case that is well known in algorithmics.

    We describe a novel \emph{permutation} of the $n$ keys that is different from the sorted order, and sorting is just the starting point for describing our preprocessing. When keys are stored according to this ``unsorted'' order in the array, the complexity of searching drops to
    \Theta\left( k + \log n \right)
    character comparisons (or probes) in the worst case, which is \emph{optimal} among all possible permutations of the $n$ keys in the array, up to a constant factor. Again, the bound is $\Theta(\log n)$ when $k=1$.

    Jointly with the aforementioned result of Andersson et al., our finding provably shows that keeping $k$-dimensional keys sorted in an array is \emph{not} the best data organization for searching. This fact was not observable before by just considering $k=O(1)$ as sorting is an optimal organization in this case.



Disqus for The Geomblog