Sorting is commonly meant as the task of arranging...

2004-07-31T09:28:00.000-06:00

Sorting is commonly meant as the task of arranging keys in increasing or decreasing order (or small variations of this order). Given $n$ keys underlying a total order, the best organization in an array is maintaining them in sorted order. Searching requires $\Theta(\log n)$ comparisons in the worst case, which is optimal.

We demonstrate that this basic fact in data structures does not hold for the general case of multi-dimensional keys, whose comparison cost is proportional to their length. In previous work [STOC94,STOC95,SICOMP01], Andersson, Hagerup, H{\aa}stad and Petersson study the complexity of searching a sorted array of $n$ keys, each of length $k$, arranged in lexicographic (or alphabetic) order for an arbitrary, possibly unbounded, ordered alphabet. They give sophisticated arguments for proving a tight bound in the worst case for this basic data organization, up to a constant factor, obtaining
\[
\Theta\left(\frac{k \log \log n}{\log \log (4 + \frac{k \log \log n}{\log n})}
+ k + \log n \right)
\]
character comparisons (or probes). Note that the bound is $\Theta(\log n)$ when $k=1$, which is the case that is well known in algorithmics.

We describe a novel \emph{permutation} of the $n$ keys that is different from the sorted order, and sorting is just the starting point for describing our preprocessing. When keys are stored according to this ``unsorted'' order in the array, the complexity of searching drops to
\[
\Theta\left( k + \log n \right)
\]
character comparisons (or probes) in the worst case, which is \emph{optimal} among all possible permutations of the $n$ keys in the array, up to a constant factor. Again, the bound is $\Theta(\log n)$ when $k=1$.

Jointly with the aforementioned result of Andersson et al., our finding provably shows that keeping $k$-dimensional keys sorted in an array is \emph{not} the best data organization for searching. This fact was not observable before by just considering $k=O(1)$ as sorting is an optimal organization in this case.

Ciao
-Roberto

Comments on The Geomblog: Apple, Wired and FOCS 2004.

Sorting is commonly meant as the task of arranging...