tag:blogger.com,1999:blog-6555947.post108879338392440133..comments2014-01-12T10:46:48.153-07:00Comments on The Geomblog: Apple, Wired and FOCS 2004.Suresh Venkatasubramaniannoreply@blogger.comBlogger1125tag:blogger.com,1999:blog-6555947.post-1091287697385770752004-07-31T09:28:00.000-06:002004-07-31T09:28:00.000-06:00Sorting is commonly meant as the task of arranging...Sorting is commonly meant as the task of arranging keys in increasing or decreasing order (or small variations of this order). Given $n$ keys underlying a total order, the best organization in an array is maintaining them in sorted order. Searching requires $\Theta(\log n)$ comparisons in the worst case, which is optimal.<br /> <br />We demonstrate that this basic fact in data structures does not hold for the general case of multi-dimensional keys, whose comparison cost is proportional to their length. In previous work [STOC94,STOC95,SICOMP01], Andersson, Hagerup, H{\aa}stad and Petersson study the complexity of searching a sorted array of $n$ keys, each of length $k$, arranged in lexicographic (or alphabetic) order for an arbitrary, possibly unbounded, ordered alphabet. They give sophisticated arguments for proving a tight bound in the worst case for this basic data organization, up to a constant factor, obtaining<br />\[ <br />\Theta\left(\frac{k \log \log n}{\log \log (4 + \frac{k \log \log n}{\log n})} <br /> + k + \log n \right) <br />\] <br />character comparisons (or probes). Note that the bound is $\Theta(\log n)$ when $k=1$, which is the case that is well known in algorithmics.<br /><br />We describe a novel \emph{permutation} of the $n$ keys that is different from the sorted order, and sorting is just the starting point for describing our preprocessing. When keys are stored according to this ``unsorted'' order in the array, the complexity of searching drops to<br />\[<br />\Theta\left( k + \log n \right)<br />\]<br />character comparisons (or probes) in the worst case, which is \emph{optimal} among all possible permutations of the $n$ keys in the array, up to a constant factor. Again, the bound is $\Theta(\log n)$ when $k=1$.<br /><br />Jointly with the aforementioned result of Andersson et al., our finding provably shows that keeping $k$-dimensional keys sorted in an array is \emph{not} the best data organization for searching. This fact was not observable before by just considering $k=O(1)$ as sorting is an optimal organization in this case.<br /><br />Ciao<br />-RobertoAnonymousnoreply@blogger.com