Tuesday, August 31, 2004

Maybe it's just a train wreck.

Success catastrophe: a term I first heard in AT&T, used to describe a project that goes so well that higher ups keep demanding more and more (when you work at a company, being on the radar screen of the powers-that-be is a mixed blessing).

Catastrophe of success: Frank Capra's biography.

Catastrophic success: Apparently, the war on Iraq.

Any more suggestions ?

Talk styles

So I just finished my talk, and it appears to have been well received. Not being a DB regular, I took a somewhat more general tone with my slides, and listening to the other talks in my session, I realized that some notions I could have assumed audience knowledge for.

It brings up a duality in talk styles: does one:

* go into lots of details, explaining how specific things were done.
* present a general overview, glossing over details ?

As with all dualities, the answer is "it depends". On the one hand, if you give a talk where you say at some point, "And now we shall solve subproblem B with this trick that I won't get into in this paper", the people who came to your talk precisely because they had been banging their heads over subproblem B to no avail will be rather disappointed.

On the other hand, a lively and spirited discussion among 3 people about the solution to subproblem B might be lost on the 100 or so other people who have no clue as to why it was such a big deal.

My personal preference is what I'll call 'the big tent'. Keep things at a fairly high level, assuming that someone interested in the problem can read the paper with the roadmap one provides.

But (and here is where good judgement and the art of presentation comes in), make good choices about what details to present and what not to present. Some details are illuminating; some are not. Knowing which is which differentiates a good speaker from a not-so-good one.

Monday, August 30, 2004

What do DB people do ?

For people who know how I carp about databases in general, this will be deliciously ironic:

I was attempting to get my laptop to talk to the wireless system in the business center of the VLDB hotel, and the person managing the center comes up to chat. It turns out that he is taking computer classes in night school and asked me (of all people) what it is that database researchers do. It was interesting that his first thought was that they do something with Oracle. I think I successfully defended DB folk against this scurrilous baseless rumor...

As an aside, a friend of mine who used to teach databases to business school folks frequently complained that they all came into the class expecting to learn how to use Micro$oft Access.

Nosy gmail

Blogging will be light: am at VLDB in Toronto.

I was exchanging mail with a colleague about a problem in dynamic graph algos. I was doing this on my gmail acct for various reasons. I just noticed the ads that google placed next to the conversation:

1. Need an algorithm ?
2. Java graph layout library
3. Find shortest paths

Moreover, in the "related links" section, it pointed me to
(a) a paper on maintaining MSTs in graphs by Henzinger and King, and
(b) Information on Dijkstra's algorithm from NIST.

Spooky...

Thursday, August 26, 2004

Graphics

In response to a previous post, Chandra asks:
Turing awards are typically given for work
that induce a paradigm shift. Not being very
knowledgeable about graphics, I would like to
know whether there have been any recent paradigm
shifts in graphics.
Hmm. I am not an expert, but here are some things:
  • The switch from vector to raster graphics
  • The use of specialized hardware (the graphics card) for rendering operations
  • (and this is far too soon to tell) the advent of programmability in graphics cards.
all of these are fairly recent. It would be fair to say that much of the modern state of graphics reflects innovation that happened starting in the late 80s and going forward, and considering that the most recent Turing Award was given for something invented in the late 70s....



As an aside, wouldn't it be nice to have RSS feeds for comments ? I don't have any illusions that people are falling over themselves to hear what I have to say, but the only easy way for me to respond to comments in a way that they can find out is to reply directly (not always possible) or post a new entry.

Haloscan used to do this, but blogger doesn't... yet....

Tuesday, August 24, 2004

Referencing

I was chatting with people at SIGGRAPH and remarking on the fact that there is only one Turing Award winner for graphics (so far). The people there made the interesting argument that one of the reasons theoretical contributions appear to get Turing Awards more frequently (more on this below) is that theoretical papers tends to frame a problem in the context of related work far more meticulously.

Now this is something I have heard in other contexts as well (when seeing reviews of papers, reading papers etc). It is not too hard to see why this might be the case; theoretical problems are usually crisply defined, and it is almost always clear what results/techniques influence a given work (though not completely clear).

As an argument in favor of sound referencing, I like this one. We are taught (implicitly or explicitly) that referencing is important to place work in context, to acknowledge other's work in the "society of research", and ultimately to recognize that one's one work is never done in isolation but builds upon that of others. The argument that proper referencing creates a trail back to a source of inspiration is more than 'citation counts count !': it emphasizes what's important about careful referencing: that it builds a structured edifice of knowledge whose value can be perceived objectively.

Back to Turing Awards:

According to my highly unscientific classification method, theory-like disciplines (Algorithms/Complexity/Logic/Programming Languages) took a large, but not overwhelmingly large share of Turing Awards:

(Contains theory-like substance): 18 (11 Alg/Complexity + 6 PL + 1 Logic)
(We build stuff): 10 (3 DB + 7 Compilers/Arch/OS/)
(We are stuff): 5 (4 AI/1 Robotics)
(We count stuff): 2 Numerical Analysis

What category does Dijkstra fall into ? I left him out of this...

Saturday, August 21, 2004

War and Graphics

A few days ago, on the SIGGRAPH blog, I remarked on the number of simulations directed towards military applications. Now this kind of thing has been going on for a while within the graphics community, and I am sure none of them are surprised by this.

However, in what can only be viewed as a sign of the times (no pun intended), both the New York Times Magazine and the September issue of Wired carry articles describing the level of sophistication military simulations have reached in training soldiers for combat situations. Both articles focus on a particular software package called Full Spectrum Warrior as one of the primary candidate tools for training soldiers (FSW is available as a commercial tool; the military version has more features and more accurate military plans than the commercial version).

The focus (contrary to the popular impression conveyed by games like Doom and Quake) is not on first-person orgies of murder and mayhem; many of these simulations, and indeed many games nowadays, focus more on story telling and scenarios/strategies. Some military training game don't even allow the player to fire a weapon, requiring that they handle situations without the use of force.

The NYT article gets bonus points for managing not to mention Ender's Game. The Wired article stays strong for a while, and then flubs the ending, not only mentioning the book, but mangling the title.


p.s The game industry is paying their PR staff well...here's how games are taking over Hollywood

p.p.s More coverage of this on /. (follow the links in the blurb)

Friday, August 20, 2004

Open Problems (redux)

One of my first posts was about finding interesting problems to work on. Adam Klivans, guest-posting over at the complexity blog, mentions that COLT has an extremely civilized approach for dealing with this:
For the last few years, COLT has glorified the open problems section and allocates about an hour of time for a presentation of open problems. The open problems themselves must be submitted months beforehand and are refereed (how rigorously is anyone's guess); accepted problems appear in the proceedings. A list of this year's open problems can be found on the COLT 2004 program schedule -- the session was held on Friday evening.
Thus I am happy to note that this year's fall workshop on computational geometry will have a similar focus on open problems. From the call for abstracts:
To promote a free exchange of questions and research challenges, there will be a special focus on Open Problems, with a presentation on The Open Problems Project, as well as an Open Problem Session to present new open problems. Submissions are strongly encouraged to include stand-alone open problems, which will be collected into a separate webpage and considered for inclusion in The Open Problems Project.
This workshop is the 14th in a series of CG workshops that are always a pleasure to attend: the focus is on interactions and discussions, rather than presentations, and the atmosphere is always very relaxed. So mark your calendars for Nov 19-20 in Boston.

Thursday, August 19, 2004

Timing a talk

A nifty link I found via the Dynamist Blog:

It's an online stopwatch to time yourself. Handy if you're like me and never carry a watch, and need to time yourself when preparing a talk.

A survey on inapproximability

Not to infringe on the Complexity blog :), but I found this survey by Luca Trevisan at the ECCC to be a very well written birds-eye view of the developments in inapproximability from slightly before the main PCP results till now (in fact some of the results mentioned are to appear at FOCS 2004).

It covers the early attempts at showing approximability by gap-introducing reductions, spends a little time (although not a lot) on the growing body of work relating proof systems to inapproximability, and then spends a lot of time on developments post PCP.

What I like about it, as a non-expert in complexity, is that it explains some of the basic PCP-based methods for proving hardness very lucidly, using examples like clique and set cover. It also helps one understand the feverish developments in the field leading to the slew of optimal inapproximability results in the late 90s.

There has been a recent flurry of breakthrough complexity results, some of which are to appear at FOCS. The survey addresses some of these new results as well, and what they could imply for the knottier problems like vertex cover etc.

Probably one of the most interesting developments over the past few years is a set of hardness results that destroy the 5-class inapproximability hierarchy that people had thought to map out the space of approximations. New results showing tight log* n hardness, log log n hardness, and even log2 n hardness results indicate that approximation classes might be a lot finer than people had imagined.

Tuesday, August 17, 2004

What is it ?

I was wandering the corridors of AT&T one day and picked up a gizmo that looked like this:



Rather puzzled by it, I took it to lunch, where I was peremptorily ridiculed for basically being too young to know what it was. As it turns out, this is the "type ball" of the IBM Selectric, an ingenious gadget that allowed one to change types on the fly (the ball had all the characters of a particular type on it, and would rotate as keys were pressed on the typewriter.

This story would be highly unremarkable if not for the fact that William Gibson, author of Neuromancer and cyberpunk pioneer, wrote a little rhapsody to the Selectric type ball.

[Dead Tech backgrounder, for extra points: In the decade or so prior to the advent of personal computers, IBM produced an electric typewriter called the Selectric; this had a “type-ball”, a metal sphere about the size of a golf ball, which held an entire font; you could switch fonts, which at the time was little short of miraculous; you could also, even more amazingly, power-correct mistakes with a built-in paper-colored ribbon. The IBM Selectric, when I started writing for publication, was the most shit-hot professional writing machine on the planet; by the time I could have afforded one, they were propping up broken barbecue grills in Value Village. The Finn’s shop probably has at least one box of Selectric type-balls, somewhere; they are beautiful sculptural objects, these balls, and won’t be easily thrown away.]

Monday, August 16, 2004

PCPs for geometric problems

Many geometric problems (especially those involving rectangles in the plane) are MAX SNP-hard; some are even log n hard via set cover reductions, and there is at least one problem (matching of point sets) that is polynomially hard.

However, as far as I am aware of, most of the reductions are via L-reductions from other hard-to-approximate problems. In fact the only "from first-principles" hardness results for geometric problems (i.e directly from a PCP) that I am aware of are

1. The result by Brieden showing that width is hard to approximate to any constant
2. The extension by Varadarajan, Venkatesh and Zhang that shows it is hard to approximate to within logd n, for some d > 0.

In both cases, the "source" problem is Quadratic Programming. One might quibble that even in this case, the reduction is really an L-reduction from QP, but in both results some tinkering with the PCP is necessary to get the desired bound.

Are there other problems that have "from first principles" hardness results ?

Sunday, August 15, 2004

A trail of thought...

The knowledge contained in a research paper represents only a small fraction of the trips the author(s) took through the landscape of the subject. There are countless counterexamples, lemmas that didn't work, statements that were trivial, and ideas that fell by the wayside, none of which make it into a written document, and only occasionally see the light of day (in a casual talk, or a one-on-one conversation).

But these lost trails are useful as well. they are landmarks of places not to go, of things unexplored, of directions taken and backed away from. It is a pity that one cannot record these as part of the research log, in some form.

Vannevar Bush talked about these things nearly 60 years ago, in a prescient essay titled 'As We May Think'.
This is the essential feature of the memex. The process of tying two items together is the important thing.

When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item.

Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together to form a new book. It is more than this, for any item can be joined into numerous trails.

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outranged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

There are many more gems in the essay: worth a read.



My link trail: Boing Boing -> Battelle On Search.

Thursday, August 12, 2004

Perl Swearing

In this article, Paul Graham comes up with a really funny description of Perl:

"When I talk about ugly source code, people will of course think of Perl. But the superficial ugliness of Perl is not the sort I mean. Real ugliness is not harsh-looking syntax, but having to build programs out of the wrong concepts. Perl may look like a cartoon character swearing..."


The article is about Python programmers: makes for interesting reading.



p.s SIGGRAPH is finally over. It was overwhelming, fascinating, boring, and LONG; all at the same time. My conference-ending-timer kicked in around Monday, and it has been a long haul since.

Tuesday, August 10, 2004

Real vs "Real"

From the SIGGRAPH blog:
An undercurrent that occasionally bubbles up in talks/comments is the feeling that graphics is "not real", that it is only "smoke and mirrors". Bruce Sterling (when he wasn't busy being pleased with himself) hammered this point home as well during the beginning of his keynote address. Given that from where I stand in theory-land, graphics is about as "real" as it gets, this was interesting and not a little surprising.
Read more...

I've been spotted !

It appears that my SIGGRAPH entries over the past few days have not gone unnoticed. I was invited to post entries to the official SIGGRAPH blog. I will be crossposting my most recent entries there, and will be posting new notes there.

I'll also provide a brief summary (and links) here in case people don't want to switch over.

Monday, August 09, 2004

Demo or Die...

Another interesting concept at SIGGRAPH is called 'Demo or Die'. The premise is as follows:

* you have to conduct a live demo of whatever gizmo you've cooked up.
* you get three minutes to present it
* After your demo, the audience gets to vote.

Now the voting itself is a neat idea. The organizers distributed laser pointers (yes, over a 1000 of them) to everyone in the audience. There was a large screen up near the speaker podium, and after each demo, two large boxes saying Demo and Die were broadcast on the screen. You pointed your laser pointer at the screen (these are very powerful pointers) and with some nifty software/hardware, the number of dots on each side were tallied to decide whether the presentation survived or not.

Surprisingly enough, the Law of Demos only kicked in once; most demos ran seamlessly. Which is not to say that they were all good. Some were quite nifty, involving haptic devices.

An amusing moment: A demo participant from Microsoft showed a haptic device for playing Jenga online. I had dinner with him the night before, and he was describing the demo to me. I couldn't help but remember Uri Zwick's talk on Jenga from SODA 2002. It seemed fitting that we prove theorems about Jenga, while graphics people build demos about Jenga.

Sunday, August 08, 2004

Stranger in a strange land...

So after the somewhat more civilized GP2 workshop, I am now at SIGGRAPH itself. It's being held in the LA convention center, and if any of you have ever been to a home improvement or other such convention, you will get an idea of the scale of this beast.

The main order of business is the Fast Forward Paper Review: the idea is that all authors of the 83 accepted SIGGRAPH papers get 50 seconds to present a capsule preview of their work. It's an interesting concept, and with over 2,000 people (out of 20,000) attending the conference itself, it is probably essential.

The review itself is held in a dark and cavernous curtained-off section of the convention center. Joe Marks, the PC chair, just announced the start of the review, and from the sound of the applause, you'd think you were at a rock concert.

It's quite the setting: there are three huge screens for the presentation and another huge screen displaying the presenter. Not all the presenters are good at this format though: some people run through reduced versions of their actual presentations. Powerpoint never looked so out of place...

Some highlights:
1. Rendered gems that look like real !!
2. Graphical origami: how to make a paper model from a mesh. You CAN Touch this !
3. Triangles are toast; Our very own Nina Amenta renders with point clouds.
4. A Mission-Impossible-themed movie....on tensors.


Random thoughts:
* You can tell that this is Hollywood-driven: many presentations go "suppose you have this picture, and your director wants it modified to that picture"
* It does put paper-writing in perspective: how many paper ideas would survive if you knew you also needed a funky 50 second advertisement for them ? Reminds of the maxim (by Feynman?): if you can't explain what you are doing in one sentence to a child, you are probably not doing anything interesting.

Saturday, August 07, 2004

Heard in passing...

Things heard at the GP2 Workshop:

Fred Brooks:
In the days of IBM 7950 (1961), you would program one 250 byte instruction, go for lunch, program one more, and then leave for the day.

Keith Cooper:
In 1980, a compiler could generate code that ran at 85% of peak CPU power (on a VAX). The figure now is 5-15%.


Friday, August 06, 2004

Hi ho, hi ho,...

it's off to SIGGRAPH I go...

A rite of passage that many fellow geometers have gone through: I will be hanging out with the only group of people who can hope to get both Turing awards and Academy awards (take that, C. P. Snow).

<shameless-shill-for-my-research>
I go to do God'sTuring's work: presenting some results on theoretical analyses of GPU algorithms at the GP2 workshop. If I can make it through the workshop without getting into any arguments about the value of theory, I will consider it a success.
</shameless-shill-for-my-research>

SIGGRAPH gets over 2,000 attendees; considering how we do cartwheels at SODA when we get 300 folks, I expect significant culture shock.

Update: Did I say 2,000 ? I mean 20,000. Sigh....

Wednesday, August 04, 2004

Scott Aaronson is a very patient man...

The latest P=NP saga on comp.theory involved soap bubbles: the first post in the thread:
The paper is the best argument I have heard for P=NP, even though I believe the opposite. It can be found here: http://arxiv.org/abs/cs.CC/0406056. It brings out a great question.

Basically, the argument is that since soap bubbles can be made to solve NP-complete problems, particularly the Steiner tree graph problem, in what appears to be polynomial time and physics on a macroscopic level can be modeled as a Turing machine, it must be true that P=NP.

What I would like to know from any physicists out there is why do soap bubbles work in such a way that they are able to solve the Steiner tree graph problem?How is nature able to quickly solve problems that we cannot solve quickly?
Scott Aaronson, in a post downstream:
Motivated by this newsgroup discussion, this week I did the experiment. At a hardware store I bought two 8"x9" glass plates; paint to mark grid points on the plates; thin copper rods which I cut into 1" pieces; suction cups to attach the rods to the plates; Murphy liquid oil soap; and a plastic tub to hold the soapy water. I also bought work gloves, since at one point I cut my hand handling the glass.
The post continues: read it all. He even has a picture.

Monday, August 02, 2004

Proofs and Reputations

Karl Sabbagh, the author of a recent book on the Riemann Hypothesis, recently wrote an article in the London Review of Books about Louis de Branges and his recent announcement of a proof for the RH.

There is no evidence that, so far, any mathematician has read [de Branges' proof ]: de Branges and his proof appear to have been ostracised by the profession. I have talked to a number of mathematicians about him and his work over the last few years and it seems that the profession has come to the view that nothing he does in this area will ever bear fruit and therefore his work can be safely ignored. It may be that a possible solution of one of the most important problems in mathematics is never investigated because no one likes the solution's author.

My post has nothing to say about the proof itself; I would not dare to presume even a passing familiarity with it. What caught my attention was the sense of surprise in Sabbagh's article; the unstated 'what on earth does a man's reputation have to do with his proof' ?

What indeed ?

Mathematics (and by extension theoretical CS) inhabits a Platonic world of truths and false statements (with a dark brooding Gödel lurking in the background, but that's a different story). As such, either statements are true, and become theorems/lemma/what-have-you, or they are not and fall by the wayside. The pursuit of (mathematical) truth is thus the search for these true statements. The identity (or very existence) of the searcher has no effect on the truth of the statements; there is no observational uncertainty principle.

However, mathematicians live in the real world. In this world, true and false gets a bit murkier. A theorem is no longer true or false, but almost certainly true, or definitely false. They are far closer to the falsifiable theories of natural science, although there is at least a "there" there; a scientific theory can only have overwhelming evidence in support of it, but a mathematical statement (if not too complex) can be categorically true.

The natural sciences have reproducible experiments; if I cannot reproduce the results you claim, all else being equal, the burden of proof is on you to demonstrate that your results are indeed correct. Similarly in mathematics, if a claimed theorem has a complex proof, the burden of proof does reside on the author to demonstrate that it is indeed correct. They can do this by simplifying steps, supplementing with more intuition, or whatever...

In this respect, theorem proving in the real world has a somewhat social flavor to it. And thus, there is also (it seems to me) a social compact: You demonstrate competence and capability above a certain basic threshold, and I deem your proofs worthy of study. The threshold has to be low, to prevent arbitrary exclusion of reasonable provers, but it cannot be nonzero zero, because in the real world it is hard to check a proof with absolute certainty.

This is why the many proofs that P=NP (or P != NP) that float on comp.theory don't get a fair shake: it is not because the "experts" are "elitists" who don't appreciate "intruders" poaching their beloved problems; it is because the social compact has not been met; the writers don't cross the threshold for basic reasonableness, either by choosing to disregard the many approaches to P vs NP that have been ruled out, or by refusing to accept comments from peer review as plausible criticism, and demanding that the burden of proof be shifted from them to the critical reviewer.

Such a compact could be abused mightily to create a clique; this is why the threshold must be set low and is low in mathematics. The notorious cliche that a mathematician's best work is done when young at least reinforces the idea that this club can be entered by anyone. More mundanely, there are awards for best student papers at STOC/FOCS that often go to first-year grad students (like this year's award).

Going back to de Branges' proof, I have no idea what the technical issues are with his proof, and if there are known reasons why they don't work, but going solely on the basis of Karl Sabbagh's article (and I acknowledge that he could be biased) it seems wrong to ignore his manuscripts. He for one has clearly crossed the threshold of the social compact. Reminds me of an attempt I made to read a popular exposition of Mulmuley and Sohoni's recent work on P vs NP; if this work does lead to a claimed proof, I imagine that there would be few people who could comprehend the proof, but it would deserve to be read.

Sunday, August 01, 2004

On empirical research and the Energizer Bunny

From Lee Smolin's article on the anthropic principle:

...to be part of science, X-theorists have to do more than convince other X-theorists that X-theory is true. They have to convince all the other well trained scientists who up till now have been skeptical. If they don’t aspire to do this, by rational arguments from the evidence, then by Popper’s definition, they are not doing science.


From Numerical Recipes in C (Ch. 14, pg 609):

At best, you can substantiate a hypothesis by ruling out, statistically, a whole long list of competing hypotheses, every one that has ever been proposed. After a while your adversaries and competitors will give up trying to think of alternative hypotheses, ro else they will grow old and die, and then your hypothesis will become accepted. Sounds crazy, we know, but that's how science works.


So that's where the creationists get their methods from !

From an interview with Lawrence Kraus in Scientific American:

But then you realize that this is exactly what Phillip Johnson, this lawyer who first proposed the intelligent-design strategy, proposed when he said something like, "We'll just keep going and going and going till we outlast the evolutionists."

Sorting vs Searching:

In a previous post, I had mentioned an upcoming paper in FOCS 2004 by Franceschini and Grossi titled 'No Sorting? Better Searching!'.

The paper is not yet online, but Roberto Grossi posted a long comment in that post detailing the results in the paper. I reproduce the comment below in full; a short summary is:

They show that the natural way to do searching in a set of ordered elements (i.e via sorting and then doing binary search), makes sense when the cost of comparisons is O(1), but does not make sense when elements are larger (formally, when each element of the list actually consists of k characters, where k is super-constant). They do this by demonstrating a new ordering technique that beats known lower bounds on searching a sorted list; what's nice is that their result is tight as well.

His abstract follows (bold-face is mine):

Sorting is commonly meant as the task of arranging keys in increasing or decreasing order (or small variations of this order). Given n keys underlying a total order, the best organization in an array is maintaining them in sorted order. Searching requires Θ(log n) comparisons in the worst case, which is optimal. We demonstrate that this basic fact in data structures does not hold for the general case of multi-dimensional keys, whose comparison cost is proportional to their length. In previous work [STOC94, STOC95, SICOMP00], Andersson, Hagerup, Håstad and Petersson study the complexity of searching a sorted array of n keys, each of length k, arranged in lexicographic (or alphabetic) order for an arbitrary, possibly unbounded, ordered alphabet. They give sophisticated arguments for proving a tight bound in the worst case for this basic data organization, up to a constant factor, obtaining

Θ[ (k log log n)/(log log (4 + (k log log n)/(log n)) + k + log n ]

character comparisons (or probes). Note that the bound is Θ(log n) when k=1, which is the case that is well known in algorithmics.

We describe a novel permutation of the n keys that is different from the sorted order, and sorting is just the starting point for describing our preprocessing. When keys are stored according to this ``unsorted'' order in the array, the complexity of searching drops to Θ( k + log n) character comparisons (or probes) in the worst case, which is optimal among all possible permutations of the n keys in the array, up to a constant factor. Again, the bound is Θ(log n) when k=1. Jointly with the aforementioned result of Andersson et al., our finding provably shows that keeping k-dimensional keys sorted in an array is not the best data organization for searching. This fact was not observable before by just considering k=O(1) as sorting is an optimal organization in this case.


When the paper is available I will link to it here; one interesting question is: how hard is this "other" order to compute ?

Disqus for The Geomblog