Sunday, January 07, 2007

SODA 2007: Day 1

I didn't always know I wanted to do algorithms; in fact, I came to Stanford thinking I was more interested in AI. One of my most embarrassing moments in graduate school was when I went to talk to my advisor-to-be. He told me he worked in the design and analysis of algorithms, and I asked him if he did design or analysis.

(Note to graduate students everywhere; when someone tells you that no question is a stupid question, don't believe it)

But after attending Philippe Flajolet's talk today on "Analytic Combinatorics", and after hearing Luc Devroye's talk yesterday, I'm not so sure that my question was off the mark.

A bit of background. What we refer to as "the analysis of algorithms" is usually associated with Don Knuth and the Art of Computer Programming. It referred to the initial methods being developed to analyze structures like hash tables, search trees and the like. Most computer scientists have taken some kind of discrete math class, and have seen the Knuth-Graham-Patashnik "Concrete Mathematics" textbook, and much of the content (basic combinatorics, recurrence relations, generating functions, etc) was used for early algorithm analysis.

These methods were quite precise. It's not uncommon to look back at papers from the 70s and see detailed constants in front of running times for algorithms; Bob Sedgewick mentioned one such calculation in his introduction to Flajolet's talk. People didn't use O() notation like a sledgehammer, the way we do today.

Over time, we became more comfortable with O() notation; algorithms became more sophisticated, and it became harder to figure out actual constants. It didn't seem to matter as much. After all, when you were coming up with elaborately complicated algorithms that ran in exotic times like O(n^(11/7)), it hardly seemed to matter what the constant was. This was, and has continued to be, the Age of Design.

But all along, with people like Flajolet, Knuth, Sedgewick, and many many others, the work of really analyzing algorithms continued on. ANALCO is an offshoot of this long effort; a way to attract people working on some of the considerably hard problems in this area, while creating some cross-fertilization with the design crowd at SODA. Of course, by no means do I claim that algorithm designers don't analyze; of course we do. But it's fair to say that the sophisticated analysis methods and sophisticated design methods do appear to have diverged.

Which brings us to the talk today. The rough theme for his talk was an overview of how the combinatorial problem of counting structures (trees, steps in a linear probe, etc) can be transformed into a generating function, to which then the methods of real, and more recently, complex analysis can be applied. There's some pretty heavy mathematical machinery being thrown out here: we saw large deviation theory in yesterday's talk, for example, and there are things Flajolet talked about that I have only the barest inkling of.

Doing such analysis is hard; it's not as if we're suddenly going to abandon O() notation. But, as Piotr Indyk pointed out when we discussed this later, computers aren't getting any faster, and data is getting larger and larger, and it's more and more true that the actual constants in front of a running time matter, sometimes even more than the asymptotic bound. If more sophisticated analysis methods allow us to reveal algorithm running times more transparently, this also helps repair some of the "bad press" theoreticians can get with more applied folk.

So the analysis of algorithms takes on its original meaning again; there is a conference as well, now in its 13th year, and there's an upcoming book by Flajolet and Sedgewick that covers much of the mathematics Flajolet refers to in his talk. I looked at it briefly (it's 753 pages and counting!), and I hope that when it does come out, we learn more about how to use methods from analytic combinatorics to improve analysis techniques for even our run-of-the-mill algorithms.


  • I've quite enjoyed the talks I've attended thus far. I haven't written much about them, but that's mostly due to laziness on my part. I've been quite torn having to navigate the multiple parallel sessions; human cloning, where art thou ?
  • [From a random research paper session] It's funny to see a speaker struggling with their desire to display EVERY SINGLE SLIDE that they made, when faced with a remorseless clock ticking down to zero. Word of advice: no one really cares if you go through all your slides, or even flip thru them frantically while muttering very fast. They do care if you go over time and drag things ON and ON and ON.
  • Contrary to the general confusion being spread around, the wireless DOES work and it IS free.
  • I don't like hotels with two towers; especially when I'm in one and the conference is in the other, and ESPECIALLY when the only connection between the two towers is at the lobby.

Disqus for The Geomblog