Monday, January 24, 2011

ALENEX/ANALCO II

Today, someone asked me to post something sensational just to stir up some controversy. It turns out that without realizing it, I already did it yesterday ! I was talking about the use of CPLEX to solve (very effectively) instances of 1-median over strings, and said this:
It's not the "sexiest" thing in the world to solve algorithms problems in practice by using large industrial strength packages. However, both CPLEX and  SAT solvers are examples of tools that can be used in practice to solve fairly intractable problems. It still takes a lot of engineering and skill to make the heuristics work well, but it's something that we should be using as a matter of course when designing heuristics before trying to invent an algorithm from scratch.
I should have known better than to bring down the fury of the entire field of OR on my head. Michael Trick, OR blogger extraordinaire, decided to round my variables for me: read what he had to say here.  As penance, I promise to download CPLEX and encode at least one problem on it in the next year :).

I've seen Bob Sedgewick give talks a few times now, and I'm always inspired by them. This latest one was titled 'Algorithms for the masses' and was hard to summarize: it was part exhortation to do more analytic combinatorics, part discussion of a new intro CS course he and Kevin Wayne have designed, and part emphasis on using the scientific method properly to design good models for algorithm behavior and data characeteristics.

The principle at the heart of this was a fitting one for this joint talk: we should do more scientific analysis of our algorithms to figure out exactly how our algorithms behave in practice, rather than relying on O() notation as a predictive and comparative tool (both of which it isn't). This goes back to Dick Lipton's coinage of 'galactic' algorithms: Bob made the assertion (not wrong in my view) that most algorithms at STOC and FOCS are 'galactic' and much of the work at SODA is too.

While I agree that it's high time we stopped using O() notation as a cudgel, I think it's harder than one might think. Engineers can model the real world in various ways, and when they want to test their models, they can - well - run it on the real world. Even if come up with a plausible model of how my algorithm works, and what the various cost functions are, I still need to hope that the data doesn't have weird characteristics that make all the results go wonky. Probably the way to see this is that even in "the real world", if we dont know how a particular genetic mechanism works, it's as good (or bad) as not having an accurate model of data that we're testing.

The second invited talk, by James Demmel, was a little harder for me to follow, because it was a much more technical talk about the challenges of designing linear algebra routines for future architectures. He described a machine the DoE is proposing to build, and it's likely to have 1 billion cores ! But even with that many cores, the main bottleneck is going to be communication, and the goal going forward is to design algorithms that parallelize well with minimal communication.

Or as he ended his talk:
Don't communic...