Tuesday, April 29, 2008

Should we care about geometric complexity theory ?

There's been an interesting debate over on the complexity blog with Lance's first diss of Ketan Mulmuley's program, and Ketan's response. I think Lance essentially trolled himself with the line,
Also while knowing the resolution of P v. NP is very important, knowing the details of the proof, especially if it requires deep and complex mathematics, is not nearly as important. I was excited to see Fermat's Last Theorem resolved in my lifetime but I have no desire to actually understand the proof.
Although I have a hard time imagining how someone could say this and mean it (after all, P vs NP is not Fermat's theorem in either impact, breadth or importance), and even more so how someone steeped in complexity theory could say it, I think the larger point he makes is essentially reasonable. Namely, (and I paraphrase),
if you want to lead people to the top of the mountain, you have to show them the way, or give them some hope that they can get there.
What would make the whole GCT program more plausible (and I think that people would like to believe in it) are some intermediate results. Maybe some weaker lower bounds using more primitive techniques that suggest that more powerful statements are lurking underneath. I don't think more surveys will help, frankly. I've read the Regan survey, and even the initial parts of GCTflip, and it's really heavy going. In fact, I was talking with a professor who spent an entire semester, together with students trained in complexity theory AND representation theory, trying to work through some of the early GCT papers. It was definitely more than a month's worth of effort.

Of course, there is one example of "a weaker lower bound using more primitive techniques": KM's result separating P and (NC without bit-wise operations). This paper appeared in STOC 1994, and was subsequently published in SICOMP in 1999. I've always wanted to read this paper, for one thing because of its intriguing way of mapping computations to algebraic sets (if that isn't cool geometry, I don't know what is).

So since summer is here, and semester is out, I've decided to forcibly inject some content into this blog. I'm going to read the P vs NC paper and blog about it as I go along. I don't know how many installments it will take, and there might be digressions along the way, but at the end I hope to understand this paper, and with it maybe even get a glimmer of the general direction KM is pushing in the GCT work.

For anyone interested in following along, I'll be mainly using the SICOMP paper as a reference, and will tag all posts with the tag 'p-vs-nc'. As far as possible I'll try to do one post a week (more if time permits). Let's see how this goes.

Saturday, April 26, 2008

LaTeX: hyperref/algorithm interaction

According to the hyperref documentation, the algorithm package should be loaded after hyperref:
\usepackage[chapter]{algorithm}% eg.
According to the hyperref documentation, all packages should be loaded before hyperref:
In most cases, therefore, you should load your package before you load hyperref, and hyperref will patch things up so that they work, so you can utilise your (patched) package after loading both:
If you do the first, you get this annoying set of warnings:
! pdfTeX warning (ext4): destination with the same identifier (name{page.1}) has been already used, duplicate ignored
If you do the second, you get an error:
undefined control sequence \theHalgorithm
Clearly, the first is preferable to the second, but even the first is terribly annoying. Does anyone have any ideas on how this can be fixed ?

Update: It works ! A comment from Iain Murray points out that in the hyperref README, one is told to include the float package BEFORE hyperref , and only then include algorithm (after hyperref).

Red-black trees: An Update

In a previous post, I mentioned a talk by Bob Sedgewick on left-leaning red-black trees, a variant that's easier to analyze, but with the same asymptotic properties. He gave another talk on LLRB trees at the Analysis of Algorithms workshop, and the new slides have (I quote from his email):
somewhat simpler presentation and code than the Dagstuhl version, the discovery that 2-3 trees are also covered, and some fun facts about the analysis that should answer many of the questions that people were asking.
There were many comments on my original post, and possibly these new slides answer some of the questions.

Wednesday, April 23, 2008

Public Intellectuals

Foreign Policy has brought out a "list of top 100 public intellectuals", a list that contains one computer scientist (Neil Gershenfeld of MIT). Before I continue, let me insert two caveats:
  • Yes, I do think lists like this are beauty contests
  • No, I don't think computer scientists should aspire to being on such lists
Having said that, here's what I'm wondering. We are in the middle of possibly the greatest era of technological disruption of all time, a disruption brought about by a mess of tubes called the Internet. We are seeing the fruits of computer science permeate daily life to a degree that relativity hasn't come even close to, whether it's RSA, recommendation systems, peer to peer file sharing, or what have you. The disruptions created by the Web have changed our society in radical ways: consider Facebook, Myspace and the whole array of social networking tools we use today.

And yet, we lack the voices that speak to this time and place. We lack cogent articulation of the tools that brought us here, of the wonders of Turing, Von Neumann, and others, of the fundamentally radical idea of the algorithm as an idiom. Or we lack recognition of those who do articulate such a vision of today's computationally-driven world.

We don't need to be on lists of public intellectuals, but we need to frame the role of computer science and computation in society today, before we get relegated to the role of glorified telephone repairmen.

Sunday, April 20, 2008

Abstracts booklets

The ICDE proceedings is all digital. You get one CD with all the papers from the conference, and another with all the papers from all the workshops. This comes with a very nifty overview PDF that has clickable indices by name and session, as well as a search feature, and links to the PDFs for each paper. Along with this comes an abstracts booklet organized in order of schedule, with one section for titles, and another one for abstracts (the first is handy for planning your schedule).

I've been wishing that we could do something like this for a while now. The good news is that next year, SODA will do the same thing. Personally, I found the abstracts booklet more useful than one might think, and only rarely felt that I needed to look at the paper in detail.

A new (to CS) model for publishing

One of the things I like about the database community is their willingess to play with new ideas in the space of conference publishing. SIGMOD and VLDB have been experimenting with the idea of semi-persistent reviews, where reviews from SIGMOD get passed on to VLDB for papers deemed on the border; SIGMOD went to double-blind mode, over some opposition, and there's been some interesting back-and-forth since on the effectiveness of this (read this, and then this). There's also a weak rebuttal mechanism (where authors can in a limited way respond to reviewer comments in the submission process).

An even more radical idea, which from the sound of it is nearing approval, is described in detail by Panos Ipeirotis. The main points are these:
  • A new journal called the Journal for Database Management Research will be created.
  • It will have a fast review process and turnaround time (akin to the biological journals - 3 months or so)
  • Submission deadlines will be rolling: i.e you submit a paper when it's ready.
  • SIGMOD, VLDB and other database conferences will convert to a by-invitation model, where the conferences choose a sample of the published works in the journal (over the last 12 months I imagine) to be "presented" at the conference.
  • To discourage frivolous submissions, papers rejected from the journal will have to undergo a year-long cooling off before they can be resubmitted.
It's a radical approach, and approximates to a degree the prevailing practice in journal-based publication environments. It does raise some questions (some raised by Panos in the original post):
  • The year-long cooling off seems excessive punishment for what will still by necessity be a less than perfect review process
  • How will this new journal interact with other database journals ?
  • Can one journal hope to keep up with the volume of papers being produced ? Just SIGMOD, VLDB and ICDE take in over 2000 submissions between the three of them. That's close to 6 submissions EACH DAY.
  • What happens when you get into areas that overlap with DB ? For example, what about KDD, and other data mining conferences ?
  • What about all the years and years spent arguing with tenure committees about the primacy of conferences in computer science ? "Oops ! we changed our mind" ?
  • What kind of prestige will now be attached to giving presentations at the "new" conferences ? More importantly, since it was JUST ESTABLISHED that double blind submission helps remove bias at conferences like SIGMOD, isn't this a step backward in terms of which papers are chosen for presentations at conferences ? I can't imagine the process of getting invited for a talk at such a conference getting easier with this process. Are we heading towards (again) the bio model of big plenary talks by bigwigs, and lots of posters, or the math model where anyone who wants to give a talk can ?
Separating the idea of publication and dissemination is dear to my heart (I have always felt that conferences in CS fail by needing to serve both these masters at the same time), and so I'm bullish on proposals like this. But I do see problems in the details, and am curious to see how things pan out over time.

Tuesday, April 15, 2008

Deadlines (tax, conference, ...)

For the first time, I had to rush on the last day to get taxes out in time. It was a rather intense few hours, and I couldn't help but think that most of this would have been impossible in the pre-electronic era: I was able to download missing documents, get forms online, and even download tax software, all in a matter of a few clicks.

I'm also somewhat ashamed to admit that I rather enjoyed the adrenaline rush of getting to the post office with 10 minutes to spare and sending everything off. It reminded of conference submission deadlines (and I imagine that in the days before electronic submission, people literally had to rush to the post office to send copies of a submission in).

But then I got to wondering. In this hyper-competitive era, with submissions increasing, and acceptance rates, is it slowly becoming less likely that papers submitted at the last minute can compete with more polished submissions ?

Now I like data as much as anyone else, and was wondering what kinds of statistics we could glean from information already available. For example, every year PC chairs trot out statistics about acceptance rate as a function of when papers were submitted. Because of the almost hyper-exponential rate of submissions as the deadline approaches, these numbers are necesarily unreliable, but to the extent that one believes in such statistics, there's always a kind of maximum point somewhat away from the actual deadline.

I don't know if this tells me what I need to know though: does a drift of the maximum point away from the deadline prove my point ? Lacking any actual information about which papers are more "polished" i.e maybe re-submissions, or papers prepared earlier, one has to work off the assumption that papers that are ready earlier are submitted earlier, and I'm not willing to buy that argument without more data.

So maybe I should rely on anecdotal evidence. How about it, readers ? In your personal experience (I emphasize this point: I want to distinguish your opinions from what you've encountered in your own work), do you think that it's become less likely that a last-minute paper gets into a conference, especially the top ones ? Feel free to generalize this question to other deadline-based submission fora (even proposals, if you wish).

p.s A blog post without links ! Scandalous...

Saturday, April 12, 2008

A few links

I've been at ICDE in Cancun for the last few days because of this, and it's hard to blog too much from "the best beach in Mexico". There are some interesting trends emerging from this conference, many of which might be interesting to algorithms folks, and I'll try to say more about this later on.

Two links that came across Google Reader and might be of interest:
  • "Shut up and calculate", by Max Tegmark (via Mathematics under the Microscope). Tegmark argues for an extreme Platonic position: not only is the universe governed by mathematics, but that it is mathematics ! There's more to this argument in the (short) paper: buried in there is an interesting robust variant of falsificationism:
    For a theory to be falsifiable, we need not be able to observe and test all its predictions, merely at least one of them
  • Via BB, New Scientist's list of the ten weirdest computing models (for actual computing, that is). They start off tamely with optical and quantum computing, and then move on into slightly more esoteric models like DNA computing, culminating with ooze computing and water wave computing (hey, I'm on a beach here, computing my legs off!). One of the more interesting proposals was something called 'reversible computing':
    Every computing operation involves feeding inputs into logic gates, which produce output signals. Instead of discarding the energy of those signals, Frank's gates run in reverse after every operation. That returns the energy of the output signal to the start of the circuit where it is used to carry a new input signal
    Of course, what's not clear to me is why this isn't a poor cousin of quantum computing, where as far as I understand all computations prior to a measurement are reversible.
But the idea of reversible computing gets back to very deep ideas about entropy and the second law of thermodynamics. Think of a box partitioned into two parts, each containing air at the same (global) temperature. In the partition is a small passage, guarded by a tiny demon, who only lets molecules go from left to right if they are hotter than average, and from right to left if they are colder than average. Over time, the right side of the container gets hotter and hotter, and the left side gets colder and colder. This demon is Maxwell's demon, posited to try and find a contradiction in the Second Law of Thermodynamics.

One of the more famous refutations of this apparent paradox was by Leo Szilard, who argued that in order for the demon to do its job, it must somehow gather information about the average temperature, and this gathering of information costs energy. Rolf Landauer famously showed in 1960 that reversible computations need not increase thermodynamic entropy (and also information entropy), and these facts were expanded by Charles Bennett to argue that eventually, a demon that does the desired job must run out of space and start deleting information, an irreversible (and therefore entropy increasing) operation. It's a streaming algorithm !!

There's a book that collects all this work together: Maxwell's Demon 2: Entropy, Classical and Quantum Information, Computing. It's been on my list of 'books to read' for a long time now.

Disqus for The Geomblog