Sunday, May 11, 2008

P vs NC II: The main theorem, and a (very high level) proof skeleton

After some initial throat-clearing, we're now in a position to state the main result of this paper. To save myself some typing, I'll refer to the model of computation (PRAM without bit operations) as PRAM-wb from now on.

Theorem.
  • Min-cost flow on k nodes cannot be solved in the PRAM-wb model deterministically (or randomized) in (expected) time using parallel processors, even if we assume that the costs and capacities are integers with bit length at most for some large enough positive constants a, b.
  • A similar result holds for max-flow, with the limits on capacities being replaced by
  • All lower bounds hold even if we merely desire an additive approximation.
Corollary.
A min-cost-flow or max-flow problem of total input bitlength N cannot be solved in the PRAM-wb model deterministically (or with randomization) in time (expected) using processors, for some constant c.
Discussion.
Before we dive in, it's useful (or at least it was for me), to take apart the statement of the theorem itself. Firstly, we note that the bounds hold even with randomization, which of course means that the kinds of obstructions that Mulmuley constructs are "frequent". The actual proof goes via the standard Yao-minimax principle, and we'll get to it once we've completed the deterministic lower bound.

Another interesting point is that the lower bound holds for additive approximations as well. I haven't read far enough ahead for this to be obvious, but I suspect that it has something to do with the fact that we'll always be dealing with integers, and so intuitively an approximation that can get us "between" integers will collapse to the right answer.

Finally, a note on the bitlengths. One might argue that if the bitlengths were "constant", the problem would be solvable. This is in fact the case, as Mulmuley discusses: it is actually important that the bitlengths are "long enough". If the bitlengths are short (say O(log n)), then we could read off all the bits efficiently using the procedure described in a previous post, at which point we have access to the full power of PRAM. At this point, we can solve max flow via an RNC algorithm for bipartite matching. So to get the strong bound on the number of processors, we need the bitlengths to be long enough.

But we don't want them to be too long. This constraint on the bitlengths feeds directly into the corollary, since we can relate N and k using the upper bound on the bitlength. However, the larger the bitlengths get, the weaker the bound on the running time expressed in terms of N. So it's actually useful that the problem is hard even for inputs with "smaller" bitlengths.

An overview of the proof strategy.
As I mentioned earlier, the technique used to prove a lower bound for bit extraction is a useful template to follow. Let's consider the basic argument.
  1. Examining the operations permitted by the model, come up with a bound on the number of distinct paths any bounded computation can take.
  2. Come up with a geometric description of the way the space of inputs is carved out by these paths.
  3. Show that if we mark inputs as either being in or out of the language (the decision problem say), that the "intrinsic" complexity of this space prevents the geometric description constructed in (2) from being able to carve out the IN points from the OUT points: loosely, that the model does not have enough geometric expressivity to separate good from bad.
[Aside: stated this way, there's a very VC-dimension feel to the argument. For example, let's say your "model" consists of "things you can do with one hyperplane", and your language consists of "two diagonally opposed points on the unit square", then the famous perceptron result is basically that the "model" can't capture the "language"].

Stated this way, it might not seem to surprising that algebraic geometry starts playing a role in the argument. In order to perform Step (2), we need to talk about invariants under computations that can be expressed as a series of algebraic operations (this is one reason why the omission of bit operations makes things more tractable), and algebraic geometry, which deals (at the most basic level) with the geometry of solutions to algebraic equations, is the right toolkit to use.

It would be remiss of me if I didn't point out that at least some of these ideas are not new (once you're looking at them from high enough). Dobkin and Lipton considered linear decision trees, Steele and Yao generalized these lower bounds to algebraic decision trees, and Ben-Or generalized further to algebraic computation trees (see Jeff Erickson's notes on algebraic lower bounds, and also the postscript on Joel Friedman's more recent work). In all these cases, the rough lower bound argument worked by showing that
  1. The computations expressible in the model could be captured geometrically.
  2. A bound on the computation could be expressed in terms of an intrinsic complexity of the target function. In all the above, the intrinsic complexity was topological: number of connected components, or even a sum of Betti numbers.
  3. The target function had a "high" intrinsic complexity (large number of components etc).
So what's the notion of "intrinsic complexity" in our setting ? This goes to step (3) in the high level sketch, and leads to the notion of parametric complexity. The idea is to consider a specific class of candidate inputs that can be described by parameters: for example, specifying that each edge in a graph has a capacity that's a linear function of some parameter . This is a "magic step", in the sense of Gowers: it seems pulled out of a hat because I don't understand why the parametric lens reveals the lower bound to us (of course, it might be clearer to others: if so, do speak up :)).

One can define a notion of parametric complexity (basically the number of breakpoints in the optimal value as the parameters change), and show that it is high for the problems under consideration. This takes care of one part of step (3): showing that the intrinsic complexity measure is high. The next step is to show that if this is true, there exists a way of parametrizing the inputs so that the IN and OUT inputs are distributed badly (intuitively, we'd want a strategy that doesn't allow large chunks of INs and OUTs to congregate in input space) . Finally, the proof completes by showing that the "low degree" regions that PRAM-wb can carve out cannot separate these badly distributed points.

This is all very vague, and indeed the "details", if I dare call them so, are tremendously complex. It's a good idea to keep this very high level line of attack in one's head as we go forward: in subsequent posts I'll dive down into the proofs and start getting very detailed, and it will be easy to lose the forest for the trees (pun unintended) without a roadmap.

p.s Recent work by Joel Friedman on algebraic geometry-baed methods for proving circuit lower bounds is more directly in the program of research initiated by the work on decision trees. Understanding his work is extremely well beyond my ability, and requires a deep understanding of modern algebraic geometry. I'll leave it to others to try and explain his work for the masses, merely noting that this at least shows ONE other approach based on algebraic geometry for attacking lower bound questions.

Wednesday, May 07, 2008

A note on bit operations.

In the last post,we saw a lower bound for bit probing (specifically, extracting the lower-most bit of a number) in the PRAM-without-bitops model. Now clearly this can be computed efficiently in P, so why isn't this enough of a proof for the separation of the two classes ?

The answer is subtle. In a PRAM model, excluding bit operations is a nontrivial restriction, because, as we saw, there's no parallel-efficient way of extracting bits from a word without bit operations. However, in P, this is not true. In time linear in the bitlength, we can extract every single bit, and then carry on. Thus, P-with-no-bitops is as powerful as P itself !

Mulmuley argues that the "correct" notion of excluding bit operations in P is that of an algorithm that might run in time that depends on the bit length, but without bitops, in the way that algorithms like the ellipsoid method work. But an easier approach is to exclude algorithms whose run time depends on the bit length, yielding the class of strongly polynomial algorithms. Note that this class (which he calls SP) still contains the P-complete problems max-flow and min-cost flow.

Once we do that, extracting bits is no longer an "easy" operation in SP. The "input size" is 1, and so any bit-extracting algorithm is only permitted a constant number of steps when extracting a bit. Therefore, the apparent separation doesn't really hold.

Formally, what is proved in this paper is that SP strictly contains C (the class of PRAM-without-bitops algorithms), via the separation result for max flow.

More math-blogging

Gil Kalai has a blog (at least for the next year) (HT 0xDE)

Monday, May 05, 2008

P vs NC Part I: Preliminaries

We start our exploration of Mulmuley's result on P vs NC with some definitions, and an "easy" example of a lower bound.

(For the definition of P and NC, visit the Complexity Zoo)

1. P-completeness

NC is our proxy for "can be parallelized efficiently" just like P is our proxy for "efficient algorithm". P contains NC, and as with P vs NP, the interesting question is whether the containment is strict: that is, are there problems that are inherently sequential ?

A theorem that separates P and NC will show that there's some problem in P that cannot be parallelized. Obviously, we want to pick a hard problem in P. P-completeness captures the idea of 'hard-core' problems for P, and the equivalent of SAT for P-completeness is the circuit value problem: given a circuit, evaluate it. We'll actually be looking at a different P-complete problem here, but one that's equally familiar: min-cost flow on a network with n nodes.

Remember, this result DOES NOT show that P strictly contains NC. What it shows is that P strictly contains a fairly general (but still restricted) subclass of NC.

2. The PRAM model

A convenient computational model to work with when talking about NC algorithms is the P(arallel) RAM model. PRAMs were all the rage in the 80s when supercomputers were being built, fell out of favor in the 90s when parallel computing lost its sheen, and are possibly making a comeback today in the world of multicore systems, GPU clusters, and MapReduce.

As the name suggests, a PRAM consists of many processors, each able to run a general purpose program with branchings, indirections, and arithmetic operations +, - X. A problem is in NC if it can be implemented to run in polylogarithmic parallel steps on polynomially many machines (where in one parallel step, each machine might perform one step of a computation).

A bit operation on a PRAM is any operation that treats data as bit-strings, such as AND, OR, and other boolean operations on bitstrings, or even 'extract-bit': get a particular bit of a string. All such operations will be excluded in the models considered here.

It will be important to pay attention to input size. As always, the input size can either be described by n, the cardinality of the input (the quantity counting the number of "things"), or by N, the bitsize (the quantity measuring the number of bits needed to write everything down). When we talk about polylogarithmic time and polynomial number of processors, unless specified otherwise, we will be referring to polynomials of n AND N.

In the non-uniform setting, we'll assume that we're presented with a PRAM that for fixed n, N, runs in time t(n, N) with p(n, N) processors. Each input consists either of an integer, or a rational with integer numerator and denominator presented separately (so that we can do division more easily). Division of two rationals is encoded via multiplication, so that p/q is computed as p * 1/q, and 1/q is computed by reversing the numerator and denominator of q.

The cost model is unit-cost, which means that any operation takes constant time, regardless of the relative sizes of the inputs. It is very important to exclude the floor operation when we do this: if not, judicious use of the floor function along with unit cost operations would allow us to solve PSPACE-Complete problems in polynomial time ! Since we are only concerned about lower bounds, assuming unit-cost for operations can only make the lower bound weaker than it actually is, so it's a reasonable thing to do.

It'll actually be useful to consider four variants of the basic PRAM-without-bit operations model.
  1. Arithmetic PRAM: In this model, the running time depends on the cardinality of the input alone, rather than the total bitlength.
  2. Linear PRAM: Here, at least one of the operands in any multiplication operation must be a constant.
  3. Arithmetic Linear PRAM: Combine the two above
  4. PRAM: None of the above restrictions
We will also elide the difference between the various memory access protocols (EREW, CRCW etc) since they only affect the overall running time by a polylogarithmic factor through well-known simulation results.

3. Are these good models ?

The dilemma of lower bounds is the problem of finding models that are rich enough to describe actual algorithms, while being weak enough that we have any hope of proving lower bounds in the first place. Is the PRAM-without-bit-operations a reasonable model of parallel computation ?
Mulmuley argues that it is, by pointing out that this model captures "virtually all" known parallel algorithms for optimization and algebraic problems. Even the weaker models proposed have exemplars: for example, one problem that admits an arithmetic PRAM is solving linear systems over rationals.I am not aware of any algorithm that appears to need bit operations in any nontrivial way, and would welcome pointers to possible candidates.

4. The problem of bit extraction.

Since we've excluded bit operations from consideration, it's natural to wonder how expensive it really is to extract a bit (or in other words, simulate a step from a general PRAM). It turns that that a very simple version of the general proof argument can be used to show a lower bound here, and although it's not terribly relevant, it's useful as a warmup exercise.

Proposition: The lowest order bit of an n-bit operand cannot be extracted in time using processors in the PRAM-without-bitops model, for some large enough constant a.

Why on earth should this be true ? After all, to test for odd/even, all we have to do is divide the number by 2 and take the remainder ! But how do we do that ? the only operations permitted are on integers or rationals, and the operation "5/2" will merely return the rational (5,2). If we had a floor function, we could compute , but we don't have it.

On the other hand, is this tight ? Clearly, if the bitlength is log-bounded, you can guess it by doubling, and then extract bits starting with the most significant one. This can be done in O(l) time sequentially, if the bitlength is l. Generalizing, you can split an l-length string into pieces of size , and work on each piece, starting with the most significant one, for a constant number of parallel time units. On such a piece, you have processors guessing all possible values, and then check which one is correct. Subtracting, you can then proceed to the next piece. This is one way of understanding where the bounds in the proposition come from.

So if we're convinced that the 'form' of the proposition makes sense, let's see how we might go about proving it.

Proof outline:
Assume that there actually is a machine that can return the lowermost bit in the desired bounds , . Suppose we simulate the action of this machine on two inputs x, x' upto time t. If the machine, upto time t, behaves the same on both inputs, then we say that they're in the same t-equivalence class.

Can we bound the total number of such classes ? Note that if we can do that, then we might have a hope of using the pigeonhole principle to argue that at least some class might have too many elements in it: since all elements in an equivalence class will yield the same outcome, we might be able to use this to create a contradiction.

Consider the branch operations executed by the machines at the (t+1)^th step for two t-equivalent inputs x, x'. Since the computation is identical for the two inputs upto the previous step, any branching operations consist of evaluating the sign of some polynomial in x (because all the stored memory locations are polynomials in x). Each of these polynomials has degree at most , and therefore has at most real roots. Collecting all these roots together for all machines, we get a subdivision of the real line into intervals, and (this is the crucial part) in each of these intervals, the answers to sign queries for any of the polynomials does not change.

[Aside: this trick, of constructing an arrangement over a geometric space such that each cell contains a fixed answer to a problem, is fairly common in computational geometry].

What this means is that each t-equivalence class can blow up into only a certain number of (t+1)-equivalence classes (roughly ), and by induction, this means that over a total of t(n) time steps, there can be at most something like equivalence classes.

Now equivalence classes partition the space of possible inputs, but they don't really tell us how: any one partition might have inputs from all over the real line, and knowing the number of equivalence classes in and of itself doesn't tell us why estimating the lowest bit might be difficult. In order to make that leap, we have to further fracture these equivalence classes into actual segments on the real line, such that in each segment, all inputs are t(n)-equivalent.

This is not too hard to do, since we know that all polynomials evaluated in the process have degree at most . Since there are at most t(n)*p(n) polynomials to consider in any computation for any particular equivalence class, the total number of intervals induced by roots of these polynomials is roughly , which is still . Two inputs from any one of these intervals are treated in exactly the same way by the computation.

Now comes a neat application of the pigeonhole principle. Choose a large enough so that . If we label each of the possible integers with their correct output (the lowermost bit), we get an alternating sequence of zeros and ones, and clearly a correct algorithm must create at least intervals to ensure that no interval contains numbers labelled both one and zero. But if we choose a large enough, we can ensure by the pigeonhole principle that there will be at least one interval that is "too long" i.e must contain inputs labelled both one and zero. Since by construction, the algorithm must return the same output on ALL inputs in this interval, it will be wrong on at least one input, which yields a contradiction.

We will use many elements of this basic proof structure as we go along.

Tuesday, April 29, 2008

Should we care about geometric complexity theory ?

There's been an interesting debate over on the complexity blog with Lance's first diss of Ketan Mulmuley's program, and Ketan's response. I think Lance essentially trolled himself with the line,
Also while knowing the resolution of P v. NP is very important, knowing the details of the proof, especially if it requires deep and complex mathematics, is not nearly as important. I was excited to see Fermat's Last Theorem resolved in my lifetime but I have no desire to actually understand the proof.
Although I have a hard time imagining how someone could say this and mean it (after all, P vs NP is not Fermat's theorem in either impact, breadth or importance), and even more so how someone steeped in complexity theory could say it, I think the larger point he makes is essentially reasonable. Namely, (and I paraphrase),
if you want to lead people to the top of the mountain, you have to show them the way, or give them some hope that they can get there.
What would make the whole GCT program more plausible (and I think that people would like to believe in it) are some intermediate results. Maybe some weaker lower bounds using more primitive techniques that suggest that more powerful statements are lurking underneath. I don't think more surveys will help, frankly. I've read the Regan survey, and even the initial parts of GCTflip, and it's really heavy going. In fact, I was talking with a professor who spent an entire semester, together with students trained in complexity theory AND representation theory, trying to work through some of the early GCT papers. It was definitely more than a month's worth of effort.

Of course, there is one example of "a weaker lower bound using more primitive techniques": KM's result separating P and (NC without bit-wise operations). This paper appeared in STOC 1994, and was subsequently published in SICOMP in 1999. I've always wanted to read this paper, for one thing because of its intriguing way of mapping computations to algebraic sets (if that isn't cool geometry, I don't know what is).

So since summer is here, and semester is out, I've decided to forcibly inject some content into this blog. I'm going to read the P vs NC paper and blog about it as I go along. I don't know how many installments it will take, and there might be digressions along the way, but at the end I hope to understand this paper, and with it maybe even get a glimmer of the general direction KM is pushing in the GCT work.

For anyone interested in following along, I'll be mainly using the SICOMP paper as a reference, and will tag all posts with the tag 'p-vs-nc'. As far as possible I'll try to do one post a week (more if time permits). Let's see how this goes.

Saturday, April 26, 2008

LaTeX: hyperref/algorithm interaction

According to the hyperref documentation, the algorithm package should be loaded after hyperref:
algorithm:
\usepackage{hyperref}
\usepackage[chapter]{algorithm}% eg.
According to the hyperref documentation, all packages should be loaded before hyperref:
In most cases, therefore, you should load your package before you load hyperref, and hyperref will patch things up so that they work, so you can utilise your (patched) package after loading both:
If you do the first, you get this annoying set of warnings:
! pdfTeX warning (ext4): destination with the same identifier (name{page.1}) has been already used, duplicate ignored
If you do the second, you get an error:
undefined control sequence \theHalgorithm
Clearly, the first is preferable to the second, but even the first is terribly annoying. Does anyone have any ideas on how this can be fixed ?

Update: It works ! A comment from Iain Murray points out that in the hyperref README, one is told to include the float package BEFORE hyperref , and only then include algorithm (after hyperref).

Red-black trees: An Update

In a previous post, I mentioned a talk by Bob Sedgewick on left-leaning red-black trees, a variant that's easier to analyze, but with the same asymptotic properties. He gave another talk on LLRB trees at the Analysis of Algorithms workshop, and the new slides have (I quote from his email):
somewhat simpler presentation and code than the Dagstuhl version, the discovery that 2-3 trees are also covered, and some fun facts about the analysis that should answer many of the questions that people were asking.
There were many comments on my original post, and possibly these new slides answer some of the questions.

Wednesday, April 23, 2008

Public Intellectuals

Foreign Policy has brought out a "list of top 100 public intellectuals", a list that contains one computer scientist (Neil Gershenfeld of MIT). Before I continue, let me insert two caveats:
  • Yes, I do think lists like this are beauty contests
  • No, I don't think computer scientists should aspire to being on such lists
Having said that, here's what I'm wondering. We are in the middle of possibly the greatest era of technological disruption of all time, a disruption brought about by a mess of tubes called the Internet. We are seeing the fruits of computer science permeate daily life to a degree that relativity hasn't come even close to, whether it's RSA, recommendation systems, peer to peer file sharing, or what have you. The disruptions created by the Web have changed our society in radical ways: consider Facebook, Myspace and the whole array of social networking tools we use today.

And yet, we lack the voices that speak to this time and place. We lack cogent articulation of the tools that brought us here, of the wonders of Turing, Von Neumann, and others, of the fundamentally radical idea of the algorithm as an idiom. Or we lack recognition of those who do articulate such a vision of today's computationally-driven world.

We don't need to be on lists of public intellectuals, but we need to frame the role of computer science and computation in society today, before we get relegated to the role of glorified telephone repairmen.

Sunday, April 20, 2008

Abstracts booklets

The ICDE proceedings is all digital. You get one CD with all the papers from the conference, and another with all the papers from all the workshops. This comes with a very nifty overview PDF that has clickable indices by name and session, as well as a search feature, and links to the PDFs for each paper. Along with this comes an abstracts booklet organized in order of schedule, with one section for titles, and another one for abstracts (the first is handy for planning your schedule).

I've been wishing that we could do something like this for a while now. The good news is that next year, SODA will do the same thing. Personally, I found the abstracts booklet more useful than one might think, and only rarely felt that I needed to look at the paper in detail.

A new (to CS) model for publishing

One of the things I like about the database community is their willingess to play with new ideas in the space of conference publishing. SIGMOD and VLDB have been experimenting with the idea of semi-persistent reviews, where reviews from SIGMOD get passed on to VLDB for papers deemed on the border; SIGMOD went to double-blind mode, over some opposition, and there's been some interesting back-and-forth since on the effectiveness of this (read this, and then this). There's also a weak rebuttal mechanism (where authors can in a limited way respond to reviewer comments in the submission process).

An even more radical idea, which from the sound of it is nearing approval, is described in detail by Panos Ipeirotis. The main points are these:
  • A new journal called the Journal for Database Management Research will be created.
  • It will have a fast review process and turnaround time (akin to the biological journals - 3 months or so)
  • Submission deadlines will be rolling: i.e you submit a paper when it's ready.
  • SIGMOD, VLDB and other database conferences will convert to a by-invitation model, where the conferences choose a sample of the published works in the journal (over the last 12 months I imagine) to be "presented" at the conference.
  • To discourage frivolous submissions, papers rejected from the journal will have to undergo a year-long cooling off before they can be resubmitted.
It's a radical approach, and approximates to a degree the prevailing practice in journal-based publication environments. It does raise some questions (some raised by Panos in the original post):
  • The year-long cooling off seems excessive punishment for what will still by necessity be a less than perfect review process
  • How will this new journal interact with other database journals ?
  • Can one journal hope to keep up with the volume of papers being produced ? Just SIGMOD, VLDB and ICDE take in over 2000 submissions between the three of them. That's close to 6 submissions EACH DAY.
  • What happens when you get into areas that overlap with DB ? For example, what about KDD, and other data mining conferences ?
  • What about all the years and years spent arguing with tenure committees about the primacy of conferences in computer science ? "Oops ! we changed our mind" ?
  • What kind of prestige will now be attached to giving presentations at the "new" conferences ? More importantly, since it was JUST ESTABLISHED that double blind submission helps remove bias at conferences like SIGMOD, isn't this a step backward in terms of which papers are chosen for presentations at conferences ? I can't imagine the process of getting invited for a talk at such a conference getting easier with this process. Are we heading towards (again) the bio model of big plenary talks by bigwigs, and lots of posters, or the math model where anyone who wants to give a talk can ?
Separating the idea of publication and dissemination is dear to my heart (I have always felt that conferences in CS fail by needing to serve both these masters at the same time), and so I'm bullish on proposals like this. But I do see problems in the details, and am curious to see how things pan out over time.

Tuesday, April 15, 2008

Deadlines (tax, conference, ...)

For the first time, I had to rush on the last day to get taxes out in time. It was a rather intense few hours, and I couldn't help but think that most of this would have been impossible in the pre-electronic era: I was able to download missing documents, get forms online, and even download tax software, all in a matter of a few clicks.

I'm also somewhat ashamed to admit that I rather enjoyed the adrenaline rush of getting to the post office with 10 minutes to spare and sending everything off. It reminded of conference submission deadlines (and I imagine that in the days before electronic submission, people literally had to rush to the post office to send copies of a submission in).

But then I got to wondering. In this hyper-competitive era, with submissions increasing, and acceptance rates, is it slowly becoming less likely that papers submitted at the last minute can compete with more polished submissions ?

Now I like data as much as anyone else, and was wondering what kinds of statistics we could glean from information already available. For example, every year PC chairs trot out statistics about acceptance rate as a function of when papers were submitted. Because of the almost hyper-exponential rate of submissions as the deadline approaches, these numbers are necesarily unreliable, but to the extent that one believes in such statistics, there's always a kind of maximum point somewhat away from the actual deadline.

I don't know if this tells me what I need to know though: does a drift of the maximum point away from the deadline prove my point ? Lacking any actual information about which papers are more "polished" i.e maybe re-submissions, or papers prepared earlier, one has to work off the assumption that papers that are ready earlier are submitted earlier, and I'm not willing to buy that argument without more data.

So maybe I should rely on anecdotal evidence. How about it, readers ? In your personal experience (I emphasize this point: I want to distinguish your opinions from what you've encountered in your own work), do you think that it's become less likely that a last-minute paper gets into a conference, especially the top ones ? Feel free to generalize this question to other deadline-based submission fora (even proposals, if you wish).

p.s A blog post without links ! Scandalous...

Saturday, April 12, 2008

A few links

I've been at ICDE in Cancun for the last few days because of this, and it's hard to blog too much from "the best beach in Mexico". There are some interesting trends emerging from this conference, many of which might be interesting to algorithms folks, and I'll try to say more about this later on.

Two links that came across Google Reader and might be of interest:
  • "Shut up and calculate", by Max Tegmark (via Mathematics under the Microscope). Tegmark argues for an extreme Platonic position: not only is the universe governed by mathematics, but that it is mathematics ! There's more to this argument in the (short) paper: buried in there is an interesting robust variant of falsificationism:
    For a theory to be falsifiable, we need not be able to observe and test all its predictions, merely at least one of them
  • Via BB, New Scientist's list of the ten weirdest computing models (for actual computing, that is). They start off tamely with optical and quantum computing, and then move on into slightly more esoteric models like DNA computing, culminating with ooze computing and water wave computing (hey, I'm on a beach here, computing my legs off!). One of the more interesting proposals was something called 'reversible computing':
    Every computing operation involves feeding inputs into logic gates, which produce output signals. Instead of discarding the energy of those signals, Frank's gates run in reverse after every operation. That returns the energy of the output signal to the start of the circuit where it is used to carry a new input signal
    Of course, what's not clear to me is why this isn't a poor cousin of quantum computing, where as far as I understand all computations prior to a measurement are reversible.
But the idea of reversible computing gets back to very deep ideas about entropy and the second law of thermodynamics. Think of a box partitioned into two parts, each containing air at the same (global) temperature. In the partition is a small passage, guarded by a tiny demon, who only lets molecules go from left to right if they are hotter than average, and from right to left if they are colder than average. Over time, the right side of the container gets hotter and hotter, and the left side gets colder and colder. This demon is Maxwell's demon, posited to try and find a contradiction in the Second Law of Thermodynamics.

One of the more famous refutations of this apparent paradox was by Leo Szilard, who argued that in order for the demon to do its job, it must somehow gather information about the average temperature, and this gathering of information costs energy. Rolf Landauer famously showed in 1960 that reversible computations need not increase thermodynamic entropy (and also information entropy), and these facts were expanded by Charles Bennett to argue that eventually, a demon that does the desired job must run out of space and start deleting information, an irreversible (and therefore entropy increasing) operation. It's a streaming algorithm !!

There's a book that collects all this work together: Maxwell's Demon 2: Entropy, Classical and Quantum Information, Computing. It's been on my list of 'books to read' for a long time now.

Monday, March 31, 2008

CiteseerX

Rather quietly, Citeseer appears to have undergone an upgrade (at least an alpha-level one). The new CiteseerX allows you to create collections, associate tags with papers, and has a generally snazzier interface. Whether it solves the problems of the old citeseer (constant crashing, almost laughably bad bibtex entries) remains to be seen.

Saturday, March 22, 2008

On interview puzzles

Michael Mitzenmacher got hammered over at the complexity blog for trying to argue against the 'interview puzzle' style of interviewing so in vogue at places like Google, Microsoft and others.

Here's PhDComics' take on it:



I should add that I've been collecting questions that job-seekers are asked on such interviews.

Thursday, March 20, 2008

The Joys of NAE-SAT

(ed: the point of this post will be completely lost on you if you don't often prove NP-Completeness results)

When all else fails, the "standard" way of proving a problem NP-hard is to go back to the basics, i.e 3SAT. One particularly quirky variant that I quite enjoy is NAE-SAT:
As usual, you're given an instance of satisfiability with clauses and variables, and you want to check if there's a satisfying instance. In NAE-SAT, the extra catch is that in no clause are you allowed to have all literals set to TRUE. Since no clause can have all literals set to FALSE, you get the name 'Not-All-Equal-SAT', or NAE-SAT.
Like SAT, NAE-SAT is NP-Complete, and remains so if you restrict clauses to containing 3 literals (i.e NAE-3SAT). Other interesting properties:
  • NAE-3SAT is still NP-complete in its monotone variant (i.e if all variables appear unnegated). This is in contrast to SAT, which is trivial in that case. This property is particularly handy if you don't want to deal with how to ensure consistency between a variable and its negation when making a gadget.
  • If X is a satisfying assignment for NAE-SAT, then so is X' (the complement of X). This is again because of the NAE-property. A clause that's satisfied with one literal becomes one that's satisfied with two literals, and so on. Since no clause has all three literals set to TRUE, no clause becomes unsatisfied. This is useful because when you assume that the problem has a satisfying instance, you have two instances you can use (we used this trick in a paper a while back).
  • Unusually, and unlike other SAT variants, Planar-NAE-SAT is in P. This is a surprising result due to Bernard Moret, and prevents us from using NAE-SAT for many geometric problems where planarity is a useful tool (but if you can handle crossings, you're ok).
Anyone for a paean to ONE-in-THREE-SAT ?

Tuesday, March 18, 2008

David Gale, 1921-2008

Via Michael Trick comes the news that David Gale has died. To us CS folks, he's probably best known for the Gale-Shapely stable marriage algorithm (that has controlled the lives of thousands of medical interns since - talk about impact !), and the Gale Transform in projective geometry.

Apart from all of that, he was also a tremendously influential economist and expert in optimization. His obituary has much more.

Friday, February 22, 2008

Notes from Dagstuhl I: Red-black trees

I've just returned (well almost: I'm in JFK waiting out the storm) from a Dagstuhl workshop on data structures. For those not in the know, this is one of the common retreat spots for computer scientists to get together and chat about problems in a very relaxed atmosphere (close to the true spirit of a workshop).

In that spirit, people present things that are more "uncooked", half-baked, or incomplete, and so I won't talk about too many of the presentations, except for the ones which are available online.

In this post, I'll discuss a new take on an old warhorse: the red-black tree.

A bit of background (for those unfamiliar with the story thus far):
One of the most fundamental data structures in computer science is the binary search tree. It's built over a set of items with an ordering defined on them, and has the property that all nodes in the left subtree of the root are smaller than all nodes in the right subtree (and so on recursively). BSTs are handy for building structures that allow for quick membership queries, or "less-than"-type queries.

One can build a BST with depth log n for a set of n ordered items. This means that operations on this structure will take log n time in general. However, if the items can change on the fly, then it's more difficult to maintain such a structure while making sure that updates themselves are cheap (O(log n)).

Among many solutions proposed to handle dynamic maintainence of BSTs is the red-black tree, proposed in its current form by Leo Guibas and Bob Sedgewick back in 1978. The red-black tree is the definitive dynamic BST data structure, at least in practice: it has worst-case log n bounds for all operations, and shows up in the implementations of basic structures in the STL and in many language libraries. By virtue of its being in CLRS, it also has stupefied the minds of undergraduates for many years now.

The new story:
Conceptually, the re-balancing operations that are used to maintain a red-black tree are not terribly hard. However, there are numerous cases to consider, especially when doing deletions, and this tends to make the actual code used to write these trees fairly complex, and thus potentially error prone. In the first talk at Dagstuhl, Bob Sedgewick talked about a simpler variant of red-black trees, called left-leaning red-black trees, whose main virtue is that they are simpler to implement (many fewer lines of code) while maintaining the nice properties of red-black trees.

Bob's slides have more details: although this new structure hasn't been published anywhere, it will appear in the next edition of his book. The key idea is to first understand how a red-black tree simulates a 2-3-4 tree (a more complex search tree in which nodes can have two, three or four children (and therefore, one, two or three keys). It's possible to construct a direct mapping between a node in a 2-3-4 tree and a subtree in a red-black tree.

Once this is understood, the LLRB tree comes about by restricting the subtrees thus formed to be "left-leaning". Specifically, right-leaning red edges are not allowed, and to prevent the tree from being too skewed, more than three consecutive left-leaning red edges are disallowed as well. Doing this allows us to simplify various cases in the insertion and deletion steps, while not increasing the tree depth by too much (this last statement was argued by analogy with 2-3-4 trees).

It's a simple idea, and simplifies code by a substantial amount without sacrificing asymptotics.

Sunday, February 17, 2008

The most apologetic conference rejection I've received:

From the PC Chair:
Please know that I understand how much work it is to do research
and write a paper. Rejection is never fun, happens to all of us, and there is likely always an element of misunderstanding to it.

Because CONFERENCE accepts such a small percentage of papers, however, rejection from CONFERENCE may not at all mean your paper will be rejected at other high quality conferences. To that end, I encourage you to take into account the reviewers' recommendations. They have spent many hours with your paper and their remarks (and even their misunderstandings) may help you to clarify your paper or perhaps to do some more work.
I can just imagine the PC sitting around a table, tears streaming down their eyes, as they pen this dejected missive to me.

Wednesday, February 13, 2008

Metric spaces, VC-dimension, and metric entropy.

For a problem that I've been working on, it turns out that if a related range space has bounded VC-dimension, the problem can be solved exactly (but with running time exponential in the dimension). The range space is constructed from two parameters: a metric space (X, d), and a radius e, and consists of the domain X, and all balls of radius at most e in X.

So a natural question that I've been unable to answer is:
What properties of a metric space ensure that the induced range space has bounded VC dimension ?
Most of what we do know comes from the PAC-learning community. For instance, the doubling dimension of a metric space is the smallest number d such that any ball of radius e can be covered by at most 2^d balls of radius e/2. In recent years, it has been popular to explore the extent to which
"metric space of bounded doubling dimension == bounded dimensional Euclidean space"
is true. Unfortunately, there are spaces of bounded VC-dimension that do not have bounded doubling dimension.

Another related proxy for the "dimension" of a metric space is its metric entropy: Determine the minimum number of balls of radius e needed to cover all points in a metric space. The log of this number is the metric entropy, and among other things is useful as a measure of the number of points needed to "cover" a space approximately (in a dominating set sense). It's known that the metric entropy of concept classes is closely related to their VC-dimension, (where the underlying metric is symmetric difference between the classes). I am not aware of any general result that relates the two though.

For more on the dizzying array of numbers used to describe metric space dimension, do read Ken Clarkson's magnificent survey.

[On a side note, I don't quite understand why the term "entropy" is used. It seems to me that if one wanted to use entropy, one would compute the entropy of the resulting set of balls, rather than merely the log of their number. ]

Tuesday, February 12, 2008

SoCG results out

Via compgeom-announce: (papers in order of submission time stamp)
  1. Manor Mendel and Assaf Naor. Markov convexity and local rigidity of distorted metrics
  2. Noga Alon, Robert Berke, Maike Buchin, Kevin Buchin, Peter Csorba, Saswata Shannigrahi, Bettina Speckmann and Philipp Zumstein. Polychromatic Colorings of Plane Graphs
  3. Jinhee Chun, Matias Korman, Martin Nöllenburg and Takeshi Tokuyama. Consistent Digital Rays
  4. Eitan Yaffe and Dan Halperin. Approximating the Pathway Axis and the Persistence Diagram of a Collection of Balls in 3-Space
  5. Naoki Katoh and Shin-ichi Tanigawa. Fast Enumeration Algorithms for Non-crossing Geometric Graphs
  6. Ken Been, Martin Nöllenburg, Sheung-Hung Poon and Alexander Wolff. Optimizing Active Ranges for Consistent Dynamic Map Labeling
  7. Hans Raj Tiwary and Khaled Elbassioni. On the Complexity of Checking Self-duality of Polytopes and its Relations to Vertex Enumeration and Graph Isomorphism
  8. Victor Chepoi, Feodor Dragan, Bertrand Estellon, Michel Habib and Yann Vaxes. Diameters, centers, and approximating trees of delta-hyperbolic geodesic spaces and graphs
  9. Esther Arkin, Joseph Mitchell and Valentin Polishchuk. Maximum Thick Paths in Static and Dynamic Environments
  10. Julien Demouth, Olivier Devillers, Marc Glisse and Xavier Goaoc. Helly-type theorems for approximate covering
  11. Sarit Buzaglo, Ron Holzman and Rom Pinchasi. On $k$-intersecting curves and related problems
  12. Mohammad Ali Abam, Mark de Berg and Joachim Gudmundsson. A Simple and Efficient Kinetic Spanner
  13. Frederic Chazal and Steve Oudot. Towards Persistence-Based Reconstruction in Euclidean Spaces
  14. Ken Clarkson. Tighter Bounds for Random Projections of Manifolds
  15. Krzysztof Onak and Anastasios Sidiropoulos. Circular Partitions with Applications to Visualization and Embeddings
  16. Bernard Chazelle and Wolfgang Mulzer. Markov Incremental Constructions
  17. Kenneth L. Clarkson and C. Seshadhri. Self-Improving Algorithms for Delaunay Triangulations
  18. Frederic Cazals, Aditya Parameswaran and Sylvain Pion. Robust construction of the three-dimensional flow complex
  19. Herbert Edelsbrunner, John Harer and Amit Patel. Reeb Spaces of Piecewise Linear Mappings
  20. Evangelia Pyrga and Saurabh Ray. New Existence Proofs for $\epsilon$-Nets
  21. Lars Arge, Gerth Stølting Brodal and S. Srinivasa Rao. External memory planar point location with logarithmic updates
  22. Eric Berberich, Michael Kerber and Michael Sagraloff. Exact Geometric-Topological Analysis of Algebraic Surfaces
  23. Misha Belkin, Jian Sun and Yusu Wang. Discrete Laplace Operator on Meshed Surfaces
  24. Olivier Devillers, Marc Glisse and Sylvain Lazard. Predicates for 3D visibility
  25. Luca Castelli Aleardi, Eric Fusy and Thomas Lewiner. Schnyder woods for higher genus triangulated surfaces
  26. Erin Chambers, Jeff Erickson and Pratik Worah. Testing Contractibility in Planar Rips Complexes
  27. Rado Fulek, Andreas Holmsen and János Pach. Intersecting convex sets by rays
  28. Noga Alon, Dan Halperin, Oren Nechushtan and Micha Sharir. The Complexity of the Outer Face in Arrangements of Random Segments
  29. Maarten Löffler and Jack Snoeyink. Delaunay triangulations of imprecise points in linear time after preprocessing
  30. Erin Chambers, Éric Colin de Verdière, Jeff Erickson, Sylvain Lazard, Francis Lazarus and Shripad Thite. Walking Your Dog in the Woods in Polynomial Time
  31. Jean-Daniel Boissonnat, Camille Wormser and Mariette Yvinec. Locally Uniform Anisotropic Meshing
  32. Adrian Dumitrescu, Micha Sharir and Csaba Toth. Extremal problems on triangle areas in two and three dimensions
  33. Timothy M. Chan. A (Slightly) Faster Algorithm for Klee's Measure Problem
  34. Timothy M. Chan. Dynamic Coresets
  35. Timothy M. Chan. On Levels in Arrangements of Curves, III: Further Improvements
  36. Minkyoung Cho and David Mount. Embedding and Similarity Search for Point Sets under Translation
  37. Jacob Fox and Janos Pach. Coloring K_k-free intersection graphs of geometric objects in the plane
  38. Timothy G. Abbott, Zachary Abel, David Charlton, Erik D. Demaine, Martin L. Demaine and Scott D. Kominers. Hinged Dissections Exist
  39. Vida Dujmovic, Ken-ichi Kawarabayashi, Bojan Mohar and David R. Wood. Improved upper bounds on the crossing number
  40. Pankaj Agarwal, Lars Arge, Thomas Mølhave and Bardia Sadri. I/O Efficient Algorithms for Computing Contour Lines on a Terrain
  41. Pankaj Agarwal, Bardia Sadri and Hai Yu. Untangling triangulations through local explorations
  42. Ileana Streinu and Louis Theran. Combinatorial Genericity and Minimal Rigidity

Monday, February 04, 2008

2007 ACM Turing Award, and a perspective on model checking.

The 2007 Turing Award was awarded to Edmund Clarke, E. Allen Emerson and Joseph Sifakis for their work on model checking. In our "theory A" cocoon, we tend not to pay attention to much "theory B" work, and model checking is one of the shining stars in theory B, with a rich theory and immense practical value.

I am not competent to discuss model checking at any level, and so I asked someone who is ! Ganesh Gopalakrishnan is a colleague of mine at the U. and heads our Formal Verification group. I asked him if he could write a perspective on model checking, and he kindly agreed. What follows is his piece (with URLs added by me). As usual, all credit to him, and any mistakes in transcribing are my fault alone.

A Perspective On Model Checking
Ganesh Gopalakrishnan

One of the earliest attempts at proving the correctness of a program was by Alan Turing in 1949. The activity was raised to a much more prominent level of interest and importance through the works of Dijkstra, Hoare, and others. Yet, the enterprise of "Program Verification" was met with more noticeable failures than successes in the 1970's, due to its inability to make a dent on practice. The introduction of Temporal Logic for modeling concurrent systems (for which Amir Pnueli won the 1996 Turing Award) was the first attempt at shifting the attention away from the impossibly difficult task of 'whole program verification' to a much more tractable focus on verifying the reactive aspects of concurrency.

Looking back, this enterprise did not meet with the expected degree of success, as Temporal Logic reasoning was still based on `deduction' (proof from first principles). From an engineer's perspective, what was needed was a simple way to specify the classes of "legal and illegal waveforms" a system can produce on its signals / variables, and an efficient way of checking that all those waveforms are contained in the set of "legal waveforms" as can be specified through temporal properties. In other words, it was sufficient to show that finite state machines (or finite state control skeletons of programs) served as *models* for a catalog of desired temporal properties.

This was essentially the approach of model checking. It immediately took off as a practically viable method for debugging concurrent systems; the approach guaranteed exhaustive verification for small instances of a problem. In practice it meant that all relevant corner cases were considered in the confines of afforable resource budgets. This was the gist of Allen Emerson's contribution in his PhD dissertation which he wrote under Ed Clarke's guidance. A parallel effort underway at France in Joseph Sifakis's group also arrived at the same pragmatic approach to verification. Model checking was given crucial impetus by Ken McMillan who, using the technology of Binary Decision Diagrams formulated and popularized by Randy Bryant, developed the highly efficient Symbolic Model Checking approach. The technology has since then skyrocketed in its adoption rate.

Today, model checking and its derivatives are part of every verification tool used to debug digital hardware. While "full verification" is not the goal (and is largely a meaningless term), the ability of model checkers to find *high quality bugs* is uncanny. It is those bugs that escape simulation, and may not even manifest in hardware that operates at the rate of GHz. Some of the error traces produced by model checkers are only about two dozen steps long; yet, if one imagines the number of reachable nodes in a tree with arity (branching factor) 2000 in two dozen steps, one understands why conventional approaches to testing miss bugs: they have to make the same lucky "1 in 2000" choices two dozen times in a row. A model checker can often get away by not really modeling these choices explicitly. Reduction techniques allow model checkers to detect and avoid exploring many "equivalent" behaviors. In that sense, model checking is one of the most powerful "test acceleration technologies" that mankind has invented.

Today, microprocessor companies employ several kinds of model checkers. In software, Bell Labs (mainly before its collapse), JPL, NASA, and Microsoft (to name a few) have had tremendous success using model checking for verifying device drivers, switch software, and flight control software. Speaking more generally, anything with a finite state structure (e.g., decision processes, reliability models, planning in AI) can be approached through model checking.

Aftermatter:
  • Daniel Jackson (MIT), "With its dramatic successes in automatically detecting design errors (mainly in hardware and protocols), model checking has recently rescued the reputation of formal methods." (source)
  • "Proofs of programs are too boring for the social process of mathematics to work." DeMillo et al in CACM 1979).

Saturday, February 02, 2008

Thursday, January 31, 2008

Quantum Algorithms for Computational Geometry

You'd think the title of this post is a joke paper title, but it actually isn't. There are now at least two papers (the second one even uses the word 'revisited' in the title!) on quantum algorithms for geometric problems.

Neither paper (and they basically admit this themselves) really pushes the envelope in the QC dimension. The first one, by Sadakane, Sugawara and Tokuyama, uses Grover's quantum search algorithm as a subroutine to solve a number of search problems in geometry, many of these reduced to the problem of finding the lowest point in the upper envelope of an arrangement of halfplanes. The second, by Bahadur, Durr, Lafaye and Kulkarni, uses a quantum algorithm for element distinctness based on a random walk method by Ambainis to solve some of the above problems. They also show lower bounds for some of these problems. One interesting result is an n^{2/3} lower bound, and nlog n upper bound (!) for 3SUM.

I'm not sure what to make of these results: apart from the fact that I have a hard time seeing a need for quantum algorithms for geometric problems, I don't know whether these are anything more than (insert model of computation) algorithm for (insert name of problem) type results (not that such results are bad, but they are not always interesting).

Monday, January 28, 2008

Plotting implicit functions: a bleg..

Does anyone know of good tools for plotting implicit functions ? Ideally, I'd like something that could plot functions of the form f(x,y) = 0, and even contours of the form f(x,y) = c.

(for those who aren't hip with web jargon, a bleg is a beg on a blog ;)

Important heuristics: Alternating optimization

The Kleinberg/Tardos algorithms textbook has a nice chapter on local search heuristics. As has been discussed earlier, learning heuristics as part of an algorithms class is important so that we know what to do when a problem is NP-hard and approximations are either provably hard or elusive.

However, perhaps one of the most important heuristics that is used in practice is not discussed, and I have yet to find a source that examines its power fully. This heuristic is "Alternating minimization", and appears "in the wild" in two wildly popular heuristics, the k-means algorithm in clustering and the Iterated Closest Pair algorithm in model registration.

Suppose you have a cost function D defined over the product of domains P and Q. For concreteness, let's say that P is the space of all assignments of points to clusters, and Q is the space of all assignments of centers to clusters. The standard k-means cost function asks for an assignment of points to clusters, and an assignment of centers to clusters, so that the sum of squared distances of points to their cluster centers is minimized.

In general, such an optimization is hard to perform. However, it is often the case that for a fixed p in P, we can minimize D(p, Q), and vice versa. Continuing the example above, using Euclidean distance as the underlying distance, it's easy to see that for a fixed set of points in a cluster, the minimizing center is their centroid. Moreover, for a fixed set of centroids, the assignment is a straight nearest neighbour assignment.

This is the idea behind alternating minimization. Starting with arbitrary p0 and q0, we repeat the two-step procedure:
  1. p_i = arg min D(p, q_{i-1})
  2. q_i = arg min D(p_i, q)
in the hope that it will converge to the optimal solution.

Now there are three questions that one usually has to answer for an alternating minimization. The first one is relatively straightforward:


Does it converge ? In general, it's often fairly easy to write down a potential function that decreases strictly with each iteration and is always positive. Thus, convergence can often be established for this procedure.

Does it converge to the optimal solution ? In general, no. This procedure can often get stuck in local minima. However, (and this is one of the more attractive aspects of this heuristic), Csiszar and Tusnady showed that as long as a certain "5-point" property holds (a simple inequalities involving iterates of the procedure), the process will converge to the optimal solution. In their paper, this result is used to show convergence when P and Q are convex sets of distributions, and D is the Kullback-Leibler distance.

Another useful example where the procedure converges is when P and Q are convex sets, and D computes the closest distance between the two sets. In that case, each iterative step determines the closest point in a convex set closest to a given point.

A variant of the above question is: does it converge to a close-to-optimal solution ? Again, in general, one cannot say anything useful about this. However, for k-means, recent work by Arthur and Vassilvitskii, and Ostrovsky, Rabani, Schulman and Swamy, has shown that choosing start points carefully can yield provable bounds on the quality.

Does it converge in polynomial time ? Perhaps the most important question about this heuristic is whether it converges in polynomial time. For k-means, it can be shown that there are examples that will force the heuristic to take superpolynomial to converge. However, recent results in smoothed complexity have shown that if the input is perturbed ever so slightly, then the iterative procedure converges in polynomial time (with high probability). In a sense, one can view this result as explaining the success of the method in practice.

Alternating minimization appears in many domains, and is a common trick for "solving" complex optimizations. It should definitely be taught wherever heuristics are discussed.

Tuesday, January 22, 2008

SODA 2008: Summaries

I've written before about the perils of blogging at the conference: another problem is that if you don't blog, people come by and start asking you when the posts are going to appear.

Before you read further, do check out posts by DavidE (II), Michael Mitzenmacher (I, II), and a "stranger in a strange land" take by Hal Daume, my colleague at Utah and an ML/NLP person whose blog you should definitely follow if you're interested in such topics.

There's been a lot of work on clustering in high dimensional spaces, and I want to say a lot more about this, especially the "abstract framework" results of Kumar, Sabherwal and Sen, but that will have to wait for another post. For now, I want to highlight the paper, "Clustering for Metric and Non-Metric distance measures" by Ackermann, Blomer and Sohler. Essentially what they do is take the framework for approximate linear time clustering in high dimensions, and remove the need for either symmetry or triangle inequality in the distance measure, replacing it by a set of sampling conditions that have a more statistical feel to them. Doing this allows them to cluster with more general distance functions, including the relative entropy or Kullback-Leibler divergence. I'm a sucker for results that illustrate "what's going on", especially in the realm of approximate high-D geometry, and this is a great example of attempting to get to the core strucutural properties needed to make an algorithm work.

Outtakes:
  • After many years, I actually missed the business meeting this time, so no drinking games or updates. From what I hear, Austin is the first choice for the venue in 2010 (we're in NYC for 2009). Claire Mathieu is the PC chair for SODA 2009.
  • Overall, I really enjoyed SODA this year: somehow I felt that there were far too many interesting papers (too many for me to attend the talks, that is), and the papers were of really high quality.

Monday, January 21, 2008

SODA 2008: Day 1

I've been terribly remiss in blogging, because I've actually been trying to do some work (!). At any rate, do read David's take on Day 1.

The first session I attended was on embeddings (which was placed parallel to the geometry session: NOT A GOOD IDEA). The first talk was by Edo Liberty, on work with Nir Ailon on "Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes".

The Johnson-Lindenstrauss lemma, which tells us that any set of n points in an d dimensional l_2 space "approximately" lives in a space of dimension k = log n, has had a huge effect on data analysis, as a tool for reducing the dimensionality of a set of points. Although the construction is very simple (project each point using a matrix populated with random entries), it doesn't scale, because the matrix required is dense, and so is hard to maintain for large high-dimensional data sets. Also the transformation itself takes time O(dk) per point, and is therefore expensive if d and k are large.

Earlier work by Ailon and Chazelle showed that the JL-transform can be implemented in time O(min(d log d,k^3), which improves the "trivial" bound for certain values of k (as a function of d). The paper presented above improves the embedding speed for much larger ranges of k, and draws on a number of tools from Banach spaces and functional analysis, as well as using some coding theory to "derandomize" the construction. It's a good example of practice (how do we implement the JL-transform efficiently) driving new and nontrivial theory results.

Saturday, January 19, 2008

ALENEX/ANALCO 2008: Invited talks

I'm in San Francisco for the SIAM-ACM Symposium on Discrete Algorithms (SODA), and as is tradition, Saturday is the day for ALENEX (the workshop on experimental algorithms) and ANALCO (the workshop on analytic combinatorics).

ALENEX:

The ALENEX invited talk today was on routing problems. Rolf Mohring from TU Berlin presented a pair of case studies of routing problems in the "wild".

The following video explains the first case study:


Actually, the problem he was looking at was a tad simpler: you have containers coming in at Hamburg port, and you have automated robot vehicles that ferry the containers from the loading dock to their temporary storage locations. The problem is to create routes for the vehicles (on the fly) so that they reach their destination at a reasonable time without colliding with each other.

The actual solution involved successive shortest paths on a "time-expanded" flow graph, which in their case was a relatively simple grid graph. The time expansion refers to the fact that an "edge" is occupied between two time instants, and of course travel is blocked on an edge when it's occupied.

The second application, which in my mind was more impressive, was an instance of large-scale optimization to redesign the Berlin subway system. Perhaps the most impressive aspect of this work was that it's actually being used ! The latest incarnation of the Berlin subway schedule uses the schedule designed by their scheme, and the speaker showed a neat 4 minute Numb3rs-style video at the end targeting the layman that explained their work.

This is a tremendous achievement, both technically and politically. And I can't but wonder whether it's something about Germany itself, the home of classic engineering, that allows optimizations designed by researchers in a university to actually be implemented on a subway system. I'm probably wrong here, in that there must be numerous examples of success stories of OR in the US as well, but it does make me wonder.

Analysis: As examples of "applying algorithms in the real-world", the talk was great. As an example of the intricacies of doing "applied algorithms", it was less impressive: most of the "compromises with the real-world" were the same kinds of modifications to theoretical problems that one might expect to see in papers at ALENEX, and there was little takeaway to be had or lessons to be learned: I'd have loved to hear about how exactly they were able to convince the Berlin subway authority to implement their schemes.

ANALCO:
This was either planned, or a happy coincidence, but the ANALCO invited speaker was Don Knuth, just a few days after his 70th birthday. The title of his talk was 'Puzzling problems', and as befits a Knuth talk, it was a random collection of cute problems in combinatorics. He had promised 5 such, with an exponentially increasing amount of time spent on each successive problem, but he only got through three of them. David's already described them in more detail here; all I can say is that it was a typical Knuthian talk :)

And yes, there were talks ! One talk that I'll draw attention to is on the work by Coleman and Wirth on the feedback arc set problem on tournaments. Given a directed graph in which for each pair of vertices, there's a directed edge one way or the other (i.e a tournament graph), can we delete the fewest number of edges so that the resulting graph is a DAG ? One important application of this problem is for aggregating rankings. Think of all the non-bribed figure skating judges at a competition, each creating a ranking of the performers. How do we aggregate these rankings into a "consensus" ranking ?

What is interesting about this problem is that it relates closely to sorting algorithms. In a sense, you can think of the rank aggregation problem as attempting to "sort" the participants into a total order, much like we have with a sorting problem. An earlier paper by Ailon, Charikar and Newman showed that a quick-sort like heuristic actually achieves a 3-approximation to the optimal solution to the feedback arc set problem, and other local heuristics for feedback arc set can also be related to sorting algorithms (although the analog of insertion sort performs quite badly).

Mergesort has not been translated to this framework: it's not obvious (but not hopeless) to see how to reformulate mergesort as an algorithm for feedback arc set, and indeed one open question from the earlier Ailon et al paper is an analysis of the behaviour of a mergesort-like method.

That's all for today, folks ! Tomorrow the action starts in earnest...

After notes:
  • The hotel wireless works in the lecture rooms (!). It's also free. this must be a first...
  • The location is great. The hotel is on Van Ness and there are tons of restaurants (and Peets) within a block, as well as a drugstore and a Whole Foods.
  • The ALENEX business meeting was the shortest business meeting in recorded history: it took all of 5 minutes. I commend the organizers for their good taste, and can only hope that SODA emulates them.

Thursday, January 17, 2008

A Post-Doc opening

I'll post this on the usual forums shortly, but thought I'd place it here since SODA is coming up.
I'm looking to hire a post-doctoral researcher to work with me at the School of Computing. The position is initially for one year, with the possibility of extension for another year based on mutual consent. Applicants must have expertise in algorithms and computational geometry, and preferably should have experience with high-dimensional geometry, approximations, and large data algorithms (external memory/streaming/etc). Some familiarity with differential geometry is a definite plus.

For more on my research, visit my webpage. To apply, please send me a CV, a statement, and three letters of reference. Feel free to contact me if you have questions about the position.
To add to the above: if you think you might be interested, and will be at SODA, drop me a note and we'll try to meet up.

Wednesday, January 16, 2008

Readable math

Bill Gasarch asks about math books you can actually read. This reminded me of my post from over 3 years ago, "Books that read like a thrilller". I'd update that list with
  • Information theory, by Cover and Thomas
  • Randomized Algorithms (Motwani-Raghavan): it's an old book, but even when I read it today, I realize just how well it flows.
  • Convex Optimization (Boyd/Vandenberghe): there's just something about the typesetting :)
  • Computational Complexity (Arora/Barak)
  • Convex Analysis (Rockafellar): A model of precision and clarity
  • (almost, but not quite in the pantheon): Approximation algorithms, (Vazirani)

Two surveys/sites on string matching and text processing.

1. A survey on string algorithms and data structures by Paolo Ferragina.
2. A paper surveying the theory and practice of compressed text algorithms, by Ferragina, Gonzalez, Navarro, and Venturini.
3. The "Pizza and Chili" site for test data and sources for compressed text indexing (associated with the paper above).

Tuesday, January 15, 2008

The CACM is dead ! Long live the CACM

Readers of this blog will know that I love to HATE the CACM, the embodiment of all that is wrong in our IT-obsessed, SCIENCE-ignoring culture. Well, the main source of my angst is now gone ! vanished ! replaced by something that <shudder> resembles an actual scientific magazine. Here's the story:

Way back in 2005 (before Web 2.0 ! Prehistoric!), the ACM realized that something needed to be done about the CACM. As Peter Denning tells it,
When David Patterson was president of ACM, many researchers told him they thought CACM had never regained its vaunted glory of the 1970s. Patterson set up a committee to review the current model and propose ways to recharge its content and scope
This committee was chaired by Moshe Vardi David Patterson, and Moshe Vardi, who is now the editor-in-chief (EIC) of the new and improved CACM. He recounts a story that is familiar to all of us:
While the format envisioned in the early 1980s was that of a content-rich magazine, the reduced role of the editorial advisory board combined with a relatively small professional staff meant that most of the content came from submitted articles. Over the years those articles have evolved to be strongly slanted toward Management Information Systems. Over time, a significant segment of the ACM membership lost interest in the publication.
The committee was working off the model of Science, as a fast turnaround-high-prestige magazine that showcased work of broad interest, without necessarily being too technical in focus. The main conclusions of the committee were to leave the more "practice" sides of computing to Queue magazine, and focus on a more research and news structure. The new CACM has the following basic outline:
  • News: The news section will have a very distinct “voice,” covering research and practice in computing on a global scale.
  • Practice: CACM will offer coverage of cutting-edge, emerging technology issues.
  • Breakthrough Research: The goal is to bring research articles, covering the broad spectrum of computing research, back to CACM. This will be a combined effort of conference and program chairs (ACM as well as non-ACM) and the CACM editorial board to select the best research material coming out of conferences and share that information with the wide readership of CACM. Each selected research paper will be adapted to the broad CACM readership and will be preceded by a short “Technical Perspective,” giving readers an overview of the important points and significance of the research.
  • Refereed Articles: CACM will continue to publish, as it has since its early days, peer reviewed articles.
  • Opinions and Views: The final component in the content model of the new CACM is a section dedicated to opinions and views, typically of a nontechnical nature.
Of course, the middle two bullets are the new ones, and hopefully will drive CACM back to its roots as a technical magazine. What's interesting to me is that as Peter Denning tells it, almost this exact same restructuring was first proposed in 1983, and floundered after a visit to the offices of Science, where the ACM folks realized that a true research-driven magazine required far more technical manpower than the ACM of the time was willing to commit. Moshe Vardi indicates plans to expand the staff of CACM in order to make sure that the new incarnation can be effective in this mode.

Another important aspect of the new incarnation is a complete graphic design overhaul:
There is also a need for a complete redesign of CACM’s Web site, with the goal of creating a dynamic, rich Web site that is much more than an online store of the magazine’s content. The aim is to think of CACM as consisting of a print publication, a rich Web site, and email channel to readers.
And so we now have the 50th edition of CACM, in its new form. It's extremely unfair to judge a major revamp like this after one issue, but let's see how it looks anyway.
  • The design: It's pretty slick, with a web-based contents and navigation mechanism inside what is called a "Published Web format" designed by a company called Texterity. The feel is PDF-like, but on the web. There are all kinds of nice features for extracting interesting web links from pages, or from sections. Overall, quite snazzy (but then I'm not a professional graphic designer, so what do I know). I will add that given the general 30 year lag between technology in the "real world" and on the ACM sites, this is a ginormously humongous improvement. The interface does seem a little slow to respond to things like page turns and menu options; I'm using the actual PDF for quick reading and text cut-and-pasting.
  • General content: Given the 50 year anniversary, there are many retrospectives, as well as articles by each of the past CACM EICs. These tell a very interesting picture of the rise and fall of CACM, especially in the period when Peter Denning was in charge. He tells of how budget cuts and restructuring led to the decision to remove research content from it. There are perspectives from well known luminaries in the field, including Jon Bentley, Rodney Brooks, Jeanette Wing among others. There's a longer article by Gordon Bell on "a theory of the evolution of a computer"
  • Research content: There are two main "breakthrough research" articles. A first on MapReduce, the Jeffrey Dean/Sanjay Ghemawat algorithmic design paradigm that runs much of Google's internals, with an introduction by David Patterson. The second is on an old friend, locality sensitive hashing, (in a new incarnation by Alex Andoni and Piotr Indyk) and the introduction is by Bernard Chazelle, which basically means you all should IMMEDIATELY stop reading this and go read what he wrote, gems like, "Sharp measure concentrations and easy spectral predictions are the foodstuffs on which science feasts."

    Both of these are excellent choices for "broad interest" articles. MapReduce has revolutionized the way we think (or should think) about large-scale computing, and there's even been some attempt at an algorithmic model around it. Nearest neighbour search is one of the most basic operations in any kind of data analysis, and I know for a fact that many people actually use LSH for their data mining problems. More of this, please !
  • The opinion section: Not much to remark upon in this issue: definitely nothing that would satisfy a focus group attendee's wish for "blood on the pages". But again, time will tell.
For years, we've all been complaining about the CACM, and the new revamp definitely deserves our attention. It'll take some time before we can tell if the new version of CACM is truly better, but the signs are looking good.

Saturday, January 12, 2008

Core Sets As Grids

This is another post in my ongoing series of missives from the algorithms seminar I ran last semester (yes I know, I'm late: I have a good excuse though, and it currently weighs 6 lb 11oz).

One of the most exciting developments in the area of high dimensional geometry was the introduction of the idea of core sets. The basic premise is this: suppose we have a geometric optimization problem on a set of points P of size n (Amen!) that can be solved exactly in time p(n) (where p might be some relatively unpleasant polynomial in n). Suppose we can select a set of points S (not necessarily contained in P) of size g(1/epsilon) (the "core-set") such that the value of the optimal solution on S is close (i.e within a 1+epsilon factor) to the value on P, then we can run our expensive algorithm on S, rather than on P. The new running time is n (typically) to compute S, plus p(g(1\epsilon)) to obtain the approximate solution. What we've done here is construct a linear-time approximation scheme for our optimization.

There's a bit of nice magic here: we are essentially claiming that a set of points with size independent of the input can suffice to give a good approximation to the desired answer.

There are many ways of understanding the real power of core sets, and when they exist: perhaps the most surprising result is that for certain high dimensional problems, it is possible to find a core-set that has size independent of both the input size and the input dimension, thus circumventing the famed curse of dimensionality. But that's not what I want to discuss here. Rather, I want to present the view of core sets as a kind of generalized grid: I find this viewpoint less "magical" and often useful when trying to figure out when a core-set might actually exist.

Sparsifying via a grid:
Suppose we have a large collection of points in the plane, and we wish to "shrink" the input to some manageable size. The first thing we can try is to lay down a grid, with side length R, and snap each point to the center of the grid cell it's in. If n is large compared to R, we'd expect many points to lie in each cell. If the diameter of the point set is D, then we need (D/R)^2 grid cells.

There are two problems with this: firstly, D could be much larger than n. In fact, since D is a function of the coordinates of the input, it could be exponentially larger than n ! Secondly, the perturbation of each point to the center of the grid incurs an error of possibly R/\sqrt{2}, and so R needs to be very smalll.

Consider the problem of computing the minimum enclosing ball of a collection of points. The radius of this ball is within a constant factor of D by the triangle inequality. This means that if we're willing to tolerate a multiplicative error in our bounds, we can perturb each point by epsilon*D, while maintaining an epsilon-approximation. Here, setting R = epsilon' D suffices, and we only need O(1/epsilon^2) cells. Voila ! a core-set.

Of course, we're not going to be this lucky all the time: for example, if we want to compute the width (the smallest distance between parallel lines that bound the point set), then the diameter has no relevance: the width of a point set can be zero even if its diameter is large (if all points are on a line). In that case, how would we choose a grid size that would work for us ?

Approximating all directions:
The answer, as it turns out, is to try and approximate the "diameter" in all directions. More precisely, we can define the extent of a point set along a certain direction as the diameter of the projection along that direction. Having done so, the property we desire of our "grid" is that it approximates this extent function along each direction. For many interesting optimization problems, the extents suffice to compute the optimal solution, and so approximating the extents suffices to approximate the solution.

The easiest way of approximating extents would be to compute the convex hull of the point set. This might not be a sparse representation in general. A better answer is to place a grid, and pick out all non-empty grid cells along the boundary (actually, even picking the extreme cells along each vertical direction suffices).

Notice that by concerning ourselves only with extents, we have essentially eliminated the problem of large diameters (because we can always scale the point set). What remains is the problem of variation in the extents. If a point set has a very small extent along a direction (say the direction that defines its width), then in order to capture that extent correctly, we'd need a very fine grid resolution.

Exploiting fatness:
The problem here appears to arise from the fact that a point set can be skinny. The notion of fatness in computational geometry is a very powerful one: there are many ways of defining it, but in all cases, the measure captures the degree to which a convex shape is "rounded" versus "skinny" (one definition is the ratio between the minimum enclosing ball and the largest inscribed ball). If a point set is alpha-fat, then upto a (constant) factor alpha, it's round, and therefore has comparable extents in all directions.

Performing an affine transform can make a point set alpha-fat (for any constant alpha). Although an affine transform will play havoc with distances, it preserves ratios: in particular, the ratio between exact and approximate extents.

And with this, we are done. We transform the point set so that it's alpha-fat, and choose a grid size that's a function of alpha and epsilon (after scaling the point set so that the diameter is 1). Then we snap all points to grid cells as before, and pick up the highest and lowest grid cells in each vertical column. What we get is a point set that captures the important characteristics of the input, but whose size depends solely on the error parameters: in other words, a core-set.

Extensions:
What makes this idea particularly powerful is that it can be extended. For example, we can construct core sets for the "extents" of a set of linear functions by going to the dual space. Similarly, we can construct core sets for collections of polynomials by using the linearization trick and then going to the dual space (in higher dimensions). But ultimately, it boils down to constructing a clever grid.

Note: most of this exposition comes from a reading of Geometric Approximation via Coresets, by Pankaj Agarwal, Sariel Har-Peled and Kasturi Varadarajan.

Thursday, January 10, 2008

Happy Birthday, Don Knuth !

Today is Don Knuth's birthday, and by way of tribute, there are posts from David, Luca and Scott that commemorate him. Among the many obsessions of Knuth is one that went into the new editions of TACP: typesetting the names of cited authors in their native language. This meant a laborious process of converting names into their original scripts and then doing a Unicode translation. I was at Stanford for some of this, and did my (tiny) share of translations in Hindi (and even once in Tamil!).

But what I wanted to highlight is a very innocuous result from Vol 2 of TACP that's both well known, elementary to prove, and provides a key tool for much of the work on streaming that came much later.
Can we pick a random element from a set of items whose cardinality we do not know ?
In this setting, we can examine an item and decide whether to retain it or discard it, but at the end of the process, we must hold in our hands an element chosen uniformly at random.

This is the kind of problem you might encounter on a Google/Microsoft interview: it's simple to state, and so is the answer, but it requires at least some "lateral" thinking. Without further ado, here's the answer (so stop right here if you want to puzzle it out yourself)

Pick an item and store it. Pick the next one, and replace the first one with it with probability 1/2. Pick the third one, and do a replace with probability 1/3, and so on. At the end, the item you've stored has a probability of 1/n of being any particular element.
This is of course a streaming algorithm. You can imagine the items arriving one by one, and using a local counter and a single word of storage, you can execute the algorithm. First of all, this can be generalized to maintaining a random sample of size k: in general, the technique is called reservoir sampling, and has spawned many generalizations.

There are at least three different ways of arguing the correctness of this procedure - another reason why I find it so elegant.

1. Suppose the sampling procedure is correct for the first k items. Now when item k+1 appears, it becomes the new sample with probability 1/(k+1). Every prior item currently has a probability of 1/k of being picked, and therefore remains the chosen sample item with probability k/(k+1) * 1/k = 1/(k+1). Hence, by induction, the procedure works

Like most induction proofs, this argument leaves something wanting. Here's a less rigorous, but in my mind more appealing argument:

2. Suppose we pick the kth item with probability p_k. Let an adversary suddenly halt the stream at this step. If p_k is not equal to 1/k, then the last item is not picked with a uniform probability over the current stream. So in some sense, the algorithm has no choice, since it doesn't know when the stream ends.

3. A more muscular proof of the "let's roll up our sleeves and crank things out" variety, goes as follows: Let p_i be the probability that the i^th item is picked for the sample, replacing the current sample element. The probability that i survives is



Since these numbers must be equal for all i, we can equate any two of these, in particular the terms for i and i+1, which yields the expression



Some simple telescoping later, once we observe that p_1 = 1, gives the desired probabilities.

CS researchers and the practitioners of computer science tend to view each other warily from a distance, each side convinced that the other has absolutely no idea of what "real CS" is about. Don Knuth straddled both worlds effortlessly, gaining respect from 15 year old hackers and 50 year old researchers alike. And that's the most tremendous feat of all.

Update: More tributes, by Bill Gasarch and the Quantum Pontiff.

Monday, January 07, 2008

Levi Conant Prize for Hoory, Linial and Wigderson

The joint meetings of the AMS and MAA are going on right now, and among the prizes being awarded is the Levi L. Conant Prize for the best expository article in the Notices or the Bulletin of the AMS in the last 5 years. One of the two awardees is the group consisting of Shlomo Hoory, Nati Linial and Avi Wigderson, for their article 'Expander graphs and their applications' in the Bulletin of the AMS (here's the related course).

Congratulations ! On a side note, I'm truly impressed by the number of prizes awarded for articles that are expository or targeted at the general public. I don't think that the mere creation of prizes encourages people to write such articles; rather, the existence of such prizes is a post facto demonstration of the intrinsic value attributed to exposition in the mathematics community.

Thursday, January 03, 2008

A call for information and help

In response to Muthu's query about invited speakers at conferences, Iftah Gamzu at Tel Aviv University set up a very nice page containing all the relevant information. He also has the next generation of the 'Accepted papers at theory conferences' page originally maintained by Anupam Gupta, as well as a page with upcoming dates (conference/submission) for most theory conferences.

Iftah sent me a request: he's missing information on speakers/talks for many conference-year combinations, and was hoping for some help from the community. Here's what's missing:
  • Invited talks at
    • STOC 2000.
    • RANDOM-APPROX 2003 and 2005.
    • SoCG 2001 and 2004.
    • IPCO 2007 summer school.
  • Titles of the talks
    • CCC 2001.
    • Gene Myers, Christos Papadimitriou, and Ehud Shapiro at ESA APPROX 2002.
    • Marino Zerial at ESA 2005.
    • Leslie Lamport, and Butler Lampson at PODC 2003.
    • Elias Koutsoupias at SPAA PODC 2005.
    • Robin Thomas at STACS 2004.
    • Daniel Lewin at SPAA 2000.
    • Andrew Chien at SPAA 2002.
    • Adi Shamir at ICALP 2005.
  • Name of the speaker of
    • "Discrete Tomography: Reconstruction under Periodicity Constraints" at ICALP 2002.
    • "Program Debugging and Validation Using Semantic Approximations and Partial Specifications" at ICALP 2002.
  • Can someone confirm that there were no invited talks at
    • FOCS 2004, CCC 2004, and SPAA 2004?
If anyone reading this has the relevant information, they can either post it in the comments, or email him directly

Disqus for The Geomblog