## Sunday, May 11, 2008

### P vs NC II: The main theorem, and a (very high level) proof skeleton

After some initial throat-clearing, we're now in a position to state the main result of this paper. To save myself some typing, I'll refer to the model of computation (PRAM without bit operations) as PRAM-wb from now on.

Theorem.
• Min-cost flow on k nodes cannot be solved in the PRAM-wb model deterministically (or randomized) in $\sqrt{k}/b$ (expected) time using $2^{\sqrt{k}/b}$ parallel processors, even if we assume that the costs and capacities are integers with bit length at most $ak$ for some large enough positive constants a, b.
• A similar result holds for max-flow, with the limits on capacities being replaced by $ak^2$
• All lower bounds hold even if we merely desire an additive approximation.
Corollary.
A min-cost-flow or max-flow problem of total input bitlength N cannot be solved in the PRAM-wb model deterministically (or with randomization) in time $N^c$ (expected) using $2^{N^C}$ processors, for some constant c.
Discussion.
Before we dive in, it's useful (or at least it was for me), to take apart the statement of the theorem itself. Firstly, we note that the bounds hold even with randomization, which of course means that the kinds of obstructions that Mulmuley constructs are "frequent". The actual proof goes via the standard Yao-minimax principle, and we'll get to it once we've completed the deterministic lower bound.

Another interesting point is that the lower bound holds for additive approximations as well. I haven't read far enough ahead for this to be obvious, but I suspect that it has something to do with the fact that we'll always be dealing with integers, and so intuitively an approximation that can get us "between" integers will collapse to the right answer.

Finally, a note on the bitlengths. One might argue that if the bitlengths were "constant", the problem would be solvable. This is in fact the case, as Mulmuley discusses: it is actually important that the bitlengths are "long enough". If the bitlengths are short (say O(log n)), then we could read off all the bits efficiently using the procedure described in a previous post, at which point we have access to the full power of PRAM. At this point, we can solve max flow via an RNC algorithm for bipartite matching. So to get the strong bound on the number of processors, we need the bitlengths to be long enough.

But we don't want them to be too long. This constraint on the bitlengths feeds directly into the corollary, since we can relate N and k using the upper bound on the bitlength. However, the larger the bitlengths get, the weaker the bound on the running time expressed in terms of N. So it's actually useful that the problem is hard even for inputs with "smaller" bitlengths.

An overview of the proof strategy.
As I mentioned earlier, the technique used to prove a lower bound for bit extraction is a useful template to follow. Let's consider the basic argument.
1. Examining the operations permitted by the model, come up with a bound on the number of distinct paths any bounded computation can take.
2. Come up with a geometric description of the way the space of inputs is carved out by these paths.
3. Show that if we mark inputs as either being in or out of the language (the decision problem say), that the "intrinsic" complexity of this space prevents the geometric description constructed in (2) from being able to carve out the IN points from the OUT points: loosely, that the model does not have enough geometric expressivity to separate good from bad.
[Aside: stated this way, there's a very VC-dimension feel to the argument. For example, let's say your "model" consists of "things you can do with one hyperplane", and your language consists of "two diagonally opposed points on the unit square", then the famous perceptron result is basically that the "model" can't capture the "language"].

Stated this way, it might not seem to surprising that algebraic geometry starts playing a role in the argument. In order to perform Step (2), we need to talk about invariants under computations that can be expressed as a series of algebraic operations (this is one reason why the omission of bit operations makes things more tractable), and algebraic geometry, which deals (at the most basic level) with the geometry of solutions to algebraic equations, is the right toolkit to use.

It would be remiss of me if I didn't point out that at least some of these ideas are not new (once you're looking at them from high enough). Dobkin and Lipton considered linear decision trees, Steele and Yao generalized these lower bounds to algebraic decision trees, and Ben-Or generalized further to algebraic computation trees (see Jeff Erickson's notes on algebraic lower bounds, and also the postscript on Joel Friedman's more recent work). In all these cases, the rough lower bound argument worked by showing that
1. The computations expressible in the model could be captured geometrically.
2. A bound on the computation could be expressed in terms of an intrinsic complexity of the target function. In all the above, the intrinsic complexity was topological: number of connected components, or even a sum of Betti numbers.
3. The target function had a "high" intrinsic complexity (large number of components etc).
So what's the notion of "intrinsic complexity" in our setting ? This goes to step (3) in the high level sketch, and leads to the notion of parametric complexity. The idea is to consider a specific class of candidate inputs that can be described by parameters: for example, specifying that each edge in a graph has a capacity that's a linear function of some parameter $\lambda$. This is a "magic step", in the sense of Gowers: it seems pulled out of a hat because I don't understand why the parametric lens reveals the lower bound to us (of course, it might be clearer to others: if so, do speak up :)).

One can define a notion of parametric complexity (basically the number of breakpoints in the optimal value as the parameters change), and show that it is high for the problems under consideration. This takes care of one part of step (3): showing that the intrinsic complexity measure is high. The next step is to show that if this is true, there exists a way of parametrizing the inputs so that the IN and OUT inputs are distributed badly (intuitively, we'd want a strategy that doesn't allow large chunks of INs and OUTs to congregate in input space) . Finally, the proof completes by showing that the "low degree" regions that PRAM-wb can carve out cannot separate these badly distributed points.

This is all very vague, and indeed the "details", if I dare call them so, are tremendously complex. It's a good idea to keep this very high level line of attack in one's head as we go forward: in subsequent posts I'll dive down into the proofs and start getting very detailed, and it will be easy to lose the forest for the trees (pun unintended) without a roadmap.

p.s Recent work by Joel Friedman on algebraic geometry-baed methods for proving circuit lower bounds is more directly in the program of research initiated by the work on decision trees. Understanding his work is extremely well beyond my ability, and requires a deep understanding of modern algebraic geometry. I'll leave it to others to try and explain his work for the masses, merely noting that this at least shows ONE other approach based on algebraic geometry for attacking lower bound questions.