The Geomblog

Sunday, May 11, 2008

P vs NC II: The main theorem, and a (very high level) proof skeleton

After some initial throat-clearing, we're now in a position to state the main result of this paper. To save myself some typing, I'll refer to the model of computation (PRAM without bit operations) as PRAM-wb from now on.

Theorem.

Min-cost flow on k nodes cannot be solved in the PRAM-wb model deterministically (or randomized) in $\sqrt{k}/b$ (expected) time using $2^{\sqrt{k}/b}$ parallel processors, even if we assume that the costs and capacities are integers with bit length at most $ak$ for some large enough positive constants a, b.
A similar result holds for max-flow, with the limits on capacities being replaced by $ak^2$
All lower bounds hold even if we merely desire an additive approximation.

Corollary.

A min-cost-flow or max-flow problem of total input bitlength N cannot be solved in the PRAM-wb model deterministically (or with randomization) in time $N^c$ (expected) using $2^{N^C}$ processors, for some constant c.

Discussion.
Before we dive in, it's useful (or at least it was for me), to take apart the statement of the theorem itself. Firstly, we note that the bounds hold even with randomization, which of course means that the kinds of obstructions that Mulmuley constructs are "frequent". The actual proof goes via the standard Yao-minimax principle, and we'll get to it once we've completed the deterministic lower bound.

Another interesting point is that the lower bound holds for additive approximations as well. I haven't read far enough ahead for this to be obvious, but I suspect that it has something to do with the fact that we'll always be dealing with integers, and so intuitively an approximation that can get us "between" integers will collapse to the right answer.

Finally, a note on the bitlengths. One might argue that if the bitlengths were "constant", the problem would be solvable. This is in fact the case, as Mulmuley discusses: it is actually important that the bitlengths are "long enough". If the bitlengths are short (say O(log n)), then we could read off all the bits efficiently using the procedure described in a previous post, at which point we have access to the full power of PRAM. At this point, we can solve max flow via an RNC algorithm for bipartite matching. So to get the strong bound on the number of processors, we need the bitlengths to be long enough.

But we don't want them to be too long. This constraint on the bitlengths feeds directly into the corollary, since we can relate N and k using the upper bound on the bitlength. However, the larger the bitlengths get, the weaker the bound on the running time expressed in terms of N. So it's actually useful that the problem is hard even for inputs with "smaller" bitlengths.

An overview of the proof strategy.

As I mentioned earlier, the technique used to prove a lower bound for bit extraction is a useful template to follow. Let's consider the basic argument.

Examining the operations permitted by the model, come up with a bound on the number of distinct paths any bounded computation can take.
Come up with a geometric description of the way the space of inputs is carved out by these paths.
Show that if we mark inputs as either being in or out of the language (the decision problem say), that the "intrinsic" complexity of this space prevents the geometric description constructed in (2) from being able to carve out the IN points from the OUT points: loosely, that the model does not have enough geometric expressivity to separate good from bad.

[Aside: stated this way, there's a very VC-dimension feel to the argument. For example, let's say your "model" consists of "things you can do with one hyperplane", and your language consists of "two diagonally opposed points on the unit square", then the famous perceptron result is basically that the "model" can't capture the "language"].

Stated this way, it might not seem to surprising that algebraic geometry starts playing a role in the argument. In order to perform Step (2), we need to talk about invariants under computations that can be expressed as a series of algebraic operations (this is one reason why the omission of bit operations makes things more tractable), and algebraic geometry, which deals (at the most basic level) with the geometry of solutions to algebraic equations, is the right toolkit to use.

It would be remiss of me if I didn't point out that at least some of these ideas are not new (once you're looking at them from high enough). Dobkin and Lipton considered linear decision trees, Steele and Yao generalized these lower bounds to algebraic decision trees, and Ben-Or generalized further to algebraic computation trees (see Jeff Erickson's notes on algebraic lower bounds, and also the postscript on Joel Friedman's more recent work). In all these cases, the rough lower bound argument worked by showing that

The computations expressible in the model could be captured geometrically.
A bound on the computation could be expressed in terms of an intrinsic complexity of the target function. In all the above, the intrinsic complexity was topological: number of connected components, or even a sum of Betti numbers.
The target function had a "high" intrinsic complexity (large number of components etc).

So what's the notion of "intrinsic complexity" in our setting ? This goes to step (3) in the high level sketch, and leads to the notion of parametric complexity. The idea is to consider a specific class of candidate inputs that can be described by parameters: for example, specifying that each edge in a graph has a capacity that's a linear function of some parameter $\lambda$ . This is a "magic step", in the sense of Gowers: it seems pulled out of a hat because I don't understand why the parametric lens reveals the lower bound to us (of course, it might be clearer to others: if so, do speak up :)).

One can define a notion of parametric complexity (basically the number of breakpoints in the optimal value as the parameters change), and show that it is high for the problems under consideration. This takes care of one part of step (3): showing that the intrinsic complexity measure is high. The next step is to show that if this is true, there exists a way of parametrizing the inputs so that the IN and OUT inputs are distributed badly (intuitively, we'd want a strategy that doesn't allow large chunks of INs and OUTs to congregate in input space) . Finally, the proof completes by showing that the "low degree" regions that PRAM-wb can carve out cannot separate these badly distributed points.

This is all very vague, and indeed the "details", if I dare call them so, are tremendously complex. It's a good idea to keep this very high level line of attack in one's head as we go forward: in subsequent posts I'll dive down into the proofs and start getting very detailed, and it will be easy to lose the forest for the trees (pun unintended) without a roadmap.

p.s Recent work by Joel Friedman on algebraic geometry-baed methods for proving circuit lower bounds is more directly in the program of research initiated by the work on decision trees. Understanding his work is extremely well beyond my ability, and requires a deep understanding of modern algebraic geometry. I'll leave it to others to try and explain his work for the masses, merely noting that this at least shows ONE other approach based on algebraic geometry for attacking lower bound questions.

Wednesday, May 07, 2008

A note on bit operations.

In the last post,we saw a lower bound for bit probing (specifically, extracting the lower-most bit of a number) in the PRAM-without-bitops model. Now clearly this can be computed efficiently in P, so why isn't this enough of a proof for the separation of the two classes ?

The answer is subtle. In a PRAM model, excluding bit operations is a nontrivial restriction, because, as we saw, there's no parallel-efficient way of extracting bits from a word without bit operations. However, in P, this is not true. In time linear in the bitlength, we can extract every single bit, and then carry on. Thus, P-with-no-bitops is as powerful as P itself !

Mulmuley argues that the "correct" notion of excluding bit operations in P is that of an algorithm that might run in time that depends on the bit length, but without bitops, in the way that algorithms like the ellipsoid method work. But an easier approach is to exclude algorithms whose run time depends on the bit length, yielding the class of strongly polynomial algorithms. Note that this class (which he calls SP) still contains the P-complete problems max-flow and min-cost flow.

Once we do that, extracting bits is no longer an "easy" operation in SP. The "input size" is 1, and so any bit-extracting algorithm is only permitted a constant number of steps when extracting a bit. Therefore, the apparent separation doesn't really hold.

Formally, what is proved in this paper is that SP strictly contains C (the class of PRAM-without-bitops algorithms), via the separation result for max flow.

More math-blogging

Gil Kalai has a blog (at least for the next year) (HT 0xDE)

Monday, May 05, 2008

P vs NC Part I: Preliminaries

We start our exploration of Mulmuley's result on P vs NC with some definitions, and an "easy" example of a lower bound.

(For the definition of P and NC, visit the Complexity Zoo)

1. P-completeness

NC is our proxy for "can be parallelized efficiently" just like P is our proxy for "efficient algorithm". P contains NC, and as with P vs NP, the interesting question is whether the containment is strict: that is, are there problems that are inherently sequential ?

A theorem that separates P and NC will show that there's some problem in P that cannot be parallelized. Obviously, we want to pick a hard problem in P. P-completeness captures the idea of 'hard-core' problems for P, and the equivalent of SAT for P-completeness is the circuit value problem: given a circuit, evaluate it. We'll actually be looking at a different P-complete problem here, but one that's equally familiar: min-cost flow on a network with n nodes.

Remember, this result DOES NOT show that P strictly contains NC. What it shows is that P strictly contains a fairly general (but still restricted) subclass of NC.

2. The PRAM model

A convenient computational model to work with when talking about NC algorithms is the P(arallel) RAM model. PRAMs were all the rage in the 80s when supercomputers were being built, fell out of favor in the 90s when parallel computing lost its sheen, and are possibly making a comeback today in the world of multicore systems, GPU clusters, and MapReduce.

As the name suggests, a PRAM consists of many processors, each able to run a general purpose program with branchings, indirections, and arithmetic operations +, - X. A problem is in NC if it can be implemented to run in polylogarithmic parallel steps on polynomially many machines (where in one parallel step, each machine might perform one step of a computation).

A bit operation on a PRAM is any operation that treats data as bit-strings, such as AND, OR, and other boolean operations on bitstrings, or even 'extract-bit': get a particular bit of a string. All such operations will be excluded in the models considered here.

It will be important to pay attention to input size. As always, the input size can either be described by n, the cardinality of the input (the quantity counting the number of "things"), or by N, the bitsize (the quantity measuring the number of bits needed to write everything down). When we talk about polylogarithmic time and polynomial number of processors, unless specified otherwise, we will be referring to polynomials of n AND N.

In the non-uniform setting, we'll assume that we're presented with a PRAM that for fixed n, N, runs in time t(n, N) with p(n, N) processors. Each input consists either of an integer, or a rational with integer numerator and denominator presented separately (so that we can do division more easily). Division of two rationals is encoded via multiplication, so that p/q is computed as p * 1/q, and 1/q is computed by reversing the numerator and denominator of q.

The cost model is unit-cost, which means that any operation takes constant time, regardless of the relative sizes of the inputs. It is very important to exclude the floor operation when we do this: if not, judicious use of the floor function along with unit cost operations would allow us to solve PSPACE-Complete problems in polynomial time ! Since we are only concerned about lower bounds, assuming unit-cost for operations can only make the lower bound weaker than it actually is, so it's a reasonable thing to do.

It'll actually be useful to consider four variants of the basic PRAM-without-bit operations model.

Arithmetic PRAM: In this model, the running time depends on the cardinality of the input alone, rather than the total bitlength.
Linear PRAM: Here, at least one of the operands in any multiplication operation must be a constant.
Arithmetic Linear PRAM: Combine the two above
PRAM: None of the above restrictions

We will also elide the difference between the various memory access protocols (EREW, CRCW etc) since they only affect the overall running time by a polylogarithmic factor through well-known simulation results.

3. Are these good models ?

The dilemma of lower bounds is the problem of finding models that are rich enough to describe actual algorithms, while being weak enough that we have any hope of proving lower bounds in the first place. Is the PRAM-without-bit-operations a reasonable model of parallel computation ?
Mulmuley argues that it is, by pointing out that this model captures "virtually all" known parallel algorithms for optimization and algebraic problems. Even the weaker models proposed have exemplars: for example, one problem that admits an arithmetic PRAM is solving linear systems over rationals.I am not aware of any algorithm that appears to need bit operations in any nontrivial way, and would welcome pointers to possible candidates.

4. The problem of bit extraction.

Since we've excluded bit operations from consideration, it's natural to wonder how expensive it really is to extract a bit (or in other words, simulate a step from a general PRAM). It turns that that a very simple version of the general proof argument can be used to show a lower bound here, and although it's not terribly relevant, it's useful as a warmup exercise.

Proposition: The lowest order bit of an n-bit operand cannot be extracted in time $\sqrt{n}/a$ using $2^{\sqrt{n}/a}$ processors in the PRAM-without-bitops model, for some large enough constant a.

Why on earth should this be true ? After all, to test for odd/even, all we have to do is divide the number by 2 and take the remainder ! But how do we do that ? the only operations permitted are on integers or rationals, and the operation "5/2" will merely return the rational (5,2). If we had a floor function, we could compute $x - 2*\lfloor x \rfloor$ , but we don't have it.

On the other hand, is this tight ? Clearly, if the bitlength is log-bounded, you can guess it by doubling, and then extract bits starting with the most significant one. This can be done in O(l) time sequentially, if the bitlength is l. Generalizing, you can split an l-length string into pieces of size $\sqrt{l}$ , and work on each piece, starting with the most significant one, for a constant number of parallel time units. On such a piece, you have $2^{\sqrt{l}}$ processors guessing all possible values, and then check which one is correct. Subtracting, you can then proceed to the next piece. This is one way of understanding where the bounds in the proposition come from.

So if we're convinced that the 'form' of the proposition makes sense, let's see how we might go about proving it.

Proof outline:
Assume that there actually is a machine that can return the lowermost bit in the desired bounds $t(n) = \sqrt{n}/a$ , $p(n) = 2^{\sqrt{n}/a}$ . Suppose we simulate the action of this machine on two inputs x, x' upto time t. If the machine, upto time t, behaves the same on both inputs, then we say that they're in the same t-equivalence class.

Can we bound the total number of such classes ? Note that if we can do that, then we might have a hope of using the pigeonhole principle to argue that at least some class might have too many elements in it: since all elements in an equivalence class will yield the same outcome, we might be able to use this to create a contradiction.

Consider the branch operations executed by the machines at the (t+1)^th step for two t-equivalent inputs x, x'. Since the computation is identical for the two inputs upto the previous step, any branching operations consist of evaluating the sign of some polynomial in x (because all the stored memory locations are polynomials in x). Each of these polynomials has degree at most $2^t$ , and therefore has at most $2^t$ real roots. Collecting all these roots together for all machines, we get a subdivision of the real line into $2^t p(n) + 1$ intervals, and (this is the crucial part) in each of these intervals, the answers to sign queries for any of the polynomials does not change.

[Aside: this trick, of constructing an arrangement over a geometric space such that each cell contains a fixed answer to a problem, is fairly common in computational geometry].

What this means is that each t-equivalence class can blow up into only a certain number of (t+1)-equivalence classes (roughly $2^{2t}$ ), and by induction, this means that over a total of t(n) time steps, there can be at most something like $\exp(t(n)^2)$ equivalence classes.

Now equivalence classes partition the space of possible inputs, but they don't really tell us how: any one partition might have inputs from all over the real line, and knowing the number of equivalence classes in and of itself doesn't tell us why estimating the lowest bit might be difficult. In order to make that leap, we have to further fracture these equivalence classes into actual segments on the real line, such that in each segment, all inputs are t(n)-equivalent.

This is not too hard to do, since we know that all polynomials evaluated in the process have degree at most $2^{t(n)}$ . Since there are at most t(n)*p(n) polynomials to consider in any computation for any particular equivalence class, the total number of intervals induced by roots of these polynomials is roughly $t(n)p(n)exp(t(n)^2)2^{t(n)} + 1$ , which is still $exp(t(n)^2) = exp(n/a)$ . Two inputs from any one of these intervals are treated in exactly the same way by the computation.

Now comes a neat application of the pigeonhole principle. Choose a large enough so that $exp(n/a) < 2^n/2$ . If we label each of the $2^n$ possible integers with their correct output (the lowermost bit), we get an alternating sequence of zeros and ones, and clearly a correct algorithm must create at least $2^n/2$ intervals to ensure that no interval contains numbers labelled both one and zero. But if we choose a large enough, we can ensure by the pigeonhole principle that there will be at least one interval that is "too long" i.e must contain inputs labelled both one and zero. Since by construction, the algorithm must return the same output on ALL inputs in this interval, it will be wrong on at least one input, which yields a contradiction.

We will use many elements of this basic proof structure as we go along.

Tuesday, April 29, 2008

Should we care about geometric complexity theory ?

There's been an interesting debate over on the complexity blog with Lance's first diss of Ketan Mulmuley's program, and Ketan's response. I think Lance essentially trolled himself with the line,

Also while knowing the resolution of P v. NP is very important, knowing the details of the proof, especially if it requires deep and complex mathematics, is not nearly as important. I was excited to see Fermat's Last Theorem resolved in my lifetime but I have no desire to actually understand the proof.

Although I have a hard time imagining how someone could say this and mean it (after all, P vs NP is not Fermat's theorem in either impact, breadth or importance), and even more so how someone steeped in complexity theory could say it, I think the larger point he makes is essentially reasonable. Namely, (and I paraphrase),

if you want to lead people to the top of the mountain, you have to show them the way, or give them some hope that they can get there.

What would make the whole GCT program more plausible (and I think that people would like to believe in it) are some intermediate results. Maybe some weaker lower bounds using more primitive techniques that suggest that more powerful statements are lurking underneath. I don't think more surveys will help, frankly. I've read the Regan survey, and even the initial parts of GCTflip, and it's really heavy going. In fact, I was talking with a professor who spent an entire semester, together with students trained in complexity theory AND representation theory, trying to work through some of the early GCT papers. It was definitely more than a month's worth of effort.

Of course, there is one example of "a weaker lower bound using more primitive techniques": KM's result separating P and (NC without bit-wise operations). This paper appeared in STOC 1994, and was subsequently published in SICOMP in 1999. I've always wanted to read this paper, for one thing because of its intriguing way of mapping computations to algebraic sets (if that isn't cool geometry, I don't know what is).

So since summer is here, and semester is out, I've decided to forcibly inject some content into this blog. I'm going to read the P vs NC paper and blog about it as I go along. I don't know how many installments it will take, and there might be digressions along the way, but at the end I hope to understand this paper, and with it maybe even get a glimmer of the general direction KM is pushing in the GCT work.

For anyone interested in following along, I'll be mainly using the SICOMP paper as a reference, and will tag all posts with the tag 'p-vs-nc'. As far as possible I'll try to do one post a week (more if time permits). Let's see how this goes.

Saturday, April 26, 2008

LaTeX: hyperref/algorithm interaction

According to the hyperref documentation, the algorithm package should be loaded after hyperref:

algorithm:
\usepackage{hyperref}
\usepackage[chapter]{algorithm}% eg.

According to the hyperref documentation, all packages should be loaded before hyperref:

In most cases, therefore, you should load your package before you load hyperref, and hyperref will patch things up so that they work, so you can utilise your (patched) package after loading both:

If you do the first, you get this annoying set of warnings:

! pdfTeX warning (ext4): destination with the same identifier (name{page.1}) has been already used, duplicate ignored

If you do the second, you get an error:

undefined control sequence \theHalgorithm

Clearly, the first is preferable to the second, but even the first is terribly annoying. Does anyone have any ideas on how this can be fixed ?

Update: It works ! A comment from Iain Murray points out that in the hyperref README, one is told to include the float package BEFORE hyperref , and only then include algorithm (after hyperref).

Red-black trees: An Update

In a previous post, I mentioned a talk by Bob Sedgewick on left-leaning red-black trees, a variant that's easier to analyze, but with the same asymptotic properties. He gave another talk on LLRB trees at the Analysis of Algorithms workshop, and the new slides have (I quote from his email):

somewhat simpler presentation and code than the Dagstuhl version, the discovery that 2-3 trees are also covered, and some fun facts about the analysis that should answer many of the questions that people were asking.

There were many comments on my original post, and possibly these new slides answer some of the questions.

Wednesday, April 23, 2008

Public Intellectuals

Foreign Policy has brought out a "list of top 100 public intellectuals", a list that contains one computer scientist (Neil Gershenfeld of MIT). Before I continue, let me insert two caveats:

Yes, I do think lists like this are beauty contests
No, I don't think computer scientists should aspire to being on such lists

Having said that, here's what I'm wondering. We are in the middle of possibly the greatest era of technological disruption of all time, a disruption brought about by a mess of tubes called the Internet. We are seeing the fruits of computer science permeate daily life to a degree that relativity hasn't come even close to, whether it's RSA, recommendation systems, peer to peer file sharing, or what have you. The disruptions created by the Web have changed our society in radical ways: consider Facebook, Myspace and the whole array of social networking tools we use today.

And yet, we lack the voices that speak to this time and place. We lack cogent articulation of the tools that brought us here, of the wonders of Turing, Von Neumann, and others, of the fundamentally radical idea of the algorithm as an idiom. Or we lack recognition of those who do articulate such a vision of today's computationally-driven world.

We don't need to be on lists of public intellectuals, but we need to frame the role of computer science and computation in society today, before we get relegated to the role of glorified telephone repairmen.

Sunday, April 20, 2008

Abstracts booklets

The ICDE proceedings is all digital. You get one CD with all the papers from the conference, and another with all the papers from all the workshops. This comes with a very nifty overview PDF that has clickable indices by name and session, as well as a search feature, and links to the PDFs for each paper. Along with this comes an abstracts booklet organized in order of schedule, with one section for titles, and another one for abstracts (the first is handy for planning your schedule).

I've been wishing that we could do something like this for a while now. The good news is that next year, SODA will do the same thing. Personally, I found the abstracts booklet more useful than one might think, and only rarely felt that I needed to look at the paper in detail.

A new (to CS) model for publishing

One of the things I like about the database community is their willingess to play with new ideas in the space of conference publishing. SIGMOD and VLDB have been experimenting with the idea of semi-persistent reviews, where reviews from SIGMOD get passed on to VLDB for papers deemed on the border; SIGMOD went to double-blind mode, over some opposition, and there's been some interesting back-and-forth since on the effectiveness of this (read this, and then this). There's also a weak rebuttal mechanism (where authors can in a limited way respond to reviewer comments in the submission process).

An even more radical idea, which from the sound of it is nearing approval, is described in detail by Panos Ipeirotis. The main points are these:

A new journal called the Journal for Database Management Research will be created.
It will have a fast review process and turnaround time (akin to the biological journals - 3 months or so)
Submission deadlines will be rolling: i.e you submit a paper when it's ready.
SIGMOD, VLDB and other database conferences will convert to a by-invitation model, where the conferences choose a sample of the published works in the journal (over the last 12 months I imagine) to be "presented" at the conference.
To discourage frivolous submissions, papers rejected from the journal will have to undergo a year-long cooling off before they can be resubmitted.

It's a radical approach, and approximates to a degree the prevailing practice in journal-based publication environments. It does raise some questions (some raised by Panos in the original post):

The year-long cooling off seems excessive punishment for what will still by necessity be a less than perfect review process
How will this new journal interact with other database journals ?
Can one journal hope to keep up with the volume of papers being produced ? Just SIGMOD, VLDB and ICDE take in over 2000 submissions between the three of them. That's close to 6 submissions EACH DAY.
What happens when you get into areas that overlap with DB ? For example, what about KDD, and other data mining conferences ?
What about all the years and years spent arguing with tenure committees about the primacy of conferences in computer science ? "Oops ! we changed our mind" ?
What kind of prestige will now be attached to giving presentations at the "new" conferences ? More importantly, since it was JUST ESTABLISHED that double blind submission helps remove bias at conferences like SIGMOD, isn't this a step backward in terms of which papers are chosen for presentations at conferences ? I can't imagine the process of getting invited for a talk at such a conference getting easier with this process. Are we heading towards (again) the bio model of big plenary talks by bigwigs, and lots of posters, or the math model where anyone who wants to give a talk can ?

Separating the idea of publication and dissemination is dear to my heart (I have always felt that conferences in CS fail by needing to serve both these masters at the same time), and so I'm bullish on proposals like this. But I do see problems in the details, and am curious to see how things pan out over time.

Tuesday, April 15, 2008

Deadlines (tax, conference, ...)

For the first time, I had to rush on the last day to get taxes out in time. It was a rather intense few hours, and I couldn't help but think that most of this would have been impossible in the pre-electronic era: I was able to download missing documents, get forms online, and even download tax software, all in a matter of a few clicks.

I'm also somewhat ashamed to admit that I rather enjoyed the adrenaline rush of getting to the post office with 10 minutes to spare and sending everything off. It reminded of conference submission deadlines (and I imagine that in the days before electronic submission, people literally had to rush to the post office to send copies of a submission in).

But then I got to wondering. In this hyper-competitive era, with submissions increasing, and acceptance rates, is it slowly becoming less likely that papers submitted at the last minute can compete with more polished submissions ?

Now I like data as much as anyone else, and was wondering what kinds of statistics we could glean from information already available. For example, every year PC chairs trot out statistics about acceptance rate as a function of when papers were submitted. Because of the almost hyper-exponential rate of submissions as the deadline approaches, these numbers are necesarily unreliable, but to the extent that one believes in such statistics, there's always a kind of maximum point somewhat away from the actual deadline.

I don't know if this tells me what I need to know though: does a drift of the maximum point away from the deadline prove my point ? Lacking any actual information about which papers are more "polished" i.e maybe re-submissions, or papers prepared earlier, one has to work off the assumption that papers that are ready earlier are submitted earlier, and I'm not willing to buy that argument without more data.

So maybe I should rely on anecdotal evidence. How about it, readers ? In your personal experience (I emphasize this point: I want to distinguish your opinions from what you've encountered in your own work), do you think that it's become less likely that a last-minute paper gets into a conference, especially the top ones ? Feel free to generalize this question to other deadline-based submission fora (even proposals, if you wish).

p.s A blog post without links ! Scandalous...

Saturday, April 12, 2008

A few links

I've been at ICDE in Cancun for the last few days because of this, and it's hard to blog too much from "the best beach in Mexico". There are some interesting trends emerging from this conference, many of which might be interesting to algorithms folks, and I'll try to say more about this later on.

Two links that came across Google Reader and might be of interest:

"Shut up and calculate", by Max Tegmark (via Mathematics under the Microscope). Tegmark argues for an extreme Platonic position: not only is the universe governed by mathematics, but that it is mathematics ! There's more to this argument in the (short) paper: buried in there is an interesting robust variant of falsificationism:
For a theory to be falsifiable, we need not be able to observe and test all its predictions, merely at least one of them
Via BB, New Scientist's list of the ten weirdest computing models (for actual computing, that is). They start off tamely with optical and quantum computing, and then move on into slightly more esoteric models like DNA computing, culminating with ooze computing and water wave computing (hey, I'm on a beach here, computing my legs off!). One of the more interesting proposals was something called 'reversible computing':
Every computing operation involves feeding inputs into logic gates, which produce output signals. Instead of discarding the energy of those signals, Frank's gates run in reverse after every operation. That returns the energy of the output signal to the start of the circuit where it is used to carry a new input signal
Of course, what's not clear to me is why this isn't a poor cousin of quantum computing, where as far as I understand all computations prior to a measurement are reversible.

But the idea of reversible computing gets back to very deep ideas about entropy and the second law of thermodynamics. Think of a box partitioned into two parts, each containing air at the same (global) temperature. In the partition is a small passage, guarded by a tiny demon, who only lets molecules go from left to right if they are hotter than average, and from right to left if they are colder than average. Over time, the right side of the container gets hotter and hotter, and the left side gets colder and colder. This demon is Maxwell's demon, posited to try and find a contradiction in the Second Law of Thermodynamics.

One of the more famous refutations of this apparent paradox was by Leo Szilard, who argued that in order for the demon to do its job, it must somehow gather information about the average temperature, and this gathering of information costs energy. Rolf Landauer famously showed in 1960 that reversible computations need not increase thermodynamic entropy (and also information entropy), and these facts were expanded by Charles Bennett to argue that eventually, a demon that does the desired job must run out of space and start deleting information, an irreversible (and therefore entropy increasing) operation. It's a streaming algorithm !!

There's a book that collects all this work together: Maxwell's Demon 2: Entropy, Classical and Quantum Information, Computing. It's been on my list of 'books to read' for a long time now.

Monday, March 31, 2008

CiteseerX

Rather quietly, Citeseer appears to have undergone an upgrade (at least an alpha-level one). The new CiteseerX allows you to create collections, associate tags with papers, and has a generally snazzier interface. Whether it solves the problems of the old citeseer (constant crashing, almost laughably bad bibtex entries) remains to be seen.

Saturday, March 22, 2008

On interview puzzles

Michael Mitzenmacher got hammered over at the complexity blog for trying to argue against the 'interview puzzle' style of interviewing so in vogue at places like Google, Microsoft and others.

Here's PhDComics' take on it:

I should add that I've been collecting questions that job-seekers are asked on such interviews.

Thursday, March 20, 2008

The Joys of NAE-SAT

(ed: the point of this post will be completely lost on you if you don't often prove NP-Completeness results)

When all else fails, the "standard" way of proving a problem NP-hard is to go back to the basics, i.e 3SAT. One particularly quirky variant that I quite enjoy is NAE-SAT:

As usual, you're given an instance of satisfiability with clauses and variables, and you want to check if there's a satisfying instance. In NAE-SAT, the extra catch is that in no clause are you allowed to have all literals set to TRUE. Since no clause can have all literals set to FALSE, you get the name 'Not-All-Equal-SAT', or NAE-SAT.

Like SAT, NAE-SAT is NP-Complete, and remains so if you restrict clauses to containing 3 literals (i.e NAE-3SAT). Other interesting properties:

NAE-3SAT is still NP-complete in its monotone variant (i.e if all variables appear unnegated). This is in contrast to SAT, which is trivial in that case. This property is particularly handy if you don't want to deal with how to ensure consistency between a variable and its negation when making a gadget.
If X is a satisfying assignment for NAE-SAT, then so is X' (the complement of X). This is again because of the NAE-property. A clause that's satisfied with one literal becomes one that's satisfied with two literals, and so on. Since no clause has all three literals set to TRUE, no clause becomes unsatisfied. This is useful because when you assume that the problem has a satisfying instance, you have two instances you can use (we used this trick in a paper a while back).
Unusually, and unlike other SAT variants, Planar-NAE-SAT is in P. This is a surprising result due to Bernard Moret, and prevents us from using NAE-SAT for many geometric problems where planarity is a useful tool (but if you can handle crossings, you're ok).

Anyone for a paean to ONE-in-THREE-SAT ?

Tuesday, March 18, 2008

David Gale, 1921-2008

Via Michael Trick comes the news that David Gale has died. To us CS folks, he's probably best known for the Gale-Shapely stable marriage algorithm (that has controlled the lives of thousands of medical interns since - talk about impact !), and the Gale Transform in projective geometry.

Apart from all of that, he was also a tremendously influential economist and expert in optimization. His obituary has much more.

Friday, February 22, 2008

Notes from Dagstuhl I: Red-black trees

I've just returned (well almost: I'm in JFK waiting out the storm) from a Dagstuhl workshop on data structures. For those not in the know, this is one of the common retreat spots for computer scientists to get together and chat about problems in a very relaxed atmosphere (close to the true spirit of a workshop).

In that spirit, people present things that are more "uncooked", half-baked, or incomplete, and so I won't talk about too many of the presentations, except for the ones which are available online.

In this post, I'll discuss a new take on an old warhorse: the red-black tree.

A bit of background (for those unfamiliar with the story thus far):
One of the most fundamental data structures in computer science is the binary search tree. It's built over a set of items with an ordering defined on them, and has the property that all nodes in the left subtree of the root are smaller than all nodes in the right subtree (and so on recursively). BSTs are handy for building structures that allow for quick membership queries, or "less-than"-type queries.

One can build a BST with depth log n for a set of n ordered items. This means that operations on this structure will take log n time in general. However, if the items can change on the fly, then it's more difficult to maintain such a structure while making sure that updates themselves are cheap (O(log n)).

Among many solutions proposed to handle dynamic maintainence of BSTs is the red-black tree, proposed in its current form by Leo Guibas and Bob Sedgewick back in 1978. The red-black tree is the definitive dynamic BST data structure, at least in practice: it has worst-case log n bounds for all operations, and shows up in the implementations of basic structures in the STL and in many language libraries. By virtue of its being in CLRS, it also has stupefied the minds of undergraduates for many years now.

The new story:
Conceptually, the re-balancing operations that are used to maintain a red-black tree are not terribly hard. However, there are numerous cases to consider, especially when doing deletions, and this tends to make the actual code used to write these trees fairly complex, and thus potentially error prone. In the first talk at Dagstuhl, Bob Sedgewick talked about a simpler variant of red-black trees, called left-leaning red-black trees, whose main virtue is that they are simpler to implement (many fewer lines of code) while maintaining the nice properties of red-black trees.

Bob's slides have more details: although this new structure hasn't been published anywhere, it will appear in the next edition of his book. The key idea is to first understand how a red-black tree simulates a 2-3-4 tree (a more complex search tree in which nodes can have two, three or four children (and therefore, one, two or three keys). It's possible to construct a direct mapping between a node in a 2-3-4 tree and a subtree in a red-black tree.

Once this is understood, the LLRB tree comes about by restricting the subtrees thus formed to be "left-leaning". Specifically, right-leaning red edges are not allowed, and to prevent the tree from being too skewed, more than three consecutive left-leaning red edges are disallowed as well. Doing this allows us to simplify various cases in the insertion and deletion steps, while not increasing the tree depth by too much (this last statement was argued by analogy with 2-3-4 trees).

It's a simple idea, and simplifies code by a substantial amount without sacrificing asymptotics.

Sunday, February 17, 2008

The most apologetic conference rejection I've received:

From the PC Chair:

Please know that I understand how much work it is to do research
and write a paper. Rejection is never fun, happens to all of us, and there is likely always an element of misunderstanding to it.

Because CONFERENCE accepts such a small percentage of papers, however, rejection from CONFERENCE may not at all mean your paper will be rejected at other high quality conferences. To that end, I encourage you to take into account the reviewers' recommendations. They have spent many hours with your paper and their remarks (and even their misunderstandings) may help you to clarify your paper or perhaps to do some more work.

I can just imagine the PC sitting around a table, tears streaming down their eyes, as they pen this dejected missive to me.

Wednesday, February 13, 2008

Metric spaces, VC-dimension, and metric entropy.

For a problem that I've been working on, it turns out that if a related range space has bounded VC-dimension, the problem can be solved exactly (but with running time exponential in the dimension). The range space is constructed from two parameters: a metric space (X, d), and a radius e, and consists of the domain X, and all balls of radius at most e in X.

So a natural question that I've been unable to answer is:

What properties of a metric space ensure that the induced range space has bounded VC dimension ?

Most of what we do know comes from the PAC-learning community. For instance, the doubling dimension of a metric space is the smallest number d such that any ball of radius e can be covered by at most 2^d balls of radius e/2. In recent years, it has been popular to explore the extent to which

"metric space of bounded doubling dimension == bounded dimensional Euclidean space"

is true. Unfortunately, there are spaces of bounded VC-dimension that do not have bounded doubling dimension.

Another related proxy for the "dimension" of a metric space is its metric entropy: Determine the minimum number of balls of radius e needed to cover all points in a metric space. The log of this number is the metric entropy, and among other things is useful as a measure of the number of points needed to "cover" a space approximately (in a dominating set sense). It's known that the metric entropy of concept classes is closely related to their VC-dimension, (where the underlying metric is symmetric difference between the classes). I am not aware of any general result that relates the two though.

For more on the dizzying array of numbers used to describe metric space dimension, do read Ken Clarkson's magnificent survey.

[On a side note, I don't quite understand why the term "entropy" is used. It seems to me that if one wanted to use entropy, one would compute the entropy of the resulting set of balls, rather than merely the log of their number. ]

Tuesday, February 12, 2008

SoCG results out

Via compgeom-announce: (papers in order of submission time stamp)

Manor Mendel and Assaf Naor. Markov convexity and local rigidity of distorted metrics
Noga Alon, Robert Berke, Maike Buchin, Kevin Buchin, Peter Csorba, Saswata Shannigrahi, Bettina Speckmann and Philipp Zumstein. Polychromatic Colorings of Plane Graphs
Jinhee Chun, Matias Korman, Martin Nöllenburg and Takeshi Tokuyama. Consistent Digital Rays
Eitan Yaffe and Dan Halperin. Approximating the Pathway Axis and the Persistence Diagram of a Collection of Balls in 3-Space
Naoki Katoh and Shin-ichi Tanigawa. Fast Enumeration Algorithms for Non-crossing Geometric Graphs
Ken Been, Martin Nöllenburg, Sheung-Hung Poon and Alexander Wolff. Optimizing Active Ranges for Consistent Dynamic Map Labeling
Hans Raj Tiwary and Khaled Elbassioni. On the Complexity of Checking Self-duality of Polytopes and its Relations to Vertex Enumeration and Graph Isomorphism
Victor Chepoi, Feodor Dragan, Bertrand Estellon, Michel Habib and Yann Vaxes. Diameters, centers, and approximating trees of delta-hyperbolic geodesic spaces and graphs
Esther Arkin, Joseph Mitchell and Valentin Polishchuk. Maximum Thick Paths in Static and Dynamic Environments
Julien Demouth, Olivier Devillers, Marc Glisse and Xavier Goaoc. Helly-type theorems for approximate covering
Sarit Buzaglo, Ron Holzman and Rom Pinchasi. On $k$-intersecting curves and related problems
Mohammad Ali Abam, Mark de Berg and Joachim Gudmundsson. A Simple and Efficient Kinetic Spanner
Frederic Chazal and Steve Oudot. Towards Persistence-Based Reconstruction in Euclidean Spaces
Ken Clarkson. Tighter Bounds for Random Projections of Manifolds
Krzysztof Onak and Anastasios Sidiropoulos. Circular Partitions with Applications to Visualization and Embeddings
Bernard Chazelle and Wolfgang Mulzer. Markov Incremental Constructions
Kenneth L. Clarkson and C. Seshadhri. Self-Improving Algorithms for Delaunay Triangulations
Frederic Cazals, Aditya Parameswaran and Sylvain Pion. Robust construction of the three-dimensional flow complex
Herbert Edelsbrunner, John Harer and Amit Patel. Reeb Spaces of Piecewise Linear Mappings
Evangelia Pyrga and Saurabh Ray. New Existence Proofs for $\epsilon$-Nets
Lars Arge, Gerth Stølting Brodal and S. Srinivasa Rao. External memory planar point location with logarithmic updates
Eric Berberich, Michael Kerber and Michael Sagraloff. Exact Geometric-Topological Analysis of Algebraic Surfaces
Misha Belkin, Jian Sun and Yusu Wang. Discrete Laplace Operator on Meshed Surfaces
Olivier Devillers, Marc Glisse and Sylvain Lazard. Predicates for 3D visibility
Luca Castelli Aleardi, Eric Fusy and Thomas Lewiner. Schnyder woods for higher genus triangulated surfaces
Erin Chambers, Jeff Erickson and Pratik Worah. Testing Contractibility in Planar Rips Complexes
Rado Fulek, Andreas Holmsen and János Pach. Intersecting convex sets by rays
Noga Alon, Dan Halperin, Oren Nechushtan and Micha Sharir. The Complexity of the Outer Face in Arrangements of Random Segments
Maarten Löffler and Jack Snoeyink. Delaunay triangulations of imprecise points in linear time after preprocessing
Erin Chambers, Éric Colin de Verdière, Jeff Erickson, Sylvain Lazard, Francis Lazarus and Shripad Thite. Walking Your Dog in the Woods in Polynomial Time
Jean-Daniel Boissonnat, Camille Wormser and Mariette Yvinec. Locally Uniform Anisotropic Meshing
Adrian Dumitrescu, Micha Sharir and Csaba Toth. Extremal problems on triangle areas in two and three dimensions
Timothy M. Chan. A (Slightly) Faster Algorithm for Klee's Measure Problem
Timothy M. Chan. Dynamic Coresets
Timothy M. Chan. On Levels in Arrangements of Curves, III: Further Improvements
Minkyoung Cho and David Mount. Embedding and Similarity Search for Point Sets under Translation
Jacob Fox and Janos Pach. Coloring K_k-free intersection graphs of geometric objects in the plane
Timothy G. Abbott, Zachary Abel, David Charlton, Erik D. Demaine, Martin L. Demaine and Scott D. Kominers. Hinged Dissections Exist
Vida Dujmovic, Ken-ichi Kawarabayashi, Bojan Mohar and David R. Wood. Improved upper bounds on the crossing number
Pankaj Agarwal, Lars Arge, Thomas Mølhave and Bardia Sadri. I/O Efficient Algorithms for Computing Contour Lines on a Terrain
Pankaj Agarwal, Bardia Sadri and Hai Yu. Untangling triangulations through local explorations
Ileana Streinu and Louis Theran. Combinatorial Genericity and Minimal Rigidity