The first is a robust formal definition of efficiency, or tractability. In fact, the idea of a "reasonable" computing model works very well in cryptography, where ever since the days of RSA, the game has been to prevent attacks by a limited adversary, rather than an allpowerful one (which would be much harder).

Which brings up a second core contribution; the idea of worst-case performance, and an adversary that can do anything. One thing that I've often noticed when working with people in other areas is that they are surprised when I can prove that such-and-such technique is optimal, or approximately optimal, or efficient. The power of worst-case analysis is in the ability to make such sweeping guarantees, and it's so ingrained into our psyche that it can be surprising to have some one else note this explicitly.

But worst-case analysis has its limits, and David alludes to this with his question on why anyone would ever assume that an adversary was not worst-case:

This assumption makes fields that assume some kind of distribution on inputs or on errors seem extremely strange. Electrical engineering does this in the context of coding theory, and it's also common in statistical learning theory. These fields have enormous practical impact and relevance, which I acknowledge and appreciate. I just have a hard time with the shock of "well, why do we believe errors must be distributed like that?"In fields like game theory and cryptography, your adversary is assumed malicious; this is often true, especially code hackers, and is a good modelling assumption (think "selfish agents" and the like). Theorems proved here are "pure"; you start off with a set of assumptions about your world, and the results have guarantees within this world.

But when you start getting into areas like machine learning, data mining, and data modelling in general, your "adversary" is Nature, or some unknown indifferent process that generates data. Time and again we see what we would formally think of as heuristics work very well "in practice" and often the explanation is that the algorithm is able to exploit some property of the data (something we can then prove after the fact). Whether it's expectation-maximization, its cousin k-means, ICP, or even the simplex method, we see the same story over and over again.

Worst-case analysis, applied straight up, does not help us in these domains. Worst-case analysis often provides bounds that are weak, and algorithms that are ineffective on many kinds of data sets. To use formal methods, you need to dig deeper into a problem, and try to understand the data sources themselves, and often you come up with hidden structure that can be exploited formally.

I can answer David's question by going all the way back to Gauss. According to Leonard Mlodinow, it was when Gauss was overseeing a surveying project that he discovered that the errors in measurement followed a bell curve (hence, Gaussian distributions). The central limit theorem tells us that all kinds of i.i.d error will end up looking Gaussian after a while. Many natural processes also generate power law or log normal distributions. The point is that if you're dealing with nature, then more often than not you're dealing with data that is stochastic rather than adversarial, and modelling it as such gives you a better handle on algorithm design.

Thank you for your careful reading and your comments!

David Molnar

ReplyDeleteThanks also for bringing up the point that we can empirically determine distributions that might be appropriate for assumptions. What's interesting is that when we do an empirical investigation to find out what distribution is appropriate for modelling a problem, should we require that the distribution have a probabilistic polynomial time algorithm for sampling? My impression is that this is not a common requirement, or at least, not made explicit.

Put another way, in principle, Gauss could have come up with a distribution that is "hard" to sample from, yet explained the data. He didn't. Why?

One response is to postulate that Nature itself is constrained by efficiency requirements. I first saw this formulated explicitly in the "Optimal Error Correction Against Computationally Bounded Noise" paper by Micali, Peikert, Sudan, and Wilson

http://theory.csail.mit.edu/~cpeikert/pubs/mpsw.pdf

To me this is an attractive assumption, but I wonder how it looks to, say, a physicist.

Posted by

It is certainly plausible that Nature is computationally bounded. Indeed if BQP is contained in NP, then this is even more believable. What's interesting though is that most useful distributions (the so-called exponential family) are characterized by exponential tails, and therefore efficient sampling is always possible.

Suresh

ReplyDeleteOn the other hand, certain natural processes generate polynomial tails, which make efficient estimation a lot harder. What do we do then ?

Posted by