## Thursday, September 01, 2005

### No Free Lunch Theorems

Neural networks and genetic algorithms are general search strategies that are often used as heuristics to solve otherwise intractable problems. They provide the luxury of not having to think about the internals of a specific problem, and in certain cases can perform quite well as practical heuristics.

I dislike them greatly though (even though in my misguided youth I used to play with GAs) because they are extremely general hammers that typically don't have provably performance or running time guarantees, and don't reveal any interesting insights into the structure of a problem.

Such search strategies are also victims of their own generality, and the dagger that pierces them is called a "No Free Lunch" theorem. No Free Lunch theorems are generally of the form
...all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. In particular, if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A.
The crucial point here is the "averaged over all cost functions". The NFL theorems are not saying (and this was something I used to be confused about) that search procedures are doomed; rather, they say that blind search procedures (such as those performed by genetic algorithms) are doomed. The original NFL theorem is attributed to Wolpert and Macready; many variants have been proved as well.

Joseph Culberson has a nice perspective on such theorems from an algorithmic point of view, and attempts to frame them in the context of complexity theory.

Tags: ,

1. We work like a horse.
We eat like a pig.
We like to play chicken.
You can get someone's goat.
We can be as slippery as a snake.
We get dog tired.
We can be as quiet as a mouse.
We can be as quick as a cat.
Some of us are as strong as an ox.
People try to buffalo others.
Some are as ugly as a toad.
We can be as gentle as a lamb.
Sometimes we are as happy as a lark.
Some of us drink like a fish.
We can be as proud as a peacock.
A few of us are as hairy as a gorilla.
You can get a frog in your throat.
We can be a lone wolf.
But I'm having a whale of a time!

You have a riveting web log
and undoubtedly must have
atypical & quiescent potential
May I suggest that you do
everything in your power to
Designer/Architect as well
as your revering audience.
As soon as we acknowledge
this Supreme Designer/Architect,
Who has erected the beauteous
fabric of the universe, our minds
must necessarily be ravished with
wonder at this infinate goodness,
wisdom and power.

Please remember to never
restrict anyone's opportunities
for ascertaining uninterrupted
existence for their quintessence.

There is a time for everything,
a season for every activity
under heaven. A time to be
born and a time to die. A
time to plant and a time to
harvest. A time to kill and
a time to heal. A time to
tear down and a time to
rebuild. A time to cry and
a time to laugh. A time to
grieve and a time to dance.
A time to scatter stones
and a time to gather stones.
A time to embrace and a
time to turn away. A time to
search and a time to lose.
A time to keep and a time to
throw away. A time to tear
and a time to mend. A time
to be quiet and a time to
speak up. A time to love
and a time to hate. A time
for war and a time for peace.

Best wishes for continued ascendancy,
Dr. Howdy

'Thought & Humor'

P.S. One thing of which I am sure is
that the common culture of my youth
is gone for good. It was hollowed out
by the rise of ethnic "identity politics,"
then splintered beyond hope of repair
by the emergence of the web-based
technologies that so maximized and
facilitated cultural choice as to make
the broad-based offerings of the old
mass media look bland and unchallenging
by comparison."

{Please note that this letter about your
esteemed site promotes no merchandise -
but is simply a missive of good will to you.}

////////////////////////

Posted by howdy

2. Isn't "blind search" a bit of a misnomer (**)?

Usually the combination operators and fitness function in GAs are highly fine tuned to the specific problem being solved. While it is true that generally such fine tuning never explicitly involves the first derivative in some (many?) cases it implicitly encodes such information.

For example, for a specific (and rather successful) application that I have in mind, one of the combination operators was an exhaustive local search of the neighbourhood of the current solution, with the local maximum in this neighbourhood being the sole survivor (first derivative anyone?)

One of John Koza's students told me once that in their Stanford group they used to spend a month or two fine-tunning the combination operators and fitness function before unleashing the search. The harderst part was doing so without biasing the shape of the solution. In other words, one wants combination operators that preserve whatever is good about the current solution, rather than combination operators that will make the current solution look the way you believe good solutions ought to look like. [Koza's group has the distinction of having filed and obtained a patent for a device discovered by a GA.]

(**) Admittedly the "blind search" moniker was advanced by some over-enthusiastic GA researchers. In principle this would be a "nice" property of GAs, because if true it obviates the need to program a refined search algorithm.

Posted by Alex Lopez-Ortiz

3. I guess what irks me is the claim to 'generality', and the use of these methods as hammers. The way I see it, this is s catch-22. Either you believe that GAs are general, and require no problem specific tuning, in which case the NFL theorems apply. If you don't, and believe that specific fine tuning of a GA is required to obtain local/global maxima, then why on earth wouldn't you spend the same time trying to understand the problem formally ?

TSPs (and I should have linked the paper) are a good example of this. A GA-like algorithm is one of the best practical heuristics for performing TSPs, but this has come after a wealth of understanding of problem, and in fact even to establish the quality of the GA-based heuristic, we need a better formal understanding of TSP lower bounds.

Posted by Suresh

4. The point is that in some instances the formal study of the problem is more difficult than having the computer automatically search for a solution. A good example is the evaluation function of Deep Thought/Deep Blue. AI chess researchers had spent an enormous amount of time trying to understand the ins and outs of evaluation functions. In contrast, the Deep Blue team (which had relatively little knowledge of chess or AI) created a very good evaluation function using a GA search technique. (The function was later on further refined by hand under the efforts of Joel Benjamin).

Another example is the design of an amplifier circuit. This is a well studied, yet quite difficult problem. Koza's team uncovered a new design that was in many ways superior to existing designs.

In both cases the parallel GA search time consumed several years of CPU time.

Posted by Alex Lopez-Ortiz

5. "The point is that in some instances the formal study of the problem is more difficult than having the computer automatically search for a solution. "

I think this is the main point. In other words, the GA search strategy is a "conjecture-generating device", used to generate ideas where humans have missed out.

The problem I have with this is that I can think of far more places where human creativity trumps what a computer has been trained to do. This is not a knock on AI per se. My point is merely that in all kinds of "pattern matching" tasks that require one to make creative leaps, it has been hard to get a computer to do it independently.

So to me, as a conjecture generating machine, the GA doesn't have a good cost-benefit ratio, and as an engineering heuristic, it leaves me, the algorithm designer, in the dark about what's really going on, with no ability to explain its behaviour, or even predict what kinds of problems might be amenable to this kind of approach (local search and PLS maybe?)

Posted by Suresh

6. The moral of the NFL story is that we do not live in a world where all functions are equally likely or equally interesting. No Free Lunch states that in the space of all possible functions, for every one where an algorithm works there will be one where it doesn't work. For instance, for every function with a gradient that leads to an optimum, there will be one where the gradient is deceptive, and leads away from the optimum. Thus a gradient descent algorithm will perform, on average, no better than a gradient-opposing one.

The argument relies on the presumption that we are talking about all possible functions. In reality, though, most possible functions would be considered "random" (a can of worms, yes, but appropriate here). I have a friend who likes to say that if you took the number of functions that have been studied by humans and divided it by the number of possible functions, the result would be, essentially, zero.

It seems to me that there is something different about the subset that we do work with. One clue is that most of those functions are encoded as formulas, rather than look-up tables. In other words, they are orderly, compressible, simple. A function that can only be described by a lookup table is a random one, by several definitions. Even when we are working with raw data, we are almost aways searching for the compressible form, the formula that summarizes the underlying function.

So ... the No Free Lunch theorems apply in a kind of dreamworld, the Land of Infinite Possibilities, and not in the world of real working mathematics, where systems cohere in an orderly way. In the years since their publication, it has seemed to me that the great controversy blew up, blew around a little bit, and then blew away. NFL is not a real challenge to anybody.

Posted by Passerby

7. I don't view the NFL theorems as negative results; more as warnings about merchants bearing snake oil. In that respect, they are useful.

Posted by Suresh

8. The problem I have with this is that I can think of far more places where human creativity trumps what a computer has been trained to do. This is not a knock on AI per se.

For any given problem, my money would be first on formal analysis yielding an efficient solution (that's why I'm a theoretician). If, on the other hand, we have good evidence that finding a solution analitically is very difficult and yet a good feasible solution is needed regardless, then I'd place my money on a computer search.

Depending on the problem, the search heuristic would be different. GA excels in problems where the search space is not isomorphic to a subset of R^n. The fact that non-standard combination operators are an integral part of the heuristic makes GA better suited for these types of problems.

For other problems, other heuristics are better.

I don't view the NFL theorems as negative results; more as warnings about merchants bearing snake oil. In that respect, they are useful.

Sure, there were (are?) some over-enthusiastic proponents of GA, and the NFL theorems help to keep thinks in perspective.

Posted by Alex Lopez-Ortiz

9. "because they are extremely general hammers that typically don't have provably performance or running time guarantees"

Actually, neural network community has done quite a bit of interesting research into performance guarantees. For instance

- "P. L. Bartlett. For valid generalization, the size of the weights is more important than the size of the network", author gives bounds on misclassification rate using size of the weights.

- Work by the school lead by Watanabe studying how algebraic geometry of the set of best fitting models (which is more than 0 dimensional when the model is non-identifiable) relates to statistical efficiency of estimators, in particular neural network

- Work by the school of Amari studying differential geometry of neural networks, and also into statistical apects of neural network estimators.

- John Moody introduced in 1991 "Generalized Prediction Error" penalty which generalized AIC for non-linear models, in particular, neural networks

Neural networks are often used as non-parametric estimators, ie, "general hammers", but they can also be designed for a particular task. Check out Yann LeCun's website for some impressive applications. In particular http://www.cs.nyu.edu/~yann/research/dave/index.html

Posted by Yaroslav Bulatov

10. hi there! enjoyed your post. free personals