Sunday, January 22, 2006

SODA II: Predictions....

Today's highlight was the invited talk by Rakesh Vohra titled 'Predicting the Unpredictable'. It starts from the following very simple questions:

Given a generator that's producing 0s and 1s, your goal is to predict the next bit. You don't know anything about the generator, and you have to make some prediction. The measure of your performance is the asymptotic fraction of predictions you make that are correct.
An amazing result is that if the number of 1s in the output thus far is hn, you can come up with a randomized scheme that achieves max(hn, 1-hn) -eps. This result was proved in 1957 by James Hannan, and has apparently been reproved 13 times !

If you think of 1 as "it will rain tomorrow", then you see why this is an important problem. Of course, weather forecasters usually predict with a probability: there is a 20% chance of rain tomorrow. How do we evaluate the effectiveness of a prediction in that case. The answer is quite elegant, and is called "calibration". The idea is that you look at the subsequence of 0s and 1s where the forecaster predicted a 20% chance of rain, and check if over this sequence, there was indeed a 20% fraction of rain. Repeat for all probability predictions, weighting by the length of the subsequence.

A stunning result is that no matter what measure of discrepancy you use, and what kinds of subsequences you pick, it is possible to design a strategy that drives the calibration error to zero. In other words, a completely clueless weather forecaster might be a perfect predictor under this measure. Rather disturbing...

In the afternoon, there was an interesting talk by Seshadri Comandur on self-improving algorithms. The idea is that as an algorithm, you want to learn from the input, and then be able to generate answers that are optimal with respect to the distribution the input is being drawn from (think of this as a generalization of average-case analysis). Notice the similarities between this work and the invited talk; here also, you want to adapt your behaviour to the input so that you are optimal for the distribution it comes from.

Outtakes:
  • If you walk out of a skywalk, thru a parking lot, down the elevator, thru the parking lot, via tunnel, you can actually find somewhat passable Starbucks-branded coffee. Trust me, it's worth the walk :)
  • Jeff Erickson once again has his business meeting drinking game. I have to say that it's so complicated, I'll need to print out a sheet with all the rules, and will probably forget them once I get drunk enough...
  • Maybe that's the point...
  • Can you get drunk before the business meeting ? By my count David Eppstein needs to take 6 drinks for his hotel complaints, plus a special bonus for his innovative complaint about airplane noise at 1am. I myself am at about 3 drinks...

Categories:

1 comment:

Disqus for The Geomblog