## Saturday, September 18, 2004

### Experimental Work in SODA

Let me suggest another [topic]--the attitude towards experimental algorithmic work, which I think is a recurring issue. The CFP [for SODA] often advocates this theme but then the committee is likely to not appreciate so much the experimentations.

As theoreticians, it is probably not so impressive to us that one can beat the known worst-case analysis in certain data sets (particularly simulated data). Perhaps we are not used to evaluate experimental work and we rarely think about the kind of experiments that would significantly benefit our theoretical work.
David Johnson has written a Theoretician's Guide to the Experimental Evaluation of Algorithms (PDF)(link fixed). The first time I saw this, it was a short 7 page guide, and now it has expanded to a 36 page review. I highly recommend reading it, and keeping a copy around when you start planning your experiments. The article addresses all the points (and more) that I would have hoped to make, so I'll merely excerpt a few points:

–
• Implementation for a specific application
• A 'horse-race' paper, comparing algorithm/implementation to others
• Exploring the strengths/weaknesses etc of various algorithmic ideas in practice
• Generating conjectures about average case behaviour when formal probabilistic analysis is too hard
The first two cases are the most common kinds of implementations, but it is the third one that David suggests is the most interesting from an experimental algorithmics perspective. Indeed, usually papers that combine the first two characteristics are best suited for the area the application came from, rather than a SODA/ALENEX....

2. What makes a paper interesting:

...the results should yield broader insights, either in terms of asymptotics or presumption of robustness.... (this is one way in which an experimental paper aspires to higher goals than a horse race paper...)
3. On reproducibility:
... what does "reproducibility" mean in the context of the experimental analysis of algorithms? In the strictest sense it means that if you ran the same code on the same instances on the same machine/compiler/operating system/system load combination you would get the same running time, operation counts, solution quality (or the same averages, in the case of a randomized algorithm).

This, however, would not be very informative, nor does it correspond to what is meant by "reproducibility" in classical scientific studies. In reproducing such a study, a scientist must use the same basic methods, but will typically use different apparatus, similar but distinct materials, and possibly different measuring techniques. He will be said to have reproduced the original results if the data he obtains is consistent with that of the original experiments and supports the same conclusions. In this way he helps confirm that the results of the original experiment (and the conclusions drawn from them) are independent of the precise details of the experiment itself.

It is this broader notion of reproducibility that I consider to be the most important.
4. On comparability:
You should write your papers (and to the extent possible make relevant data, code, and instances publicly available in a long-term manner) so that future researchers (yourself included) can accurately compare their results for new algorithms/instances to your results. Part of this is simply following the above recommendations for ensuring reproducibility, but there are additional steps necessary.
He goes on to list some issues: the uncalibrated machine, the lost testbed, lost code/data.

Finally, the article lists things you cannot trust:
1. Never trust a random number generator.
2. Never trust your code to be correct.
3. Never trust a previous author to have known all the literature.
4. Never trust your memory as to where you put that data (and how it was generated).
5. Never trust your computer to remain unchanged.
6. Never trust backup media or Web sites to remain readable indefinitely.
And most memorably,

7. Never trust a so-called expert on experimental analysis.

The advice in this paper, if taken, will lead to an excellent piece of experimental work. It does require a lot more work than merely coding up an algorithm, and I think this is something to keep in mind. I have come to realize that unlike when writing a theoretical paper, when the timeline from initial problem conception to paper writing can be as little as a few days depending on the problem, experimental work always takes a lot longer and requires a lot more care to do and to write about. It's worth keeping that in mind; don't expect a published work to come out of an implementation merely by the act of implementing.

Frame the question this way: what can someone take away from reading your paper ? In what way will it guide their work ? When is your empirical work applicable to their situation, and when not ? I actually don't think theory committees have great difficulty evaluating experimental work; it's that I think experimental algorithmics has some ways to go to catch up in quality to the (justified) expectations of it. Over the years, the quality of papers in ALENEX has steadily grown, and I see no reason why we can't get better.

A bit of historical context: SODA has always been interested in examining experimental work. This excerpt comes from the CFP for SODA 1990 (the first one); added emphasis is mine. You can't find it on the web; I had to parse a troff document from David !
Papers are solicited that, by mathematical or experimental analysis, address the ways in which resource usage grows with increasing problem size, with resource'' interpreted broadly to include, for instance, nearness to optimality'' as well as such traditional measures as running time, storage, number of processors, and amount of communication. Mathematical analysis may be worst-case or probabilistic. Lower bound results are appropriate if they apply to real algorithmic problems.

Experimental papers can address a variety of issues, but special consideration will be given to those that put theoretical results into practical perspective or suggest new avenues for theoretical investigation.