Saturday, September 18, 2004

Experimental Work in SODA

A reader writes:
Let me suggest another [topic]--the attitude towards experimental algorithmic work, which I think is a recurring issue. The CFP [for SODA] often advocates this theme but then the committee is likely to not appreciate so much the experimentations.

As theoreticians, it is probably not so impressive to us that one can beat the known worst-case analysis in certain data sets (particularly simulated data). Perhaps we are not used to evaluate experimental work and we rarely think about the kind of experiments that would significantly benefit our theoretical work.
David Johnson has written a Theoretician's Guide to the Experimental Evaluation of Algorithms (PDF)(link fixed). The first time I saw this, it was a short 7 page guide, and now it has expanded to a 36 page review. I highly recommend reading it, and keeping a copy around when you start planning your experiments. The article addresses all the points (and more) that I would have hoped to make, so I'll merely excerpt a few points:

1. Understand what your paper is about:
  • Implementation for a specific application
  • A 'horse-race' paper, comparing algorithm/implementation to others
  • Exploring the strengths/weaknesses etc of various algorithmic ideas in practice
  • Generating conjectures about average case behaviour when formal probabilistic analysis is too hard
The first two cases are the most common kinds of implementations, but it is the third one that David suggests is the most interesting from an experimental algorithmics perspective. Indeed, usually papers that combine the first two characteristics are best suited for the area the application came from, rather than a SODA/ALENEX....

2. What makes a paper interesting:

...the results should yield broader insights, either in terms of asymptotics or presumption of robustness.... (this is one way in which an experimental paper aspires to higher goals than a horse race paper...)
3. On reproducibility:
... what does "reproducibility" mean in the context of the experimental analysis of algorithms? In the strictest sense it means that if you ran the same code on the same instances on the same machine/compiler/operating system/system load combination you would get the same running time, operation counts, solution quality (or the same averages, in the case of a randomized algorithm).

This, however, would not be very informative, nor does it correspond to what is meant by "reproducibility" in classical scientific studies. In reproducing such a study, a scientist must use the same basic methods, but will typically use different apparatus, similar but distinct materials, and possibly different measuring techniques. He will be said to have reproduced the original results if the data he obtains is consistent with that of the original experiments and supports the same conclusions. In this way he helps confirm that the results of the original experiment (and the conclusions drawn from them) are independent of the precise details of the experiment itself.

It is this broader notion of reproducibility that I consider to be the most important.
4. On comparability:
You should write your papers (and to the extent possible make relevant data, code, and instances publicly available in a long-term manner) so that future researchers (yourself included) can accurately compare their results for new algorithms/instances to your results. Part of this is simply following the above recommendations for ensuring reproducibility, but there are additional steps necessary.
He goes on to list some issues: the uncalibrated machine, the lost testbed, lost code/data.

Finally, the article lists things you cannot trust:
1. Never trust a random number generator.
2. Never trust your code to be correct.
3. Never trust a previous author to have known all the literature.
4. Never trust your memory as to where you put that data (and how it was generated).
5. Never trust your computer to remain unchanged.
6. Never trust backup media or Web sites to remain readable indefinitely.
And most memorably,

7. Never trust a so-called expert on experimental analysis.

The advice in this paper, if taken, will lead to an excellent piece of experimental work. It does require a lot more work than merely coding up an algorithm, and I think this is something to keep in mind. I have come to realize that unlike when writing a theoretical paper, when the timeline from initial problem conception to paper writing can be as little as a few days depending on the problem, experimental work always takes a lot longer and requires a lot more care to do and to write about. It's worth keeping that in mind; don't expect a published work to come out of an implementation merely by the act of implementing.

Frame the question this way: what can someone take away from reading your paper ? In what way will it guide their work ? When is your empirical work applicable to their situation, and when not ? I actually don't think theory committees have great difficulty evaluating experimental work; it's that I think experimental algorithmics has some ways to go to catch up in quality to the (justified) expectations of it. Over the years, the quality of papers in ALENEX has steadily grown, and I see no reason why we can't get better.

A bit of historical context: SODA has always been interested in examining experimental work. This excerpt comes from the CFP for SODA 1990 (the first one); added emphasis is mine. You can't find it on the web; I had to parse a troff document from David !
Papers are solicited that, by mathematical or experimental analysis, address the ways in which resource usage grows with increasing problem size, with ``resource'' interpreted broadly to include, for instance, ``nearness to optimality'' as well as such traditional measures as running time, storage, number of processors, and amount of communication. Mathematical analysis may be worst-case or probabilistic. Lower bound results are appropriate if they apply to real algorithmic problems.

Experimental papers can address a variety of issues, but special consideration will be given to those that put theoretical results into practical perspective or suggest new avenues for theoretical investigation.


  1. I do not religiously read the SODA proceedings, so I probably have missed some great works. However, the only experimental SODA paper I have admired was:

    J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

    That is a great piece of experimental work.

  2. I think a horse race paper is quite useful. Perhaps
    the most important side-effect is that code for
    various algorithms is actually written and tested
    which happens too infrequently in the algorithms
    community. Asking for much more than this is hardly
    reaslistic given the amount of work involved.

    At the same time, I am not convinced that the current
    SODA model where both theory papers and experimental
    papers are evaluated side by side is going to work
    unless there is some explicit quota for experimental
    papers which is also problematic. Basically I don't
    think there is any incentive, career wise, for the theory/algorithms community to do experimental work
    and that more than anything else explains the paucity
    of papers in the area. I don't think this is a bad
    thing in itself - perhaps there are not enough applications today that truly need all the sophisticated algorithms developed in theory so we'll have to just wait for that to happen.
    In the mean time if there are too many theory people the market should take care of that eventually.


  3. 1. It is not clear that there is a paucity of experimental papers overall: ALENEX has been attracting attention and both ESA and SoCG have applied tracks.

    2. There is no reason to expect a large number of experimental papers at SODA, similarly as there is no reason to expect a large number of (say) CG papers at FOCS. Some good ones will get into SODA, and that is fine. There are other avenues, and ALENEX is a good one.

    3. It is not clear to me that "doing experimental work" is detrimental from a "career" point of view. It all depends on how one frames one's work, and ultimately, you do what interests you. If one were to say that only a small fraction of the theory community is interested in experimental work, that is probably true, and I see no problem with it. However, for those who do find experimental work interesting, they will find ways of placing their work in context so that it doesn't harm their career.

  4. Suresh,

    I think the link to Johnson's paper is broken.
    It should be


Disqus for The Geomblog