Wednesday, October 12, 2011

O() notation breeds laziness

I will often write "We store the $n^2$ pairs of objects in a data structure" when what I really mean is "We store the $\binom{n}{2}$ pairs ..". Of course there's no "real" difference in asymptotic land. But I've often found that students reading a paper will puzzle over asymptotic handwaving wondering why someone wrote $O(n^2)$ when it should really be over all pairs and whether they're missing some key insight. It's even worse when people throw in gratuitous and specific constants just to make an analysis work ("why do we really need to run for 18.5 n steps???").

While I completely understand the reasons people do this (ranging from the last minute deadline insertion to a misguided attempt at clarity), I think that making these shortcuts often makes a paper harder to read rather than easier. There are places where the asymptotics are all that matter ("Run this randomized strategy $O(\log n)$ times and apply a Chernoff bound") but even there, it doesn't hurt to use $c \log n$ (or a specific constant if you need a specific polynomial bound) instead. And when there actually is a precise expression that conveys meaning (like in the binomial example above), it's far better to use that to signify what's going on.

I think the precept we teach to students in algorithm analysis (don't throw in O() notation till the end) would be well worth paying more heed to ourselves.

Caveat: I'm not advocating this in ALL instances. I'm advocating a greater sensitivity to when and where the use of O() notation is more or less appropriate.

Disqus for The Geomblog