Thursday, April 13, 2006

Author ordering on papers.

We order authors alphabetically in theory papers. The reasons for this are numerous, and not relevant; what's relevant is that this is often different from other CS communities, where the first-author, , last-author norm is usually followed (first-author being the person who did all the work, and last-author being the PI on the project).

I'm all for alphabetical ordering: it's easy and avoids annoying and often impossible-to-resolve discussions about who did "more" work on a paper. Note that it's easy to identify who did what; what's harder is comparing the relative merit of contributions, especially in the multi-authored extravaganzas that are common in theory.

It turns out though that alphabetical ordering may have one pitfall, at least in economics:
...a new paper (free, working version, Winter 06, JEP) demonstrates that these effects have important consequences for careers in economics. Faculty members in top departments with surnames beginning with letters earlier in the alphabet are substantially more likely to be tenured, be fellows of the Econometrics Society, and even win Nobel prizes (let's see, Arrow, Buchanan Coase...hmmm). No such effects are found in psychology where the alphabetical norm is not followed.
Well, I don't know about tenure, but I do know about ACM Fellows and Turing Award winners. I'm too lazy to do the linear regression that the authors do for their plots, but I will throw out these two tidbits: 60% of Turing award winners (30/50) have their last names in the first half of the alphabet [A-M], compared to 40% (20/50) in the latter half [N-Z]. With Fellows, the split is 63% (347/554) to 37% (207/554).

I don't know how the authors of this paper normalized against any skew in the base population of economists, and the numbers I quote are subject to the same objection. But just in case, call me Suresh Enkatasubramanian from now on.

Update (4/14): An anonymous poster and D. Sivakumar have been working hard to debunk my claim, successfully so far ! Anonymous points out that the percentage of names in [A-M] in their phone book is around 63%, and Siva reports that going by the DBLP database, around 62% of all authors are in the [A-M] range as well (the 50% cutoff is at K apparently). Now, although this "explains" my quick back of the hand statistics, it leaves two possibilities:
  • If one did the linear regression that the authors of the original paper used, one would get the same results
  • Even the authors of the original paper didn't correct for the baseline (and frankly, after reading their paper twice, I don't see where they did, but I can't imagine that they didn't).
A third point worth making is that CS has mixed policies on author ordering. Only theory consistenly uses alphabetical ordering, so it can be argued that the data cannot be used to infer anything at all.


Disqus for The Geomblog