Thursday, April 13, 2006

Author ordering on papers.

We order authors alphabetically in theory papers. The reasons for this are numerous, and not relevant; what's relevant is that this is often different from other CS communities, where the first-author, , last-author norm is usually followed (first-author being the person who did all the work, and last-author being the PI on the project).

I'm all for alphabetical ordering: it's easy and avoids annoying and often impossible-to-resolve discussions about who did "more" work on a paper. Note that it's easy to identify who did what; what's harder is comparing the relative merit of contributions, especially in the multi-authored extravaganzas that are common in theory.

It turns out though that alphabetical ordering may have one pitfall, at least in economics:
...a new paper (free, working version, Winter 06, JEP) demonstrates that these effects have important consequences for careers in economics. Faculty members in top departments with surnames beginning with letters earlier in the alphabet are substantially more likely to be tenured, be fellows of the Econometrics Society, and even win Nobel prizes (let's see, Arrow, Buchanan Coase...hmmm). No such effects are found in psychology where the alphabetical norm is not followed.
Well, I don't know about tenure, but I do know about ACM Fellows and Turing Award winners. I'm too lazy to do the linear regression that the authors do for their plots, but I will throw out these two tidbits: 60% of Turing award winners (30/50) have their last names in the first half of the alphabet [A-M], compared to 40% (20/50) in the latter half [N-Z]. With Fellows, the split is 63% (347/554) to 37% (207/554).

I don't know how the authors of this paper normalized against any skew in the base population of economists, and the numbers I quote are subject to the same objection. But just in case, call me Suresh Enkatasubramanian from now on.


Update (4/14): An anonymous poster and D. Sivakumar have been working hard to debunk my claim, successfully so far ! Anonymous points out that the percentage of names in [A-M] in their phone book is around 63%, and Siva reports that going by the DBLP database, around 62% of all authors are in the [A-M] range as well (the 50% cutoff is at K apparently). Now, although this "explains" my quick back of the hand statistics, it leaves two possibilities:
  • If one did the linear regression that the authors of the original paper used, one would get the same results
  • Even the authors of the original paper didn't correct for the baseline (and frankly, after reading their paper twice, I don't see where they did, but I can't imagine that they didn't).
A third point worth making is that CS has mixed policies on author ordering. Only theory consistenly uses alphabetical ordering, so it can be argued that the data cannot be used to infer anything at all.

Categories

10 comments:

  1. Why not just go all out and change you name to "Suresh Aaavenkatasubramanian"? You can choose as many 'a's as you like, depends how desperate you are to be first author... 

    Posted by JeffP

    ReplyDelete
  2. Of course, this is not a scalable solution to the problem :).  

    Posted by Suresh

    ReplyDelete
  3. It is amazing how hard-wired the importance of author order for researchers in areas where this matters. When we look at faculty candidates in Theory with surnames latter in the alphabet, you can bet there will be several faculty members that give negative ratings justified by the comment that the candidate has no first author papers. This is in spite of the fact that I have told these faculty many dozens of times that the standard author order is alphabetical.  

    Posted by Anonymous

    ReplyDelete
  4. Excellent Suresh. I would let the award committee know about your name change ;). I might mention also thaht _ comes before A, and capital letters come before small letters, as such ____VENKAT... is much better new name. 

    Posted by Anonymous

    ReplyDelete
  5. "60% of Turing award winners (30/50) have their last names in the first half of the alphabet [A-M], compared to 40% (20/50) in the latter half [N-Z]. With Fellows, the split is 63% (347/554) to 37% (207/554)."

    Have you checked what is the split in the phone book? 

    Posted by Herman

    ReplyDelete
  6. I checked the telephone book for our University,
    and A-M surnames constituted 63% of the pages of the phone directory.  

    Posted by Anonymous

    ReplyDelete
  7. I checked the telephone book for our University,
    and A-M surnames constituted 63% of the pages of the phone directory.

    Posted by Anonymous

    ReplyDelete
  8. Good job, Anonymous, who looked in the phone directory.
    From the DBLP data of computer science papers, about 62% of all authors are in the A-M range of last names (whether you consider it a set or a multiset with frequency = # publications). A-K accounts for 50% of all authors (again, with either interpretation).

    Information retrieval folks have two words for this type of normalization: tf idf

    just in case you were wondering, the frequency list is sorted
    S M B C K L H G P R W T D A F N J Z V Y O E I U X Q

    --Siva (working hard to restore the rightful balance of papers from the second half of the alphabet)
     

    Posted by Anonymous

    ReplyDelete
  9. At least there isn't discrimination against names starting late in the alphabet when it comes to publishing poetry in the NYT :) 

    Posted by Rahul

    ReplyDelete
  10. Oh, thanks a lot for the pointer. I just finished reading Martin Tompa's two articles, and they are hilarious. I will have to repost them up at the top of the post.  

    Posted by Suresh

    ReplyDelete

Disqus for The Geomblog