Tuesday, December 05, 2006

Author ordering and game theory.

There are typically two major ways of ordering authors on a paper. In theoryCS, we (mostly) use the lexicographic ordering (by last name); in many other areas, author ordering by relative contribution is common. There are variants: advisors might list themselves last by default even within a lexicographic or contribution-based system, and middle authors may not always be ordered carefully, etc etc.

Since paper authorship conveys many important pieces of information, author ordering is an important problem. It's an even bigger problem if you have hundreds of authors on a paper, some of which may not even know each other (!). This is apparently becoming common in the HEP (high energy physics) literature, and an interesting article by Jeremy Birnholtz studies the problem of authorship and author ordering in this setting. The study is sociological; the author interviews many people at CERN, and derives conclusions and observations from their responses.

As one might imagine, not too many of the problems of 1000-author papers are translatable to our domain. After all, these papers are bigger than our conferences, and I doubt anyone has never needed a "publication committee" when writing their paper. And yet, the interviews reveal the same kinds of concerns that we see all the time. Is a certain ordering scheme shortchanging certain authors ? Did a certain author do enough to merit authorship ? Who gets to go around giving talks on the work ?

Towards the end of the paper, the author makes an interesting (but unexplored) connection to game theory. The players in this game are the authors, and what they are trying to optimize is perceived individual contributions by the community (the "market"). Intuitively, lexicographic ordering conveys less information about author contributions and thus "spreads" contributions out: however, it's not symmetric, in the sense that if we see a paper with alphabetically ordered authors, it could be a product of a truly relative contribution ordering that yields this ordering, or a lexicographic ordering. In that sense, authors with names earlier in the alphabet are disadvantaged, something that seems counter-intuitive.

As it turns out, there's been some work on the equilibrium behaviour of this system. To cite one example, there's a paper by Engers, Gans, Grant and King (yes, it's alphabetically ordered) that studies the equilibrium behaviour of author ordering systems with a two-author paper in a market. Their setup is this:
  • The two players A and B decide to put in some individual effort.
  • The relative contribution of each (parametrized by the fraction of contribution assigned to A) is determined as a (fixed but hidden) stochastic function of the efforts.
  • The players "bargain" to determine ordering (lexicographic or contribution). The result is a probability of choosing one kind of ordering, after which a coin is tossed to determine the actual ordering
  • The work is "published", and the market assigns a value to the paper as a whole, and a fraction of this value to A, based on public information and other factors.
Now all of these parameters feed back into each other, and that's where the game comes from. What is a stable ordering strategy for this game ? It turns out that lexicographic ordering does yield equilibrium behaviour, and contribution-based ordering does not.

What's even more interesting is that if we look at merely maximizing research output (the external "quality" of the paper), then this is not maximized by lexicographic ordering, because of the overal disincentive to put in more effort if it's not recognized. However, this does not suggest that always using contribution-based ordering is better; the authors have an example where this is not true, and one intuition could be that if there's a discrepancy between the market perception of contribution and individual contributions, then there is a disincentive to deviate too much from the "average" contribution level.

It's all quite interesting. Someone made a comment to me recently (you know who you are :)) about how assigning papers to reviewers made them value research into market-clearing algorithms. I like the idea of applying game theory to the mechanisms of our own research.

(HT: Chris Leonard)

Previous posts on author ordering here, and here.



  1. Wow. Quite an interesting read here. I've gotta go grab my dictionary. Anyways, compliments on the blog. Keep at it.

    Matt Menster
    Posted by Matt

  2. Now how come early alphabet authors are at a disadvantage? Lexicographic ordering on a paper could come from true lexicographic ordering, or coincidental merit ordering (Which in this case would be a plus). For authors at the end of the alphabet, the ordering could come from true lexicographic ordering, or coincidental merit ordering (Which in this case would be a minus).

    If you assume some kind of probability for merit ordering when you see a lexicographically ordered paper, the authors at the end of the alphabet lose. 

    Posted by A

  3. But, if the merit ordering ends up being alphabetic then the first author may loose because it is probably assumed that the ordering was made alphabetically.

    You can thus argue that the early alphabet people have more to loose. 

    Posted by JeffP

  4. The advantage of having the lexicographic first name is being quoted as A.... et. al.

    "But, if the merit ordering ends up being alphabetic then the first author may loose because it is probably assumed that the ordering was made alphabetically."

    And if it is made alphabetically, one may think that the last place is "deserved" by a ranking by contribution.

    The problem seems to be completely symmetric to me.


    Posted by Michael Greinecker

  5. Amusing.

    Author early in the alphabet have an advantage, not a disadvantage, because they get to be first authors on major discoveries. I have a friend that this happened to. Yes, you can argue that there's a corresponding disadvantage, because thay may be assumed to be lead author sometimes just based on their name, not their contribution, but this is not NEARLY as important as the advantage gained by even once being first author on a major paper where you were 1/200th of the work, but the citation is "your name et al."

    The correct game-theory answer to "who should be first author" is "the guy who actually writes the paper should be first author." This is a stable equilibrium; if somebody else wanted to have been first author, they would have written the damn paper themself.


