Thursday, August 03, 2006

Timestamping using the arXiv...

When is it appropriate to post an article to the arXiv ? You could opt to post when
  • A journal /conference accepts the paper. You could also just post it on your website at that point, although the arXiv does allow for better dissemination.
  • You submit to a journal/conference. I'm told that this is often the model in areas of physics. Seems reasonable enough, although at least in CS, the fear of being "scooped" might prevent you from doing so, not to mention the often-byzantine "double-blind" policies of some conferences.
  • You've dried yourself off from the shower you were taking when inspiration struck. Ok, so maybe I'm exaggerating slightly, but...
Can the arXiv be a valid reference for time-stamping ? In other words, can you use a document posted on the arXiv as proof of 'prior art' ? Consider the following scenario: Santa and Banta* are both working furiously (and independently) on a brand new result in synthetic polymorphic differential logic. Santa submits his paper to the ACM Symp. on SPDL, while Banta misses the deadline. Banta, smartly enough, posts his manuscript on the arXiv, while Santa's paper is deemed not algorithmic enough and is rejected summarily from SPDL.

When IEEE SPDL comes along, who claims precedence ? Certainly not Santa, since he has no document to be cited ? But can Banta claim precedence merely by posting on the arXiv ? there has been no peer review after all, and his proof could be full of holes.

It would be easy enough to declare that Banta has no claim to precedence, since there is no peer-reviewed cited work available. But there are two problems with this:
  • It negates the value of the arxiv ! After all, if I cannot claim any kind of precedence, but can have someone pick over my result and improve it, what incentive do I have for posting anything ? One answer to this could be that my result is cast-iron, and can't be improved, but this happens far less often, and cannot be a useful answer in all situations.
  • It ignores common practice within the community ! People will often talk about new results they have, and if the result is important enough, word will spread, and ownership will be (informally) established.
  • People cite technical reports all the time ! One of the founding papers of streaming algorithms was (and has remained) a DEC technical report.
Personally, I'd still like to use the "peer-reviewed publication" as the best model for timestamping. But this also explains why the arXiv appears more popular among physicists and mathematicians, who publish primarily in journals. After all, the likelihood of a paper getting rejected by a journal can be reduced to a far lower number than the corresponding number for a conference, and so publishing on the arXiv is a relatively risk-free venture. I also suspect that people hold off on more controversial work, sending it out to the arXiv only when already accepted to a peer-reviewed venue.

As my Indian readers will know, Santa and Banta often star in jokes making fun of the lack of intelligence of a particular ethnic community in India. At least in this anecdote, they are both smart researchers; consider it my contribution to ethnic amity. :)



  1. I've arxived both at submission time and after acceptance. Also when I've written a paper but yet have no idea where to send it.

    Re time stamping, I think there are two different questions: (1) Does arxiv (or ECCC, or putting it on your own web site) count as time stamping for the purpose of showing that you had some idea first or independently of someone else? To me, clearly yes. (2) Does it count as time stamping for the purposes of preventing anyone else from publishing the same idea even if they came up with it independently? E.g. should we reject a FOCS submission because someone else already has a too-similar but independent ECCC preprint? Probably not. 

    Posted by Anonymous

  2. For patent law, I know the answer from personal experience.
    "If the invention has been described in a printed publication anywhere, or has been in public use or on sale in this country more than one year before the date on which an application for patent is filed in this country, a patent cannot be obtained."
    Arxiv clearly counts as "in public use" and indeed that is how things seem to work!


    Posted by Dave Bacon

  3. David E: but how do you tell the difference between 1 and 2 ? Also, how do you know that the FOCS submission is indeed independent of the pre-print ? I see no easy way of verifying this .  

    Posted by Suresh

  4. The difference is that in (1) you get credit and the right to publish your own work. In (2) you get someone else's too-late work suppressed, or don't.

    Also, how do you know that the FOCS submission is indeed independent of the pre-print ? 

    Sometimes the authors will say something about it in the introduction (having discovered the duplication themselves after already working on the problem). Also, if it's a well known enough problem that two groups came up with a solution independently then there's a reasonable chance that someone on the committee may know more of the story of how there came to be two independent papers. 

    Posted by Anonymous

  5. I like to put everything I write on Arxiv. I like
    the time-stamp it provides, and it keeps me honest.
    It also levels the playing field by giving everyone
    access to my papers at the same time, as opposed to
    just giving a leg up to my privaleged friends (well,
    I think they're privaleged).

    I do not feel that FOCS or STOC provide a more
    legitimate time-stamp, as neither really reviews
    for correctness.

    Moreover, I think that the idea that one should keep
    a result quiet until submitting (or having it accepted) to a conference is silly, and also hurts science. It's even worse to keep a result quiet
    for a year because you happen to be on a program
    committee. It also creates even more disincentive
    to be on a committee.

    Posted by Dan Spielman


Disqus for The Geomblog