Thursday, August 03, 2006

Timestamping using the arXiv...

When is it appropriate to post an article to the arXiv ? You could opt to post when
  • A journal /conference accepts the paper. You could also just post it on your website at that point, although the arXiv does allow for better dissemination.
  • You submit to a journal/conference. I'm told that this is often the model in areas of physics. Seems reasonable enough, although at least in CS, the fear of being "scooped" might prevent you from doing so, not to mention the often-byzantine "double-blind" policies of some conferences.
  • You've dried yourself off from the shower you were taking when inspiration struck. Ok, so maybe I'm exaggerating slightly, but...
Can the arXiv be a valid reference for time-stamping ? In other words, can you use a document posted on the arXiv as proof of 'prior art' ? Consider the following scenario: Santa and Banta* are both working furiously (and independently) on a brand new result in synthetic polymorphic differential logic. Santa submits his paper to the ACM Symp. on SPDL, while Banta misses the deadline. Banta, smartly enough, posts his manuscript on the arXiv, while Santa's paper is deemed not algorithmic enough and is rejected summarily from SPDL.

When IEEE SPDL comes along, who claims precedence ? Certainly not Santa, since he has no document to be cited ? But can Banta claim precedence merely by posting on the arXiv ? there has been no peer review after all, and his proof could be full of holes.

It would be easy enough to declare that Banta has no claim to precedence, since there is no peer-reviewed cited work available. But there are two problems with this:
  • It negates the value of the arxiv ! After all, if I cannot claim any kind of precedence, but can have someone pick over my result and improve it, what incentive do I have for posting anything ? One answer to this could be that my result is cast-iron, and can't be improved, but this happens far less often, and cannot be a useful answer in all situations.
  • It ignores common practice within the community ! People will often talk about new results they have, and if the result is important enough, word will spread, and ownership will be (informally) established.
  • People cite technical reports all the time ! One of the founding papers of streaming algorithms was (and has remained) a DEC technical report.
Personally, I'd still like to use the "peer-reviewed publication" as the best model for timestamping. But this also explains why the arXiv appears more popular among physicists and mathematicians, who publish primarily in journals. After all, the likelihood of a paper getting rejected by a journal can be reduced to a far lower number than the corresponding number for a conference, and so publishing on the arXiv is a relatively risk-free venture. I also suspect that people hold off on more controversial work, sending it out to the arXiv only when already accepted to a peer-reviewed venue.

As my Indian readers will know, Santa and Banta often star in jokes making fun of the lack of intelligence of a particular ethnic community in India. At least in this anecdote, they are both smart researchers; consider it my contribution to ethnic amity. :)


Disqus for The Geomblog