Friday, January 08, 2010

Guest Post: Question on posting to the arxiv

ed. note: this post is by Jeff Phillips. For another recent post on arxiv publishing issues, see Hal Daume on the arxiv, NLP and ML.

It seems that over the last few months, the number of papers posted to the arXiv has been noticeably increasing, especially in the categories of Computational Geometry and Data Structures and Algorithms.

I have posted several (but not all) of my papers on the arXiv. I still do not have a consistent set of rule under which I post the papers. Here are a couple circumstances under which I have posted paper to the arXiv.

A: Along with Proceedings Version:
When conference version does not have space for full proofs, so in conjunction with proceedings version, post full version to arXiv. This is a placeholder for the full version until the journal version appears. Additionally, the arXiv paper can be updated when the final journal version appears if it has changed.

Sometimes, I link to the arXiv version in the proceedings version. This makes it easy for a reader of the proceedings to find the full proofs.

If more conferences move to the SODA model where proceedings versions can be much longer (~20 pages), then this situation may not often be necessary.

B: Along with Submitted Version:
When you want to advertise a piece of work, but it has only been submitted, post a version to arXiv. This is useful if you are giving talks on the work, and want a documented time stamp so you can't get scooped, or say, you are applying for jobs and want to make your work very available and public.

This is closer to the math philosophy where many (most?) people submit a version of a paper to arXiv as soon as they submit it to a journal. I think it would be great if CS adapted this policy, as it would be a lot easier to track results. I have a friend who as a math graduate student would start every day by perusing the dozen or so new arXiv post in his area and choosing one paper to read. He told me that almost every paper he read as a grad student was on the arXiv. Wouldn't a world like that be extremely convenient?

However, I have had an issue following this rule. Last year I submitted a paper to a conference and concurrently, submitted a longer version to the arXiv. The paper was unfortunately, not accepted to the conference. My coauthor and I extended the results to the point where it made sense to split the paper. Half was then submitted and accepted to another conference, and full proofs were made available through a tech report at my coauthor's institution, as he was required to do. The second half which has also been extended is now under submission.

I might like to post the (full) second half to the arXiv, but do not want to double the part from the previous post. I am not sure if it make sense to merge the papers at this point either. And I would also like to note on the arXiv page that that version has been extended and part appears as a tech report.

What is the proper arXiv etiquette for this situation?


  1. <hat=arxiv-moderator>

    I suggest submitting the technical report to the ArXiv as a revision for your earlier paper, and then submitting the second result as a second ArXiv paper. You can also update the publication info for an ArXiv submission without changing the actual paper; but since you have an updated paper, why not post it?

    I'm surprised by your unstated assumption that one should only post papers to the ArXiv if they are not available elsewhere. So what if all the proofs appear in the proceedings?


  2. Jeffe's recomendation seems sound to me.

    I have been submitting papers to the arxiv for 10 years now. The greatest percentage of the papers are published, but an item or two winds up not being published, but
    anything of scholarly use is acceptable.

    It is exceptionally convenient for your subject of interest to appear in your rss reader. In my area, all the best papers are first put at the arxiv, making the arxiv essential for currency in research!

    So, modify the paper---styles here are very broad, with some papers receiving many different versions, most in math not receiving any.
    All prior versions are available.

    Another lesser known fact is that the source code is also avalible: Great tool for those papers you really want to understand, but should translate into your language.

  3. Hi Jeff,

    Thanks for your post! I was wondering if you could say anything about its complement - what reasons do you have for *not* submitting something to the arXiv?

    Of course, in certain fields of computer science, people do post to the arXiv as standard - for example, in quantum computing nearly everything appears on the arXiv sooner or later. This is a great resource for students (and everyone else!).

  4. There is a comments field where you can explain things like "v2: fixed typos. v3: added several new results, and doubled the size."

    Or you can "withdraw" a paper by replacing it with a single sentence, e.g. "This submission has been expanded and split into the two new submissions 1001.xxxx and 1001.yyyy." (Technically the old version is still accessible, but only for people who explicitly ask for it.)

  5. Ashley:

    Honestly, I don't have a good reason for not putting all of my papers on the arXiv. I can give a few excuses/rationalizations however.

    (1) Most top people in my field do not post their papers on the arXiv. Sometimes I wonder if there is some unspoken reason why they don't.

    As I don't have tenure (or even a tenure track job) yet, I hesitate to call anyone specifically out, but you senior researchers in geometry, you all know who you are, why don't you post all or any of your papers to arXiv?

    (2) laziness. I actually assume this is actually the main reason many people don't do it. I know it only takes a few minutes, but I always need to re-figure out which subset of latex-related files I am supposed to upload. Also, I usually feel the need to read over the paper one more time just to double check for mistakes, and that will add a few hours onto the process and makes it easier to procrastinate indefinitely. I know if I just posted the longer version immediately, it would not be so bad, but I am usually really busy / tired just when it becomes ready.

    (3) I asked a friend this once and he mumbled something about the arXiv has two purposes: (a) to archive the paper and (b) to announce the papers, and he did not like how they were coupled. I don't fully understand this reasoning, but perhaps others do.

    If anyone else has other reasons, I would also be very interested to hear them.

  6. I am not senior.

    I do not post my papers to arxiv because I do not want to figure out copyright-related things, especially because they seem to depend on the eventual publisher.

  7. Another example of what aram was referring to is an early version of spectral sparsification by Spielman and Teng:

    This paper has been withdrawn and replaced by three other entries.

  8. I post (almost) all of my papers on the arxiv. If you go to my website and click on "papers" it's just a link to the arxiv. I find it enormously convenient for all sorts of reasons. For example, I don't even need to keep backups of the tex source of my old papers. And it's a tremendous courtesy to readers.

    I think that journals may be reluctant to raise a stink about copyright of arxiv papers because of the negative publicity they would get as a result. Of course, this may depend on the field.

  9. I have a related question: I noticed that the fraction of posts that are arxiv uploads on the theory blog aggregator has been increasing steadily. Is it getting spammy? Should I remove one or more arxiv categories? I'd really appreciate feedback on this. Thanks!

  10. Actually I think the arxiv feeds in the aggregator are great

  11. arxiv fans should also check out While it's specialized to quantum papers, it offers a sort of solution to the problem of arxiv feeds becoming "spammy".

  12. I also really like arxiv posts to the aggregator--not spammy at all!

  13. Good to know. Thanks for the feedback!


Disqus for The Geomblog