Friday, February 12, 2010

Papers and SVN

Way back when, I had promised to do a brief post on the use of SVN (or other versioning systems) for paper writing. Writing this post reminds of all the times I've sniggered at mathematicians unaware of (or just discovering) BibTeX: I suspect all my more 'systemsy' friends are going to snigger at me for this.

For those of you not familiar with the (cvs, svn, git, ...) family of software, these are versioning systems that (generally speaking) maintain a repository of your files, allow you to check files out, make local changes, and check them back in, simultaneously with others who might be editing other files in the same directory, or even the same file itself.

This is perfect come paper writing time. Rather than passing around tokens, or copies of tex files, (or worse, zip files containing images etc), you just check the relevant files into a repository and your collaborator(s) can check them out at leisure. SVN is particularly good at merging files and identifying conflicts, making it easy to fix things.

My setup for SVN works like this: Each research project has a directory containing four subdirectories. Two are easy to explain: one is a "trunk" directory where all the draft documents go, and another is an "unversioned" directory for storing all relevant papers (I keep these separate so that when you're checking out the trunk, you don't need to keep downloading the papers that get added in)

The other two come in handy for maintaining multiple versions of the current paper. The 'branches' directory is what I use when it comes close to submission deadline time, and the only changes that need to be made are format-specific, or relate to shrinking text etc. The 'tags' directory is a place to store frozen versions of a paper (i.e post-submission, post-final version, arxiv-version, journal version, etc etc)

It seems complicated, but it works quite well. The basic workflow near deadline time is simply "check out trunk; make changes, check in trunk; repeat...". A couple of things make the process even smoother:
  • Providing detailed log messages when checking in a version: helps to record what exactly has changed from version to version - helpful when a collaborator needs to know what edits were made.
  • Configuring SVN to send email to the participants in a project whenever changes are committed. Apart from the subtle social engineering ("Oh no ! they're editing, I need to work on the paper as well now!"), it helps keep everyone in sync, so you know when updates have been made, and who made them.
  • Having a separate directory containing all the relevant tex style files. Makes it easy to add styles, conference specific class files etc.
I can't imagine going back to the old ways now that I have SVN. It's made writing papers with others tremendously streamlined.

Caveat:
  • SVN isn't ideal for collaborations across institutions. Much of my current work is with local folks, so this isn't a big problem, but it can be. Versioning software like git works better for distributed sharing, from what I understand.

14 comments:

  1. You are right about git. Apparently it's is even better than SVN, especially if you don't have a server machine where to put the things. At the very worst anyone can have a local git repository, or you can use git on your own (this scenario is possible even with SVN).

    You can realize an on-line repository on any directory which is available on the web (without installing software on the remote machine). You just need to check who reads/writes on the directory.

    I use git, coupled with an editor which
    has tools for managing diff files and versions, in case there are conflicting edits to merge.

    I also think that it is easier to setup than SVN (no server involved) for a very basic usage. This is good for reluctant/lazy workmates. :-D

    ReplyDelete
  2. I find Dropbox to be sufficient, and much easier, for paper-writing. SVN has many features that make it useful for writing large software projects, but many of these issues do not come up in paper writing. If you want to simultaneously edit the same file at the same time SVN could be more helpful with its ability to merge, but generally in paper-writing there are fewer incidents of conflicts like this so it is not to bad to simply detect the conflict (which Dropbox does) and then use an external merge tool to fix it. Actually it is even better than SVN because you will detect the conflict immediately rather than after you both try to commit.

    In other situations it is much easier to use than SVN. It automatically syncs so you don't have to remember to do a check out or a commit. It is just client software to install and the servers are already set up by Dropbox.

    After going from email to SVN I would never go back to email, but after later going from SVN to Dropbox, I would never go back to SVN.

    ReplyDelete
  3. This seems far too complicated. I don't want to deal with checking out, checking in, having to maintain some special directory structure to avoid large downloads (I hope you were joking about this). Dropbox works a lot better for distributed paper collaborations, in my experience. It might be nice if there were better versioning inside Dropbox, though.

    ReplyDelete
  4. SVN isn't ideal for collaborations across institutions.

    SVN is all I've ever used, so I don't know any better. What about it don't you like that's better in git?

    ReplyDelete
  5. I use Mercurial (hg) for my collaborative projects - with good results. It has the benefit, much like git, bazaar, darcs, and all the other happy players in the DVCS-world (distributed version control system) that you aren't required to rely on a joint server with common login for everyone - and indeed, for at least one project, we've been passing around Mercurial changesets by email to keep ourselves synchronized.

    It won't work for everyone: a few of the people I've interacted with have been strongly reluctant; but for those who try it, the ease to check changes, to merge changes (magically does the right thing most of the time!!) and the power of it as a communicative tool have been some of the major selling points.

    ReplyDelete
  6. @Anonymous Note that I tend to use version control even for just myself when I write papers - the big power isn't the communication aspect, it's the _versioning_ aspect. I don't have to worry about which version lives where or what I'm editing - since that's saved away in the Mercurial/SVN/whatever metadata.

    The ability to be able to just work ahead, deleting things with great abandon, and knowing that since I checked in earlier, everything I'm ripping out is still around, should I realize I need it, makes paperwriting and editing just SO much more structured, clean and enjoyable.

    (captcha: remakers)

    ReplyDelete
  7. jelani: suppose I'm writing a paper with you. I can't give you access to my svn repo, because you'd need an account to access it (our system doesn't do svn over http). You can't get an account because you're not at Utah.

    that's the attraction of distributed version control.

    ReplyDelete
  8. A collaborator of mine (at a different institution) set up an SVN structure for a paper we were working on. I asked about sending out email notices for updates and he argued that it was against the spirit of SVN that should encourage many easy small updates whenever you get a chance.

    I see his point to some degree, but in retrospect, I would have liked to known sooner when my coauthors made updates. I think a couple times this led to us both editing the same sections and someone needed to run a Merge.

    ReplyDelete
  9. It's also possible to admin svn in such a way that security isn't tied to logins on the Unix server where it lives, but rather is hard-coded.

    Whenever I begin working with a new collaborator, I just manually add a new login to my svn's passwd file, and so home institution doesn't matter. This is not at all sophisticated, but for a svn server with O(10) users it's fine.

    ReplyDelete
  10. Did anyone try

    http://projectlocker.com/

    for writing papers? It is supposed to be free for up to 5 users.

    ReplyDelete
  11. Hi Suresh,

    Regarding svn + requiring login, you may or may not be comfortable with it, but this is how I do it: my (remote) collaborators have ssh access into my account, but only for the purpose of using "svn". This is very easy to set up.

    http://subversion.apache.org/faq.html#ssh-authorized-keys-trick

    I am sure git probably does this better and I will switch to it eventually, but after convincing about 10+ collaborators/students to use svn, I am not about to go through that again any time soon.

    A.

    ReplyDelete
  12. Following on my comment above about "ssh", I realized that this is probably a better reference.

    http://svnbook.red-bean.com/en/1.4/svn.serverconfig.svnserve.html

    The issue of giving limited ssh access is discussed near the bottom.

    A.

    ReplyDelete
  13. SVN has its own authentication system. It was a little bit of hair pulling to learn to set it up the first time, but after that it has worked just fine. You should look into it.

    ReplyDelete
  14. To all the folks pointing out that SVN has local authentication methods: you are all right. However, our machines are managed by support staff, and I don't have direct control over the machines, so I only have access to svn via the method they provide (namely svn+ssh://

    ReplyDelete

Disqus for The Geomblog