Friday, August 05, 2005

DOI and BibTeX

For the last few hours, for reasons that I will not get into, I have been trying to track down bibtex entries for papers. Usually if the paper has an ACM DL entry, there is a bibtex entry that one can web-scrape, but for many papers (especially IEEE publications), this doesn't work because IEEE doesn't have bibtex entries on their website (and it's harder to web-scrape them).

Most of the complication comes from the fact that often I have a title, and need to match it to an actual citation of some kind. Google Scholar is quite helpful in this regard, allowing me to search for a title and more often than not returning the ACM DL link to the paper (and BibTeX entry).

But the ACM doesn't have everything, and this is where DOI numbers come in. The Document Object Identifier is a unique identifier that maps to a document entity, analogous to the URL for a web page. Similarly to a web page, the actual location of the document can be hidden from the user, and changed easily by the publisher, allowing for both portability and the ability to integrate a variety of sources. There is even a proxy server that you can supply a DOI number to; it returns the web page of the publisher that currently maintains that document.

What would be very cool would be a DOI to BibTeX converter. Note that a BibTeX entry maps to a single document, like a DOI. DOIs of course address a smaller space, since they govern only published work. If publishers exported some standard format (XML?), then it would be a trivial matter to write such a thing. Right now, all you get is the web page, from which you either have to scrape a bibtex, or construct one by hand. Neither options scales or is particularly appealing.

7 comments:

  1. Won't MathSciNet  be helpful? All Comp Sc papers you may not get there but many you will. For instance, here's the bibtex entry for one of your papers. 

    Posted by Anand

    ReplyDelete
  2. Just curious, can you tell me how you decide on the bibitem key to your bibtex entries? I mean the key you use in \cite{key}.

    I know this is largely a personal perference, but it may help me think a bit more about building a community database for bibtex entries. 

    Posted by Maverick Woo

    ReplyDelete
  3. Maverick: the DBLP format is fine: DBLP/conf/name/LastNameOfFirstAuthorXYZYear.

    Personally I am fairly random, and this was not for a personal database.

    Anand: MathSciNet is a good source. I had forgotten about it because the papers I was collating were CS papers and not math papers (though there is often overlap as you point out :)) 

    Posted by Suresh

    ReplyDelete
  4. Thanks. I will keep that format in mind. I have already met quite a number of people who wish to have a better bib database with a web service in the front to emit XML. One day when I have job security...

    For now I just wish that everyone use a text editor that can easily insert long \cite keys. I do, but the Pico users will insist on using one to two letter keys... 

    Posted by Maverick Woo

    ReplyDelete
  5. Thanks. I will keep that format in mind. I have already met quite a number of people who wish to have a better bib database with a web service in the front to emit XML. One day when I have job security...

    For now I just wish that everyone use a text editor that can easily insert long \cite keys. I do, but the Pico users will insist on using one to two letter keys... 

    Posted by Maverick Woo

    ReplyDelete
  6. Two more non-DOI tools that seem useful in your situation:

    Apart from MathSciNet, AMS provides a free tool called MRef, which seems to have a functionality you are describing.

    AMS blurb:
    "MRef is a tool for creating standard references with links to MathSciNet. The reference (with the author names first) should be typed or copied and pasted into the box above. Often, only a portion of the reference is necessary for MRef to recognize the corresponding entry in MathSciNet."

    Also, there is a web service called CiteULike, which scrapes bibliography information from a webpage describing a paper on a journal or an archive site, stores it in internal format and can optionally export it in the BibTeX format. Of course scarpers are site-specific; they have all the scrapers I usually need, but it is possible to write your own if you need support for a site they didn't take care of.

    CiteULike is a "social bookmarking service" (similar to del.icio.us, but specializing in bookmarking scientific publications), but this is another story.

    Posted by Andrei Sobolevskii

    ReplyDelete
  7. Zotero does something similar; since papers are always found with a redirect from dx.doi.org/[insert DOI herre], it should theoretically be fairly easy to write an extension to Zotero that would convert a list of DOIs to a Bibtex file.

    Scientifio Commoms, a scraped and self-archived database, may also be useful.

    ReplyDelete

Disqus for The Geomblog