Wednesday, July 15, 2009

Consistent BibTeX formatting

I try not to write BibTeX by hand any more: too easy to introduce errors. So I usually use either DBLP or the ACM digital library to get BibTeX for papers. Sometimes the journal has BibTeX, or some format that can be converted. As an aside, IEEE is extremely lame: you have to login to their digital library even to get a citation !

For the most part, I don't need to go beyond ACM or DBLP, which is great. But here's the problem: their formats are different ! I needed the BibTeX for a recent paper of mine, and found it on both sites. Here's what ACM gave me:
author = {Ahmadi, Babak and Hadjieleftheriou, Marios and Seidl, Thomas and Srivastava, Divesh and Venkatasubramanian, Suresh},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology},
year = {2009},
isbn = {978-1-60558-422-5},
pages = {84--95},
location = {Saint Petersburg, Russia},
doi = {},
publisher = {ACM},
address = {New York, NY, USA},

and here's what DBLP gave me:
author = {Babak Ahmadi and
Marios Hadjieleftheriou and
Thomas Seidl and
Divesh Srivastava and
Suresh Venkatasubramanian},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT},
year = {2009},
pages = {84-95},
ee = {},
crossref = {DBLP:conf/edbt/2009},
bibsource = {DBLP,}

editor = {Martin L. Kersten and
Boris Novikov and
Jens Teubner and
Vladimir Polutin and
Stefan Manegold},
title = {EDBT 2009, 12th International Conference on Extending Database
Technology, Saint Petersburg, Russia, March 24-26, 2009,
booktitle = {EDBT},
publisher = {ACM},
series = {ACM International Conference Proceeding Series},
volume = {360},
year = {2009},
isbn = {978-1-60558-422-5},
bibsource = {DBLP,}
So as you can see, we have a problem. The formats are not consistent, which means that if I need to get some references from DBLP, and others from the ACM, my references file is going to look very irregular.

Other critiques:
  • I have never understood why DBLP splits up the conference and the paper: with BibTeX, if you cite three or more papers that use the same crossref, the crossref is included itself as a reference, which is just strange.
  • Unless you use double curly braces, capitalizations inside a string get removed, which is mucho annoying: It's "Riemannian", not "riemannian".
  • The DBLP name for the conference is too cryptic: who'd even know what EDBT is outside the database community. On the other hand, the ACM citation is clunky, and is a page-length disaster waiting to happen.
Thoughts ?


  1. There is no way to avoid editing it by hand. The refs from this places contain information I do not care about (ISBN, publisher, publisher city, etc). The conference thingy is usually completely useless. Also, page numbers are usually mistyped (it should be -- and not - between the numbers). Here is how my bibtex entries look like:
    author = "A. C. Yao and F. F. Yao",
    title = "A general approach to {$D$}-dimensional geometric
    booktitle = STOC_1985,
    year = 1985,
    pages = "163--168"
    In fact, I have the whole geombib in this styel...


  2. One of your complaints has an easy solution: the bibtex command line min-crossrefs allows you to set a lower bound on the number of papers from the same conference it takes for bibtex to generate a separate bibitem for the conference.


    bibtex -min-crossrefs=99 yourfilname

    As for the conference name and such: I use different conventions depending on the venue (e.g. I frequently use abbreviations in conference proceedings versions, but I try to spell things out fully when page length isn't an issue). It would be nice to have ACMDL/DBLP-aware bibtex style files that will automatically format these auto-generated bibtex entries in just the right way. Seems not so hard for someone who understands the syntax of bibtex styles.


  4. I generally use BibTex with conference names expanded. If the length of the paper goes beyond restriction, I revisit the BibTex and use abbreviations. But it would be great if all association agree upon a single format.

  5. At least as useful a consistency for me would be in the citation tags (not the content of the citation) in order to facilitate work with co-authors. The tags that come from ACM or DBLP are too long or have no meaning - I want to have some clue what the reference is by reading the .tex file. What Sariel has done solves both problems. If only we had such things for broader sections of the field.

    The verbosity problem that Sariel complains about is annoying but in principle one should not be penalized for including more information. The real problem is that the most .bst styles include too much information by default when you run bibtex.

    It isn't simple. In principle the use of cross-ref citations for conferences is better because you are not repeating redundant information. However, as a stand-alone reference you probably want the information there but as a cross-ref you probably want to suppress some of it. For example, ICALP and other LNCS proceedings sometimes have long lists of PC chairs as editors and these names will show up in every cross-ref.

  6. The ACM version looks quite clean to me. Other than the uninformative tag, the only changes I would make would be to double-bracket the title (not necessary in this case but I routinely do this to prevent annoying uncapitalizations) and strip the url from the doi: doi={10.1145/1516360.1516372}.

    I agree that crossrefs are more trouble than they're worth, I don't like the abbreviated booktitle in both of the DBLP entries, the nonstandard ee tag confuses me, and the page number hyphenation is just wrong.

    I don't much care whether the author names are last-first or first-first, but I think they should be spelled out and that the bibstyle should take care of abbreviating them (although I don't always actually do it that way).

  7. Seems like there are some more people here that don't understand the meaning of braces BibTeX title fields.

    The idea is that (non-)capitalization is determined by the style file. If the style file says that titles are to be written in sentence style, then BibTeX will down-cap all words in the title. The alternative is to have a style file that leaves the title alone. (A style file will not be able to up-cap your titles, since it won't know which words to capitalize and which not.)

    Since there are words (acronyms, names, etc.) whose capitalization should be changed under no circumstances, you can protect individual words (at least, that's the idea) using curly braces, so they'll remain unchanged even if the style file requests all-lower case.

    What that means is that you

    (a) you should use curly braces only for acronyms, etc. (but not "double-bracket the title") and

    (b) use a proper style file if you don't want all titles to be written in lower case.

    Unfortunately, many of the default style files do that stupid conversion into all-lower (which probably is the reason why so many people get the idea wrong). You can easily fix a style file yourself. Watch out for '"t"$' instructions in the .bst file and remove them. E.g., in plain.bst replace

    { title "t"$ }


    { title }


Disqus for The Geomblog