Tuesday, July 17, 2007

SODA vs STOC/FOCS

Michael Mitzenmacher has an interesting post up comparing citation counts for papers at SODA/STOC/FOCS 2000. The results are quite stunning (and SODA does not fare well). For a better comparison, we'd need more data of course, but this is a reasonable starting point.

As my (minor) contribution to this endeavour, here are the paper titles from 1997-2006 (last 10 years) for STOC, FOCS, and SODA. There might be errors: this was all done using this script. Someone more proficient in Ruby than I might try to hack this neat script written by Mark Reid for stemming and plotting keywords trends in ICML papers; something I've been hoping to do for STOC/FOCS/SODA papers.

If there are any Google researchers reading this, is there some relatively painless way using the Google API to return citation counts from Google Scholar ? In fact, if people were even willing to do blocks of 10 papers each, we might harness the distributed power of the community to get the citation counts ! If you're so inspired, email me stating which block of 10 (by line number, starting from 1) you're planning to do and for which conference. Don't let Michael's effort go in vain !

There are 681 titles from FOCS, 821 from STOC, and 1076 from SODA.

11 comments:

  1. A community obsessed with rankings, stars, acceptance rates, citation counts, etc, is a community that had lost its way.

    All these voodoo ceremonies will not hide the fact that the theory community had lost its connection with reality. A drift with its own conviction of its own importance, brainwashed with its own newspeak, it stands appalled as nobody else seems to recognize its own importance.

    I have a suggestion - why dont we abandon all these pagan rituals and return to the only important thing - that is - doing good research?

    And if think this is a loser's lament, well, think twice.

    ReplyDelete
  2. I don't think it qualifies as an "obsession" to wonder what is the better conference.

    And what field do you know that does not consider such issues? Biology, physics, and chemistry certainly do, even more so. Maybe math doesn't (I don't know), but math may have the same problems of disconnectedness as TCS, only more so.

    ReplyDelete
  3. Anonymous #1 --

    In my opinion, you're way off-base.

    A community obsessed with rankings, stars, acceptance rates, citation counts, etc, is a community that had lost its way.

    I don't think we're obsessed. However, we have this information available. Trying to figure out what it means and how to best use it seems worthwhile.

    Also, as a practical point, things like citation counts do matter. At the individual level, jobs and promotions depend on them. At the larger level of theory CS as a whole, funding depends on them. It might be nice in theory to live in a world where these things don't matter, but in practice, they do.

    All these voodoo ceremonies will not hide the fact that the theory community had lost its connection with reality.

    I'm a firm proponent of applied theory, but even I think this statement is just wacky. On the applied theory side, just go look at any major networking, database, or security conference to see the impact of theory. On the less applied theory side, I can't deny the intellectual importance of say quantum computing and algorithmic mechanism design (just off the top of my head); their fundamental connection to reality seems unquestionable, and may in fact prove stronger than we can currently imagine.

    I have a suggestion - why dont we abandon all these pagan rituals and return to the only important thing - that is - doing good research?

    I think you're limiting your thinking. Rankings, acceptance rates, citations counts and such are all about the meta-questions of how we judge and promote good research within our community. Surely, somebody's got to be paying attention to the questions of understanding and fostering the environment in which we do research? I understand if that's not your interest, but if anything, I think as a community theoretical CS has not paid enough attention to these issues.

    ReplyDelete
  4. Thanks for pulling the titles Suresh! If you get enough interest, feel free to assign me a block of 20!

    ReplyDelete
  5. Thanks for mentioning my Ruby script. Just a quick note: it doesn't actually do the plotting, it just creates a large comma-separated value file that can be easy imported into Excel or Keynote. The latter is the application I used to create the plots.

    If you can point me to a web page or set of web pages that have a consistent format for conferences and paper titles it shouldn't be much work for me to modify my scripts for your purposes.

    Regarding the earlier "voodoo ceremonies" comment: surely conferences and the material published by them are interesting objects of study in their own right? Asking questions about these objects is, at worst, a fun exercise driven by a sense of curiosity and, at best, may actually shed some light on discussions about trends in our disciplines. Hardly an obsession, I would think.

    ReplyDelete
  6. This page (http://libra.msra.cn/conf_category_1.htm) gives the number of publications and citations for each of the conferences. I am not sure about the accuracy of the information given.

    ReplyDelete
  7. Surely this task is an ideal candidate for setting up on Amazon's Mechanical Turk (or similar)? Paying 2c/citation count, say...

    Graham

    ReplyDelete
  8. Thanks for the link to the libra.msn.cn site. I didn't know about that.

    As it turns out, the DBLP site where my script got all the ICML data from also has a large number of other conferences (including STOC and SODA) in the same format as the ICML conferences so porting the script should be very easy.

    I should have a bit of time in the next couple of weeks to rerun my script on these other conferences. I may also try to set up a site to let people explore stats from DBLP.

    DBLP Conferences Index

    ReplyDelete
  9. You can also get the DBLP in XML format from http://dblp.uni-trier.de/xml/

    You can then simply open the XML file with MS Access, which does a great job in parsing and organizing the results in tables.

    You can then run a simple selection query to get the data that you need and feed the titles of the papers to the Extractor that will return the citations for each paper.

    ReplyDelete

Disqus for The Geomblog