The Geomblog: alenex

Showing posts with label alenex. Show all posts

Sunday, January 07, 2018

Report on double blind reviewing in ALENEX 2018

+Rasmus Pagh and I chaired ALENEX 2018, and we decided to experiment with double blind review for the conference. What follows is a report that we wrote on our experiences doing this. There are some useful notes about logistics, especially in the context of a theoretically-organized conference on experimental algorithms.

ALENEX 2018 Double Blind Review

For ALENEX 2018, we decided to experiment with a double blind review process i.e one in which authors and reviewers were unaware of each others’ identity. While double blind review is now almost standard in most computer science conferences, it is still relatively uncommon in conferences that focus on theoretical computer science and related topics.

The motivation

In the original argument we presented to the ALENEX Steering Committee, we presented the following reasons for why we wanted double blind review:

1. Eliminating bias.

Andrew Tomkins did an experiment for WSDM this year and wrote a report on it: https://arxiv.org/abs/1702.00502

One particular observation:

"Reviewers in the single-blind condition typically bid for 22% fewer papers, and preferentially bid for papers from top institutions. Once papers were allocated to reviewers, single-blind reviewers were significantly more likely than their double-blind counterparts to recommend for acceptance papers from famous authors and top institutions. The estimated odds multipliers are 1.66 for famous authors and 1.61 and 2.10 for top universities and companies respectively, so the result is tangible”

2. Common practice.

Virtually every CS community except theory is doing double blind review, including most of ML (NIPS, ICML), DB (VLDB, SIGMOD), Systems (NSDI), etc. While theory papers have their own idiosyncrasies, we argued that ALENEX is much closer in spirit and paper structure to more experimental venues like the ones listed.

3. Limited burden on authors and reviewers for an experiment

There was no real logistical burden. We were not blocking people from posting on the arXiv, or talking about their work. We’re merely requiring submissions be blinded (which is easy to do). For reviewers also, this is not a problem - typically you merely declare conflicts based on domains and that takes care of the problem of figuring out who’s conflicted with what paper (but more on this later).

4. Prototyping.

While theoryCS conferences in general do not make use of double blind review, ALENEX is a small but core venue where such an experiment might reveal useful insights about the viability of double blind overall. So we don’t have to advocate changes at SODA/STOC/FOCS straight up without first learning how it might work.

5. PC submissions.

We are allowing PC members to submit papers, and this has been done before at ALENEX. In this case double blind review is important to prevent even the appearance of conflict.

The process

Before submission: We provided a submission template for authors that suppressed author names. We also instructed authors on how to refer to prior work or other citations that might leak author identity - in brief, they were asked to treat these as any third-party reference. We also asked authors to declare conflicts with PC members.

After submission/before reviews: We recognized that authors might not be used to submitting articles in double blind mode and so went over each submission after it was submitted and before we opened up bidding to PC members to make sure that the submissions were properly blinded. In a few cases (less than 10/49) we had to ask authors to make modifications.

During review process: The next step was to handle requests for subreviewers. Since PC members could not determine CoIs (conflicts of interest) on their own, all such requests were processed through the PC chairs. A PC member would give us a list of names and we would pick one. (so more private information retrieval than a zero knowledge protocol!)

Issues

A number of issues came up that appear to be unique to the theory conference review process. We document them here along with suggested mitigations.

Managing the CoI process: In theoryCS conferences, subreviewing happens outside the umbrella of the PC. PC members have the power to request any number of subreviewers for papers, and this process happens after the papers are submitted. In contrast, in other venues, subreviewers essentially function as members of the PC - they are invited to be reviewers ahead of time, and are listed when the author declare conflicts of interest. This means that under the process we used, PC members cannot determine for themselves whether a subreviewer has a CoI with a paper, whereas in the alternate process, this is taken care of automatically. One possible mitigation is to ask PC members to list potential reviewers ahead of time and have them registered in the system for authors to declare CoI with. While this might generate a long list of subreviewers for authors to review, this process is customarily handled by a) allowing authors to declare conflicts by affiliation (domain name) and then b) presenting them with a filtered set of reviewers to mark conflicts with. Domain-based filtering is probably the most effective method for handling conflicts based on current or recent affiliation: it allows for reviewers to be added after the fact, and systems like Microsoft’s CMT allow for it.
The difficulty of hiding identity based on prior work: In experimental work, a group will often write a series of papers that builds on infrastructure that they’ve developed. The relative difficulty of building such infrastructure also means that groups become “known” for working in certain areas. This made it a little difficult for authors to honestly blind their identity, because their papers clearly built on software that wasn’t publicly available and therefore had to be part of their group. The solution of blinding that reference itself does not always work because then it is hard to evaluate the quality of the work described in the paper. We note that this problem occurs in other, more experimental parts of CS. The typical solution is to continue with the blinding effort in any case, and make an effort to release code publicly, so anyone could have used the code being built on. In our view, this is a less significant problem than the first point. To this end, here are some guidelines from CHI and CSCW (both ACM conferences).
Is the paper provably double blind? A common complaint about double blind review is that it is not perfect -- that it’s possible with some effort to determine some subset of the authors with some degree of certainty. The response that we gave when asked, and that is usually given, is that the goal of double blind review is not to provide a zero knowledge protocol, but to prevent the immediate implicit bias that comes from directly seeing the author names prior to reading the paper. We note that this is a common complaint from people in the theory community: however our experience with double blind review in other venues has been that after a while, one gets accustomed to reviewing papers without attempting to first determine the authors and the process works as intended.

Feedback

We also solicited feedback from the program committee after the review process was complete. We asked them three questions:

What did you like (and what worked) about the double blind review process instituted this year for ALENEX?
What in your opinion did NOT work?
Is there any other feedback you'd like to provide about the double blind review process?

The responses we got for question 1 were uniformly of the form of “I’m glad that we did it, and felt that papers got a fairer shake than they would have otherwise”. One PC member even said unequivocally that the double blind review process made it more likely that they would submit to ALENEX in the future.

On question 2, PC members brought up the issues we raised above, recommending that we make it clearer to authors how they need to blind their submissions and also mentioning the difficulty of assigning subreviewers.

On question 3, the feedback was uniformly positive in favor of continuing with double blind review, inspite of the issues raised.

Summary

On balance, our experience with double blind review was positive. While there were logistical issues, many of these can be resolved by the methods we describe above. Further, there is now a wealth of knowledge accumulated in other areas of computer science that we can learn from. SIGPLAN has built a comprehensive FAQ around this issue that answers many of the questions that arise.

We recommend continuing with double blind review for at least the next two years at ALENEX, firstly because this brings us in line with common practice in most other parts of computer science, and secondly because many of the logistical issues people face with DB review will go away with habituation. At that point, the potential inconvenience of the process reduces and will be outweighed by the benefits in transparency and lack of bias.

Saturday, January 19, 2008

ALENEX/ANALCO 2008: Invited talks

I'm in San Francisco for the SIAM-ACM Symposium on Discrete Algorithms (SODA), and as is tradition, Saturday is the day for ALENEX (the workshop on experimental algorithms) and ANALCO (the workshop on analytic combinatorics).

ALENEX:

The ALENEX invited talk today was on routing problems. Rolf Mohring from TU Berlin presented a pair of case studies of routing problems in the "wild".

The following video explains the first case study:

Actually, the problem he was looking at was a tad simpler: you have containers coming in at Hamburg port, and you have automated robot vehicles that ferry the containers from the loading dock to their temporary storage locations. The problem is to create routes for the vehicles (on the fly) so that they reach their destination at a reasonable time without colliding with each other.

The actual solution involved successive shortest paths on a "time-expanded" flow graph, which in their case was a relatively simple grid graph. The time expansion refers to the fact that an "edge" is occupied between two time instants, and of course travel is blocked on an edge when it's occupied.

The second application, which in my mind was more impressive, was an instance of large-scale optimization to redesign the Berlin subway system. Perhaps the most impressive aspect of this work was that it's actually being used ! The latest incarnation of the Berlin subway schedule uses the schedule designed by their scheme, and the speaker showed a neat 4 minute Numb3rs-style video at the end targeting the layman that explained their work.

This is a tremendous achievement, both technically and politically. And I can't but wonder whether it's something about Germany itself, the home of classic engineering, that allows optimizations designed by researchers in a university to actually be implemented on a subway system. I'm probably wrong here, in that there must be numerous examples of success stories of OR in the US as well, but it does make me wonder.

Analysis: As examples of "applying algorithms in the real-world", the talk was great. As an example of the intricacies of doing "applied algorithms", it was less impressive: most of the "compromises with the real-world" were the same kinds of modifications to theoretical problems that one might expect to see in papers at ALENEX, and there was little takeaway to be had or lessons to be learned: I'd have loved to hear about how exactly they were able to convince the Berlin subway authority to implement their schemes.

ANALCO:
This was either planned, or a happy coincidence, but the ANALCO invited speaker was Don Knuth, just a few days after his 70th birthday. The title of his talk was 'Puzzling problems', and as befits a Knuth talk, it was a random collection of cute problems in combinatorics. He had promised 5 such, with an exponentially increasing amount of time spent on each successive problem, but he only got through three of them. David's already described them in more detail here; all I can say is that it was a typical Knuthian talk :)

And yes, there were talks ! One talk that I'll draw attention to is on the work by Coleman and Wirth on the feedback arc set problem on tournaments. Given a directed graph in which for each pair of vertices, there's a directed edge one way or the other (i.e a tournament graph), can we delete the fewest number of edges so that the resulting graph is a DAG ? One important application of this problem is for aggregating rankings. Think of all the non-bribed figure skating judges at a competition, each creating a ranking of the performers. How do we aggregate these rankings into a "consensus" ranking ?

What is interesting about this problem is that it relates closely to sorting algorithms. In a sense, you can think of the rank aggregation problem as attempting to "sort" the participants into a total order, much like we have with a sorting problem. An earlier paper by Ailon, Charikar and Newman showed that a quick-sort like heuristic actually achieves a 3-approximation to the optimal solution to the feedback arc set problem, and other local heuristics for feedback arc set can also be related to sorting algorithms (although the analog of insertion sort performs quite badly).

Mergesort has not been translated to this framework: it's not obvious (but not hopeless) to see how to reformulate mergesort as an algorithm for feedback arc set, and indeed one open question from the earlier Ailon et al paper is an analysis of the behaviour of a mergesort-like method.

That's all for today, folks ! Tomorrow the action starts in earnest...

After notes:

The hotel wireless works in the lecture rooms (!). It's also free. this must be a first...
The location is great. The hotel is on Van Ness and there are tons of restaurants (and Peets) within a block, as well as a drugstore and a Whole Foods.
The ALENEX business meeting was the shortest business meeting in recorded history: it took all of 5 minutes. I commend the organizers for their good taste, and can only hope that SODA emulates them.

Saturday, January 06, 2007

SODA 2007: Day 0

Getting into New Orleans at 1 am because of "mechanical trouble" meant that I haven't been at my best so far. But I've already heard one amazing talk today.

Luc Devroye gave the ANALCO plenary lecture on "Weighted Heights of Random Trees", based on work with his students Erin McLeish and Nicolas Broutin. After having sat through many talks with titles like this, I generally approach them with great caution and with a clear escape route. But...

This was an amazing exposition of a topic that could have become dry and terse, and essentially incomprehensible, within a slide or two. He had jokes, (that were funny), a global plan for the material, enough technical material that I went away feeling like I'd learnt something, and intuition galore. And the work itself is very beautiful.

So what was it all about ? The problem is really quite simple to state. Suppose I give you a (random) weighted binary tree, where nodes attach to parents randomly, and edges may have weights chosen randomly. What is the maximum height of such a tree ?

The standard application for such a tool is in analyzing binary search trees. The height of a such a tree controls the running time of an algorithm that needs to use it. And there's now a vast literature analyzing both the asymptotics of the height distribution (basically it's sharply concentrated around 2 log n) and the specific constants (the maximum height of a random binary search tree is roughly 4.3 log n, and the minimum is around 0.37 log n).

The "master goal" that Devroye described in his talk was this: Suppose I have a general way of attaching nodes to parents (that leads to a general distribution on subtree sizes), and a general way of attaching weights to edges (rather than being deterministically 1 for binary search trees). Such a general model captures the analysis of tries (trees on strings that are very important in text searching), geometric search structures like k-d trees, and even restricted preferential attachment models in social network analysis (Think of the edges as hyperlinks, and the height of the tree as the diameter of a web tree).

Is there a generic theorem that can be applied to all of these different situations, so that you can plug in a set of distributions that describes your process, and out pops a bound on the height of your tree ? It turns out that you can (with some technical conditions). The method uses two-dimensional large-deviation theory: can you estimate the probability of a sum of random variables being bounded by some function, while at the same time ensuring that some other sum of random variables (that might depend slightly on the first) is also bounded ?

An example of a 1D large deviation result is of course a Chernoff bound. Devroye showed that a 2D large deviation bound for the height of such trees can be expressed in a similar form using the so-called Cramér exponent, something that will probably not be surprising to experts in large deviation theory. After that, the analysis for any tree process becomes a whole lot easier. You have to analyze the corresponding Cramér function for your distributions, and a bound (with a constant; no big-O nonsense here!) pops out.

He also talked about a neat extension of this method to analyzing the "skinnyness" of k-d tree decompositions, showing that for a kind of "relaxed" k-d tree construction, the skinniest cell can be extremely skinny (having a super-linear aspect ratio). It's the kind of result that I imagine would be very difficult to prove without such a useful general theorem.

The Geomblog

Pages