Monday, September 10, 2018

Hello World: A short review

A short review of Hannah Fry's new book 'Hello World'

Starting wth Cathy O'Neill's Weapons of Math Destruction, there's been an onslaught of books sounding the alarm about the use of algorithms in daily life. My Amazon list that collects these together is even called 'Woke CS'. These are all excellent books, calling out the racial, gender, and class inequalities that algorithmic decision-making can and does exacerbate and the role of Silicon Valley in perpetuating these biases.

Hannah Fry's new book "Hello World" is not in this category. Not exactly, anyway. Her take is informative as well as cautionary. Her book is as much an explainer of how algorithms get used in contexts ranging from justice, to medicine, to art, as much as it is a reflection on what this algorithmically enabled world will look like from a human perspective.

And in that sense it's a far more optimistic take on our current moment than I've read in a long time. In a way it's a relief: I've been mired for so long in the trenches of bias and discrimination, looking at the depressing and horrific ways in which algorithms are used as tools of oppression, that it can be hard to remember that I'm a computer scientist for a reason: I actually do marvel at and love the idea of computation as a metaphor, as a tool, and ultimately as a way to (dare I say it) do good in the world.

The book is structured around concepts (Power, data) and domains (justice, medicine, cars, crime and art). After an initial explainer on how algorithms function (and also how models are trained using machine learning), and how data is used to fuel these algorithms, she very quickly gets into specific case studies of both the good and the bad in algorithmically mediated decision making. Many of the case studies are from the UK and were unknown to me before this book. I quite liked that: it's easy to focus solely on examples in the US, but the uses (and misuse) of algorithms is global (Vidushi Mardia's article on AI policy in India has similar locally-sourced examples).

If you're a layman looking to get a general sense of how algorithms tend to show up in decision making systems, how they hold out hope for a better way of solving problems and where they might go wrong, this is a great book. It uses a minimum of jargon, while still beiing willing to wade into the muck of false positives and false negatives in a very nice illustrative example in the section on recidivism prediction and COMPAS, and also attempting to welcome the reader into the "Church of Bayes".

If you're a researcher in algorithmic fairness, like me, you start seeing the deeper references as well. Dr. Fry alludes to many of the larger governance issues around algorithmic decision making that we're wrestling with now in the FAT* community. Are there better ways to integrate automated and human decision-making that takes advantage of what we are good at? What happens when the systems we build start to change the world around them? Who gets to decide (and how) what level of error in a system is tolerable, and who might be affected by it? As a researcher, I wish she had called out these issues a little more, and there are places where issues she raises in the book have actually been addressed (and in some cases, answered) by researchers.

While the book covers a number of different areas where algorithms might be taking hold, it takes very different perspectives on the appropriateness of algorithmic decision-making in these domains. Dr. Fry is very clear (and rightly so) that criminal justice is one place where we need very strong checks and balances before we can countenance the use of any kind of algorithmic decision-making. But I feel that maybe she's letting off the medical profession a little easy in the chapter on medicine. While I agree that biology is complex enough that ML-assistance might lead us to amazing new discoveries, I think some caution is needed, especially since there's ample evidence that the benefits of AI in medicine might only accrue to the (mostly white) populations that dominate the clinical trials.

Similarly, the discussion of creativity in art and what it means for an algorithm to be creative is fascinating. The argument Dr. Fry arrives at is that art is fundamentally human in how it exists in transmission -- from artist to audience -- and that art cannot be arrived at "by accident" via data science. It's a bold claim, and of a kind with many claims about the essential humanness of certain activities that have been pulverized by advances in AI. Notwithstanding, I find it very appealing to posit that art is essentially a human endeavour by definition.

But why not extend the same courtesy to the understanding of human behavior or biology? Algorithms in criminal justice are predicated on the belief that we can predict human behavior and how our interventions might change it. We expect that algorithms can pierce the mysterious veil of biology, revealing secrets about how our body works. And yet the book argues not that these systems are fundamentally flawed, but that precisely because of their effectiveness they need governance. I for one am a lot more skeptical about the basic premise that algorithms can predict behavior to any useful degree beyond the aggregate (and perhaps Hari Seldon might agree with me).

Separately, I found it not a little ironic, in a time when Facebook is constantly being yanked before the US Congress, Cambridge Analytica might have swayed US elections and Brexit votes, and Youtube is a dumpster fire of extreme recommendations, that I'd read a line like "Similarity works perfectly well for recommendation engines" in the context of computer generated art.

The book arrives at a conclusion that I feel is JUST RIGHT. To wit, algorithms are not authorities, and we should be skeptical of how they work. And even when they might work, the issues of governance around them are formidable. But we should not run away from the potential of algorithms to truly help us, and we should be trying to frame the problem away from the binary of "algorithms good, humans bad" or "humans good, algorithms bad" and towards a deeper investigation of how human and machine can work together. I cannot read
Imagine that, rather than exlcusively focusing our attention on designing our algorithm to adhere to some impossible standard of perfect fairness, we instead designed them to facilitate redress when they inevitable erred; that we put as much time and effort into ensuring that automatic systems were as easy to challenge as they are to implement.
without wanting to stand up and shout "HUZZAH!!!". (To be honest, I could quote the entire conclusions chapter here and I'd still be shouting "HUZZAH").

It's a good book. Go out and buy it - you won't regret it.

This review refers to an advance copy of the book, not the released hardcover. The advance copy had a glitch where a fragment of latex math remained uncompiled. This only made me happier to read it.

Thursday, August 30, 2018

Clustering: a draft of a part!

For the last X years (X being a confidential and never to be revealed number, but large enough that AI was more than just deep learning at the time), Sergei Vassilvitskii and  I have been toiling away at a book on clustering.

The book isn't ready yet, but we do have a draft of part I (the core of the book). Check it out, and send any comments you might have to clusteringbook@gmail.com.

Wednesday, February 14, 2018

A #metoo testimonial that hits close to home...

This is a guest post by a colleague in the TCS community, a person I know. If you read other TCS blogs you might come across this there. This is by design. Please do read it. 

Every #MeToo story over the last several months has made me pause. My heart races and my concentration fails. The fact that the stories have largely focused on the workplace adds to my difficulty.

Do I speak out too?

I have shared a few stories with colleagues about things that have happened to me in school and at work. But these stories have been somewhat lighthearted events that have been easy to share without outing the perpetrators.

For example, I have told a story about a university employee telling me, in so many words, that I should be barefoot and pregnant and not in the office. What I didn't share is that the same employee, later that year -- despite the fact that our common boss knew about this story because I did indeed report it -- was awarded a best employee award. How do you think that made me feel? Like my experience didn't matter and that such comments are condoned by our department. Why didn't I share that information widely? Because I was worried that folks would then be able to figure out who the culprit was. And isn't that even worse? Shouldn't it be the sexist who is worried and not the woman who, yet again, is made to feel like she doesn't belong?

---

Let me tangent a bit. For years I have not flown. Ostensibly I stopped flying because of the contribution to the climate crisis. When I travel, I go by train. It takes longer, but has been surprisingly pleasant. And when travel takes 3-4 times as long, you don't do it as often, further reducing your carbon footprint. Of course, that means that I don't go to conferences unless they are nearby.

But when I really think about it, is this really the reason I stopped going to conferences? A conference I would normally go to was held nearby a few years ago and I didn't go. Sure, I suffered a grievous injury two weeks before, but I hadn't even registered. I had planned to not go long before that injury.

So, really, why do I no longer attend conferences? Partly I don't feel that I need to anymore, now that I have tenure. When I stopped attending conferences, I was able to "coast into" tenure. Letter writers would remember me. I essentially stopped going to conferences and workshops as soon as I possibly could.

---

Back to the beginning, or close to.

I was nervous at the first conference I attended as a graduate student. One of the reasons I was nervous was that I was athletic at the time and planned on daily runs while I was attending -- I was worried that it might be viewed as a waste of time. My advisor, who also went to the conference, found out about my athleticism and suggested we run together. This was a relief to me. That is, until we were running and he started talking about his lackluster sex life with his wife. I responded by picking up the pace and feigning an illness on the remaining days. On the last day of the conference we were out for dinner with a large group of people and dinner went late into the night. I excused myself, as I had a 4AM bus to catch. My advisor walked me out of the restaurant and awkwardly said something about wanting me to stay and that we should talk. I stuck to leaving, knowing that I needed some sleep before the long trip home the next day. He said we should talk when we were back in the office. Honestly, at the time I thought he was going to complain about my talk or my professional performance in some way. I worried about it all through the weekend until we met next. I brought it up at the end of our meeting, asking what he wanted to talk about, naively expecting professional criticism. When he said I must surely know, in a certain voice, I knew he wasn't talking about work. I feigned ignorance, and he eventually brushed it off and said not to worry. In the coming months, he would cancel meetings and otherwise make himself unavailable. After a half year I realized I wouldn't be able to be successful without having a supportive advisor and, despite first planning to quit grad school, found a new advisor and moved on. That former advisor barely made eye contact with me for the remainder of my time in graduate school.

Fast forward many years. I was at a small workshop as a postdoc. A senior and highly respected researcher invited me to dinner. I was excited at the opportunity to make a stronger connection that would hopefully lead to a collaboration. However, at dinner he made it very clear that this was not professional by reaching across the table and stroking my hands repeatedly. I don't even recall how I handled it. Perhaps I should have expected it -- a grad school friend of mine had a similar, and probably worse, interaction with this same researcher. Shortly after I got to my room at the hotel, my hotel room phone rang. It was him. He wanted to continue our conversation. I did not.

Perhaps a year later, still as a postdoc, I was at a party and a colleague from another university was there too. At the end of the party, we were alone. We flirted, mutually. Flirting led to kissing, kissing led to him picking me up in a way that asserted how much stronger he is than me, which led to my utter discomfort, which led to me saying no, stop, repeatedly. Which he didn't listen to. Which led to a calculation in my head. I could either resist and risk physical injury or I could submit. I chose to submit, without consent.

For the record, that is called rape.

For a long while, I suppressed it. I pretended in my own head that it didn't happen that way, that it was consensual. I even tried to continue working with him -- always in public places, mind you. The wall in my mind gradually broke down over the years until several years later, we were at the same workshop where the doors of the rooms didn't have locks. You could lock them from the inside, but not the outside. I remember worrying that he would be lurking in my room and took to making sure I knew where he was before I ventured back to sleep.

---

So why would I continue to go to workshops and conferences when that is the environment I know I will face? Even if I felt safe, if 95% of the attendees are men, how many look at me as a colleague and how many look at me as a potential score? When I was going up for tenure, I thought long and hard about listing the senior-and-highly-respected researcher on a do-not-ask-for-a-letter list. But where would it stop? Do I include all the people who hit on me? All the people who stared at my breasts or commented on my body? All the people who I had been given clear signals that they didn't see me as a colleague and equal member of the research community, but as a woman meant to be looked at, hit on, touched inappropriately.

Should I have quit grad school when I had the chance? We all know it isn't any better in industry. Should I have pursued another discipline? No discipline, it seems, is immune to sexualization of women. But I think the situation is uniquely horrible in fields where there are so few women. At conferences in theoretical computer science, 5-10% of the attendees are women, as a generous estimate. The numbers aren't in women's favor. The chances that you will get hit on, harassed, assaulted are much higher. There is a greater probability that you will be on your own in a group of men. You can't escape working with men. It is next to impossible to build a career when you start striking men off your list of collaborators in such a field. That is not to say there aren't wonderful men to work with. There are many men in our field that I have worked with and turned to for advice and spent long hours with and never once had detected so much as a creepy vibe. But you can't escape having to deal with the many others who aren't good. When you meet someone at a conference, and they invite you for a drink or dinner to continue the conversation, how do you know that they actually want to talk about work, or at least treat you as they would any colleague? How do you make that decision?

I hung on until I no longer needed to go to conferences and workshops to advance my career to the stability of tenure. But surely my career going forward will suffer. My decision is also hard on my students, who go to conferences on their own without someone to introduce them around. It is hard on my students who can't, for visa difficulties, go to the international conferences that I am also unwilling to go to, so we roll the dice on the few domestic conferences they can go to.

And now I am switching fields. Completely. I went to two conferences last summer. The first, I brought the protective shield of my child and partner. The second, I basically showed up for my talk and nothing else. I wasn't interested in schmoozing. It'll be difficult, for sure, to establish myself in a new field without fully participating in the expected ways.

Is all this why I am switching fields? Not entirely, I'm sure, but it must have played a big role. If I enjoyed conferences as much as everyone else seems to, and didn't feel shy about starting new collaborations, I might be too engrossed to consider reasons to leave. And certainly, the directions I am pursuing are lending themselves to a much greater chance of working with women.

Why am I speaking out now? The #MeToo moment is forcing me to think about it, of course. But I have been thinking about this for years. I hope it will be a relief to get it off my chest. I have been "getting on with it" for long enough. 1 in 5 women will deal with rape in their lifetime. 1 in 5! You would think that I would hear about this from friends. But I hadn't told anyone about my rape. And almost no one has told me about theirs. I think it would help, in the very least therapeutically, to talk about it.

---

I thought about publishing this somewhere, anonymously, as a "woman in STEM". I considered publishing it non-anonymously, but was shy to deal with the trolls. I didn't want to deal with what many women who speak out about their experiences face: have their life be scrutinized, hear excuses being made on behalf of the predators, generally have their experiences denied. But I think by posting it here, many people in theoretical computer science will read it, rather than a few from the choir. I am hoping that you will talk to each other about it. That you will start thinking of ways to make our community better for others. In all my years of going to conferences and workshops, of all the inappropriate comments and behaviors that others have stood around and witnessed, never once did any of the good ones call that behavior out or intervene. Maybe they did so in private, but I think it needs to be made public. Even the good ones can do better.

What can you do?

While you couldn't have protected me from being raped, you can think about the situations we are expected to be in for our careers -- at workshops in remote locations, where we're expected to drink and be merry after hours. I hope not many of us have been raped by a colleague, but even if you haven't, it doesn't take many instances of being hit on or touched inappropriately to begin to feel unsafe.

I remember being at a conference and, standing in a small group, an attendee interrupted a conversation I was having to tell me that my haircut wasn't good, that I shouldn't have cut my hair short. I tried to ignore it, and continue my conversation, but he kept going on about it. Saying how I would never attract a man with that haircut. No one said anything. Speak up. Just say -- shut up! -- that's not appropriate. Don't leave it up to the people who have to deal with this day in day out to deal with it on their own. Create a culture where we treat each other with respect and don't silently tolerate objectification and worse.

I regret never reporting my first graduate advisor's behavior, but is it my fault? I had no idea who to report it to. I had no idea either in undergrad who I would report such behavior to. Where I am now is the first place I've been that has had clear channels for reporting sexual harassment and other damaging situations. The channels are not without problems, but I think the university is continuing to improve them. Perhaps we should have a way of reporting incidents in our field. I have a hard time believing, given that myself and a grad school friend had similar experiences with the same senior-and-highly-respected researcher, that others in the field don't know that he is a creep. It is up to you to protect the vulnerable of our community from creeps and predators. Keep an eye on them. Talk to them. Don't enable them. As a last resort, shame and isolate them.

Monday, January 22, 2018

Double blind review: continuing the discussion

My first two posts on double blind review triggered good discussion by Michael Mitzenmacher and Boaz Barak (see the comments on these posts for more).  I thought I'd try to synthesize what I took away from the posts and how my own thinking has developed.

First up, I think it's gratifying to see that the the basic premise: "single blind review has the potential for bias, especially with respect to institutional status, gender and other signifiers of in/out groups" is granted at this point. There was a time in the not-so-distant past that I wouldn't be able to even establish this baseline in conversations that I'd have.

The argument therefore has moved to one of tradeoffs: does the installation of DB review introduce other kinds of harm while mitigating harms due to bias?

Here are some of the main arguments that have come up:

Author identity carries valuable signal to evaluate the work. 

This argument manifested itself in comments (and I've heard it made in the past). One specific version of it that James Lee articulates is that all reviewing happens in a resource-limited setting (the resource here being time) and so signals like author identity, while not necessary to evaluate the correctness of a proof, provide a prior that can help focus one's attention. 

My instinctive reaction to this is "you've just defined bias". But on reflection I think James (and others people who've said this) are pointing out that abandoning author identity is not for free. I think that's a fair point to make. But I'd be hard pressed to see why this increase in effort negates the fairness benefits from double blind review (and I'm in general a little uncomfortable with this purely utilitarian calculus when it comes to bias).

As a side note, I think that focusing on paper correctness is a mistake. As Boaz points out, this is not the main issue with most decisions on papers. What matters much more is "interestingness", which is very subjective and much more easily bound up with prior reactions to author identity. 

Some reviewers may be aware of author identity and others might not. This inconsistency could be a source of error in reviewing.

Boaz makes this point in his argument against DB review. It's an interesting argument, but I think it also falls into the trap of absolutism: i.e imperfections in this process will cause catastrophic failure. This point was made far more eloquently in a comment on a blog post about ACL's double blind policy (emphasis mine). 

I think this kind of all-or-nothing position fails to consider one of the advantages of blind review. Blind review is not only about preventing positive bias when you see a paper from an elite university, it’s also about the opposite: preventing negative bias when you see a paper from someone totally unknown. Being a PhD student from a small group in a little known university, the first time I submitted a paper to an ACL conference I felt quite reassured by knowing that the reviewers wouldn’t know who I was. 
In other words, under an arXiv-permissive policy like the current one, authors still have the *right* to be reviewed blindly, even if it’s no longer an obligation because they can make their identity known indirectly via arXiv+Twitter and the like. I think that right is important. So the dilemma is not a matter of “either we totally forbid dissemination of the papers before acceptance in order to have pure blind review (by the way, 100% pure blind review doesn’t exist anyway because one often has a hint of whom the authors may be, and this is true especially of well-known authors) or we throw the baby out with the bathwater and dispense with blind review altogether”. I think blind review should be preserved at least as a right for the author (as it is know), and the question is whether it should also be an obligation or not.

Prepublication on the arXiv is a desirable goal to foster open access and the speedy dissemination of information. Double blind review is irrevocably in conflict with non-anonyous pre-print dissemination.

This is perhaps the most compelling challenge to implementing double blind review. The arXiv as currently constructed is not designed to handle (for e.g) anonymous submissions that are progressively blinded. The post that the comment above came from has an extensive discussion of this point, and rather than try to rehash it all here, I'd recommend that you read the post and the comments. 

But the comments also question the premise head on: specifically, "does it really slow things down" and "so what?". Interestingly, Hal Daumé made an attempt to answer the "really?" question. He looked at arXiv uploads in 2014-2015 and correlated them with NIPS papers. The question he was trying to ask was: is there evidence that more papers uploaded to the arXiv before submission to NIPS in the interest of getting feedback from the community? His conclusion was that there was little evidence to support the idea that the arXiv had radically changed the normal submit-revise cycle of conferences. I'd actually think that theoryCS might be a little better in this regard, but I'd also be dubious of such claims without seeing data.

In the comments, even the question of "so what?" is addressed. And again this boils down to tradeoffs. While I'm not advocating that we ban people from putting their work on the arXiv, ACL has done precisely this, by asserting that the relatively short delay between submission and decision is worth it to ensure the ability to have double blind review.

Summary

I'm glad we're continuing to have this discussion, and talking about the details of implementation is important. Nothing I've heard has convinced me that the logistical hurdles associated with double blind review are insurmountable or even more than inconveniences that arise out of habit, but I think there are ways in which we can fine tune the process to make sense for the theory community. 

Tuesday, January 09, 2018

Double blind review at theory conferences: More thoughts.

I've had a number of discussions with people both before and after the report that Rasmus and I wrote on the double-blind experiment at ALENEX. And I think it's helpful to lay out some of my thoughts on both the purpose of double blind review as I understand it, and the logistical challenges of implementing it.

What is the purpose of double blind review? 

The goal is to mitigate the effects of the unconscious, implicit biases that we all possess and that influence our decision making in imperceptible ways. It's not a perfect solution to the problem. But there is now a large body of evidence suggesting that

  • All people are susceptible to implicit biases, whether it be regarding institutional status, individual status, or demographic stereotyping. And what's worse that we are incredibly bad at assessing or detecting our own biases. At this point, a claim that a community is not susceptible to bias is the one that needs evidence. 
  • Double blind review can mitigate this effect. Probably the most striking example of this is the case of orchestra auditions, where requiring performers to play behind a screen dramatically increased the number of women in orchestras. 
What is NOT the purpose of double blind review? 

Double blind review is not a way to prevent anyone from ever figuring out the author identity. So objections to blinding based on scenarios where author identity is partially or wholly revealed are not relevant. Remember, the goal is to eliminate the initial biases that come from the first impressions. 

What makes DB review hard to implement at theory venues? 

Theory conferences do two things that are different from other communities. We
  • require that PC members do NOT submit papers
  • allow PC members to initiate queries for external subreviewers. 
These two issues are connected. 
  1. If you don't allow PC members to submit papers, you need a small PC. 
  2. If you have a small PC, each PC member is responsible for many papers. 
  3. If each PC member is responsible for many papers, they need to outsource the effort to be able to get the work done. 
As we mentioned earlier, it's not possible to have PC members initiate review requests if they don't know who might be in conflict with a paper whose authors are invisible. So what do we do? 

There's actually a reasonably straightforward answer to this. 


  • We construct the PC as usual with the usual restrictions.
  • We construct a list of “reviewers”. For example, "anyone with a SODA/STOC/FOCs paper in the last 5 years” or something like that. Ideally we will solicit nominations from the PC for this purpose.
  • We invite this list of people to be reviewers for SODA, and do this BEFORE paper submission
  • authors will declare conflicts with reviewers and domains (and reviewers can also declare conflicts with domains and authors) 
  • at bidding time, the reviewers will be invited to bid on (blinded) papers. The system will automatically assign people. 
  • PC members will also be in charge of papers as before, and it’s their job to manage the “reviewers” or even supply their own reviews as needed. 
Any remaining requests for truly external sub reviewing will be handled by the PC chairs. I expect this number will be a lot smaller.

Of course all of this is pretty standard at venues that implement double blind review. 

But what if a sub-area is so small that all the potential reviewers are conflicted

well if that's the case, then it's a problem we face right now. And DB review doesn't really affect it. 

What about if a paper is on the arXiv? 

We ask authors and reviewers to adhere to double blind review policies in good faith. Reviewers are not expected to go hunting for the author names, and authors are expected to not draw attention to information that could lead to a reveal. Like with any system, we trust people to do the right thing, and that generally works. 

But labeling CoI for so many people is overwhelming.

It does take a little time, but less time than one expects. Practically, many CoIs are handled by institutional domain matching, and most of the rest are handled by explicit listing of collaborators and looking for them in a list. Most reviewing systems allow for this to be automated. 

But how am I supposed to know if the proof is correct if I don't know who the authors are. 

Most theory conferences are now comfortable with asking for full proofs. And if the authors don't provide full proofs, and I need to know the authors to determine if the result is believable, isn't that the very definition of bias? 

And finally, from the business meeting....

Cliff Stein did an excellent job running the discussion on this topic, and I want to thank him for facilitating what could have been, but wasn't, a very fraught discussion. He's treading carefully, but forward, and that's great. I was also quite happy to see that in the straw poll, there was significant willingness for trying double blind review (more than the ones opposed). There were still way more abstentions, so I think the community is still thinking through what this might mean.


Sunday, January 07, 2018

Report on double blind reviewing in ALENEX 2018

+Rasmus Pagh and I chaired ALENEX 2018, and we decided to experiment with double blind review  for the conference. What follows is a report that we wrote on our experiences doing this. There are some useful notes about logistics, especially in the context of a theoretically-organized conference on experimental algorithms.

ALENEX 2018 Double Blind Review

For ALENEX 2018, we decided to experiment with a double blind review process i.e one in which authors and reviewers were unaware of each others’ identity. While double blind review is now almost standard in most computer science conferences, it is still relatively uncommon in conferences that focus on theoretical computer science and related topics.

The motivation

In the original argument we presented to the ALENEX Steering Committee, we presented the following reasons for why we wanted double blind review:
1. Eliminating bias.
Andrew Tomkins did an experiment for WSDM this year and wrote a report on it: https://arxiv.org/abs/1702.00502

One particular observation:

"Reviewers in the single-blind condition typically bid for 22% fewer papers, and preferentially bid for papers from top institutions. Once papers were allocated to reviewers, single-blind reviewers were significantly more likely than their double-blind counterparts to recommend for acceptance papers from famous authors and top institutions. The estimated odds multipliers are 1.66 for famous authors and 1.61 and 2.10 for top universities and companies respectively, so the result is tangible”

2. Common practice.

Virtually every CS community except theory is doing double blind review, including most of ML (NIPS, ICML), DB (VLDB, SIGMOD), Systems (NSDI), etc. While theory papers have their own idiosyncrasies, we argued that ALENEX is much closer in spirit and paper structure to more experimental venues like the ones listed.

3. Limited burden on authors and reviewers for an experiment

There was no real logistical burden. We were not blocking people from posting on the arXiv, or talking about their work. We’re merely requiring submissions be blinded (which is easy to do). For reviewers also, this is not a problem - typically you merely declare conflicts based on domains and that takes care of the problem of figuring out who’s conflicted with what paper (but more on this later).

4. Prototyping.

While theoryCS conferences in general do not make use of double blind review, ALENEX is a small but core venue where such an experiment might reveal useful insights about the viability of double blind overall. So we don’t have to advocate changes at SODA/STOC/FOCS straight up without first learning how it might work.

5. PC submissions.

We are allowing PC members to submit papers, and this has been done before at ALENEX. In this case double blind review is important to prevent even the appearance of conflict.

The process

Before submission: We provided a submission template for authors that suppressed author names. We also instructed authors on how to refer to prior work or other citations that might leak author identity - in brief, they were asked to treat these as any third-party reference. We also asked authors to declare conflicts with PC members.
After submission/before reviews: We recognized that authors might not be used to submitting articles in double blind mode and so went over each submission after it was submitted and before we opened up bidding to PC members to make sure that the submissions were properly blinded. In a few cases (less than 10/49) we had to ask authors to make modifications.
During review process: The next step was to handle requests for subreviewers. Since PC members could not determine CoIs (conflicts of interest) on their own, all such requests were processed through the PC chairs. A PC member would give us a list of names and we would pick one. (so more private information retrieval than a zero knowledge protocol!)

Issues

A number of issues came up that appear to be unique to the theory conference review process. We document them here along with suggested mitigations.
  1. Managing the CoI process: In theoryCS conferences, subreviewing happens outside the umbrella of the PC. PC members have the power to request any number of subreviewers for papers, and this process happens after the papers are submitted. In contrast, in other venues, subreviewers essentially function as members of the PC - they are invited to be reviewers ahead of time, and are listed when the author declare conflicts of interest. This means that under the process we used, PC members cannot determine for themselves whether a subreviewer has a CoI with a paper, whereas in the alternate process, this is taken care of automatically. One possible mitigation is to ask PC members to list potential reviewers ahead of time and have them registered in the system for authors to declare CoI with. While this might generate a long list of subreviewers for authors to review, this process is customarily handled by a) allowing authors to declare conflicts by affiliation (domain name) and then b) presenting them with a filtered set of reviewers to mark conflicts with. Domain-based filtering is probably the most effective method for handling conflicts based on current or recent affiliation: it allows for reviewers to be added after the fact, and systems like Microsoft’s CMT allow for it.
  2. The difficulty of hiding identity based on prior work: In experimental work, a group will often write a series of papers that builds on infrastructure that they’ve developed. The relative difficulty of building such infrastructure also means that groups become “known” for working in certain areas. This made it a little difficult for authors to honestly blind their identity, because their papers clearly built on software that wasn’t publicly available and therefore had to be part of their group. The solution of blinding that reference itself does not always work because then it is hard to evaluate the quality of the work described in the paper. We note that this problem occurs in other, more experimental parts of CS. The typical solution is to continue with the blinding effort in any case, and make an effort to release code publicly, so anyone could have used the code being built on. In our view, this is a less significant problem than the first point. To this end, here are some guidelines from CHI and CSCW (both ACM conferences).
  3. Is the paper provably double blind? A common complaint about double blind review is that it is not perfect -- that it’s possible with some effort to determine some subset of the authors with some degree of certainty. The response that we gave when asked, and that is usually given, is that the goal of double blind review is not to provide a zero knowledge protocol, but to prevent the immediate implicit bias that comes from directly seeing the author names prior to reading the paper. We note that this is a common complaint from people in the theory community: however our experience with double blind review in other venues has been that after a while, one gets accustomed to reviewing papers without attempting to first determine the authors and the process works as intended.

Feedback

We also solicited feedback from the program committee after the review process was complete. We asked them three questions:
  1. What did you like (and what worked) about the double blind review process instituted this year for ALENEX?
  2. What in your opinion did NOT work?
  3. Is there any other feedback you'd like to provide about the double blind review process?
The responses we got for question 1 were uniformly of the form of “I’m glad that we did it, and felt that papers got a fairer shake than they would have otherwise”. One PC member even said unequivocally that the double blind review process made it more likely that they would submit to ALENEX in the future.  
On question 2, PC members brought up the issues we raised above, recommending that we make it clearer to authors how they need to blind their submissions and also mentioning the difficulty of assigning subreviewers.  
On question 3, the feedback was uniformly positive in favor of continuing with double blind review, inspite of the issues raised.

Summary

On balance, our experience with double blind review was positive. While there were logistical issues, many of these can be resolved by the methods we describe above. Further, there is now a wealth of knowledge accumulated in other areas of computer science that we can learn from. SIGPLAN has built a comprehensive FAQ around this issue that answers many of the questions that arise.
We recommend continuing with double blind review for at least the next two years at ALENEX, firstly because this brings us in line with common practice in most other parts of computer science, and secondly because many of the logistical issues people face with DB review will go away with habituation. At that point, the potential inconvenience of the process reduces and will be outweighed by the benefits in transparency and lack of bias.


Disqus for The Geomblog