Tuesday, July 28, 2015

The 2nd Workshop on Fairness, Accuracy and Transparency in Machine Learning: A review

I was one of the organizers of the 2nd workshop on Fairness, Accuracy and Transparency in Machine Learning (FATML) at ICML 2015, and in my alternate career as moderator of data mining panels, I moderated the closing panel. The panelists were Fernando Diaz from MSR New York, Sorelle Friedler from Haverford College, Mykola Pechenizkiy from Eindhoven Instt. of Technology and Hanna Wallach from UMass-Amherst and MSR.

While my original intent was to do a review of the panel, it became clear that the panel discussion touched on themes that were bubbling up throughout the day. So what follows is organized by panel questions, but weaves in discussion from outside the panel as well.

This year's workshop, unlike the one at NIPS 2014, had a bit more of a technical focus: we had some of the early researchers in fairness work from Europe and Japan give talks on their work.  So as a counterweight, I thought I'd ask the panel to look beyond the presentations for the first question:
Question 1 
What is one thing you think we are missing (or should be paying more attention to) in our current discussions of fairness and discrimination  in terms of interaction with the social/policy/legal world OUTSIDE CS ?

Some themes that emerged:

...on the difference between Europe and the US

Mykola made an interesting observation based on his experience in educational data mining. Governments around Europe are very concerned about the use of student data (including health records and academic information) for any kind of data driven tools, and have severely regulated use of such data. As a result, the nascent field of educational data mining has been crippled by lack of access to data.

This is almost the opposite of the situation in the US, where data driven policy is the newest buzzword in town, and those of us who are interested in issues of fairness and transparency feel like we're constantly on the outside looking in (though attention to these issues is increasing).

...connecting to other communities

It's been clear from the beginning that discourse on fairness and transparency in machine learning must draw on the corresponding discourses in society at large. Which means that before we can start solving problems, we have to understand what the problems really are. This came through very strongly in the discussions. To paraphrase one of the panelists,  "Computer science likes to solve problems, and that's the problem !" (also known as  "slap a metric on it and optimize").

So what are the different communities we should be connecting to, and how ?

a) Connecting with social science

A major concern is the "prediction vs understanding" problem. For the most part, machine learning is about prediction: you classify, label, clustering, regress, rank and so on. But in the study of society and the dynamics of human interactions, the goal is not just to predict how humans might behave, but to understand their behavior. Which is to say, data analysis (or even fairness-aware analysis) has to be the first step in a larger conversation, rather than the last one.

While I don't think this issue is specific to fairness and transparency, it  plays a role in understanding the sources of inequality and discrimination. It's not enough to to detect examples of bias: what must happen next is an investigation of why the bias is happening.

(ed: personally, while I understand this concern, I don't think it's necessarily something computer scientists need to prioritize. This is after all what the social sciences do, and it doesn't make sense for us as computer scientists to merely try to acquire those skills. I think we need to be aware of the deeper issues of understanding a domain, but we also have strengths that we bring to the table and I'll say more about that later)

"galaxies don't care how they are studied, but people do"

Another point that was made over and over is that  issues of fairness and bias are not abstract: they affect actual people. Keeping the human in focus is important for the ethical underpinning of what we do, and even how we might design experiments.

b) connecting with journalists

Nick Diakopoulos gave a talk on "algorithmic accountability" in journalism. In addition to talking about what made research on fairness newsworthy:

  • discriminatory/unfair practices
  • mistakes that denies a service
  • censorship
  • activities that break the law or social norms
  • false prediction

he made the strong argument that (government) legitimacy comes from transparency, and talked about what that might entail in the age of data driven policy, including transparency involving data collection, the algorithms used, the inferences generated, and the humans involved in the process.

(ed: I don't think that our demands on transparency should be limited to government entities: the sad fact is that at least in the US, much of what would be considered basic internet infrastructure is controlled by private corporations, and they should be held to similar standards: if not for legitimacy, at least for fairness)

c) connecting with the law

Our fearless leader Solon Barocas made a number of interesting observations on the connection between algorithmic fairness  and the law, all the while disclaiming IANAL :). But his point (which he's made before) is worth repeating. One of the things that computer science can do well is  make precise concepts that might be defined vaguely or only indirectly through case law. And then we can get to work teasing out the relationships between different concepts (both abstractly and computationally). Indeed, the idea of a "reduction" between concepts in fairness might be one of the most useful things that computer science can uniquely contribute.

It's clear we're in a "let a thousand definitions bloom" phase in fairness research. And it's interesting to see the different reactions to this: on the social science side, there appears to be some nervousness that we're "playing games with math", but from Solon's comments this doesn't seem like a bad thing as long as we're also trying to connect the definitions together.

 Question 2 
In your view, what’s the next most pressing question we should be asking (limited to INSIDE CS to distinguish from the previous question) ?

...better definitions

It was very clear from the discussion that we need broader definitions of F-A-T beyond what's mathematically plausible. One particular example that's reminiscent of the metrics for privacy: There's a notion of "utility": how much can we make the data or the task "fair" without changing the underlying results produced by the "unfair" data/algorithm. The problem is that utility itself is not very well defined. Firstly, you might be benefiting from discriminatory policies, so your perceived "utility" itself is a problem. Trying to maintain this defeats the purpose of fairness. Secondly, even if this is not the case, the framing of the question as a tradeoff implies that these two notions are necessarily in opposition. That shortchanges the moral imperative of fairness and is different from the parallel situation in privacy. Finally, we measure utility in terms of classifier accuracy. But that's a very poor notion of overall task effectiveness. For example, is there a Bayesian perspective to bring to this ?

At any rate, since we are good at understanding tradeoffs in computer science, we should understand the different dimensions of the space of fairness preserving methods, rather than limiting ourselves to a one-dimensional false dichotomy of "fairness vs utility".

...better usable artifacts

Nick asked us the following question at the panel:

when a CEO or an agency head comes to us and asks "what should we do about this fairness stuff". what do we tell them ?

We didn't have a good response, and that was interesting. While we're beginning to explore the space of what's possible, we don't have clear examples of artifacts to hand over and say "use this".

As usual, the topic of benchmarking came up. I joke that when industry folks bring up the issue of benchmarking, I always ask "so where's the data" and they usually go very silent. But I do think there are useful data sets to be explored that come to us from the government. Some common data sets that get used are the US census data on salaries and a German data set on consumer credit. The entire data set from the Ricci court case is also available (even though it's tiny), and there are Bureau of Justice recidivism data sets to play with.

Of course this goes against the imperative coming from the social scientists to look at specific domains and ask meaningful questions in that domain. And I think we need to look more at the literature on fairness and bias over the decades and extract data that people have studied.

...better problems

For the most part, researchers have been considering binary classification as the suspect task. But of course there are much more general tasks that we could be considering: what about unsupervised learning ? what about structured prediction ? Is there a way to define fairness when you don't have a simple binary response variable and binary attributes ?

One final question I asked was this:
 Question 3 
do we have to solve the causality problem in order to talk about fairness ? 

This question was possibly not as well-posed as I would have liked, but it led to interesting discussions.

The law deals with intent, because the goal of the law is to assign responsibility. Algorithms are not agents and can't exhibit intent. Causality is a proxy for intent, in that if we can say that something caused something else, we can assign blame in a different way. In fact there were two talks at the workshop that talked about causality directly in the context of fairness.

But causality is a very hard problem. It's extremely subtle (if you doubt this, read through some of the examples Judea Pearl discusses in his book), and extremely controversial: different camps have their own view of how to mechanize causal inference, and the battles there make frequentists and Bayesians look like life-long friends.

In the discussion that followed, it became clear that there were really two ways of thinking about causality as it relates to fairness. The first way is to think about the underlying causal mechanisms that might lead to otherwise innocent features leading to biased outcomes: that is, how might zip code correlate with racial identity for example. The second way, which is closer to what I had in mind, is to think about the behavior of an algorithm causally: the use of these inputs or this algorithm *caused* a particular decision to be made. This second idea is not as far-fetched as it seems: some work in the database community has looked at trying to find which tuples "caused" a certain output to be generated from a query.

If you think it's not important to understand causality as it comes to automated methods, you might not want to drive a self-driving car or fly a plane. But as Solon suggested in the discussion, one way of getting around causality is to think about negligence with respect to algorithms: can we design reasonable best practices for predictive tools and argue that a failure to use these methods is negligence ? The legal ramifications of these idea have been explored in the context of robotics (article, and response) but more work is yet to be done.

...back to narratives

Another comment by Nick D, again connecting to the journalism perspective: narratives and story telling are a powerful way to explain the results of data mining. I haven't talked much about interpretability, which is an important part of the larger discussion of transparency and accountability. But one way to communicate the results of (say) a fairness audit would be to provide a human-interpretable linkage between the problematic attributes being used for prediction and the protected attribute. For more on this, see Michael Nielsen's very timely new Quanta article on machine-generated explanations.

It's clear from all the discussion that there's a lot of work to be done and a small but active community of people interested in pushing these issues forward. Fairness, and algorithmic bias, are hot topics in the news nowadays, and it's a good time to take advantage of this burst of interest.

Friday, July 24, 2015

Racism/sexism in algorithms

For my sins, or more specifically because of my interest in fairness, I was asked to be on a panel discussing algorithmic racism/bias on The Stream, a web/tv show on Al Jazeera English (as opposed to Al Jazeera America, in case you're not already confused).

I've never done live TV before, so it was quite nerve-wracking. While my "role" was to be the tech explainer, I was worried about saying something I wouldn't be able to rephrase or take back, and I was worried about the 'shouting talking heads' nature of TV discussions.

I'm glad to say that none of this transpired. We had a very genteel discussion on the merits of the issue, and there was a lot of informed commentary on the topic. Personally I thought that the discussion leaned more heavily on dealing with bias via diversity in hiring rather than on structural bias in algorithm design, but it's a very important part of the bigger picture of fairness, and it's definitely a more accessible discussion than the arcana of algorithmic disparate impact (something that I hope to change:)).

Here's the show in its entirety. And irony of ironies, I went to twitter later to change my user name back to my full name, and Twitter wouldn't let me fill in my whole last name !

Saturday, July 04, 2015

On the different stages of learning and teaching (algorithms)

Descending a rabbit hole of links prompted by a MeFi discussion (thanks, +David Eppstein) of Steven Pinker's essay on the curse of knowledge (thanks, +Jeff Erickson), I came across an article by Alistair Cockburn on a learning framework inspired by aikido called 'Shu-Ha-Ri'.

In brief,

  • In the Shu stage, you're a beginning learner trying to find one way to solve a problem. It doesn't matter that there might be multiple ways. The goal is to learn one path, and learn it well. 
  • In the Ha stage, you understand one way well enough to realize its limits, and are ready to encounter many different strategies for reaching your goal. You might even begin to understand the pros and cons of these different approaches. In effect, you have detached from commitment to a single approach. 
  • In the Ri stage, you have "transcended" the individual strategies. You might use one, or another, or mix and match as needed. You'll create new paths as you need them, and move fluidly through the space of possibilities. 
Reading through this article while I ponder (yet again) my graduate algorithms class for the fall, I realize that this three-stage development process maps quite well to what we expect from undergraduates, masters students and Ph.D students learning about an area. 

The undergraduate is learning a tool for the first time (recurrence analysis say) and if they can understand the master theorem and apply it, that's pretty good. 

At the next level, they realize the limitations of the master theorem, and might learn about the Akra-Bazzi method, or annihilators, or even some probabilistic recurrence methods. 

Of course, once you're dealing with some thorny recurrence for the analysis in your next SODA submission, then the standard templates are helpful, but you'll often have to do something creative and nontrivial to wrestle the analysis into a form where it makes sense. 

Pick your own topic if you don't like recurrences. 

Which also explains why it's hard to explain how to prove things. Beginning students expect a standard formula (which is why induction and proof by contradiction get taught over and over). But once you go beyond this, there aren't really good templates. In effect, there's no good second level with a set of proof techniques that you can throw at most problems, which explains why students taking a grad algorithms class tend to struggle with exactly this step. 

Disqus for The Geomblog