Saturday, December 12, 2015

White Elephant parties and fair division

It's holiday party season, which means that it's time for white elephant parties. For those who don't know what a white elephant party is, here's a description from Wikipedia:
Each participant supplies one wrapped gift. The gifts are placed in a central location, and participants determine in which order they will take turns selecting them. The first person opens a wrapped gift, and the turn ends. On subsequent turns, each person can open a new present or gets the choice to "steal" another person's gift. The gift cannot be stolen once the third participant touches the gift (i.e. - it is stolen for the 2nd time). When a person's gift is stolen, that person can either choose another wrapped gift to open or can steal from another player. The game is over when the last person goes and the first person goes again
I just finished my algorithms class with the "traditional" (n=2) discussion of fair division algorithms (aka cake cutting, but with actual cake). White elephant parties are also a form of fair division with indivisible goods.

A few years ago, I started wondering about the theory behind white elephant exchanges and posted a question on cstheory. The answer I got (from usul) was excellent (tl;dr: worst-case definitions of fairness don't work). But it's also a few years old, and I wonder if there are new results on this topic.

Tuesday, December 01, 2015

Fairness and The Good Wife

The Good Wife is a long-running legal TV show with a Chicago politics backdrop. It's one of the most popular shows on network TV, and so it was particularly fascinating to see an episode devoted in part to issues of algorithmic fairness.

It's helpful to know that the show frequently features a Google-like company called ChumHum. In this episode, they're being sued because of a feature in their maps application called "Safe Filter" that marks regions of a city (Chicago) as safe or not safe. A restaurant owner claims that she went out of business because Chummy Maps (their horribly named maps application) marked the region around her restaurant as unsafe.

The writers of this episode must have been following some of the news coverage of fairness in algorithms over the past year: a number of news items were referenced in passing. What follows is an annotated list of references.

  • "Math is not racist". This is what the character playing the COO of Chumhum says when he's first deposed, arguing that the Safe Filter uses only algorithms to determine which regions are safe. This is reminiscent of the overly uncritical article that NPR did about the use of machine learning in hiring: the CEO of Jobaline (one of the companies doing this) happily proclaimed that "math is blind"
  • "Tortious intent": Disparate impact is one vehicle that might be used to determine bias-by-algorithm. But in the episode, the lawyers argue a stronger claim, that of tortious intent, which is interesting because they then have to show deliberate racial bias. 
  • "Objective third party statistics and user generated content": The initial line of defense from Chumhum is that they use third party statistics like crime rates. The lawyers immediately point out that this could introduce bias itself. They also say they use user-generate content as a defense ("We're not racist: our users are"). This is then rebutted by the lawyers pointing out that the users of the maps app skew heavily Caucasian (bringing up another good point about how bias in training data can leech into the results)
  • "Full discovery": Chumhum wanted to hide behind its algorithm: the opposing lawyers made a successful argument for discovery of the algorithm. I doubt this could ever happen in real life, what with trade secrets and all. More on this later. 
  • "Home ownership rates as proxy for race": One subplot involved determining whether home-ownership rates were being used in the Safe Filter. The characters immediately realized that this could be a  proxy for race and could indicate bias. 
  • "The animal incident": This was a direct reference to the image-tagging fiasco of a few months ago when Google's photo app started labelling pictures of African-Americans as 'gorillas'. While at first this is a throw-away incident (including a line "Even google did it!"), it comes back later to haunt the company when a lawyer looks at the code (ha!) and discovers a patch that merely removes the 'animal' tag (instead of fixing the underlying problem). This appears to also be what Google did to "solve" its problem. 
  • "Differential ad placement": A hat tip to the work by Latanya Sweeney and the CMU team, another plot point turned on the discovery that ads in the maps application were targeting the white lawyer with ads for skiing and the black lawyer with ads for soul food.  This in and of itself was not a problem for the case, but it led to a much more fascinating argument: that Chumhum was drawing on user profile data from all its properties (search, email etc) to target ads, and so discovery could not be limited solely to maps-related data and code. This is in general the problem with asking for code to do an audit: if you don't know where the training data is coming from, the code is basically useless. Remember, an algorithm isn't always just a piece of code :)
  • "Bad training data/non-diverse workforce": One of the employee characters made the argument that the bad image tagging results were the result of "bad training data", which is an accurate statement and is part of the fairness concerns with algorithms. The lawyer also made the point that a less-homogenous workplace might have helped as well (which brings to mind the Al Jazeera panel I participated on a few months ago)
  • "IMPLICIT BIAS": I was happy when this phrase was used correctly to argue for how even "non-racist" people can help perpetuate a racist system. I would have been happier if someone had said "Disparate impact" though :). 
If you're wondering, the final resolution of the case did NOT turn on a determination of bias or not. It turned out that the restaurant had been losing money before the filter was even put into place. But it was interesting to see an example (albeit on TV) of how a court case on this might pan out. A lot of the side show involved trying to claim that programmers on the Maps app were racist (or had racist inclinations) to argue for why the code might be biased as well. 

Sunday, November 01, 2015

Rock bands with CS names.

You all know (or should know) about Positive Eigenvalues, the band formed by Michael Jordan that has included (among others) Christos Papadimitriou. But I'm not talking about that kind of band.

I'm looking for bands with names that have a CS connection (accidentally or otherwise). I was playing a Spotify playlist called (coincidentally) Deep Focus and the first band on the list was called Random Forest.

So here's my list so far:

Any more ? 

Saturday, October 31, 2015

Data and Civll Rights II

I just got back from the one-day Data and Civil Rights conference organized by Data and Society. As I mentioned in my previous post, the conference operated under Chatham House Rules, which means I can't really reveal any of the specific discussions that went on in the main sessions or the breakout groups.

This was a conference full of civil rights activists, lawyers, policy types, and even police folk. It was a conference where people could get up and proclaim that they hate numbers, to cheers from the audience. Feeling like the odd one out is unfamiliar to me.

But it was full of passion and fire. And very depressing if you think of data analysis and ML as a GOOD THING. Because in this context, it is at best a blunt weapon that is being wielded carelessly and violently.

We've had a good run designing algorithms that tell people what to buy and what to watch. But when these same algorithms start deciding whether people can live their lives on their own terms, then as the cool kids are wont to say:


Friday, October 30, 2015

Eschew obfuscation: write clearly

There's an article in the Atlantic about the "needless complexity of academic writing". Apart from learning that there's a Plain Writing Act (who says Congress is gridlocked!), I wasn't too surprised by the points made. Yes, academic writing can be turgid and yes, part of this is because we want to "impress the reviewers", and no academics can't be coerced into changing the way they do things - at least not easily.

Steven Pinker has proposed an alternate theory of why academic writing is so jargon-heavy. Paraphrasing from the Atlantic article:
Translation: Experts find it really hard to be simple and straightforward when writing about their expertise. He calls this the “curse of knowledge” and says academics aren’t aware they’re doing it or properly trained to identify their blindspots—when they know too much and struggle to ascertain what others don’t know. In other words, sometimes it’s simply more intellectually challenging to write clearly.
For me, blogging has always been a way out of this blind spot. First of all, I can be more conversational and less stilted. Secondly, even if I'm writing for a technical audience, I'm forced to pare down the jargon or go crazy trying to render it.

But I wonder how hard it really is for experts to write clearly about their work. I wonder this because these same experts who write prose that you can clobber an elephant with are remarkably colorful and vivid when describing their work in person, on a board, or at a conference (though not at a talk itself: that's another story).

While it's common to assume that the obfuscation is intentional (STOC papers need to be hard!), I think it's more a function of deadline-driven writing and last-minute proof (or experiment) wrangling.

I'm thinking about this because I'm planning to run a seminar next semester that I'm calling 'Reading with Purpose'. More on that in a bit...

Monday, October 26, 2015

Data and Civil Rights

I'm in DC right now for a one-day conference on Data and Civil Rights, run by the Data and Society Institute.

This is an annual event (this is the second such conference). Last year's conference was themed "Why Big Data is a civil rights issue", and this year's conference focuses on the very hot-button topic of big data and criminal justice.

Needless to say, issues of fairness and discrimination are front and center in an area like this, and so I'm hoping to learn a lot about the state of play (and maybe contribute as well).

This is more of a working meeting than a traditional conference: all material is private during the conference and we're expected not to talk about the discussions outside the event (a la Chatham House rules). Digested material from the different working groups will be posted in November.


JHU Workshop on Sublinear Algorithms

The latest in a long line of workshops on sublinear algorithms (streaming! sketching ! property testing ! all of the above !) will be held at JHU this year just before SODA 2016. The message from the organizers is below: do consider attending if you're planning to attend SODA. (Disclaimer: I'm giving one of the 20+ talks, but I will not promise that it's excellent). 

Dear colleagues,

We are organizing a Sublinear Algorithms workshop that will take place at Johns Hopkins University, January 7-9, 2016. The workshop will bring together researchers interested in sublinear algorithms, including sublinear-time algorithms (e.g., property testing and distribution testing), sublinear-space algorithms (e.g., sketching and streaming) and sublinear measurements (e.g., sparse recovery and compressive sensing).

The workshop will be held right before SODA’16, which starts on January 10 in Arlington, VA (about 50 miles from JHU).

Participation in this workshop is open to all, with free registration. In addition to 20+ excellent invited talks, the program will include short contributed talks by graduating students and postdocs, as well as a poster session. To participate in the contributed talk session and/or the poster session, apply by December 1.

For further details and registration, please visit
http://www.cs.jhu.edu/~vova/sublinear2016/main.html .



Best,
Vladimir Braverman, Johns Hopkins University
Piotr Indyk, MIT
Robert Krauthgamer, Weizmann Institute of Science
Sofya Raskhodnikova, Pennsylvania State University

Friday, October 02, 2015

An algorithm isn't "just code"

I've been talking to many people about algorithmic fairness of late, and I've realized that at the core of pushback against algorithmic bias ("algorithms are just math! If the code is biased, just look at it and you can fix it !") is a deep misunderstanding of the nature of learning algorithms, and how they differ fundamentally from the traditional idea of an algorithm as "a finite set of well-defined elementary instructions that take an input and produce an output".

This misunderstanding is crucial, because it prevents people from realizing why algorithmic fairness is actually a real problem. And that prompted me to write a longer note that takes the "algorithm == recipe" analogy and turn it on its head to capture how machine learning algorithms work.


Disqus for The Geomblog