## Friday, September 29, 2017

### "X is a social construct" and the perils of mining behavior.

After the infamous Google memo (and frankly for much longer if you work in algorithmic fairness), the idea of something being a "social construct" has popped up again, and I will admit that I've struggled with trying to understand what that means (damn you, focused engineering education!)

Ta-Nehisi Coates' article about race is a short and excellent read. But I also want to highlight something much closer to home. BYU Radio's Julie Rose did an interview with Jacqueline Chen (at the U) on her recent work on perceptions of race in the US vs Brazil.

The interview is here (and it's short - starting at around 20 minutes in) and in it Prof. Chen very masterfully lays out the way in which race is perceived and how it changes based on changes in context. The interview is based on a recently published paper ().

One important takeaway: the way in which one's racial identity is perceived varies greatly between the US (which appears to be influenced by parental information) vs Brazil (where skin color appears to be the dominant factor). More importantly, the idea of race as immutable vs changeable, a categorical attribute versus a continuous one, all vary.

And that's what we mean by saying that X (here, race) is a social construct. It's not saying that it's fictitious or less tangible. But that it's defined by the way we talk about it in society.

Why is this important? When we collect data as a way to predict behavior, we're making an implicit claim that behavior can be predicted (and explained) by intrinsic and often immutable descriptors of an individual. We use (or don't use) "race" as a feature when building models.

But this itself is a huge assumption! It assumes that we can intelligently ascribe features to individuals that capture these notions, and that they are defined solely by the individual and not by context. The brilliant Medium article about the paper that claimed to predict criminality from facial features makes this point very well.

But do we capture the entire history of environmental factors that make up the story of an individual. Of course not. We essentialize an individual into a collection of features that we decide captures all their relevant traits for the purpose of prediction, and then we build a model that rests on this extremely problematic idea.

Much of the work I do on fairness can be reduced to "check your data, and check your algorithm". What we're also thinking about (and that directly speaks to this issue) is "check your features".

It turns out that way back in 1921, Walter Lippman had something interesting to say about all of this. From a longer essay that he wrote on the importance of frames as mediating how we perceive the world (and it says something about fake news and "true facts" as well):
And so before we involve ourselves in the jungle of obscurities about the innate differences of men, we shall do well to fix our attention upon the extraordinary differences in what men know of the world. I do not doubt that there are important biological differences. Since man is an animal it would be strange if there were not. But as rational beings it is worse than shallow to generalize at all about comparative behavior until there is a measurable similarity between the environments to which behavior is a response.