Saturday, February 02, 2019

More FAT* blogging

Session 3: Representation and Profiling

Session 4: Fairness methods. 

Monday, January 28, 2019

FAT* Session 2: Systems and Measurement.

Building systems that have fairness properties and monitoring systems that do A/B testing on us.

Session 2 of FAT*: my opinionated summary.

Sunday, January 27, 2019

FAT* blogging

I'll be blogging about each session of papers from the FAT* Conference. So as not to clutter your feed, the posts will be housed at the fairness blog that I co-write along with Sorelle Friedler and Carlos Scheidegger.

The first post is on Session 1: Framing and Abstraction.

Thursday, December 20, 2018

The theoryCS blog aggregator REBORN

(will all those absent today please email me)

(if you can't hear me in the back, raise your hand)

The theoryCS blog aggregator is back up and running at its new location -- cstheory-feed.org -- which of course you can't know unless you're subscribed to the new feed, which....

More seriously, we've announced this on the cstheory twitter feed as well, so feel free to repost this and spread the word so that all the theorists living in caves plotting their ICML, COLT and ICALP submissions will get the word. 

Who's this royal "we"? Arnab Bhattacharyya and myself (well mostly Arnab :)). 

For anyone interested in the arcana of how the sausage (SoCG?) gets made, read on: 

Arvind Narayanan had set up an aggregator based on the Planet Venus software for feed aggregation (itself based on python packages for parsing feeds). The two-step process for publishing the aggregator works as follows:
  1. Run the software to generate the list of feed items and associated pages from a configuration file containing the list of blogs
  2. Push all the generated content to the hosting server. 
Right now, both Arnab and I have git access to the software and config files and can edit the config to update blogs etc. The generator is run once an hour and the results are pushed to the new server. 

So if you have updates or additions, either of us can make the changes and they should be reflected fairly soon on the main page. The easiest way to verify this is to wait a few hours, reload the page and see if your changes have appeared. 

The code is run off a server that Arnab controls and both of us have access to the domain registry. I say this in the interest of transparency (PLUG!!) but also so that if things go wonky as they did earlier, the community knows who to reach. 

Separately, I've been pleasantly surprised at the level of concern and anxiety over the feed -- mainly because it shows what a valuable community resource the feed is and that I'm glad to be one of the curators. 

If you've read this far, then you really are interested in the nitty gritty, and so if you'd like to volunteer to help out, let us know. It would be useful for e.g to have a volunteer in Europe so that we have different time zones covered when things break. And maybe our central Politburo (err. I mean the committee to advance TCS) might also have some thoughts, especially in regard to their mission item #3:
To promote TCS to and increase dialog with other research communities, including facilitating and coordinating the development of materials that educate the general scientific community and general public about TCS.

Thursday, December 06, 2018

The theoryCS aggregator

As you all might now, the cstheory blog aggregator is currently down. Many people have been wondering what's going on and when it will be back up so here's a short summary.

The aggregator has been thus far maintained by Arvind Narayanan who deserves a HUGE thanks for setting up the aggregator, lots of custom code and the linked twitter account. Arvind has been planning to hand it over and the domain going down was a good motivator for him to do that.

Currently I have all the code that is used to generate the feed, as well as control over the twitter feed. Arnab Bhattacharyya has kindly volunteered to be the co-manager of the aggregator. What remains to be done now is

  • set up a new location to run the aggregator code from
  • set up hosting for the website
  • link this to the twitter account. 
None of these seem too difficult and the main bottleneck is merely having Arnab and I put together a few hours of work to get this all organized (we have a domain registered already). We hope to have it done fairly soon so you can all get back to reading papers and blogs again. 

Saturday, November 24, 2018

Should credit scores be used for determining residency?

It's both exhilarating and frustrating when you see the warnings in papers you write play out in practice. Case in point, the proposal by DHS to use credit scores to ascertain whether someone should be granted legal residence.

Josh Lauer at Slate does a nice analysis of the proposal and I'll extract some relevant bits for commentary. First up: what does the proposal call for? (emphasis mine)
The new rule, contained in a proposal signed by DHS Secretary Kirstjen Nielsen, is designed to help immigration officers identify applicants likely to become a “public charge”—that is, a person primarily dependent on government assistance for food, housing, or medical care. According to the proposal, credit scores and other financial records (including credit reports, the comprehensive individual files from which credit scores are generated) would be reviewed to predict an applicant’s chances of “self-sufficiency.”
So what's the problem with this? What we're seeing is an example of the portability trap (from our upcoming FAT* paper). Specifically, scores designed in a different context (for deciding who to give loans to) are being used in this context (to determine self-sufficiency). Why is this a problem?
Unfortunately, this is not what traditional credit scores measure. They are specialized algorithms designed for one purpose: to predict future bill-paying delinquencies, for any reason. This includes late payments or defaults caused by insurmountable medical debts, job loss, and divorce—three leading causes of personal bankruptcy—as well as overspending and poor money management.
That is, the reason the portability trap is a problem is because you're using one predictor to train another system. And if you're trying to make any estimations about the validity of the resulting process, then you have to know whether the thing you're observing (in this case the credit score) has any relation to the thing you're trying to observe (the construct of "self-sufficiency"). And this is something we harp on a lot in our paper on axiomatic considerations of fairness (and ML in general)

And in this case there's a clear disconnect:
Credit scores do not predict whether an individual will become a public charge. And they do not predict financial self-sufficiency. They are only useful in this context if one believes credit scores reveal something about a person’s character. In other words, if one believes that people with low credit scores are moochers and malingerers. Given the Trump administration’s hostility toward (brown-skinned) immigrants, this conflation of credit scores and morality is not surprising.
And this is a core defining principle of our work: that beliefs about the world control how we choose our representations and learning procedures: the procedures cannot be justified except in the context of the beliefs that underpin them. 

I think that if you read anything I've written, it will be clear where I stand on the normative question of whether this is a good idea (tl;dr: NOT). But as a researcher, it's important to lay out a principled reason for why, and this sadly merely confirms that our work is on the right track.


Friday, November 02, 2018

What do I work on ?

So, what do you work on? 

As questions go, this is one of the most rudimentary. It's the conference equivalent of "Nice weather we're having", or "How about them Broncos!". It's a throat-clearer, designed to start a conversation in an easy non-controversial way. 

And yet I'm always having to calculate and calibrate my answers. There's a visible pause, a hesitation as I quickly look through my internal catalog of problems and decide which one I'll pull out. On the outside, the hesitation seems strange: as if I don't quite know what I work on, or if I don't know how to explain it. 

It's an occupational hazard that comes from living on the edge of many different areas. I go to data mining conferences, machine learning conferences, theory/geometry conferences, and (now) conferences on ethics, society and algorithms. And in each place I have a different circle of people I know, and a different answer to the question

So, what do you work on?  

It makes me uncomfortable, even though it shouldn't. I feel like I can only share a part of my research identity because otherwise my answer will make no sense or (worse!) seem like I'm trying to impress people with incomprehensible words. 

I don't doubt that most people share some form of this feeling. As researchers, none of us are one-dimensional, and most of us work on many different problems at a time. Probably the easiest answer to the question is the problem that one has most recently worked on. But I sense that my case is a little unusual: not the breadth per se, but the range of topics (and styles of problem solving) that I dabble in. 

So, what do you work on? 

I often joke that my research area is a random walk through computer science and beyond. I started off in geometry, dabbled with GPUs (alas, before they were popular), found my way into information theory and geometry (and some differential geometry), slipped down the rabbit hole into data mining, machine learning, and a brief side foray into deep learning, and then built a nice little cottage in algorithmic fairness, where I spend more time talking to social scientists and lawyers than computer scientists.

Being an academic nomad has its virtues: I don't really get bored with my work. But it also feels like I'm always starting from square one with my learning and that there are always people who know way more about every topic than I do. And my academic roamings seem to mirror my actual nomadic status. I'm a foreigner in a land that gets stranger and less familiar by the day, and the longest time I've spent in any location is the place I'm in right now.



So, what do you work on? 

Maybe, in a way that's so American, "What do you work on" is really a question of "Who are you" in the way we bind together our work and our identity. When my students come and ask me what they should work on, what they're really asking me is to tell them what their research identity is, and my answer usually is, "whatever you want it to be right now". It's a frustrating answer no doubt, but I feel that it lowers the import of the question to a manageable level. 

So, what DO you work on?

I do algorithmic fairness, and think about the ethics of automated decision-making. I bring an algorithmic (and geometric) sensibility to these questions. I'm an amateur computational philosopher, a bias detective, an ML-translator for lawyers and policy folk, and my heart still sings when I see a beautiful lemma. 


Disqus for The Geomblog