Thursday, December 06, 2018

The theoryCS aggregator

As you all might now, the cstheory blog aggregator is currently down. Many people have been wondering what's going on and when it will be back up so here's a short summary.

The aggregator has been thus far maintained by Arvind Narayanan who deserves a HUGE thanks for setting up the aggregator, lots of custom code and the linked twitter account. Arvind has been planning to hand it over and the domain going down was a good motivator for him to do that.

Currently I have all the code that is used to generate the feed, as well as control over the twitter feed. Arnab Bhattacharyya has kindly volunteered to be the co-manager of the aggregator. What remains to be done now is

  • set up a new location to run the aggregator code from
  • set up hosting for the website
  • link this to the twitter account. 
None of these seem too difficult and the main bottleneck is merely having Arnab and I put together a few hours of work to get this all organized (we have a domain registered already). We hope to have it done fairly soon so you can all get back to reading papers and blogs again. 

Saturday, November 24, 2018

Should credit scores be used for determining residency?

It's both exhilarating and frustrating when you see the warnings in papers you write play out in practice. Case in point, the proposal by DHS to use credit scores to ascertain whether someone should be granted legal residence.

Josh Lauer at Slate does a nice analysis of the proposal and I'll extract some relevant bits for commentary. First up: what does the proposal call for? (emphasis mine)
The new rule, contained in a proposal signed by DHS Secretary Kirstjen Nielsen, is designed to help immigration officers identify applicants likely to become a “public charge”—that is, a person primarily dependent on government assistance for food, housing, or medical care. According to the proposal, credit scores and other financial records (including credit reports, the comprehensive individual files from which credit scores are generated) would be reviewed to predict an applicant’s chances of “self-sufficiency.”
So what's the problem with this? What we're seeing is an example of the portability trap (from our upcoming FAT* paper). Specifically, scores designed in a different context (for deciding who to give loans to) are being used in this context (to determine self-sufficiency). Why is this a problem?
Unfortunately, this is not what traditional credit scores measure. They are specialized algorithms designed for one purpose: to predict future bill-paying delinquencies, for any reason. This includes late payments or defaults caused by insurmountable medical debts, job loss, and divorce—three leading causes of personal bankruptcy—as well as overspending and poor money management.
That is, the reason the portability trap is a problem is because you're using one predictor to train another system. And if you're trying to make any estimations about the validity of the resulting process, then you have to know whether the thing you're observing (in this case the credit score) has any relation to the thing you're trying to observe (the construct of "self-sufficiency"). And this is something we harp on a lot in our paper on axiomatic considerations of fairness (and ML in general)

And in this case there's a clear disconnect:
Credit scores do not predict whether an individual will become a public charge. And they do not predict financial self-sufficiency. They are only useful in this context if one believes credit scores reveal something about a person’s character. In other words, if one believes that people with low credit scores are moochers and malingerers. Given the Trump administration’s hostility toward (brown-skinned) immigrants, this conflation of credit scores and morality is not surprising.
And this is a core defining principle of our work: that beliefs about the world control how we choose our representations and learning procedures: the procedures cannot be justified except in the context of the beliefs that underpin them. 

I think that if you read anything I've written, it will be clear where I stand on the normative question of whether this is a good idea (tl;dr: NOT). But as a researcher, it's important to lay out a principled reason for why, and this sadly merely confirms that our work is on the right track.


Friday, November 02, 2018

What do I work on ?

So, what do you work on? 

As questions go, this is one of the most rudimentary. It's the conference equivalent of "Nice weather we're having", or "How about them Broncos!". It's a throat-clearer, designed to start a conversation in an easy non-controversial way. 

And yet I'm always having to calculate and calibrate my answers. There's a visible pause, a hesitation as I quickly look through my internal catalog of problems and decide which one I'll pull out. On the outside, the hesitation seems strange: as if I don't quite know what I work on, or if I don't know how to explain it. 

It's an occupational hazard that comes from living on the edge of many different areas. I go to data mining conferences, machine learning conferences, theory/geometry conferences, and (now) conferences on ethics, society and algorithms. And in each place I have a different circle of people I know, and a different answer to the question

So, what do you work on?  

It makes me uncomfortable, even though it shouldn't. I feel like I can only share a part of my research identity because otherwise my answer will make no sense or (worse!) seem like I'm trying to impress people with incomprehensible words. 

I don't doubt that most people share some form of this feeling. As researchers, none of us are one-dimensional, and most of us work on many different problems at a time. Probably the easiest answer to the question is the problem that one has most recently worked on. But I sense that my case is a little unusual: not the breadth per se, but the range of topics (and styles of problem solving) that I dabble in. 

So, what do you work on? 

I often joke that my research area is a random walk through computer science and beyond. I started off in geometry, dabbled with GPUs (alas, before they were popular), found my way into information theory and geometry (and some differential geometry), slipped down the rabbit hole into data mining, machine learning, and a brief side foray into deep learning, and then built a nice little cottage in algorithmic fairness, where I spend more time talking to social scientists and lawyers than computer scientists.

Being an academic nomad has its virtues: I don't really get bored with my work. But it also feels like I'm always starting from square one with my learning and that there are always people who know way more about every topic than I do. And my academic roamings seem to mirror my actual nomadic status. I'm a foreigner in a land that gets stranger and less familiar by the day, and the longest time I've spent in any location is the place I'm in right now.



So, what do you work on? 

Maybe, in a way that's so American, "What do you work on" is really a question of "Who are you" in the way we bind together our work and our identity. When my students come and ask me what they should work on, what they're really asking me is to tell them what their research identity is, and my answer usually is, "whatever you want it to be right now". It's a frustrating answer no doubt, but I feel that it lowers the import of the question to a manageable level. 

So, what DO you work on?

I do algorithmic fairness, and think about the ethics of automated decision-making. I bring an algorithmic (and geometric) sensibility to these questions. I'm an amateur computational philosopher, a bias detective, an ML-translator for lawyers and policy folk, and my heart still sings when I see a beautiful lemma. 


Monday, October 22, 2018

On teaching ethics to tech companies

Kara Swisher (who is unafraid to call it like it is!) has a new op-ed in the NYT titled "Who will teach Silicon Valley to be ethical". She asks
How can an industry that, unlike other business sectors, persistently promotes itself as doing good, learn to do that in reality? Do you want to not do harm, or do you want to do good? These are two totally different things. 
And how do you put an official ethical system in place without it seeming like you’re telling everyone how to behave? Who gets to decide those rules anyway, setting a moral path for the industry and — considering tech companies’ enormous power — the world.

There are things that puzzle me about this entire discussion about ethics and tech. It seems like an interesting idea for tech companies to incorporate ethical thinking into their operations. Those of us who work in this space are clamoring for more ethics education for budding technologists.

There is of course the cynical view that this is merely window dressing to make it look like Big Tech (is that a phrase now?) cares without actually having to change their practices.

But let's put that aside for a minute. Suppose we assume that indeed tech companies are (in some shape of form) concerned about the effects of technology on society and that their leaders do want to do something about it.

What I really don't understand is the idea that we should teach Silicon Valley to be ethical. This seems to play into the overarching narrative that tech companies are trying to do good in the world and slip up because they're not adults yet -- a problem that can be resolved by education that will allow them to be good "citizens" with upstanding moral values.

This seems rather ridiculous. When chemical companies were dumping pesticides on the land by the ton and Rachel Carson wrote Silent Spring, we didn't shake our heads sorrowfully at companies and sent them moral philosophers. We founded the EPA!

When the milk we drink was being adulterated with borax and formaldehyde and all kinds of other horrific additives that Deborah Blum documents so scarily in her new book 'The Poison Squad', we didn't shake our heads sorrowfully at food vendors and ask them to grow up. We passed a law that led eventually to the formation of the FDA.

Tech companies are companies. They are not moral agents, or even immoral agents. They are amoral profit-maximizing vehicles for their shareholders (and this is not even a criticism). Companies are supposed to make money, and do it well. Facebook's stock price didn't slip when it was discovered how their systems had been manipulated for propaganda. It slipped when they proposed changes to their newsfeed ratings mechanisms to address these issues.

It makes no sense to rely on tech companies to police themselves, and to his credit, Brad Smith of Microsoft made exactly this point in a recent post on face recognition systems. Regulation, policing and whatever else we might imagine, has to come from the outside. While I don't claim that regulation mechanisms all work as they are currently conceived, the very idea of checks and balances seems more robust than merely hoping that tech companies will get their act together on their own.

Don't get me wrong. It's not even clear what has to be regulated here. Unlike with poisoned food or toxic chemicals, it's not clear how to handle poisonous speech or toxic propaganda. And that's a real discussion we need to have.

But let's not buy into Silicon Valley's internal hype about "doing good". Even Google has dropped its "Don't be evil" credo.

Thursday, October 11, 2018

Google's analysis of the dilemma of free speech vs hate speech

Breitbart just acquired a leaked copy of an internal google doc taking a cold hard look at the problems of free speech, fake news and censorship in the current era. I wrote a tweet storm about it, but also wanted to preserve it here because tweets, once off the TL, cease to exist.

Breitbart acquired an internal google doc discussing the misinformation landscape that the world finds itself in now: https://www.scribd.com/document/390521673/The-Good-Censor-GOOGLE-LEAK#from_embed … 
I almost wish that Google had put out this document to read in public. It's a well thought out exploration of the challenges faced by all of us in dealing with information dissemination, fake news, censorship and the like. And to my surprise, it (mostly) is willing to point figures backwards at Google and other tech companies for their role in it. (although there are some glaring omissions like the building of the new censored search tool in China). It's not surprising that people inside Google are thinking carefully about these issues, even as they flail around in public. And the analysis is comprehensive without attempting to provide glib solutions

Obviously, since this is a doc generated within Google, the space of solutions is circumscribed to those that have tech as a major player. For e.g the idea of publicly run social media isn't really on the table, or even better ways to decentralize value assignment for news, or alternate models for search that don't require a business model. But with those caveats in mind, the analysis of the problems is reasonable.

Monday, October 08, 2018

A new sexual harassment policy for TCS conferences.

One of my most visited posts is the anonymous post by a theoryCS colleague describing her own #metoo moments inside the TCS conference circuit. It was a brutal and horrific story to read.

Concurrently (I don't know if the blog post had an effect, but one can but hope it helped push things along), a committee was set up under the auspices of TCMF (FOCS), ACM, SIAM, and EATCS to
Draft a proposal for joint ToC measures to combat discrimination, harassment, bullying, and retaliation, and all matters of ethics that might relate to that.
That committee has now completed its work, and a final report is available. The report was also endorsed at the FOCS business meeting this week. The report is short, and you should read it. The main takeaways/recommendations are that every conference should
  • adopt a code of conduct and post it clearly. 
  • recruit and train a group of advocates to provide confidential support to those facing problems at a conference
  • have mechanisms for authors to declare a conflict of interest without needing to be openly specific about the reasons. 
There are many useful references in the report, as well as more concrete suggestions about how to implement the above recommendations. This committee was put together fast, and generated a very useful report quickly. Well done!

Monday, September 10, 2018

Hello World: A short review

A short review of Hannah Fry's new book 'Hello World'

Starting wth Cathy O'Neill's Weapons of Math Destruction, there's been an onslaught of books sounding the alarm about the use of algorithms in daily life. My Amazon list that collects these together is even called 'Woke CS'. These are all excellent books, calling out the racial, gender, and class inequalities that algorithmic decision-making can and does exacerbate and the role of Silicon Valley in perpetuating these biases.

Hannah Fry's new book "Hello World" is not in this category. Not exactly, anyway. Her take is informative as well as cautionary. Her book is as much an explainer of how algorithms get used in contexts ranging from justice, to medicine, to art, as much as it is a reflection on what this algorithmically enabled world will look like from a human perspective.

And in that sense it's a far more optimistic take on our current moment than I've read in a long time. In a way it's a relief: I've been mired for so long in the trenches of bias and discrimination, looking at the depressing and horrific ways in which algorithms are used as tools of oppression, that it can be hard to remember that I'm a computer scientist for a reason: I actually do marvel at and love the idea of computation as a metaphor, as a tool, and ultimately as a way to (dare I say it) do good in the world.

The book is structured around concepts (Power, data) and domains (justice, medicine, cars, crime and art). After an initial explainer on how algorithms function (and also how models are trained using machine learning), and how data is used to fuel these algorithms, she very quickly gets into specific case studies of both the good and the bad in algorithmically mediated decision making. Many of the case studies are from the UK and were unknown to me before this book. I quite liked that: it's easy to focus solely on examples in the US, but the uses (and misuse) of algorithms is global (Vidushi Mardia's article on AI policy in India has similar locally-sourced examples).

If you're a layman looking to get a general sense of how algorithms tend to show up in decision making systems, how they hold out hope for a better way of solving problems and where they might go wrong, this is a great book. It uses a minimum of jargon, while still beiing willing to wade into the muck of false positives and false negatives in a very nice illustrative example in the section on recidivism prediction and COMPAS, and also attempting to welcome the reader into the "Church of Bayes".

If you're a researcher in algorithmic fairness, like me, you start seeing the deeper references as well. Dr. Fry alludes to many of the larger governance issues around algorithmic decision making that we're wrestling with now in the FAT* community. Are there better ways to integrate automated and human decision-making that takes advantage of what we are good at? What happens when the systems we build start to change the world around them? Who gets to decide (and how) what level of error in a system is tolerable, and who might be affected by it? As a researcher, I wish she had called out these issues a little more, and there are places where issues she raises in the book have actually been addressed (and in some cases, answered) by researchers.

While the book covers a number of different areas where algorithms might be taking hold, it takes very different perspectives on the appropriateness of algorithmic decision-making in these domains. Dr. Fry is very clear (and rightly so) that criminal justice is one place where we need very strong checks and balances before we can countenance the use of any kind of algorithmic decision-making. But I feel that maybe she's letting off the medical profession a little easy in the chapter on medicine. While I agree that biology is complex enough that ML-assistance might lead us to amazing new discoveries, I think some caution is needed, especially since there's ample evidence that the benefits of AI in medicine might only accrue to the (mostly white) populations that dominate the clinical trials.

Similarly, the discussion of creativity in art and what it means for an algorithm to be creative is fascinating. The argument Dr. Fry arrives at is that art is fundamentally human in how it exists in transmission -- from artist to audience -- and that art cannot be arrived at "by accident" via data science. It's a bold claim, and of a kind with many claims about the essential humanness of certain activities that have been pulverized by advances in AI. Notwithstanding, I find it very appealing to posit that art is essentially a human endeavour by definition.

But why not extend the same courtesy to the understanding of human behavior or biology? Algorithms in criminal justice are predicated on the belief that we can predict human behavior and how our interventions might change it. We expect that algorithms can pierce the mysterious veil of biology, revealing secrets about how our body works. And yet the book argues not that these systems are fundamentally flawed, but that precisely because of their effectiveness they need governance. I for one am a lot more skeptical about the basic premise that algorithms can predict behavior to any useful degree beyond the aggregate (and perhaps Hari Seldon might agree with me).

Separately, I found it not a little ironic, in a time when Facebook is constantly being yanked before the US Congress, Cambridge Analytica might have swayed US elections and Brexit votes, and Youtube is a dumpster fire of extreme recommendations, that I'd read a line like "Similarity works perfectly well for recommendation engines" in the context of computer generated art.

The book arrives at a conclusion that I feel is JUST RIGHT. To wit, algorithms are not authorities, and we should be skeptical of how they work. And even when they might work, the issues of governance around them are formidable. But we should not run away from the potential of algorithms to truly help us, and we should be trying to frame the problem away from the binary of "algorithms good, humans bad" or "humans good, algorithms bad" and towards a deeper investigation of how human and machine can work together. I cannot read
Imagine that, rather than exlcusively focusing our attention on designing our algorithm to adhere to some impossible standard of perfect fairness, we instead designed them to facilitate redress when they inevitable erred; that we put as much time and effort into ensuring that automatic systems were as easy to challenge as they are to implement.
without wanting to stand up and shout "HUZZAH!!!". (To be honest, I could quote the entire conclusions chapter here and I'd still be shouting "HUZZAH").

It's a good book. Go out and buy it - you won't regret it.

This review refers to an advance copy of the book, not the released hardcover. The advance copy had a glitch where a fragment of latex math remained uncompiled. This only made me happier to read it.

Disqus for The Geomblog