Wednesday, April 19, 2017

TheoryFest at STOC 2017.

(ed: I can't believe it's been four months since my last post. Granted, I've been posting over at algorithmicfairness.wordpress.com, but still. Time to turn in my theory card...)

One of the things I've been doing is helping out with the STOC TheoryFest this summer. Specifically, I've been on the plenary talks committee that was tasked with identifying interesting papers from other parts of CS and beyond that might be interesting for a STOC audience to hear about. After many months of deliberations, we're finally done! Boaz has more details over at his blog.

Do take a look at the papers. Boaz has a great summary for each paper, as well as links to the sources. You'll see a pretty decent coverage of different topics in CS, including even some theory B !!

But more importantly, please do register for STOC! This year, the organizers are making a real effort to restructure the conference to make it more accessible with a diversity of events like the plenary talks, tutorials, and workshops, and there should be something for everyone. Early registration ends on May 21, so click that link now.

In this day and age, I don't need to emphasize the importance of standing up and being counted. If you've ever grumbled about the old and stultifying format of STOC, this is your chance to vote with your feet. A big success for TheoryFest 2017 will surely encourage more along these lines in future years.

Monday, December 12, 2016

A short course on FATML -- and a longer course on ethics in data science.

I'm at IIIT Allahabad teaching a short course on fairness, accountability and transparency in machine learning. This is part of the GIAN program sponsored by the Government of India to enable Indian researchers from outside the country to come back and share their knowledge. The clever pun here is that GIAN -- which stands for Global Initiative of Academic Networks -- also means 'knowledge' in Hindi (ज्ञान).
Anyway, I'm excited to be teaching a course on this topic, and that has forced me to organize my thoughts on the material as well (which has already led to some interesting insights about how the literature fits together). I'm writing out my lecture notes, and one of the energetic students at the course has kindly offered to help me clean them out, so I hope to have a set of notes in less than a year or so. 

Note that the syllabus on the linked page is evolving: it will change over the course of the next few days as I add more things in. 

I'm also excited to to be teaching a (brand-new) undergraduate course on ethics and data science next fall. I'll have more to say about this as I prepare material, but any suggestions/materials are welcome. 

Thursday, October 20, 2016

Modeling thorny problems.


Today, I got into a nice twitter discussion with former colleague and NLPer extraordinaire (even when he hates on algorithms) Hal Daumé. Here's what he said:

To which I replied:
It seemed to be worth fleshing this out a little.

In the work we've been doing in algorithmic fairness, one of the trickiest issues has been attempting to mathematically capture the different (and very slippery) notions of fairness, bias, and nondiscrimination that are used in common parlance and in the more technical literature on the subject. As Hal noted in his tweet above, these are very difficult problems for which no one really has the right answer.

How then does one make forward progress? In my view, framing is an essential part of the discussion, and one that we tend not to consider very explicitly in computer science. That is to say, deciding how to ask the question has much more impact on forward progress than the specific nature of the solutions.

This is not news if you look at politics, the law, policy discussions or the social sciences in general. But it matters a great deal as CS starts dealing with these issues because the choices we make in how to pose questions change the kinds of progress we hope to make. (while I realize that "questions matter, not solutions" is one of the oldest tropes in all of research, modeling issues like ethics and fairness requires a whole other dimension of framing -- see this nifty reframing of the editorial role of Facebook for example)

Specifically, I think of these problems as points in a partial order. It's relatively easy to talk about progress along a chain: that's what CS is really good at. But it's important to realize that we're living in a partial order, and that when we argue about different approaches, we might actually be arguing about fundamentally different and incomparable issues.

So then what happen when we reach different maximal elements (or imagine ideal maximal elements)? That's where the arguments center around beliefs, axioms and worldviews (not surprisingly, that's how we're thinking about fairness). But then the arguments are of a different nature altogether, and realizing that is very important. It's no longer an issue of what running time, what approximation ratio, what generalization bound you can get, but rather what aspects you're trying to capture and what you're leaving out.

So while it might seem that the problems we face in the larger realm of algorithms and ethics are irreducibly complex (they are) there's value in understanding and creating a partial order of formulations and making progress along different chains, while also arguing about the maximal elements.

Coda: 

One day, when I have nothing to do (ha!) I might attempt to write down some thoughts on how to model ill-structured problems mathematically. And it's not just about algorithmic fairness -- although that tests my skills royally. Any kind of mathematical modeling requires certain skills that we don't learn when we learn to solve crisply defined problems.

For example, one aphorism I found myself spouting some weeks ago was
Model narrowly; solve broadly. 
This was in the context of a discussion I was having with my undergraduate student about a problem we're trying to formalize. She had a nice model for it, but it was very general, and I worried that it was capturing too much and the essential nature of the problem wasn't coming through. Of course when it comes time to solve a problem though, using general hammers is a good place to start (and sometimes finish!).

Monday, September 26, 2016

Axiomatic perspective on fairness, and the power of discussion

Sorelle Friedler, Carlos Scheidegger and I just posted a new paper on the arxiv where we try to lay out a framework for talking about fairness in a more precise way. 

The blog post I link to says more about the paper itself. But I wanted to comment here on the tortuous process that led to this paper. In some form or another, we've been thinking about this problem for two years: how do we "mathematize" discussions of fairness and bias? 

It's been a long and difficult slog, more than any other topic in "CS meets X" that I've ever bumped into. The problem here is that the words are extremely slippery, and refract completely different meanings in different contexts. So coming up with notions that aren't overly complicated, and yet appear to capture the different ways in which people think about fairness was excruciatingly difficult. 

We had a basic framework in mind a while ago, but then we started "testing" it in the wild, on papers, and on people. The more conversations we had, the more we refined and adapted our ideas, to the point where the paper we have today owes deep debts to many people that we spoke with and who provided nontrivial conceptual challenges that we had to address. 

I still have no idea whether what we've written is any good. Time will tell. But it feels good to finally put out something, however half-baked. Because now hopefully the community can engage with it. 

Friday, August 26, 2016

Congrats to John Moeller, Ph.D

My student +John Moeller (moeller.fyi) just defended his Ph.D thesis today! and yes, there was a (rubber) snake-fighting element to the defense.

John's dissertation work is in machine learning, but his publications span a wider range. He started off with a rather hard problem: attempting to formulate a natural notion of range spaces in a negatively-curved space. And as if dealing with Riemannian geometry wasn't bad enough, he was also involved in finding approximate near neighbors in Bregman spaces. He's also been instrumental in my more recent work in algorithmic fairness.

But John's true interests lie in machine learning, specifically kernels. He came up with a nice geometric formulation of kernel learning by way of the multiplicative weight update method. He then took this formulation and extended it to localized kernel learning (where you don't need each kernel to work with all points - think of it like a soft clustering of kernels).

Most recently, he's also explored the interface between kernels and neural nets, as part of a larger effort to understand neural nets. This is also a way of doing kernel learning, in a "smoother" way via Bochner's theorem.

It's a great body of work that required mastery of a range of different mathematical constructs and algorithmic techniques.  Congratulations, John!

Wednesday, August 17, 2016

FOCS workshops and travel grants.

Some opportunities relating to the upcoming FOCS.

  • Alex Andoni and Alex Madry are organizing the workshops/tutorials day at FOCS, and are looking for submissions. The deadline is August 31, so get those submissions ready!
  • Aravind Srinivasan and Rafi Ostrovsky are managing travel and registration grants for students to go to FOCS. The deadline for applications is September 12. 
Of course FOCS itself has an early registration deadline of September 17, which is also the cutoff for hotel registrations. 

Monday, July 25, 2016

Pokégyms at Dagstuhl

Yes, you read that correctly. The whole of Dagstuhl is now a Pokégym and there are Pokémon wandering the streets of Wadern (conveniently close to the ice cream shop that has excellent ice cream!)

Given this latest advancement, I was reminded of Lance Fortnow's post about Dagstuhl from back in 2008 where he wistfully mourned the fact that Wifi now meant that people don't hang out together.

Times change. I am happy to note that everything else about Dagstuhl hasn't changed that much: we still have the book of handwritten abstracts for one thing.

Carl Zimmer's series on exploring his genome

If you haven't yet read Carl Zimmer's series of articles (one, two, three), you should go out and read it now!

Because after all, it's Carl Zimmer, one of the best science writers around, especially when it comes to biology.

But even more so because when you read the story of his personal quest to understand his genetic story in all its multifaceted glory, you understand the terrifying opportunities and dangers in the use of genetic information for predictive and diagnostic medicine. You also realize the intricate way that computation is woven into this discovery, and how sequences of seemingly arbitrary choices lead to actual conclusions about your genome that you now have to evaluate for risk and likelihood.

In a sense, this is the tale of the use of all computational approaches right now, whether it be in science, engineering, the social sciences, or yes, even algorithmic fairness. Zimmer uses the analogy with telescopes to describe his attempts to look at his genome, and this explanation is right on the money:
Early telescopes weren’t terribly accurate, either, and yet they still allowed astronomers to discover new planets, galaxies, and even the expansion of the universe. But if your life depended on your telescope — if, for example, you wanted to spot every asteroid heading straight for Earth — that kind of fuzziness wouldn’t be acceptable.
And this quote from Robert Green, one of the geneticists who was helping Zimmer map out his genome:
Ultimately, the more you know, the more frustrating an exercise it is. What seemed to be so technologically clear and deterministic, you realize is going through a variety of filters — some of which are judgments, some of which are flawed databases, some of which are assumptions about frequencies, to get to a best guess.
 In this is a message for all of us doing any kind of data mining.

Disqus for The Geomblog