The Geomblog: 10/01/2015

Saturday, October 31, 2015

Data and Civll Rights II

I just got back from the one-day Data and Civil Rights conference organized by Data and Society. As I mentioned in my previous post, the conference operated under Chatham House Rules, which means I can't really reveal any of the specific discussions that went on in the main sessions or the breakout groups.

This was a conference full of civil rights activists, lawyers, policy types, and even police folk. It was a conference where people could get up and proclaim that they hate numbers, to cheers from the audience. Feeling like the odd one out is unfamiliar to me.

But it was full of passion and fire. And very depressing if you think of data analysis and ML as a GOOD THING. Because in this context, it is at best a blunt weapon that is being wielded carelessly and violently.

We've had a good run designing algorithms that tell people what to buy and what to watch. But when these same algorithms start deciding whether people can live their lives on their own terms, then as the cool kids are wont to say:

Friday, October 30, 2015

Eschew obfuscation: write clearly

There's an article in the Atlantic about the "needless complexity of academic writing". Apart from learning that there's a Plain Writing Act (who says Congress is gridlocked!), I wasn't too surprised by the points made. Yes, academic writing can be turgid and yes, part of this is because we want to "impress the reviewers", and no academics can't be coerced into changing the way they do things - at least not easily.

Steven Pinker has proposed an alternate theory of why academic writing is so jargon-heavy. Paraphrasing from the Atlantic article:

Translation: Experts find it really hard to be simple and straightforward when writing about their expertise. He calls this the “curse of knowledge” and says academics aren’t aware they’re doing it or properly trained to identify their blindspots—when they know too much and struggle to ascertain what others don’t know. In other words, sometimes it’s simply more intellectually challenging to write clearly.

For me, blogging has always been a way out of this blind spot. First of all, I can be more conversational and less stilted. Secondly, even if I'm writing for a technical audience, I'm forced to pare down the jargon or go crazy trying to render it.

But I wonder how hard it really is for experts to write clearly about their work. I wonder this because these same experts who write prose that you can clobber an elephant with are remarkably colorful and vivid when describing their work in person, on a board, or at a conference (though not at a talk itself: that's another story).

While it's common to assume that the obfuscation is intentional (STOC papers need to be hard!), I think it's more a function of deadline-driven writing and last-minute proof (or experiment) wrangling.

I'm thinking about this because I'm planning to run a seminar next semester that I'm calling 'Reading with Purpose'. More on that in a bit...

Monday, October 26, 2015

Data and Civil Rights

I'm in DC right now for a one-day conference on Data and Civil Rights, run by the Data and Society Institute.

This is an annual event (this is the second such conference). Last year's conference was themed "Why Big Data is a civil rights issue", and this year's conference focuses on the very hot-button topic of big data and criminal justice.

Needless to say, issues of fairness and discrimination are front and center in an area like this, and so I'm hoping to learn a lot about the state of play (and maybe contribute as well).

This is more of a working meeting than a traditional conference: all material is private during the conference and we're expected not to talk about the discussions outside the event (a la Chatham House rules). Digested material from the different working groups will be posted in November.

JHU Workshop on Sublinear Algorithms

The latest in a long line of workshops on sublinear algorithms (streaming! sketching ! property testing ! all of the above !) will be held at JHU this year just before SODA 2016. The message from the organizers is below: do consider attending if you're planning to attend SODA. (Disclaimer: I'm giving one of the 20+ talks, but I will not promise that it's excellent).

Dear colleagues,

We are organizing a Sublinear Algorithms workshop that will take place at Johns Hopkins University, January 7-9, 2016. The workshop will bring together researchers interested in sublinear algorithms, including sublinear-time algorithms (e.g., property testing and distribution testing), sublinear-space algorithms (e.g., sketching and streaming) and sublinear measurements (e.g., sparse recovery and compressive sensing).

The workshop will be held right before SODA’16, which starts on January 10 in Arlington, VA (about 50 miles from JHU).

Participation in this workshop is open to all, with free registration. In addition to 20+ excellent invited talks, the program will include short contributed talks by graduating students and postdocs, as well as a poster session. To participate in the contributed talk session and/or the poster session, apply by December 1.

For further details and registration, please visit
http://www.cs.jhu.edu/~vova/sublinear2016/main.html .

Best,
Vladimir Braverman, Johns Hopkins University
Piotr Indyk, MIT
Robert Krauthgamer, Weizmann Institute of Science
Sofya Raskhodnikova, Pennsylvania State University

Friday, October 02, 2015

An algorithm isn't "just code"

I've been talking to many people about algorithmic fairness of late, and I've realized that at the core of pushback against algorithmic bias ("algorithms are just math! If the code is biased, just look at it and you can fix it !") is a deep misunderstanding of the nature of learning algorithms, and how they differ fundamentally from the traditional idea of an algorithm as "a finite set of well-defined elementary instructions that take an input and produce an output".

This misunderstanding is crucial, because it prevents people from realizing why algorithmic fairness is actually a real problem. And that prompted me to write a longer note that takes the "algorithm == recipe" analogy and turn it on its head to capture how machine learning algorithms work.

The Geomblog

Pages