Wednesday, January 20, 2016

On math in job talks

It's job talk season around the country, and this note from +Lance Fortnow  popped up on my twitter feed.
I was a little puzzled by this claim, and soon the following response from +Ryan Williams appeared
At which point +Lev Reyzin chimed in with
My response to all of this was:
Above all, a job talk should be comprehensible to a general CS audience, for many of whom the last algorithms class they talk was 25 years ago, and for whom IP, PCPs or even approximation algorithms are strange unknown beasts. And so I understand the sentiment expressed in Lance's post ("don't get too technical").

But avoiding formulas seems excessive. If you're expressing mathematical concepts, sometimes you need to be precise. Especially if your work hinges on being careful about these concepts. Thankfully, a good result has a nice intuitive explanation, and it's usually possible to do what Ryan suggested. For example, my favorite pitch for IP, PSPACE and the like is "A conversation is likely exponentially better than a monologue". (For why I'd be talking about IPs and PSPACE, you'll have to wait for my next post or two). 

But couple that with the formal expression! Otherwise the more intuitive explanation seems like a giant hand wave that doesn't make any sense.

One of my standard rants when I'm explaining to students how to read theory papers is that they have to be able to go fluidly between notation and intuition, and that they have to mentally translate formalisms in a paper into intuition in order to really get what the paper is saying. But if we can't present both forms in a single setting, how is any non-expert to realize that the two are related ?

In short, if you're doing a job talk where I'm in attendance, don't be shy to put in formulas, but make sure to give some intuition for what the formalism is saying.


Friday, January 15, 2016

Reading With Purpose: A grand experiment

This is the preface to a set of notes I'm writing for a seminar this semester. It will be a fun and bumpy ride !
Reading research papers in mathematical disciplines takes training. A student’s natural tendency is to read a paper from beginning to end in a depth-first manner. Every time they get stuck at a concept, they branch down another path, and then another, to the point where even getting through an abstract for a paper can take days and is exhausting.
Needless to say, this is not the best way to read technical material of any kind, mathematical or otherwise. 
I have spent many hours explaining to my students how they should be reading papers, and if you’re reading this and you have students of your own, you’ve probably done the same. While I don’t claim to have the perfect method for reading any given paper, I’ve learnt over the years how to quickly skim a paper, flying over it and scanning the terrain as it were. I’ve learnt when to make a deep dive all the way to ground level, and when to resurface. And above all, I’ve learnt to distinguish the different reasons for reading technical material in the first place, and that each mode requires a different treatment. 
I’ve looked all over the web for good material to give to students to learn how to read papers in (theoretical) computer science. I’ve found a few short articles here and there: S. Keshav’s note on reading systems papers, Michael Mitzenmacher’s helpful document, a page by Jason Eisner, and the many blog posts by Terry Tao, to name a few. There are also a number of good short articles on reading mathematics that I’ve drawn some inspiration from. 
But none of these capture the essence of what I try to achieve in my reading and what I want my students to be able to do. It’s a kind of Thurstonian ideal of understanding: where the proof falls away and what you’re left with is a deep conceptual appreciation of how the parts of a proof (or a theorem, or a set of definitions) fit together, and what they really mean. 
Moreover, if one is to teach the art of reading, it is important not just to describe the goal, but to find a way to get there: with exercises, little practices, and muscle-strengthening activities. This is after all the difference between teaching and showing.

That is my motivation for this collection of notes. It is my attempt to demystify and systematize a set of tools for reading papers. It will not be complete, and it will likely not resemble how you read papers. But hopefully it provides one way of getting to the promised land.

Thursday, January 07, 2016

SoCG-STOC Joint Workshops Call

As you all might know, in an event sure to get someone a Nobel Peace Prize, the Symposium on Computational Geometry (SoCG) and the Symposium on the Theory of Computing (STOC) will "sympose" together in Boston this year.

As part of this peace offensive designed to unite points and graphs, the combinatorial and the geometric, the Laplacian and the Hilbertian, (someone please stop me now...), there will be a day of overlap between the two conferences. The intention is to have a day of workshops on topics of interest to both communities, as well as two invited talks (by Timothy Chan and Santosh Vempala).

I'm one of the co-organizers of this joint day of workshops (together with Yusu Wang, Chandra Chekuri and Alex Andoni). And I strongly encourage you to consider putting together a proposal for a workshop/tutorial/other event that brings together folks from these communities.

For more details, please visit the workshops page. The deadline for submitting proposals is Feb 22, 2016. Proposals do not need to be very long - they merely need to answer specific questions as detailed in the CFP.

This is a great opportunity to expose the communities to topics that they might not ordinarily bump into, but could enjoy and benefit from. I'm very excited to see what ideas get tossed around, and am looking forward to the event. Kudos to the STOC and SoCG steering committees for making this happen.

Saturday, December 12, 2015

White Elephant parties and fair division

It's holiday party season, which means that it's time for white elephant parties. For those who don't know what a white elephant party is, here's a description from Wikipedia:
Each participant supplies one wrapped gift. The gifts are placed in a central location, and participants determine in which order they will take turns selecting them. The first person opens a wrapped gift, and the turn ends. On subsequent turns, each person can open a new present or gets the choice to "steal" another person's gift. The gift cannot be stolen once the third participant touches the gift (i.e. - it is stolen for the 2nd time). When a person's gift is stolen, that person can either choose another wrapped gift to open or can steal from another player. The game is over when the last person goes and the first person goes again
I just finished my algorithms class with the "traditional" (n=2) discussion of fair division algorithms (aka cake cutting, but with actual cake). White elephant parties are also a form of fair division with indivisible goods.

A few years ago, I started wondering about the theory behind white elephant exchanges and posted a question on cstheory. The answer I got (from usul) was excellent (tl;dr: worst-case definitions of fairness don't work). But it's also a few years old, and I wonder if there are new results on this topic.

Tuesday, December 01, 2015

Fairness and The Good Wife

The Good Wife is a long-running legal TV show with a Chicago politics backdrop. It's one of the most popular shows on network TV, and so it was particularly fascinating to see an episode devoted in part to issues of algorithmic fairness.

It's helpful to know that the show frequently features a Google-like company called ChumHum. In this episode, they're being sued because of a feature in their maps application called "Safe Filter" that marks regions of a city (Chicago) as safe or not safe. A restaurant owner claims that she went out of business because Chummy Maps (their horribly named maps application) marked the region around her restaurant as unsafe.

The writers of this episode must have been following some of the news coverage of fairness in algorithms over the past year: a number of news items were referenced in passing. What follows is an annotated list of references.

  • "Math is not racist". This is what the character playing the COO of Chumhum says when he's first deposed, arguing that the Safe Filter uses only algorithms to determine which regions are safe. This is reminiscent of the overly uncritical article that NPR did about the use of machine learning in hiring: the CEO of Jobaline (one of the companies doing this) happily proclaimed that "math is blind"
  • "Tortious intent": Disparate impact is one vehicle that might be used to determine bias-by-algorithm. But in the episode, the lawyers argue a stronger claim, that of tortious intent, which is interesting because they then have to show deliberate racial bias. 
  • "Objective third party statistics and user generated content": The initial line of defense from Chumhum is that they use third party statistics like crime rates. The lawyers immediately point out that this could introduce bias itself. They also say they use user-generate content as a defense ("We're not racist: our users are"). This is then rebutted by the lawyers pointing out that the users of the maps app skew heavily Caucasian (bringing up another good point about how bias in training data can leech into the results)
  • "Full discovery": Chumhum wanted to hide behind its algorithm: the opposing lawyers made a successful argument for discovery of the algorithm. I doubt this could ever happen in real life, what with trade secrets and all. More on this later. 
  • "Home ownership rates as proxy for race": One subplot involved determining whether home-ownership rates were being used in the Safe Filter. The characters immediately realized that this could be a  proxy for race and could indicate bias. 
  • "The animal incident": This was a direct reference to the image-tagging fiasco of a few months ago when Google's photo app started labelling pictures of African-Americans as 'gorillas'. While at first this is a throw-away incident (including a line "Even google did it!"), it comes back later to haunt the company when a lawyer looks at the code (ha!) and discovers a patch that merely removes the 'animal' tag (instead of fixing the underlying problem). This appears to also be what Google did to "solve" its problem. 
  • "Differential ad placement": A hat tip to the work by Latanya Sweeney and the CMU team, another plot point turned on the discovery that ads in the maps application were targeting the white lawyer with ads for skiing and the black lawyer with ads for soul food.  This in and of itself was not a problem for the case, but it led to a much more fascinating argument: that Chumhum was drawing on user profile data from all its properties (search, email etc) to target ads, and so discovery could not be limited solely to maps-related data and code. This is in general the problem with asking for code to do an audit: if you don't know where the training data is coming from, the code is basically useless. Remember, an algorithm isn't always just a piece of code :)
  • "Bad training data/non-diverse workforce": One of the employee characters made the argument that the bad image tagging results were the result of "bad training data", which is an accurate statement and is part of the fairness concerns with algorithms. The lawyer also made the point that a less-homogenous workplace might have helped as well (which brings to mind the Al Jazeera panel I participated on a few months ago)
  • "IMPLICIT BIAS": I was happy when this phrase was used correctly to argue for how even "non-racist" people can help perpetuate a racist system. I would have been happier if someone had said "Disparate impact" though :). 
If you're wondering, the final resolution of the case did NOT turn on a determination of bias or not. It turned out that the restaurant had been losing money before the filter was even put into place. But it was interesting to see an example (albeit on TV) of how a court case on this might pan out. A lot of the side show involved trying to claim that programmers on the Maps app were racist (or had racist inclinations) to argue for why the code might be biased as well. 

Sunday, November 01, 2015

Rock bands with CS names.

You all know (or should know) about Positive Eigenvalues, the band formed by Michael Jordan that has included (among others) Christos Papadimitriou. But I'm not talking about that kind of band.

I'm looking for bands with names that have a CS connection (accidentally or otherwise). I was playing a Spotify playlist called (coincidentally) Deep Focus and the first band on the list was called Random Forest.

So here's my list so far:

Any more ? 

Saturday, October 31, 2015

Data and Civll Rights II

I just got back from the one-day Data and Civil Rights conference organized by Data and Society. As I mentioned in my previous post, the conference operated under Chatham House Rules, which means I can't really reveal any of the specific discussions that went on in the main sessions or the breakout groups.

This was a conference full of civil rights activists, lawyers, policy types, and even police folk. It was a conference where people could get up and proclaim that they hate numbers, to cheers from the audience. Feeling like the odd one out is unfamiliar to me.

But it was full of passion and fire. And very depressing if you think of data analysis and ML as a GOOD THING. Because in this context, it is at best a blunt weapon that is being wielded carelessly and violently.

We've had a good run designing algorithms that tell people what to buy and what to watch. But when these same algorithms start deciding whether people can live their lives on their own terms, then as the cool kids are wont to say:


Friday, October 30, 2015

Eschew obfuscation: write clearly

There's an article in the Atlantic about the "needless complexity of academic writing". Apart from learning that there's a Plain Writing Act (who says Congress is gridlocked!), I wasn't too surprised by the points made. Yes, academic writing can be turgid and yes, part of this is because we want to "impress the reviewers", and no academics can't be coerced into changing the way they do things - at least not easily.

Steven Pinker has proposed an alternate theory of why academic writing is so jargon-heavy. Paraphrasing from the Atlantic article:
Translation: Experts find it really hard to be simple and straightforward when writing about their expertise. He calls this the “curse of knowledge” and says academics aren’t aware they’re doing it or properly trained to identify their blindspots—when they know too much and struggle to ascertain what others don’t know. In other words, sometimes it’s simply more intellectually challenging to write clearly.
For me, blogging has always been a way out of this blind spot. First of all, I can be more conversational and less stilted. Secondly, even if I'm writing for a technical audience, I'm forced to pare down the jargon or go crazy trying to render it.

But I wonder how hard it really is for experts to write clearly about their work. I wonder this because these same experts who write prose that you can clobber an elephant with are remarkably colorful and vivid when describing their work in person, on a board, or at a conference (though not at a talk itself: that's another story).

While it's common to assume that the obfuscation is intentional (STOC papers need to be hard!), I think it's more a function of deadline-driven writing and last-minute proof (or experiment) wrangling.

I'm thinking about this because I'm planning to run a seminar next semester that I'm calling 'Reading with Purpose'. More on that in a bit...

Monday, October 26, 2015

Data and Civil Rights

I'm in DC right now for a one-day conference on Data and Civil Rights, run by the Data and Society Institute.

This is an annual event (this is the second such conference). Last year's conference was themed "Why Big Data is a civil rights issue", and this year's conference focuses on the very hot-button topic of big data and criminal justice.

Needless to say, issues of fairness and discrimination are front and center in an area like this, and so I'm hoping to learn a lot about the state of play (and maybe contribute as well).

This is more of a working meeting than a traditional conference: all material is private during the conference and we're expected not to talk about the discussions outside the event (a la Chatham House rules). Digested material from the different working groups will be posted in November.


JHU Workshop on Sublinear Algorithms

The latest in a long line of workshops on sublinear algorithms (streaming! sketching ! property testing ! all of the above !) will be held at JHU this year just before SODA 2016. The message from the organizers is below: do consider attending if you're planning to attend SODA. (Disclaimer: I'm giving one of the 20+ talks, but I will not promise that it's excellent). 

Dear colleagues,

We are organizing a Sublinear Algorithms workshop that will take place at Johns Hopkins University, January 7-9, 2016. The workshop will bring together researchers interested in sublinear algorithms, including sublinear-time algorithms (e.g., property testing and distribution testing), sublinear-space algorithms (e.g., sketching and streaming) and sublinear measurements (e.g., sparse recovery and compressive sensing).

The workshop will be held right before SODA’16, which starts on January 10 in Arlington, VA (about 50 miles from JHU).

Participation in this workshop is open to all, with free registration. In addition to 20+ excellent invited talks, the program will include short contributed talks by graduating students and postdocs, as well as a poster session. To participate in the contributed talk session and/or the poster session, apply by December 1.

For further details and registration, please visit
http://www.cs.jhu.edu/~vova/sublinear2016/main.html .



Best,
Vladimir Braverman, Johns Hopkins University
Piotr Indyk, MIT
Robert Krauthgamer, Weizmann Institute of Science
Sofya Raskhodnikova, Pennsylvania State University

Friday, October 02, 2015

An algorithm isn't "just code"

I've been talking to many people about algorithmic fairness of late, and I've realized that at the core of pushback against algorithmic bias ("algorithms are just math! If the code is biased, just look at it and you can fix it !") is a deep misunderstanding of the nature of learning algorithms, and how they differ fundamentally from the traditional idea of an algorithm as "a finite set of well-defined elementary instructions that take an input and produce an output".

This misunderstanding is crucial, because it prevents people from realizing why algorithmic fairness is actually a real problem. And that prompted me to write a longer note that takes the "algorithm == recipe" analogy and turn it on its head to capture how machine learning algorithms work.


Tuesday, July 28, 2015

The 2nd Workshop on Fairness, Accuracy and Transparency in Machine Learning: A review

I was one of the organizers of the 2nd workshop on Fairness, Accuracy and Transparency in Machine Learning (FATML) at ICML 2015, and in my alternate career as moderator of data mining panels, I moderated the closing panel. The panelists were Fernando Diaz from MSR New York, Sorelle Friedler from Haverford College, Mykola Pechenizkiy from Eindhoven Instt. of Technology and Hanna Wallach from UMass-Amherst and MSR.

While my original intent was to do a review of the panel, it became clear that the panel discussion touched on themes that were bubbling up throughout the day. So what follows is organized by panel questions, but weaves in discussion from outside the panel as well.

This year's workshop, unlike the one at NIPS 2014, had a bit more of a technical focus: we had some of the early researchers in fairness work from Europe and Japan give talks on their work.  So as a counterweight, I thought I'd ask the panel to look beyond the presentations for the first question:
Question 1 
What is one thing you think we are missing (or should be paying more attention to) in our current discussions of fairness and discrimination  in terms of interaction with the social/policy/legal world OUTSIDE CS ?

Some themes that emerged:

...on the difference between Europe and the US


Mykola made an interesting observation based on his experience in educational data mining. Governments around Europe are very concerned about the use of student data (including health records and academic information) for any kind of data driven tools, and have severely regulated use of such data. As a result, the nascent field of educational data mining has been crippled by lack of access to data.

This is almost the opposite of the situation in the US, where data driven policy is the newest buzzword in town, and those of us who are interested in issues of fairness and transparency feel like we're constantly on the outside looking in (though attention to these issues is increasing).

...connecting to other communities


It's been clear from the beginning that discourse on fairness and transparency in machine learning must draw on the corresponding discourses in society at large. Which means that before we can start solving problems, we have to understand what the problems really are. This came through very strongly in the discussions. To paraphrase one of the panelists,  "Computer science likes to solve problems, and that's the problem !" (also known as  "slap a metric on it and optimize").

So what are the different communities we should be connecting to, and how ?

a) Connecting with social science


A major concern is the "prediction vs understanding" problem. For the most part, machine learning is about prediction: you classify, label, clustering, regress, rank and so on. But in the study of society and the dynamics of human interactions, the goal is not just to predict how humans might behave, but to understand their behavior. Which is to say, data analysis (or even fairness-aware analysis) has to be the first step in a larger conversation, rather than the last one.

While I don't think this issue is specific to fairness and transparency, it  plays a role in understanding the sources of inequality and discrimination. It's not enough to to detect examples of bias: what must happen next is an investigation of why the bias is happening.

(ed: personally, while I understand this concern, I don't think it's necessarily something computer scientists need to prioritize. This is after all what the social sciences do, and it doesn't make sense for us as computer scientists to merely try to acquire those skills. I think we need to be aware of the deeper issues of understanding a domain, but we also have strengths that we bring to the table and I'll say more about that later)

"galaxies don't care how they are studied, but people do"

Another point that was made over and over is that  issues of fairness and bias are not abstract: they affect actual people. Keeping the human in focus is important for the ethical underpinning of what we do, and even how we might design experiments.

b) connecting with journalists


Nick Diakopoulos gave a talk on "algorithmic accountability" in journalism. In addition to talking about what made research on fairness newsworthy:

  • discriminatory/unfair practices
  • mistakes that denies a service
  • censorship
  • activities that break the law or social norms
  • false prediction

he made the strong argument that (government) legitimacy comes from transparency, and talked about what that might entail in the age of data driven policy, including transparency involving data collection, the algorithms used, the inferences generated, and the humans involved in the process.

(ed: I don't think that our demands on transparency should be limited to government entities: the sad fact is that at least in the US, much of what would be considered basic internet infrastructure is controlled by private corporations, and they should be held to similar standards: if not for legitimacy, at least for fairness)

c) connecting with the law


Our fearless leader Solon Barocas made a number of interesting observations on the connection between algorithmic fairness  and the law, all the while disclaiming IANAL :). But his point (which he's made before) is worth repeating. One of the things that computer science can do well is  make precise concepts that might be defined vaguely or only indirectly through case law. And then we can get to work teasing out the relationships between different concepts (both abstractly and computationally). Indeed, the idea of a "reduction" between concepts in fairness might be one of the most useful things that computer science can uniquely contribute.

It's clear we're in a "let a thousand definitions bloom" phase in fairness research. And it's interesting to see the different reactions to this: on the social science side, there appears to be some nervousness that we're "playing games with math", but from Solon's comments this doesn't seem like a bad thing as long as we're also trying to connect the definitions together.


 Question 2 
In your view, what’s the next most pressing question we should be asking (limited to INSIDE CS to distinguish from the previous question) ?

...better definitions


It was very clear from the discussion that we need broader definitions of F-A-T beyond what's mathematically plausible. One particular example that's reminiscent of the metrics for privacy: There's a notion of "utility": how much can we make the data or the task "fair" without changing the underlying results produced by the "unfair" data/algorithm. The problem is that utility itself is not very well defined. Firstly, you might be benefiting from discriminatory policies, so your perceived "utility" itself is a problem. Trying to maintain this defeats the purpose of fairness. Secondly, even if this is not the case, the framing of the question as a tradeoff implies that these two notions are necessarily in opposition. That shortchanges the moral imperative of fairness and is different from the parallel situation in privacy. Finally, we measure utility in terms of classifier accuracy. But that's a very poor notion of overall task effectiveness. For example, is there a Bayesian perspective to bring to this ?

At any rate, since we are good at understanding tradeoffs in computer science, we should understand the different dimensions of the space of fairness preserving methods, rather than limiting ourselves to a one-dimensional false dichotomy of "fairness vs utility".

...better usable artifacts


Nick asked us the following question at the panel:

when a CEO or an agency head comes to us and asks "what should we do about this fairness stuff". what do we tell them ?

We didn't have a good response, and that was interesting. While we're beginning to explore the space of what's possible, we don't have clear examples of artifacts to hand over and say "use this".

As usual, the topic of benchmarking came up. I joke that when industry folks bring up the issue of benchmarking, I always ask "so where's the data" and they usually go very silent. But I do think there are useful data sets to be explored that come to us from the government. Some common data sets that get used are the US census data on salaries and a German data set on consumer credit. The entire data set from the Ricci court case is also available (even though it's tiny), and there are Bureau of Justice recidivism data sets to play with.

Of course this goes against the imperative coming from the social scientists to look at specific domains and ask meaningful questions in that domain. And I think we need to look more at the literature on fairness and bias over the decades and extract data that people have studied.

...better problems


For the most part, researchers have been considering binary classification as the suspect task. But of course there are much more general tasks that we could be considering: what about unsupervised learning ? what about structured prediction ? Is there a way to define fairness when you don't have a simple binary response variable and binary attributes ?

One final question I asked was this:
 Question 3 
do we have to solve the causality problem in order to talk about fairness ? 

This question was possibly not as well-posed as I would have liked, but it led to interesting discussions.

The law deals with intent, because the goal of the law is to assign responsibility. Algorithms are not agents and can't exhibit intent. Causality is a proxy for intent, in that if we can say that something caused something else, we can assign blame in a different way. In fact there were two talks at the workshop that talked about causality directly in the context of fairness.

But causality is a very hard problem. It's extremely subtle (if you doubt this, read through some of the examples Judea Pearl discusses in his book), and extremely controversial: different camps have their own view of how to mechanize causal inference, and the battles there make frequentists and Bayesians look like life-long friends.

In the discussion that followed, it became clear that there were really two ways of thinking about causality as it relates to fairness. The first way is to think about the underlying causal mechanisms that might lead to otherwise innocent features leading to biased outcomes: that is, how might zip code correlate with racial identity for example. The second way, which is closer to what I had in mind, is to think about the behavior of an algorithm causally: the use of these inputs or this algorithm *caused* a particular decision to be made. This second idea is not as far-fetched as it seems: some work in the database community has looked at trying to find which tuples "caused" a certain output to be generated from a query.

If you think it's not important to understand causality as it comes to automated methods, you might not want to drive a self-driving car or fly a plane. But as Solon suggested in the discussion, one way of getting around causality is to think about negligence with respect to algorithms: can we design reasonable best practices for predictive tools and argue that a failure to use these methods is negligence ? The legal ramifications of these idea have been explored in the context of robotics (article, and response) but more work is yet to be done.

...back to narratives


Another comment by Nick D, again connecting to the journalism perspective: narratives and story telling are a powerful way to explain the results of data mining. I haven't talked much about interpretability, which is an important part of the larger discussion of transparency and accountability. But one way to communicate the results of (say) a fairness audit would be to provide a human-interpretable linkage between the problematic attributes being used for prediction and the protected attribute. For more on this, see Michael Nielsen's very timely new Quanta article on machine-generated explanations.

It's clear from all the discussion that there's a lot of work to be done and a small but active community of people interested in pushing these issues forward. Fairness, and algorithmic bias, are hot topics in the news nowadays, and it's a good time to take advantage of this burst of interest.

Friday, July 24, 2015

Racism/sexism in algorithms

For my sins, or more specifically because of my interest in fairness, I was asked to be on a panel discussing algorithmic racism/bias on The Stream, a web/tv show on Al Jazeera English (as opposed to Al Jazeera America, in case you're not already confused).

I've never done live TV before, so it was quite nerve-wracking. While my "role" was to be the tech explainer, I was worried about saying something I wouldn't be able to rephrase or take back, and I was worried about the 'shouting talking heads' nature of TV discussions.

I'm glad to say that none of this transpired. We had a very genteel discussion on the merits of the issue, and there was a lot of informed commentary on the topic. Personally I thought that the discussion leaned more heavily on dealing with bias via diversity in hiring rather than on structural bias in algorithm design, but it's a very important part of the bigger picture of fairness, and it's definitely a more accessible discussion than the arcana of algorithmic disparate impact (something that I hope to change:)).

Here's the show in its entirety. And irony of ironies, I went to twitter later to change my user name back to my full name, and Twitter wouldn't let me fill in my whole last name !


Saturday, July 04, 2015

On the different stages of learning and teaching (algorithms)

Descending a rabbit hole of links prompted by a MeFi discussion (thanks, +David Eppstein) of Steven Pinker's essay on the curse of knowledge (thanks, +Jeff Erickson), I came across an article by Alistair Cockburn on a learning framework inspired by aikido called 'Shu-Ha-Ri'.

In brief,

  • In the Shu stage, you're a beginning learner trying to find one way to solve a problem. It doesn't matter that there might be multiple ways. The goal is to learn one path, and learn it well. 
  • In the Ha stage, you understand one way well enough to realize its limits, and are ready to encounter many different strategies for reaching your goal. You might even begin to understand the pros and cons of these different approaches. In effect, you have detached from commitment to a single approach. 
  • In the Ri stage, you have "transcended" the individual strategies. You might use one, or another, or mix and match as needed. You'll create new paths as you need them, and move fluidly through the space of possibilities. 
Reading through this article while I ponder (yet again) my graduate algorithms class for the fall, I realize that this three-stage development process maps quite well to what we expect from undergraduates, masters students and Ph.D students learning about an area. 

The undergraduate is learning a tool for the first time (recurrence analysis say) and if they can understand the master theorem and apply it, that's pretty good. 

At the next level, they realize the limitations of the master theorem, and might learn about the Akra-Bazzi method, or annihilators, or even some probabilistic recurrence methods. 

Of course, once you're dealing with some thorny recurrence for the analysis in your next SODA submission, then the standard templates are helpful, but you'll often have to do something creative and nontrivial to wrestle the analysis into a form where it makes sense. 

Pick your own topic if you don't like recurrences. 

Which also explains why it's hard to explain how to prove things. Beginning students expect a standard formula (which is why induction and proof by contradiction get taught over and over). But once you go beyond this, there aren't really good templates. In effect, there's no good second level with a set of proof techniques that you can throw at most problems, which explains why students taking a grad algorithms class tend to struggle with exactly this step. 

Sunday, June 21, 2015

On Pixar, creativity and advising

I'm in Bertinoro for the Algorithms and Data Structures workshop organized by Camil Demetrescu, Andrew Goldberg and Valerie King. I will try to post updates from the event, but with the density of talks, no promises :). I'm still waiting to hear more about the STOC theoryfest deliberations from the conference: come on, bloggers !

In the meantime, I wanted to point to an excerpt from Ed Catmull's book on the Pixar process.

I don't generally enjoy "behind the scenes" books about "genius companies" or "genius" individuals. I feel that the word "genius" gets bandied around far too often to be useful, and there's too much 'gee whiz smart people do smart things' that perpetuates the 'Great Man' (and yes, Man) theory of intellectual discovery.

But I enjoyed the excerpt, and am now interested in reading the book (Disclaimer: Ed Catmull is ONE OF US). Catmull doesn't appear to trot out trite recipes for success. If the excerpt is any indication, it's an extremely thoughtful and nuanced take on what worked at Pixar and why, and brings in many voices of the actual people doing the hard work to make movies. Here's a paragraph on leading a team as a director:
Andrew likens the director’s job to that of a ship captain, out in the middle of the ocean, with a crew that’s depending on him to make land. The director’s job is to say, “Land is that way.” Maybe land actually is that way and maybe it isn’t, but Andrew says that if you don’t have somebody choosing a course—pointing their finger toward that spot there, on the horizon—then the ship goes nowhere. It’s not a tragedy if the leader changes her mind later and says, “Okay, it’s actually not that way, it’s this way. I was wrong.” As long as you commit to a destination and drive toward it with all your might, people will accept when you correct course.
This reminds me very much of things I say to my students as an advisor. To whit, "It's important to have some direction you're heading towards with a problem, even if that direction turns out to be a bad one". First of all, it keeps you moving forward instead of around in circles. Second, (and this is particularly true in research), even a failed exploration of a direction teaches you more about a problem than going in circles without a plan. This manifests itself in a few different ways:

  • If you go this way, are you getting somewhere interesting (an interesting problem, an interesting structural insight, something that is novel regardless of success or failure)
  • Do you have some sense that this approach might work ? In other words, a fishing trip with a new tool is fine, but a fishing trip with a new tool that you have some good feelings about is even better. This reminds of of a comment that I've heard before but took me a while to understand: "I don't try to solve a problem till I think I know how to solve it". 
  • Can you test the direction quickly ? This is a corollary of the 'hurry up and fail' concept that startups like to preach, without the valorization of failure. That is, since we accept that a direction might not pan out, it would be best to find ways to test this quickly so we can move on. This doesn't mean that repeated failure is good, or as Gavin Belson would say, "FAILURE = PRE-SUCCESS".  

Tuesday, May 19, 2015

ITA, or a conference I really enjoy.

Continuing my thoughts on the STOC 2017 reboot, I went back to Boaz's original question:

What would make you more likely to go to STOC?

And thought I'd answer it by mentioning an event that I really enjoy attending. I didn't post it as a comment because it's a little out of scope for the blog post itself: it doesn't make concrete recommendations so much as relay anecdotal evidence. 

The Information Theory and Applications workshop is a workshop: it doesn't have printed proceedings, and it encourages people to present work that has been published (or is under review) elsewhere. Keep that caveat in mind: the structure here might not work for a peer-reviewed venue like STOC. 

Having said that, the ITA is a wonderful event to go to. 
  • It's in San Diego every year in February - what's not to like about that
  • It runs for 5 days, so is quite long. But the topics covered change over the course of the 5 days: the early days are heavy on information theory and signal processing, and the algorithms/ml/stats shows up later in the week. 
  • There are multiple parallel sessions: usually 5. And lots of talks (no posters)
  • There are lots of fun activities. There's an irreverent streak running through the entire event, starting with the countdown clock to the invitations, the comedy show where professional comedians come and make fun of us :), various other goofy events interspersed with the workshop, and tee-shirts and mugs with your name and picture on them. 
The talks are very relaxed, probably precisely because there isn't a sense of "I must prove my worth because my paper got accepted here". Talk quality varies as always, but the average quality is surprisingly high, possibly also because it's by invitation. 

But the attendance is very high. I think the last time I attended there were well over 600 people, drawn from stats, math, CS, and EE. This had the classic feel of a 'destination workshop' that STOC wants to emulate. People came to share their work and listen to others, and there was lots of space for downtime discussions. 

My assertion is that the decoupling of presentation from publication (i.e the classical workshop nature of ITA), makes for more fun talks, because people aren't trying to prove a theorem from the paper and feel the freedom to be more expansive in their talks (maybe covering related results, or giving some larger perspective). 

Obviously this would be hard to do at STOC. But I think the suggestions involving posters are one way of getting to this: namely, that you get a pat on the back for producing quality research via a CV bullet ("published at STOC") and an opportunity to share your work (the poster). But giving a talk is a privilege (you're occupying people's time for slice of a day), not a right, and that has to be earned. 

I also think that a commenter (John) makes a good point when they ask "Who's the audience?". I'm at a point where I don't really enjoy 20 minutes of a dry technical talk and I prefer talks with intuition and connections (partly because I can fill in details myself, and partly because I know I'll read the details later if I really care). I don't know if my view is shared by everyone, especially grad students who have the stamina and the inclination to sit through hours of very technical presentations. 




Monday, May 18, 2015

STOC 2017 as a theory festival

Over on Windows on Theory, there's a solid discussion going on about possible changes to the format for STOC 2017 to make it more of a 'theory festival'. As Michael Mitzenmacher exhorts, please do go and comment there: this is a great chance to influence the form of our major conferences, and you can't make a change (or complain about the lack of change) if you're not willing to chime in.

I posted my two comment there, and you should go and read number one  and number two. Two things that I wanted to pull out and post here are in the form of a 'meta-suggestion':
1. Promise to persist with the change for a few years. Any kind of change takes time to get used to, and every change feels weird and crazy till you get used to it, after which point it’s quite natural. 
Case in point: STOC experimented one year with a two-tier committee, but there was no commitment to stick to the change for a few years, and I’m not sure what we learned at all from one data point (insert joke about theorists not knowing how to run experiments). 
Another case in point: I’m really happy about the continued persistence with workshops/tutorials. It’s slowly becoming a standard part of STOC/FOCS, and that’s great. 
2. Make a concerted effort to collect data about the changes. Generate surveys, and get people to answer them (not as hard as one might think). Collect data over a few years, and then put it all together to see how the community feels. In any discussion (including this one right here), there are always a few people with strong opinions who speak up, and the vast silent majority doesn’t really chip in. But surveys will reach a larger crowd, especially people who might be uncomfortable engaging in public.

Friday, May 15, 2015

Higher order interactions and SDM 2015

This year I'm one of the PC Chairs for SIAM Data Mining (along with Jieping Ye), and so I've been spending time in decidedly-not-sunny Vancouver. Being a PC Chair, or even being on a PC, is an exercise in constant deja vu: I hear a talk and think "Where have I heard this before" before realizing that I've probably reviewed the paper, or looked at its reviews or meta-reviews.

Being the PC chair means though that I can float around the conference freely without feeling the pressure to attend talks, network or otherwise be social: I've earned my keep :). Following standard conference networking maxims though, I made an effort to meet at least one coffee shop that I've met before and introduce myself to at least one new coffee shop, and another.

But I am attending talks ! And I enjoyed listening to some work on tensor spectral clustering by Benson, Gleich and Leskovec. It got me thinking about the larger issue of modeling higher-order interactions and what appear to be many different ways of modeling the problem.

The problem.


Imagine that you have a number of interacting entities. These could be points in a space, or vertices in a graph, or even dimensions of a point (aka variables). The easiest way to model a collection of such entities is to assume they're independent of each other. For example, I might draw i.i.d samples from a distribution. Or I might be looking at a collection of independent features describing an object, and so on. 

Independence assumptions are powerful. They allow us to "factorize" structure into combinations of individual elements, which either leads to simple summations directly, or eventually after a log transformation of a product. This means we can deal with entities independently, and the "inference complexity blowup" is linear in the number of entities. A good example of this is the Naive Bayes approach to learning, where assuming all entities are independent leads to a likelihood cost function that's just a sum of terms, one for each entity.

I'm necessarily being rather loose with my statements here. Making them precise is possible in specific contexts, but it gets unwieldy. 

But independence is too restrictive an assumption. It limits modeling power, and therefore will be unable to capture interactions that might make understanding structure a lot easier. For one thing, you'd never find correlations if you assumed that all features are independent.

Graphs. 


The easiest form of interaction is a pairwise interaction. Modeling pairwise interactions gets us to a graph, and who doesn't love a graph ! More importantly for what follows,

a graph and its associated structures is a rich representation of a system of pairwise interactions

in that we have a colorful vocabulary and an arsenal of algorithms for talking about pairwise interactions and structures built on them.

Of course we've paid a price - in complexity. Instead of the linear cost incurred by independent entities, we now have quadratically many potential pairwise interactions to model. But (and here's the key), we can interpret a sparse graph as capturing weak interactions, and it's still a rich model to model different phenomena.

Higher-order interactions.


But what happens if we want to model interactions that aren't just pairwise ? What is the correct higher-order structure to model such interactions as effectively as graphs ? It turns out that there are many different ways to do this, and they all can be reduced to a sentence (pace Saunders MacLane) of the form

A graph is just a very special kind of X
for different values of X. 

1. The graphical model view.


A graph is just a special kind of clique intersection structure, if you only have 2-cliques.

One way to manage a collection of higher order interactions is to factorize them in a more general way. This is the basis for the clique-tree idea in graphical models, where the interaction structure is viewed as a set of complete joint interactions (aka cliques) all connected together in a tree structure for easy inference. Another name for this would be the 'bounded treewidth' model, but it misses the fact that we are allowing higher-order interactions, but in a controlled way. 

The advantage of this representation is that in a true parametrized complexity way, it isolates where the true complexity is coming from (the size of each clique) from the overall complexity (the size of the graph). 

A spectral perspective.


When graphs arise from natural sources of data (social networks and the like), we have to deal with noise and spurious signals. And the simple language of connectivity and paths is no longer robust enough. For example a graph might be connected, but only because there's one edge connecting two huge components. If this edge was spurious, we've just made a huge mistake on modeling this graph structure. 

Spectral methods are currently our best way of dealing with noisy interactions. By focusing not on the topological structure of connectivity but on the amount of connectivity measured via cuts, spectral analysis of graphs has become perhaps the best way of finding structures in large graphs.

The spectral lens sees a graph through random walks on the edges. This is great for modeling a collection of pairwise interactions between entities, but not for modeling interactions among sets of entities. We have to be careful here. Spectral methods are actually quite good at finding community structure in graphs (i.e a partition into sets of vertices). What they can't do is find higher order partitionings in graphs (i.e sets of triangles or sets of special 4-vertex structures). And that's where the next three higher-order methods enter the picture

2. The algebraic topology view.


A graph is just the 1-skeleton of a simplicial complex. 

If we're looking to model higher-order interactions, we need a language for describing a collection of well-defined higher order structures. That's what a simplicial complex is. I'll skip the formal definition, but the basic idea is that if you have a simplex (an interacting group of entities), then all subsets must be simplices as well. But you declare the simplices first, which means that the simplex $\{a,b,c\}$ is different from the three simplices $\{a,b\}, \{b, c\}, \{c, a\}$, even though the first must contain the second. 

A simplicial complex is a topological object. It generalizes a graph because a graph is what you get if you limit yourself to simplices of size at most 2. Because it's a discrete topological object, you can now play with it using all the tools of topology, and in particular very powerful tools like homology and homotopy that reveal all kinds of hidden structure not accessible via a graph metaphor.

While simplicial complexes allow you to express higher order interactions (tunnels! holes!), they don't remove the problem of noise: one edge/simplex can still change the structure in nontrivial ways. There are two approaches that researchers have taken to this problem: one spectral, and one not. 

The non-spectral approach is by far the most well-established one. It is based on the idea of persistence:  a way to determine from the homology groups of the simplicial complex what structures are interesting and which ones are not. Persistence is the dominant weapon in the arsenal of topological data analysis, and I'll say no more about it here, so as to keep focus on spectral methods. 

The spectral approach is less well developed, but is quite interesting. The idea is to generalize the notion of expansion from a graph to higher-order simplices, as well as generalizing the Laplacian operator to higher order simplices (or homology groups). Then a random walk on the simplicial complex can be linked to the Laplacian operator, and the eigenstructure of the operator can be linked to the existence (or nonexistence) of certain homology groups. Two places to start with this topic are one on capturing generalizations of edge expansion, and another on building random walks on simplicial complexes and connecting them to a combinatorial Laplacian. 

3. The differential geometry view.


A graph is just a discrete approximation of (scalar functions on) a manifold. 

The graph Laplacian is a discrete approximation of the Laplacian second order differential operator, and more generally the Laplace-Beltrami operator on a manifold. Indeed, one way to build intuition for what the graph Laplacian means is that it's capturing heat diffusion on an implicit manifold that the graph is merely approximating.

The Laplace-Beltrami operator is a "zeroth-order" operator in that it applies to the zero-dimensional entities on a manifold: namely, scalar fields over the points of the manifold. Suppose you were to build a vector field instead over the manifold, and wished to reason about it. Then the generalization of the L-B operator that you'd need is called the Laplace-de Rham operator which formally acts like a Laplacian on the higher order differential forms defined over the manifold (formally, on sections of the tangent bundle). Writing down the L-R operator is a little tricky: it involves a combination of the exterior derivative and its dual (via the Hodge * operator). But one useful observation is that the L-R operator on graphs amounts to a Laplacian on the set of edges, rather than vertices.

This means that you can now treat edges as first-class objects for grouping, rather than vertices. And this is useful for higher-order clustering. Whether this can be generalized even further remains to be seen.

4. The linear algebraic view.


A graph (adjacency matrix) is just a special case of a tensor structure on the entities.

This is perhaps the most well-known of the different higher-order approaches to modeling interactions, and is making the most waves right now. The idea is very simple. If we think of a graph in terms of its adjacency matrix, then each entry encodes a relation between two basic entities. If we wished to encode relations between (say) three entities, then we need a "3D matrix", or more precisely, a tensor

Of course a tensor is more than just a way to assemble a collection of triples into a box, just like a matrix is much more than just a grid of numbers. The most basic question in tensor modeling is factorization: just like we can use the SVD to write down a matrix as a linear combination of outer products of basis vectors, can we write a tensor as 3-way wedge product of basic vectors ? If so, then we've been able to identify the key 3-way factors controlling an interaction. 

Tensor factorization is a hard problem: unlike the SVD, most formulations of tensor factorization are NP-hard. I won't get into this very rich topic right now, and instead will point you to some slides by Ravi Kannan as well as an older paper of his

But can we cluster using tensors ? Or more generally, is there a spectral theory of tensors, and maybe even the analog of Cheeger's inequality ? It turns out that it's possible to define eigenvalues and eigenvectors of tensors, and at least some of the standard spectral theory can be made to carry over - for more on this see these slides by Lek-Heng Lim. But we're still looking for a theory can truly work on hypergraphs or tensors in general. In the meantime, tensor-based approaches to clustering boil down to factorization and PCA-like methods, of which there are many. 


5. Coda


Are these approaches in any sense equivalent ? It's hard to say in general, although for a particular problem there might be connections. Certainly, the de Rham cohomology yields a nice filtration over homology groups that could be wrestled into a persistence framework (research question !!!) thus allowing us to extract robust higher-order structures from both geometric and topological considerations. The tensor approach is closely linked to probabilistic models, not surprisingly. But whether the spectral perspective (and all its attendant intuition and value) will extend nicely across these different frameworks remains to be seen. 




Tuesday, May 12, 2015

Special Issue of Internet Mathematics on Evolving Networks

Aris Gionis and I are co-editing a special issue of Internet Mathematics on evolving networks.
The widespread adoption of digitization has opened the way to gathering large amounts of data that record detailed information for many systems of interest. Examples include telecommunication networks, online social-media platforms, and biological systems. Many of these systems are typically represented as networks, and graph-theoretic techniques are used to analyze the available data. Furthermore, as our data-gathering capacity has increased, it is now possible to collect data the record not only a static aggregate view of the underlying network, but a continuous stream of events that captures the full dynamic behavior of the network. Such events may take the form of structural changes, or they may encode different types of actions and interactions performed by the network entities. This view of time-evolving networks poses new challenges and opens new research directions. The objective is to develop the theoretical foundations and to design the algorithmic principles that will allow to efficiently manage and analyze such evolving networks.

Read more about it here.  Submissions are due Oct 15, 2015. 

Thursday, April 30, 2015

LaTeX, Word, ggplot, and n00b-ness

I've been forcing myself to learn enough R and ggplot to make nice looking plots. It's taking me painfully long to learn to do even the most basic things, and +Carlos Scheidegger has been a big help !

What makes learning ggplot hard is grasping its underlying design philosophy. There's a notion of layers in a plot that I'm still getting used to. Once you get the hang of it it's quite elegant, and makes modifying plots very easy -- once you get the hang of it.

All of which makes me a little more sympathetic to people who struggle with using LaTeX. LaTeX has many of the same unnatural design principles, starting with the lack of a WYSIWIG interface (and no, LyX doesn't really count). It's an incredibly powerful interface (like ggplot), but it's darned hard to do simple things.

In fact, I've been wishing for a "Word equivalent" for ggplot2. Just like overleaf.com is now trying to make LaTeX easier for the general public, I would love to see some kind of interactive interface to ggplot2 that can generate the code automatically for standard tasks.

Disqus for The Geomblog