## Tuesday, May 04, 2010

### Metrics on distributions defined over metric spaces

Semester is over, which means I hope to get back into blogging form again, picking up my clustering series where it left off, and also starting some new rants on problems in shape matching (which more and more to me look like problems in clustering).

But for today, just a little something to ponder. The following situation often occurs in data analysis. You have some data that inhabits a metric space - usually Euclidean space, but that doesn't really matter. You also have distributions over the data, by which I mean some kind of weight vector with one "component" for each data point, and components summing to one. The goal now is to compare these weight vectors in a way that takes into account the structure of the space.

The standard construction that one uses here is the earthmover distance, also known as the Wasserstein distance, or the Monge-Kantorovich distance, or the Mallows distance, or the transportation metric (you pick your favorite one). It's very intuitive (which is probably why it's been invented so many times) and works like this. Imagine piles of earth at each data point, with each pile having mass equalling the weight at that point. We have a "starting configuration" consisting of the first weight vector, and an "ending configuration" consisting of the second weight vector. The goal is to figure out how to "transport" the earth with minimum effort (effort = weight X distance moved) so that the first configuration becomes the second. Formally, this amounts to a generalized matching problem that can be solved via the Hungarian algorithm. The earthmover distance is very popular in computer vision (see the Wikipedia article for details)

Another metric over distributions of the kind above is the Levy-Prokhorov metric, which for two distributions u and v is defined as the smallest e such that on any neighborhood A, the measure of u is within e of the measure of v on A inflated by e (i.e by constructing a ball of size e around A), and vice versa. I haven't seen this metric used much in practice, and it seems hard to compute.

Another approach that I realized recently uses a method that thus far has been used only in shape analysis. I've been working with a shape matching measure called the current distance of late (more on this in another post - it's a fascinating measure). Roughly speaking, it works like this. It starts with a "shape" defined anyway you like (clouds of points, a curve, a surface, whatever), and a similarity function (a positive definite kernel actually) defined on this space. It then allows you to compare these shapes by using the kernel to create a "global signature" that lifts each shape to a point in a Hilbert space, where the induced distance captures the shape distance.

It also works with weighted point sets, which is the relevant point here. Suppose I give you a space with distance defined indirectly via a kernel similarity function (rather than via a distance function). The current distance then gives me a way of comparing distributions over this shape just like the above measures, and the kicker is that this approach is WAY more efficient then any of the above methods, taking near-linear time instead of needing the rather expensive Hungarian algorithm. Moreover, the current distance has a built-in isometric embedding into a Hilbert space, something the earthmover distance cannot have.

If you're curious for more details, wait for my post on the current distance - in the mean time, there are two papers we've written (one online, the other you should email me for) that explore the theory and practice behind the current distance. I'm curious now as to whether the current distance can be used an efficient replacement for the earthmover distance in applications that rely on the EMD, but don't have a natural relationship to shape analysis.

p.s Shape matching in particular, and data analysis in general, is a rich source of ever more exotic metric spaces. I've been working with a number of these, especially in non-Euclidean spaces, and there are lots of interesting algorithms questions here.

1. One reason for interest in the Mallows and Prohorov metrics on the part of probabilists and statisticians is that they metrize convergence in distribution, i.e., for a sequence of probability measures P_n and a limit measure P, d(P_n,P) -> 0 in these metrics iff P_n converges to P in distribution. A bunch of other metrics or divergences on probability distributions which are more intuitive (like Kullback-Leibler divergence) actually induce finer topologies than that of convergence in distribution. Do you happen to know if the current distance also metrizes convergence in distribution?

2. For the earth mover distance, say I have u,v, and EMD(u,v) = e.

Can I say given v, e, that the inverse of EMD=e given v is exactly u? If so, do I get the same guarantee from Levy-Prokhorov?

3. Ooh, Cosma nailed what I was thinking about.

4. Cosma, that's an excellent question. I don't have an answer for you off the top of my head. One good thing about the current distance is that it embeds in an RKHS which is "nice" in many ways, so there's some hope that a metrizing result exists. I'll have to think more about it.

5. Hey, look what just popped up (literally) in my RSS feeds:
Hilbert Space Embeddings and Metrics on Probability Measures. I guess this fills my monthly allowance of one-in-a-million coincidences.

6. ah cool. it looks like it's exactly what I was talking about. excellent. and I note that they have metrizing results as well.