Showing posts with label distributions. Show all posts
Showing posts with label distributions. Show all posts

Monday, September 16, 2013

SIGACT CG Column: Moving heaven and earth

My new column for SIGACT News is up (non pay-walled version here). In it, I discuss two different ways of defining distances between distributions.

The earthmover distance is a familar measure to TCS folk, and the kernel distance (or the maximum mean distortion, or the distance covariance) is a familiar character in machine learning and statistics.

It turns out that while statistically, these are not as different as one might think, they are different when it comes to computing them, and understanding their geometry.

The column describes both measures and their properties, and does a compare-contrast between them. More generally, I try to describe how one should think about the process of constructing a distance function between two distributions.

Wednesday, February 09, 2011

My talks at ITA and at the college of engineering in Montana State

This is the abstract for the talk I'm giving in brief at the ITA Workshop and more expanded at a College of Engineering colloquium at Montana State. Thanks to the ITA program commitee and Brendan Mumey (at Montana State) for the invitations.

Dimensionality reduction for distributions: the good and the bad

In many application areas, the natural representation of data is in the form of an empirical probability distribution. Documents are represented as normalized frequency counts of words, images are represented by color (or other feature) histograms, and speech signals can be represented by spectral histograms.

There are many natural and meaningful ways of measuring similarity (or distance) between such distributions, and these define different geometries in which we might wish to analyze collections of distributions for interesting patterns. However, a practical bottleneck is the high dimensionality of these representations: for example, an 256 x 256 image might be represented by a vector of over 1000 features, and a document might be represented as a sparse vector with hundreds of attributes.

Thus, a key problem is: can we reduce the dimensionality of collections of distributions to make data analysis tractable while preserving the distances in the collection ?
In this talk, I'll discuss a collection of recent results centered around this theme, that provide both good news (and bad) for dimensionality reduction on distributions in theory and in practice.
 The above draws on information mostly from this paper with Arvind Agarwal and Jeff Phillips, and this work-in-progress with Rasmus Kyng and Jeff Phillips.

Tuesday, May 12, 2009

Joint distributions aren't that bad

I was reading Noam Nisan's exposition of correlated equilibria. It's interesting to note that finding a correlated equilibrium is easy: you have to solve a linear program in the relevant variables of the underlying distribution. This is in contrast to Nash equilibria.

What struck me is that this is a good example of where a joint distribution isn't that bad. In machine learning especially, trying to "learn" a joint distribution is hard firstly because the number of variables blows up, and because of nasty correlations that one has to account for. In fact, the area of variational methods often uses the trick of replacing a joint distribution by the "closest" product distribution, and optimizing over that instead.

Here though, the "replacing by a product distribution" takes you from a correlated equilibrium problem to a Nash equilibrium problem, and now the individual probabilities actually multiply, yielding a nonlinear problem that's PPAD-Complete.

Disqus for The Geomblog