Wednesday, February 13, 2008

Metric spaces, VC-dimension, and metric entropy.

For a problem that I've been working on, it turns out that if a related range space has bounded VC-dimension, the problem can be solved exactly (but with running time exponential in the dimension). The range space is constructed from two parameters: a metric space (X, d), and a radius e, and consists of the domain X, and all balls of radius at most e in X.

So a natural question that I've been unable to answer is:
What properties of a metric space ensure that the induced range space has bounded VC dimension ?
Most of what we do know comes from the PAC-learning community. For instance, the doubling dimension of a metric space is the smallest number d such that any ball of radius e can be covered by at most 2^d balls of radius e/2. In recent years, it has been popular to explore the extent to which
"metric space of bounded doubling dimension == bounded dimensional Euclidean space"
is true. Unfortunately, there are spaces of bounded VC-dimension that do not have bounded doubling dimension.

Another related proxy for the "dimension" of a metric space is its metric entropy: Determine the minimum number of balls of radius e needed to cover all points in a metric space. The log of this number is the metric entropy, and among other things is useful as a measure of the number of points needed to "cover" a space approximately (in a dominating set sense). It's known that the metric entropy of concept classes is closely related to their VC-dimension, (where the underlying metric is symmetric difference between the classes). I am not aware of any general result that relates the two though.

For more on the dizzying array of numbers used to describe metric space dimension, do read Ken Clarkson's magnificent survey.

[On a side note, I don't quite understand why the term "entropy" is used. It seems to me that if one wanted to use entropy, one would compute the entropy of the resulting set of balls, rather than merely the log of their number. ]

Disqus for The Geomblog