Saturday, June 19, 2010

The Shape of Shape Analysis Research, Part III

(a brief series on shape analysis: for earlier episodes, click here)

Shape matching research in computational geometry is fundamentally distance-based. In other words, we start with a distance function, and then design algorithms to compute it, or minimize it under transformations, or approximate it, and so on.

There's an important problem with this point of view. While computing the distance between two shapes is an important tool in shape analysis, it's not the only problem. Other equally important problems include:
  • Finding a shape similar to a query shape
  • Matching pieces of shapes together
  • Organizing shapes into groups (i.e clustering)
And so the problem with the distance-based viewpoint is that all you get at the end is an abstract metric space. You can compute d(x,y) in an appropriate amount of time (maybe), but you lack all the additional structure needed to solve these other problems efficiently. With our modern knowledge of metric embeddings, it's always possible to ask if these distances can be embedded in a more tractable space, but it turns out for measures of interest (Hausdorff, Frechet, earthmover), this cannot be done without incurring huge errors.

The idea of shape spaces turns this process around. Rather than starting with the distance, and trying to find a space to embed it in, shape-space based methods start with a mapping that takes a shape to a single point in a (usually curved) space, and use an induced metric (usually some kind of geodesic) as the distance.

By at least one unsourced account, this view of shape dates back to Riemann, but the modern formulation of this approach started with David Kendall, in the 70s. His idea was extremely elegant.

Consider a collection of closed simply connected regions of the plane (the shapes), each shape described by k points on its boundary. Each of these points can be described by the two coordinates (x,y), which we will write as the complex number x+iy. By a shifting transformation,  we can ensure that the centroid of each shape lies at the origin. This loses one (complex) degree of freedom, yielding a k-1 dimensional complex vector.

Next, consider what it means to rotate the shape around the origin. In the complex plane, this corresponds to multiplying by the complex number z = exp(i theta). Doing the appropriate projective transformation, this means that we can identify a shape with a single point in k-2 dimensional complex projective space.
The distance between two shapes is now defined as the geodesic distance between two points in this space.

There are a few important points to note here:
  1. Each shape of k points is mapped to a single point in a k-2 dimensional space.
  2. All shapes are assumed to have the same number of points, which correspond across shapes. 
  3. The space is constructed by quotienting the original representation (the k-dimensional complex vector) by the special orthogonal group.
This last point is particularly crucial: the invariance under transformations is folded directly into the representation, rather than being something to "solve" via minimization.

The general program outlined by Kendall (map shapes to points on a manifold quotiented by a suitable set of transformations) has led to many other constructions, among the more notable being Bookstein's shape space and the Michor-Mumford representation for planar closed curves invariant under diffeomorphisms (which bears a strong resemblance to a summed variant of the Frechet distance). These methods have (for reasons unknown to me) taken up residence primarily in the computer vision community.

A Critique.

There is much to like about the shape space approach to shape analysis. Fundamentally, by embedding shapes in a space with structure, it gives us both a distance measure and a geometry to play with, and this is invaluable. However, there are serious limitations to the ideas developed thus far.
  • Computation: It's all very well to come up with a mathematically elegant formulation of a distance as a geodesic, but it's a lot harder to actually compute these distances. In practice, researchers often resort to heuristics with no guarantees beyond local convergence. To me, this is like building a beautiful mansion in a pit of mud: it's hard to get in and out with a lot of dirt and pain. 
  • Scalability: the mathematical complexity also makes it harder to do scalable computations on shapes.
  • Global vs local features: I'll have more to say about this later, but these approaches (generally speaking) construct a global signature for a shape, which limits one's ability to do partial matching. 
  • Correspondences: The Kendall method at least requires explicit correspondences between points in each shape. Finding correspondences is one of the most annoying parts of shape analysis (and affect most methods for comparing shapes). 
Next: We examine the problem of hearing shape, or how the Laplacian starts to figure in. 


  1. A minor comment:

    ask if these distances can be embedded in a more tractable space, but it turns out for measures of interest (Hausdorff, Frechet, earthmover), this cannot be done without incurring huge errors.

    This is true for most such measures. However, Hausdorff is actually an exception: it is possible to embed Hausdorff over planar point-sets of size up to s into l_infty with dimension about s^2, with arbitrary small distortion (reference on demand).

    This, of course, does not affect the general point you are making.

  2. I'd like to hear your comments on your colleagues' ideas of using principal geodesics in shape spaces...

  3. I like the idea. in fact, I've been working with them on related matters. Overall, what I think this direction needs is *more* algorithmic work to make the procedures more rigorous.


Disqus for The Geomblog