Ruminations on computational geometry, algorithms, theoretical computer science and life

## Friday, June 08, 2007

### My Biased Coin: A New Blog !!

It's been a while since I've been able to announce a new CS blog. Please point your bookmarks/RSS newsreaders/browsers to Michael Mitzenmacher's new blog, My Biased Coin. Michael has occasionally posted over here, and has often had comments and suggestions for me. When he's not brokering peace agreements between power law distributions and log normal distributions, Michael corrupts young grad students at Harvard for a living, filling their minds with all kinds of random bits that are occasionally error corrected.

Labels:
blogosphere,
community

### SoCG 2007: CUP takes over all of geometry...

The good news from today is that the Demaine-O'Rourke book on folding and linkages is finally out. Erik had a copy of the book at the conference, and it's a handsome volume that will hopefully soon reside on my bookshelf.

The authors have a web page associated with the book, and it has a number of applets that go along with constructions from the text.

The publisher of this book is Cambridge University Press, which is great, because I love their fonts :). CUP clearly has a plan for domination of the entire computational geometry catalog: along with the folding book, they are publishing:

The authors have a web page associated with the book, and it has a number of applets that go along with constructions from the text.

The publisher of this book is Cambridge University Press, which is great, because I love their fonts :). CUP clearly has a plan for domination of the entire computational geometry catalog: along with the folding book, they are publishing:

- Tamal Dey's book on curve and surface reconstruction.
- Subir Ghosh's new book on visibility algorithms in the plane.
- Narasimhan and Smid's book on geometric spanners.

Labels:
community,
miscellaneous,
socg

## Wednesday, June 06, 2007

### SoCG 2007: Approximate Clustering

I was listening to a couple of talks that improve known bounds for various kinds of approximate clustering in high dimensions, and I got to thinking.

One of the revolutions in geometry over the last 10 years has been the development of nontrivial tools for dealing with approximations in high dimensions. This is of course necessitated by the curse of dimensionality, and the hardness of most high-D data analysis problems (most exact solutions are exponential in the dimension). So rather than computing the optimal solution to some problem on n points in d dimensions, we compute a (1+e)-approximation to the optimal solution.

One problem this creates is that every algorithm is now described by a complicated expression involving three parameters (n, d, e). Some algorithms are exponential in 1/e, but polynomial in d. Some are poly in 1/e, and exponential in d. The exponent could have a base of n, or 2, or even d, or 1/e.

In short, it's an unholy mess of strictly incomparable methods that tradeoff different parameters against each other.

This makes life for me, the "user" of approximation technology, rather difficult. What I'd really like to understand are the gadgets that transform one kind of bound to another (and there are many such gadgets: discretization, enumeration, random sampling, etc). But it's hard to gather these from the papers directly: these gadgets (the really useful tools) are buried deep inside lemmas, and inside people's heads.

What I'd like to see is some kind of "taxonomization" or classification of the different "tricks of the trade" in high-dimensional geometric approximation, with some sense of which techniques apply when, and why. In fact, I like this idea so much I might even try to run a seminar on it..

One of the revolutions in geometry over the last 10 years has been the development of nontrivial tools for dealing with approximations in high dimensions. This is of course necessitated by the curse of dimensionality, and the hardness of most high-D data analysis problems (most exact solutions are exponential in the dimension). So rather than computing the optimal solution to some problem on n points in d dimensions, we compute a (1+e)-approximation to the optimal solution.

One problem this creates is that every algorithm is now described by a complicated expression involving three parameters (n, d, e). Some algorithms are exponential in 1/e, but polynomial in d. Some are poly in 1/e, and exponential in d. The exponent could have a base of n, or 2, or even d, or 1/e.

In short, it's an unholy mess of strictly incomparable methods that tradeoff different parameters against each other.

This makes life for me, the "user" of approximation technology, rather difficult. What I'd really like to understand are the gadgets that transform one kind of bound to another (and there are many such gadgets: discretization, enumeration, random sampling, etc). But it's hard to gather these from the papers directly: these gadgets (the really useful tools) are buried deep inside lemmas, and inside people's heads.

What I'd like to see is some kind of "taxonomization" or classification of the different "tricks of the trade" in high-dimensional geometric approximation, with some sense of which techniques apply when, and why. In fact, I like this idea so much I might even try to run a seminar on it..

### SoCG 2007: "Magic Hausdorff Lions Have Happy Endings"

(It's at the point now where people complain if the business meeting post is NOT up 5 minutes after the actual meeting. Sigh...)

Summary for people in a hurry:

Local News:

Otfriend Cheong was in charge of local arrangements. I have to say that the hotel we are at is quite amazing: It's on the Bomun Lake, and right outside the hotel is this lake-side path that leads to all the restaurants one can imagine, a tourist village with acres of motorized scooters (!), and some not-disgusting coffee. The hotel facilities themselves are excellent (again, modulo the coffee); much better for a cheaper price in comparison to a US Hotel. And as I sit here writing this, I can hear the rehearsals for our Korean classical music concert tonight.

Unfortunately, there was no one here to enjoy it ! Attendance is down tremendously (111 people), a factor exacerbated by FCRC, being held in a day or two in the exotic locale of San Diego (fish tacos ! blech!), and other related conferences. The relative remoteness of Gyeongju played no small role in this, I imagine.

There was much discussion about whether we should continue to be sponsored by ACM or not (the main issue is cost, and logistics when organizing in a non-US location); no resolution on this point.

PC Stuff: (obvious caveat: I was on the PC this year)

Jeff Erickson presented the usual stats (45/139 papers accepted, out of 286 submitting authors, and with 115+ external reviews). It turns out that the formula for succes at SoCG this year involves

Lots of other stats, nothing particularly enlightening.

The Near Future.

Monique Teillaud will chair the SoCG 2008 program committee. The committee has 17 people on it, an unusually large number. She's hoping to get 4 reviews per paper, so maybe that's why.

David Mount explained why we had to move from Alexandria, Virginia to UMD for SoCG 2008. Apparently the main hotel we would have used was bought out and is now much more expensive. Boo hoo. On the bright side, UMD will be a lot cheaper in terms of accomodation.

SoCG 2009.

We had two competing bids for SoCG 2009. Aarhus (Beer ! Lego ! Beer ! City life ! Beer!) and Portland (Roses ! Beer ! More Roses ! Powell's Books ! Microbreweries!).

No, we are not a bunch of boozing alcoholics.

Lars did a bang up job with his Aarhus presentation. John Hershberger did a great presentation on Portland (a great city to visit, btw), but it was no contest. By a 4-1 margin, Aarhus it is !

Open Discussion.

It was getting rather late by the time we got to open discussions. Guenter Rote initiated the question, "Should we have a rebuttal process for reviewing papers", in response to apparently some aggrieved set of rejected authors.

We had a heated discussion on this point, and although we ultimately went nowhere, it was felt that we should continue things electronically (i.e on blogs). I'll post at greater length on this issue later, but to summarize the main points pro and con:

Pro:

Summary for people in a hurry:

- Attendance this year was a record low
- Next PC Chair: Monique Teillaud
- SoCG 2008 will be at the University of Maryland (rather than Virginia: long story)
- SoCG 2009 (by a 4-1 margin) will be in Aarhus, Denmark. BEER !!!!!!!!!

Local News:

Otfriend Cheong was in charge of local arrangements. I have to say that the hotel we are at is quite amazing: It's on the Bomun Lake, and right outside the hotel is this lake-side path that leads to all the restaurants one can imagine, a tourist village with acres of motorized scooters (!), and some not-disgusting coffee. The hotel facilities themselves are excellent (again, modulo the coffee); much better for a cheaper price in comparison to a US Hotel. And as I sit here writing this, I can hear the rehearsals for our Korean classical music concert tonight.

Unfortunately, there was no one here to enjoy it ! Attendance is down tremendously (111 people), a factor exacerbated by FCRC, being held in a day or two in the exotic locale of San Diego (fish tacos ! blech!), and other related conferences. The relative remoteness of Gyeongju played no small role in this, I imagine.

There was much discussion about whether we should continue to be sponsored by ACM or not (the main issue is cost, and logistics when organizing in a non-US location); no resolution on this point.

PC Stuff: (obvious caveat: I was on the PC this year)

Jeff Erickson presented the usual stats (45/139 papers accepted, out of 286 submitting authors, and with 115+ external reviews). It turns out that the formula for succes at SoCG this year involves

submitting seven papers, with 4 co-authors, from an email address in Pakistan.The title of this post was composed from words in accepted papers.

Lots of other stats, nothing particularly enlightening.

The Near Future.

Monique Teillaud will chair the SoCG 2008 program committee. The committee has 17 people on it, an unusually large number. She's hoping to get 4 reviews per paper, so maybe that's why.

David Mount explained why we had to move from Alexandria, Virginia to UMD for SoCG 2008. Apparently the main hotel we would have used was bought out and is now much more expensive. Boo hoo. On the bright side, UMD will be a lot cheaper in terms of accomodation.

SoCG 2009.

We had two competing bids for SoCG 2009. Aarhus (Beer ! Lego ! Beer ! City life ! Beer!) and Portland (Roses ! Beer ! More Roses ! Powell's Books ! Microbreweries!).

No, we are not a bunch of boozing alcoholics.

Lars did a bang up job with his Aarhus presentation. John Hershberger did a great presentation on Portland (a great city to visit, btw), but it was no contest. By a 4-1 margin, Aarhus it is !

Open Discussion.

It was getting rather late by the time we got to open discussions. Guenter Rote initiated the question, "Should we have a rebuttal process for reviewing papers", in response to apparently some aggrieved set of rejected authors.

We had a heated discussion on this point, and although we ultimately went nowhere, it was felt that we should continue things electronically (i.e on blogs). I'll post at greater length on this issue later, but to summarize the main points pro and con:

Pro:

- Creates an improved sense of fairness
- A safety net for potential errors

- Time waste for reviewers
- Time waste for authors (!) (according to Jack Snoeyink), but I am not convinced that this is a valid argument
- Won't make a significant difference

### SoCG 2007: Geometric Views of Learning

I've been fairly remiss in my SoCG blogging; I'll blame it on having session chair duties, and not wanting to lug my laptop around.

The invited talk was by Partha Niyogi from the University of Chicago on 'A Geometric Perspective on Machine Learning'. You may remember his work from my earlier post on the estimation of the surface area of a convex body (read the comments). More importantly, he is part of the group that developed a method known as Laplacian Eigenmaps for learning a manifold from a collection of data.

Manifold learning is a new set of problems in machine learning that has interesting connections to algorithms and geometry. The basic problem is as follows. Given a collection of (unlabelled) data inhabiting some high dimensional space, can you determine whether they actually lie on some lower dimensional manifold in this space ?

Why do we care ? The problem with any data analysis problem in high dimensionality is the rather poetically named, 'curse of dimensionality' which basically says that any interesting data analysis algorithm runs in time exponential in the dimension of the data. For data that lives in 100s of dimensions, this is rather bad news.

However, "data that lives in 100 dimensions" is really an artifact of the way we represent data, slapping on dimensions willy-nilly for every attribute that might be relevant. What one often expects is that data doesn't really lie in 100 dimensions, but in some lower dimensional manifold of it. A beautiful example of this was given by Partha in his talk: he described the problem of inferring a sound wave signal generated at one of a tube by listening in at the other hand. By Fourier analysis, you can think of both signals as living in an infinite dimensional space, but suppose we now vary the tube length, for a fixed input signal. Then the output signal varies smoothly along a curve (i.e a 1-d manifold) in this infinite dimensional space.

"So what ?" one might ask. The problem is that the standard method of doing data analysis is to translate the problem of interest into some property of the distances between points in the high dimensional space. If the data really lies on some kind of curved surface, then the "distance in ambient space" does not reflect the true structure of the data. What we really need is "distance along the manifold", which we could do if we could reconstruct the manifold !

The key idea of the Laplacian Eigenmaps work is this: If you set up an appropriate weighted graph on the data points (where each edge has a weight that is exponentially related to the distance between the points) and compute the Laplacian of this graph, you get a approximation that converges (as the data size increases) to the Laplacian of the underlying manifold !! This assumes that that the data was sampled uniformly (or mostly uniformly) from the manifold. Moreover, the convergence happens at a rate that depends only on the dimension of the manifold, rather than the dimension of ambient space.

There are many ramifications of this idea, that connect to shape modelling, homology, and even volume estimation. But it reinforces the idea of the Laplacian as a key geometric construct that can be modelled combinatorially in a consistent way.

The invited talk was by Partha Niyogi from the University of Chicago on 'A Geometric Perspective on Machine Learning'. You may remember his work from my earlier post on the estimation of the surface area of a convex body (read the comments). More importantly, he is part of the group that developed a method known as Laplacian Eigenmaps for learning a manifold from a collection of data.

Manifold learning is a new set of problems in machine learning that has interesting connections to algorithms and geometry. The basic problem is as follows. Given a collection of (unlabelled) data inhabiting some high dimensional space, can you determine whether they actually lie on some lower dimensional manifold in this space ?

Why do we care ? The problem with any data analysis problem in high dimensionality is the rather poetically named, 'curse of dimensionality' which basically says that any interesting data analysis algorithm runs in time exponential in the dimension of the data. For data that lives in 100s of dimensions, this is rather bad news.

However, "data that lives in 100 dimensions" is really an artifact of the way we represent data, slapping on dimensions willy-nilly for every attribute that might be relevant. What one often expects is that data doesn't really lie in 100 dimensions, but in some lower dimensional manifold of it. A beautiful example of this was given by Partha in his talk: he described the problem of inferring a sound wave signal generated at one of a tube by listening in at the other hand. By Fourier analysis, you can think of both signals as living in an infinite dimensional space, but suppose we now vary the tube length, for a fixed input signal. Then the output signal varies smoothly along a curve (i.e a 1-d manifold) in this infinite dimensional space.

"So what ?" one might ask. The problem is that the standard method of doing data analysis is to translate the problem of interest into some property of the distances between points in the high dimensional space. If the data really lies on some kind of curved surface, then the "distance in ambient space" does not reflect the true structure of the data. What we really need is "distance along the manifold", which we could do if we could reconstruct the manifold !

The key idea of the Laplacian Eigenmaps work is this: If you set up an appropriate weighted graph on the data points (where each edge has a weight that is exponentially related to the distance between the points) and compute the Laplacian of this graph, you get a approximation that converges (as the data size increases) to the Laplacian of the underlying manifold !! This assumes that that the data was sampled uniformly (or mostly uniformly) from the manifold. Moreover, the convergence happens at a rate that depends only on the dimension of the manifold, rather than the dimension of ambient space.

There are many ramifications of this idea, that connect to shape modelling, homology, and even volume estimation. But it reinforces the idea of the Laplacian as a key geometric construct that can be modelled combinatorially in a consistent way.

## Sunday, June 03, 2007

### 9th Carnival of Mathematics

From A to Z, at JD2718.

Subscribe to:
Posts (Atom)