Comments on The Geomblog: SoCG 2007: Approximate Clustering

Oh yes, another separation: it is known that you c...

2007-06-08T06:29:00.000-06:00

Oh yes, another separation: it is known that you can solve c-approximate near neighbor with subquadratic space and dn^{1/c^2} query time, while there is a dn^Omega(1/c^2) query time lower bound. However, the lower bound only works for a restricted class of "hashing-based algorithms". See Andoni et al, FOCS'06, and Motwani et al, SOCG'06.

Piotr

To add to what Piotr pointed out; although this is...

2007-06-08T02:37:00.000-06:00

To add to what Piotr pointed out; although this is not exactly the same thing, Luca Trevisan showed that for geometric TSP, the doubly exponential dependence on dimension for approximation algorithms cannot be removed unless NP has subexponential algorithms.

There are other results that show that if the dimension is part of the input, various clustering problems are NP-hard to solve exactly, and in some cases have approximation hardness results. But there is still no nuanced understanding of the relation between n, d, and e. One thing to keep in mind is that these aren't independent parameters, in the sense that any lower bound will have to trade one parameter off against the other.

Hi, There are some provable separations between t...

2007-06-08T01:40:00.000-06:00

Hi,

There are some provable separations between the exact and approximate nearest neighbor. It is actually easier to talk about the *near* neighbor problem, where you are given a parameter r, and for any query q, the goal is to check if there is any data point within distance r from q (the approximate version allows wrong answer if the distance to the nearest neighbor of q is in the range [r, (1+eps)r].)

What is known:

* (1+eps)-approx near neighbor in d-dimensional spaces (Hamming, etc) can be solved using only one query to a data structure of size n^O(1/eps^2)

* exact near neighbor in d-dimensional Hamming space requires roughly Omega(d) queries to any data structure of polynomial size.

For references, see:
"Nearest Neighbors in High-dimensional Spaces" at
http://theory.lcs.mit.edu/~indyk/39.ps

Piotr

Good questions, Suresh.Like Gareth I am curious wh...

2007-06-07T13:33:00.000-06:00

Good questions, Suresh.

Like Gareth I am curious what complexity/lower bounds theory has to say about this mess of parameters, specifically, in problems where NP-completeness is not the appropriate concept.

For instance, exact nearest-neighbor search can be done in time polynomial in the database, but is there work which suggests that it will necessarily suffer more from high dimensionality than the clever algorithms for approximate nearest-neighbor?

What I'd like to see is some kind of "taxonomizati...

2007-06-07T04:25:00.000-06:00

What I'd like to see is some kind of "taxonomization" or classification of the different "tricks of the trade" in high-dimensional geometric approximation

Sounds like a job for parameterized complexity.

Look at the notes from the first half of the class...

2007-06-06T22:29:00.000-06:00

Look at the notes from the first half of the class Pankaj taught last semester:
http://www.cs.duke.edu/education/courses/spring07/cps296.2/lectures.html

Also, Sariel's book-to-be should have some nice tricks as well:
http://valis.cs.uiuc.edu/~sariel/teach/notes/aprx/lec/

this two places should be a great place to start for someone who already has a basic understanding of algorithms and geometry.

Jeff