You'd think the title of this post is a joke paper title, but it actually isn't. There are now at least two papers (the second one even uses the word 'revisited' in the title!) on quantum algorithms for geometric problems.

Neither paper (and they basically admit this themselves) really pushes the envelope in the QC dimension. The first one, by Sadakane, Sugawara and Tokuyama, uses Grover's quantum search algorithm as a subroutine to solve a number of search problems in geometry, many of these reduced to the problem of finding the lowest point in the upper envelope of an arrangement of halfplanes. The second, by Bahadur, Durr, Lafaye and Kulkarni, uses a quantum algorithm for element distinctness based on a random walk method by Ambainis to solve some of the above problems. They also show lower bounds for some of these problems. One interesting result is an n^{2/3} lower bound, and nlog n upper bound (!) for 3SUM.

I'm not sure what to make of these results: apart from the fact that I have a hard time seeing a need for quantum algorithms for geometric problems, I don't know whether these are anything more than (insert model of computation) algorithm for (insert name of problem) type results (not that such results are bad, but they are not always interesting).

Ruminations on computational geometry, algorithms, theoretical computer science and life

## Thursday, January 31, 2008

### Quantum Algorithms for Computational Geometry

## Monday, January 28, 2008

### Plotting implicit functions: a bleg..

Does anyone know of good tools for plotting implicit functions ? Ideally, I'd like something that could plot functions of the form f(x,y) = 0, and even contours of the form f(x,y) = c.

(for those who aren't hip with web jargon, a bleg is a beg on a blog ;)

(for those who aren't hip with web jargon, a bleg is a beg on a blog ;)

### Important heuristics: Alternating optimization

The Kleinberg/Tardos algorithms textbook has a nice chapter on local search heuristics. As has been discussed earlier, learning heuristics as part of an algorithms class is important so that we know what to do when a problem is NP-hard and approximations are either provably hard or elusive.

However, perhaps one of the most important heuristics that is used in practice is not discussed, and I have yet to find a source that examines its power fully. This heuristic is "Alternating minimization", and appears "in the wild" in two wildly popular heuristics, the k-means algorithm in clustering and the Iterated Closest Pair algorithm in model registration.

Suppose you have a cost function D defined over the product of domains P and Q. For concreteness, let's say that P is the space of all assignments of points to clusters, and Q is the space of all assignments of centers to clusters. The standard k-means cost function asks for an assignment of points to clusters, and an assignment of centers to clusters, so that the sum of squared distances of points to their cluster centers is minimized.

In general, such an optimization is hard to perform. However, it is often the case that for a fixed p in P, we can minimize D(p, Q), and vice versa. Continuing the example above, using Euclidean distance as the underlying distance, it's easy to see that for a fixed set of points in a cluster, the minimizing center is their centroid. Moreover, for a fixed set of centroids, the assignment is a straight nearest neighbour assignment.

This is the idea behind alternating minimization. Starting with arbitrary p0 and q0, we repeat the two-step procedure:

Now there are three questions that one usually has to answer for an alternating minimization. The first one is relatively straightforward:

Does it converge ? In general, it's often fairly easy to write down a potential function that decreases strictly with each iteration and is always positive. Thus, convergence can often be established for this procedure.

Does it converge to the optimal solution ? In general, no. This procedure can often get stuck in local minima. However, (and this is one of the more attractive aspects of this heuristic), Csiszar and Tusnady showed that as long as a certain "5-point" property holds (a simple inequalities involving iterates of the procedure), the process will converge to the optimal solution. In their paper, this result is used to show convergence when P and Q are convex sets of distributions, and D is the Kullback-Leibler distance.

Another useful example where the procedure converges is when P and Q are convex sets, and D computes the closest distance between the two sets. In that case, each iterative step determines the closest point in a convex set closest to a given point.

A variant of the above question is: does it converge to a close-to-optimal solution ? Again, in general, one cannot say anything useful about this. However, for k-means, recent work by Arthur and Vassilvitskii, and Ostrovsky, Rabani, Schulman and Swamy, has shown that choosing start points carefully can yield provable bounds on the quality.

Does it converge in polynomial time ? Perhaps the most important question about this heuristic is whether it converges in polynomial time. For k-means, it can be shown that there are examples that will force the heuristic to take superpolynomial to converge. However, recent results in smoothed complexity have shown that if the input is perturbed ever so slightly, then the iterative procedure converges in polynomial time (with high probability). In a sense, one can view this result as explaining the success of the method in practice.

Alternating minimization appears in many domains, and is a common trick for "solving" complex optimizations. It should definitely be taught wherever heuristics are discussed.

However, perhaps one of the most important heuristics that is used in practice is not discussed, and I have yet to find a source that examines its power fully. This heuristic is "Alternating minimization", and appears "in the wild" in two wildly popular heuristics, the k-means algorithm in clustering and the Iterated Closest Pair algorithm in model registration.

Suppose you have a cost function D defined over the product of domains P and Q. For concreteness, let's say that P is the space of all assignments of points to clusters, and Q is the space of all assignments of centers to clusters. The standard k-means cost function asks for an assignment of points to clusters, and an assignment of centers to clusters, so that the sum of squared distances of points to their cluster centers is minimized.

In general, such an optimization is hard to perform. However, it is often the case that for a fixed p in P, we can minimize D(p, Q), and vice versa. Continuing the example above, using Euclidean distance as the underlying distance, it's easy to see that for a fixed set of points in a cluster, the minimizing center is their centroid. Moreover, for a fixed set of centroids, the assignment is a straight nearest neighbour assignment.

This is the idea behind alternating minimization. Starting with arbitrary p0 and q0, we repeat the two-step procedure:

- p_i = arg min D(p, q_{i-1})
- q_i = arg min D(p_i, q)

Now there are three questions that one usually has to answer for an alternating minimization. The first one is relatively straightforward:

Does it converge ? In general, it's often fairly easy to write down a potential function that decreases strictly with each iteration and is always positive. Thus, convergence can often be established for this procedure.

Does it converge to the optimal solution ? In general, no. This procedure can often get stuck in local minima. However, (and this is one of the more attractive aspects of this heuristic), Csiszar and Tusnady showed that as long as a certain "5-point" property holds (a simple inequalities involving iterates of the procedure), the process will converge to the optimal solution. In their paper, this result is used to show convergence when P and Q are convex sets of distributions, and D is the Kullback-Leibler distance.

Another useful example where the procedure converges is when P and Q are convex sets, and D computes the closest distance between the two sets. In that case, each iterative step determines the closest point in a convex set closest to a given point.

A variant of the above question is: does it converge to a close-to-optimal solution ? Again, in general, one cannot say anything useful about this. However, for k-means, recent work by Arthur and Vassilvitskii, and Ostrovsky, Rabani, Schulman and Swamy, has shown that choosing start points carefully can yield provable bounds on the quality.

Does it converge in polynomial time ? Perhaps the most important question about this heuristic is whether it converges in polynomial time. For k-means, it can be shown that there are examples that will force the heuristic to take superpolynomial to converge. However, recent results in smoothed complexity have shown that if the input is perturbed ever so slightly, then the iterative procedure converges in polynomial time (with high probability). In a sense, one can view this result as explaining the success of the method in practice.

Alternating minimization appears in many domains, and is a common trick for "solving" complex optimizations. It should definitely be taught wherever heuristics are discussed.

## Tuesday, January 22, 2008

### SODA 2008: Summaries

I've written before about the perils of blogging at the conference: another problem is that if you don't blog, people come by and start asking you when the posts are going to appear.

Before you read further, do check out posts by DavidE (II), Michael Mitzenmacher (I, II), and a "stranger in a strange land" take by Hal Daume, my colleague at Utah and an ML/NLP person whose blog you should definitely follow if you're interested in such topics.

There's been a lot of work on clustering in high dimensional spaces, and I want to say a lot more about this, especially the "abstract framework" results of Kumar, Sabherwal and Sen, but that will have to wait for another post. For now, I want to highlight the paper, "Clustering for Metric and Non-Metric distance measures" by Ackermann, Blomer and Sohler. Essentially what they do is take the framework for approximate linear time clustering in high dimensions, and remove the need for either symmetry or triangle inequality in the distance measure, replacing it by a set of sampling conditions that have a more statistical feel to them. Doing this allows them to cluster with more general distance functions, including the relative entropy or Kullback-Leibler divergence. I'm a sucker for results that illustrate "what's going on", especially in the realm of approximate high-D geometry, and this is a great example of attempting to get to the core strucutural properties needed to make an algorithm work.

Outtakes:

Before you read further, do check out posts by DavidE (II), Michael Mitzenmacher (I, II), and a "stranger in a strange land" take by Hal Daume, my colleague at Utah and an ML/NLP person whose blog you should definitely follow if you're interested in such topics.

There's been a lot of work on clustering in high dimensional spaces, and I want to say a lot more about this, especially the "abstract framework" results of Kumar, Sabherwal and Sen, but that will have to wait for another post. For now, I want to highlight the paper, "Clustering for Metric and Non-Metric distance measures" by Ackermann, Blomer and Sohler. Essentially what they do is take the framework for approximate linear time clustering in high dimensions, and remove the need for either symmetry or triangle inequality in the distance measure, replacing it by a set of sampling conditions that have a more statistical feel to them. Doing this allows them to cluster with more general distance functions, including the relative entropy or Kullback-Leibler divergence. I'm a sucker for results that illustrate "what's going on", especially in the realm of approximate high-D geometry, and this is a great example of attempting to get to the core strucutural properties needed to make an algorithm work.

Outtakes:

- After many years, I actually missed the business meeting this time, so no drinking games or updates. From what I hear, Austin is the first choice for the venue in 2010 (we're in NYC for 2009). Claire Mathieu is the PC chair for SODA 2009.
- Overall, I really enjoyed SODA this year: somehow I felt that there were far too many interesting papers (too many for me to attend the talks, that is), and the papers were of really high quality.

## Monday, January 21, 2008

### SODA 2008: Day 1

I've been terribly remiss in blogging, because I've actually been trying to do some work (!). At any rate, do read David's take on Day 1.

The first session I attended was on embeddings (which was placed parallel to the geometry session: NOT A GOOD IDEA). The first talk was by Edo Liberty, on work with Nir Ailon on "Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes".

The Johnson-Lindenstrauss lemma, which tells us that any set of n points in an d dimensional l_2 space "approximately" lives in a space of dimension k = log n, has had a huge effect on data analysis, as a tool for reducing the dimensionality of a set of points. Although the construction is very simple (project each point using a matrix populated with random entries), it doesn't scale, because the matrix required is dense, and so is hard to maintain for large high-dimensional data sets. Also the transformation itself takes time O(dk) per point, and is therefore expensive if d and k are large.

Earlier work by Ailon and Chazelle showed that the JL-transform can be implemented in time O(min(d log d,k^3), which improves the "trivial" bound for certain values of k (as a function of d). The paper presented above improves the embedding speed for much larger ranges of k, and draws on a number of tools from Banach spaces and functional analysis, as well as using some coding theory to "derandomize" the construction. It's a good example of practice (how do we implement the JL-transform efficiently) driving new and nontrivial theory results.

The first session I attended was on embeddings (which was placed parallel to the geometry session: NOT A GOOD IDEA). The first talk was by Edo Liberty, on work with Nir Ailon on "Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes".

The Johnson-Lindenstrauss lemma, which tells us that any set of n points in an d dimensional l_2 space "approximately" lives in a space of dimension k = log n, has had a huge effect on data analysis, as a tool for reducing the dimensionality of a set of points. Although the construction is very simple (project each point using a matrix populated with random entries), it doesn't scale, because the matrix required is dense, and so is hard to maintain for large high-dimensional data sets. Also the transformation itself takes time O(dk) per point, and is therefore expensive if d and k are large.

Earlier work by Ailon and Chazelle showed that the JL-transform can be implemented in time O(min(d log d,k^3), which improves the "trivial" bound for certain values of k (as a function of d). The paper presented above improves the embedding speed for much larger ranges of k, and draws on a number of tools from Banach spaces and functional analysis, as well as using some coding theory to "derandomize" the construction. It's a good example of practice (how do we implement the JL-transform efficiently) driving new and nontrivial theory results.

## Saturday, January 19, 2008

### ALENEX/ANALCO 2008: Invited talks

I'm in San Francisco for the SIAM-ACM Symposium on Discrete Algorithms (SODA), and as is tradition, Saturday is the day for ALENEX (the workshop on experimental algorithms) and ANALCO (the workshop on analytic combinatorics).

ALENEX:

The ALENEX invited talk today was on routing problems. Rolf Mohring from TU Berlin presented a pair of case studies of routing problems in the "wild".

The following video explains the first case study:

Actually, the problem he was looking at was a tad simpler: you have containers coming in at Hamburg port, and you have automated robot vehicles that ferry the containers from the loading dock to their temporary storage locations. The problem is to create routes for the vehicles (on the fly) so that they reach their destination at a reasonable time without colliding with each other.

The actual solution involved successive shortest paths on a "time-expanded" flow graph, which in their case was a relatively simple grid graph. The time expansion refers to the fact that an "edge" is occupied between two time instants, and of course travel is blocked on an edge when it's occupied.

The second application, which in my mind was more impressive, was an instance of large-scale optimization to redesign the Berlin subway system. Perhaps the most impressive aspect of this work was that it's actually being used ! The latest incarnation of the Berlin subway schedule uses the schedule designed by their scheme, and the speaker showed a neat 4 minute Numb3rs-style video at the end targeting the layman that explained their work.

This is a tremendous achievement, both technically and politically. And I can't but wonder whether it's something about Germany itself, the home of classic engineering, that allows optimizations designed by researchers in a university to actually be implemented on a subway system. I'm probably wrong here, in that there must be numerous examples of success stories of OR in the US as well, but it does make me wonder.

Analysis: As examples of "applying algorithms in the real-world", the talk was great. As an example of the intricacies of doing "applied algorithms", it was less impressive: most of the "compromises with the real-world" were the same kinds of modifications to theoretical problems that one might expect to see in papers at ALENEX, and there was little takeaway to be had or lessons to be learned: I'd have loved to hear about how exactly they were able to convince the Berlin subway authority to implement their schemes.

ANALCO:

This was either planned, or a happy coincidence, but the ANALCO invited speaker was Don Knuth, just a few days after his 70th birthday. The title of his talk was 'Puzzling problems', and as befits a Knuth talk, it was a random collection of cute problems in combinatorics. He had promised 5 such, with an exponentially increasing amount of time spent on each successive problem, but he only got through three of them. David's already described them in more detail here; all I can say is that it was a typical Knuthian talk :)

And yes, there were talks ! One talk that I'll draw attention to is on the work by Coleman and Wirth on the feedback arc set problem on tournaments. Given a directed graph in which for each pair of vertices, there's a directed edge one way or the other (i.e a tournament graph), can we delete the fewest number of edges so that the resulting graph is a DAG ? One important application of this problem is for aggregating rankings. Think of all the non-bribed figure skating judges at a competition, each creating a ranking of the performers. How do we aggregate these rankings into a "consensus" ranking ?

What is interesting about this problem is that it relates closely to sorting algorithms. In a sense, you can think of the rank aggregation problem as attempting to "sort" the participants into a total order, much like we have with a sorting problem. An earlier paper by Ailon, Charikar and Newman showed that a quick-sort like heuristic actually achieves a 3-approximation to the optimal solution to the feedback arc set problem, and other local heuristics for feedback arc set can also be related to sorting algorithms (although the analog of insertion sort performs quite badly).

Mergesort has not been translated to this framework: it's not obvious (but not hopeless) to see how to reformulate mergesort as an algorithm for feedback arc set, and indeed one open question from the earlier Ailon et al paper is an analysis of the behaviour of a mergesort-like method.

That's all for today, folks ! Tomorrow the action starts in earnest...

After notes:

ALENEX:

The ALENEX invited talk today was on routing problems. Rolf Mohring from TU Berlin presented a pair of case studies of routing problems in the "wild".

The following video explains the first case study:

Actually, the problem he was looking at was a tad simpler: you have containers coming in at Hamburg port, and you have automated robot vehicles that ferry the containers from the loading dock to their temporary storage locations. The problem is to create routes for the vehicles (on the fly) so that they reach their destination at a reasonable time without colliding with each other.

The actual solution involved successive shortest paths on a "time-expanded" flow graph, which in their case was a relatively simple grid graph. The time expansion refers to the fact that an "edge" is occupied between two time instants, and of course travel is blocked on an edge when it's occupied.

The second application, which in my mind was more impressive, was an instance of large-scale optimization to redesign the Berlin subway system. Perhaps the most impressive aspect of this work was that it's actually being used ! The latest incarnation of the Berlin subway schedule uses the schedule designed by their scheme, and the speaker showed a neat 4 minute Numb3rs-style video at the end targeting the layman that explained their work.

This is a tremendous achievement, both technically and politically. And I can't but wonder whether it's something about Germany itself, the home of classic engineering, that allows optimizations designed by researchers in a university to actually be implemented on a subway system. I'm probably wrong here, in that there must be numerous examples of success stories of OR in the US as well, but it does make me wonder.

Analysis: As examples of "applying algorithms in the real-world", the talk was great. As an example of the intricacies of doing "applied algorithms", it was less impressive: most of the "compromises with the real-world" were the same kinds of modifications to theoretical problems that one might expect to see in papers at ALENEX, and there was little takeaway to be had or lessons to be learned: I'd have loved to hear about how exactly they were able to convince the Berlin subway authority to implement their schemes.

ANALCO:

This was either planned, or a happy coincidence, but the ANALCO invited speaker was Don Knuth, just a few days after his 70th birthday. The title of his talk was 'Puzzling problems', and as befits a Knuth talk, it was a random collection of cute problems in combinatorics. He had promised 5 such, with an exponentially increasing amount of time spent on each successive problem, but he only got through three of them. David's already described them in more detail here; all I can say is that it was a typical Knuthian talk :)

And yes, there were talks ! One talk that I'll draw attention to is on the work by Coleman and Wirth on the feedback arc set problem on tournaments. Given a directed graph in which for each pair of vertices, there's a directed edge one way or the other (i.e a tournament graph), can we delete the fewest number of edges so that the resulting graph is a DAG ? One important application of this problem is for aggregating rankings. Think of all the non-bribed figure skating judges at a competition, each creating a ranking of the performers. How do we aggregate these rankings into a "consensus" ranking ?

What is interesting about this problem is that it relates closely to sorting algorithms. In a sense, you can think of the rank aggregation problem as attempting to "sort" the participants into a total order, much like we have with a sorting problem. An earlier paper by Ailon, Charikar and Newman showed that a quick-sort like heuristic actually achieves a 3-approximation to the optimal solution to the feedback arc set problem, and other local heuristics for feedback arc set can also be related to sorting algorithms (although the analog of insertion sort performs quite badly).

Mergesort has not been translated to this framework: it's not obvious (but not hopeless) to see how to reformulate mergesort as an algorithm for feedback arc set, and indeed one open question from the earlier Ailon et al paper is an analysis of the behaviour of a mergesort-like method.

That's all for today, folks ! Tomorrow the action starts in earnest...

After notes:

- The hotel wireless works in the lecture rooms (!). It's also free. this must be a first...
- The location is great. The hotel is on Van Ness and there are tons of restaurants (and Peets) within a block, as well as a drugstore and a Whole Foods.
- The ALENEX business meeting was the shortest business meeting in recorded history: it took all of 5 minutes. I commend the organizers for their good taste, and can only hope that SODA emulates them.

Labels:
alenex,
analco,
conferences,
soda

## Thursday, January 17, 2008

### A Post-Doc opening

I'll post this on the usual forums shortly, but thought I'd place it here since SODA is coming up.

I'm looking to hire a post-doctoral researcher to work with me at the School of Computing. The position is initially for one year, with the possibility of extension for another year based on mutual consent. Applicants must have expertise in algorithms and computational geometry, and preferably should have experience with high-dimensional geometry, approximations, and large data algorithms (external memory/streaming/etc). Some familiarity with differential geometry is a definite plus.To add to the above: if you think you might be interested, and will be at SODA, drop me a note and we'll try to meet up.

For more on my research, visit my webpage. To apply, please send me a CV, a statement, and three letters of reference. Feel free to contact me if you have questions about the position.

## Wednesday, January 16, 2008

### Readable math

Bill Gasarch asks about math books you can actually read. This reminded me of my post from over 3 years ago, "Books that read like a thrilller". I'd update that list with

- Information theory, by Cover and Thomas
- Randomized Algorithms (Motwani-Raghavan): it's an old book, but even when I read it today, I realize just how well it flows.
- Convex Optimization (Boyd/Vandenberghe): there's just something about the typesetting :)
- Computational Complexity (Arora/Barak)
- Convex Analysis (Rockafellar): A model of precision and clarity
- (almost, but not quite in the pantheon): Approximation algorithms, (Vazirani)

### Two surveys/sites on string matching and text processing.

1. A survey on string algorithms and data structures by Paolo Ferragina.

2. A paper surveying the theory and practice of compressed text algorithms, by Ferragina, Gonzalez, Navarro, and Venturini.

3. The "Pizza and Chili" site for test data and sources for compressed text indexing (associated with the paper above).

2. A paper surveying the theory and practice of compressed text algorithms, by Ferragina, Gonzalez, Navarro, and Venturini.

3. The "Pizza and Chili" site for test data and sources for compressed text indexing (associated with the paper above).

## Tuesday, January 15, 2008

### The CACM is dead ! Long live the CACM

Readers of this blog will know that I love to HATE the CACM, the embodiment of all that is wrong in our IT-obsessed, SCIENCE-ignoring culture. Well, the main source of my angst is now gone ! vanished ! replaced by something that <shudder> resembles an actual scientific magazine. Here's the story:

Way back in 2005 (before Web 2.0 ! Prehistoric!), the ACM realized that something needed to be done about the CACM. As Peter Denning tells it,

~~Moshe Vardi ~~David Patterson, and Moshe Vardi, who is now the editor-in-chief (EIC) of the new and improved CACM~~. He ~~ recounts a story that is familiar to all of us:

Another important aspect of the new incarnation is a complete graphic design overhaul:

Way back in 2005 (before Web 2.0 ! Prehistoric!), the ACM realized that something needed to be done about the CACM. As Peter Denning tells it,

When David Patterson was president of ACM, many researchers told him they thought CACM had never regained its vaunted glory of the 1970s. Patterson set up a committee to review the current model and propose ways to recharge its content and scopeThis committee was chaired by

While the format envisioned in the early 1980s was that of a content-rich magazine, the reduced role of the editorial advisory board combined with a relatively small professional staff meant that most of the content came from submitted articles. Over the years those articles have evolved to be strongly slanted toward Management Information Systems. Over time, a significant segment of the ACM membership lost interest in the publication.The committee was working off the model of Science, as a fast turnaround-high-prestige magazine that showcased work of broad interest, without necessarily being too technical in focus. The main conclusions of the committee were to leave the more "practice" sides of computing to Queue magazine, and focus on a more research and news structure. The new CACM has the following basic outline:

- News: The news section will have a very distinct “voice,” covering research and practice in computing on a global scale.
- Practice: CACM will offer coverage of cutting-edge, emerging technology issues.
- Breakthrough Research: The goal is to bring research articles, covering the broad spectrum of computing research, back to CACM. This will be a combined effort of conference and program chairs (ACM as well as non-ACM) and the CACM editorial board to select the best research material coming out of conferences and share that information with the wide readership of CACM. Each selected research paper will be adapted to the broad CACM readership and will be preceded by a short “Technical Perspective,” giving readers an overview of the important points and significance of the research.
- Refereed Articles: CACM will continue to publish, as it has since its early days, peer reviewed articles.
- Opinions and Views: The final component in the content model of the new CACM is a section dedicated to opinions and views, typically of a nontechnical nature.

Another important aspect of the new incarnation is a complete graphic design overhaul:

There is also a need for a complete redesign of CACM’s Web site, with the goal of creating a dynamic, rich Web site that is much more than an online store of the magazine’s content. The aim is to think of CACM as consisting of a print publication, a rich Web site, and email channel to readers.And so we now have the 50th edition of CACM, in its new form. It's extremely unfair to judge a major revamp like this after one issue, but let's see how it looks anyway.

- The design: It's pretty slick, with a web-based contents and navigation mechanism inside what is called a "Published Web format" designed by a company called Texterity. The feel is PDF-like, but on the web. There are all kinds of nice features for extracting interesting web links from pages, or from sections. Overall, quite snazzy (but then I'm not a professional graphic designer, so what do I know). I will add that given the general 30 year lag between technology in the "real world" and on the ACM sites, this is a ginormously humongous improvement. The interface does seem a little slow to respond to things like page turns and menu options; I'm using the actual PDF for quick reading and text cut-and-pasting.
- General content: Given the 50 year anniversary, there are many retrospectives, as well as articles by each of the past CACM EICs. These tell a very interesting picture of the rise and fall of CACM, especially in the period when Peter Denning was in charge. He tells of how budget cuts and restructuring led to the decision to remove research content from it. There are perspectives from well known luminaries in the field, including Jon Bentley, Rodney Brooks, Jeanette Wing among others. There's a longer article by Gordon Bell on "a theory of the evolution of a computer"
- Research content: There are two main "breakthrough research" articles. A first on MapReduce, the Jeffrey Dean/Sanjay Ghemawat algorithmic design paradigm that runs much of Google's internals, with an introduction by David Patterson. The second is on an old friend, locality sensitive hashing, (in a new incarnation by Alex Andoni and Piotr Indyk) and the introduction is by Bernard Chazelle, which basically means you all should IMMEDIATELY stop reading this and go read what he wrote, gems like, "Sharp measure concentrations and easy spectral predictions are the foodstuffs on which science feasts."

Both of these are excellent choices for "broad interest" articles. MapReduce has revolutionized the way we think (or should think) about large-scale computing, and there's even been some attempt at an algorithmic model around it. Nearest neighbour search is one of the most basic operations in any kind of data analysis, and I know for a fact that many people actually use LSH for their data mining problems. More of this, please ! - The opinion section: Not much to remark upon in this issue: definitely nothing that would satisfy a focus group attendee's wish for "blood on the pages". But again, time will tell.

## Saturday, January 12, 2008

### Core Sets As Grids

This is another post in my ongoing series of missives from the algorithms seminar I ran last semester (yes I know, I'm late: I have a good excuse though, and it currently weighs 6 lb 11oz).

One of the most exciting developments in the area of high dimensional geometry was the introduction of the idea of core sets. The basic premise is this: suppose we have a geometric optimization problem on a set of points P of size n (Amen!) that can be solved exactly in time p(n) (where p might be some relatively unpleasant polynomial in n). Suppose we can select a set of points S (not necessarily contained in P) of size g(1/epsilon) (the "core-set") such that the value of the optimal solution on S is close (i.e within a 1+epsilon factor) to the value on P, then we can run our expensive algorithm on S, rather than on P. The new running time is n (typically) to compute S, plus p(g(1\epsilon)) to obtain the approximate solution. What we've done here is construct a linear-time approximation scheme for our optimization.

There's a bit of nice magic here: we are essentially claiming that a set of points with size independent of the input can suffice to give a good approximation to the desired answer.

There are many ways of understanding the real power of core sets, and when they exist: perhaps the most surprising result is that for certain high dimensional problems, it is possible to find a core-set that has size independent of both the input size and the input dimension, thus circumventing the famed curse of dimensionality. But that's not what I want to discuss here. Rather, I want to present the view of core sets as a kind of generalized grid: I find this viewpoint less "magical" and often useful when trying to figure out when a core-set might actually exist.

Sparsifying via a grid:

Suppose we have a large collection of points in the plane, and we wish to "shrink" the input to some manageable size. The first thing we can try is to lay down a grid, with side length R, and snap each point to the center of the grid cell it's in. If n is large compared to R, we'd expect many points to lie in each cell. If the diameter of the point set is D, then we need (D/R)^2 grid cells.

There are two problems with this: firstly, D could be much larger than n. In fact, since D is a function of the coordinates of the input, it could be exponentially larger than n ! Secondly, the perturbation of each point to the center of the grid incurs an error of possibly R/\sqrt{2}, and so R needs to be very smalll.

Consider the problem of computing the minimum enclosing ball of a collection of points. The radius of this ball is within a constant factor of D by the triangle inequality. This means that if we're willing to tolerate a multiplicative error in our bounds, we can perturb each point by epsilon*D, while maintaining an epsilon-approximation. Here, setting R = epsilon' D suffices, and we only need O(1/epsilon^2) cells. Voila ! a core-set.

Of course, we're not going to be this lucky all the time: for example, if we want to compute the width (the smallest distance between parallel lines that bound the point set), then the diameter has no relevance: the width of a point set can be zero even if its diameter is large (if all points are on a line). In that case, how would we choose a grid size that would work for us ?

Approximating all directions:

The answer, as it turns out, is to try and approximate the "diameter" in all directions. More precisely, we can define the extent of a point set along a certain direction as the diameter of the projection along that direction. Having done so, the property we desire of our "grid" is that it approximates this extent function along each direction. For many interesting optimization problems, the extents suffice to compute the optimal solution, and so approximating the extents suffices to approximate the solution.

The easiest way of approximating extents would be to compute the convex hull of the point set. This might not be a sparse representation in general. A better answer is to place a grid, and pick out all non-empty grid cells along the boundary (actually, even picking the extreme cells along each vertical direction suffices).

Notice that by concerning ourselves only with extents, we have essentially eliminated the problem of large diameters (because we can always scale the point set). What remains is the problem of variation in the extents. If a point set has a very small extent along a direction (say the direction that defines its width), then in order to capture that extent correctly, we'd need a very fine grid resolution.

Exploiting fatness:

The problem here appears to arise from the fact that a point set can be skinny. The notion of fatness in computational geometry is a very powerful one: there are many ways of defining it, but in all cases, the measure captures the degree to which a convex shape is "rounded" versus "skinny" (one definition is the ratio between the minimum enclosing ball and the largest inscribed ball). If a point set is alpha-fat, then upto a (constant) factor alpha, it's round, and therefore has comparable extents in all directions.

Performing an affine transform can make a point set alpha-fat (for any constant alpha). Although an affine transform will play havoc with distances, it preserves ratios: in particular, the ratio between exact and approximate extents.

And with this, we are done. We transform the point set so that it's alpha-fat, and choose a grid size that's a function of alpha and epsilon (after scaling the point set so that the diameter is 1). Then we snap all points to grid cells as before, and pick up the highest and lowest grid cells in each vertical column. What we get is a point set that captures the important characteristics of the input, but whose size depends solely on the error parameters: in other words, a core-set.

Extensions:

What makes this idea particularly powerful is that it can be extended. For example, we can construct core sets for the "extents" of a set of linear functions by going to the dual space. Similarly, we can construct core sets for collections of polynomials by using the linearization trick and then going to the dual space (in higher dimensions). But ultimately, it boils down to constructing a clever grid.

Note: most of this exposition comes from a reading of Geometric Approximation via Coresets, by Pankaj Agarwal, Sariel Har-Peled and Kasturi Varadarajan.

One of the most exciting developments in the area of high dimensional geometry was the introduction of the idea of core sets. The basic premise is this: suppose we have a geometric optimization problem on a set of points P of size n (Amen!) that can be solved exactly in time p(n) (where p might be some relatively unpleasant polynomial in n). Suppose we can select a set of points S (not necessarily contained in P) of size g(1/epsilon) (the "core-set") such that the value of the optimal solution on S is close (i.e within a 1+epsilon factor) to the value on P, then we can run our expensive algorithm on S, rather than on P. The new running time is n (typically) to compute S, plus p(g(1\epsilon)) to obtain the approximate solution. What we've done here is construct a linear-time approximation scheme for our optimization.

There's a bit of nice magic here: we are essentially claiming that a set of points with size independent of the input can suffice to give a good approximation to the desired answer.

There are many ways of understanding the real power of core sets, and when they exist: perhaps the most surprising result is that for certain high dimensional problems, it is possible to find a core-set that has size independent of both the input size and the input dimension, thus circumventing the famed curse of dimensionality. But that's not what I want to discuss here. Rather, I want to present the view of core sets as a kind of generalized grid: I find this viewpoint less "magical" and often useful when trying to figure out when a core-set might actually exist.

Sparsifying via a grid:

Suppose we have a large collection of points in the plane, and we wish to "shrink" the input to some manageable size. The first thing we can try is to lay down a grid, with side length R, and snap each point to the center of the grid cell it's in. If n is large compared to R, we'd expect many points to lie in each cell. If the diameter of the point set is D, then we need (D/R)^2 grid cells.

There are two problems with this: firstly, D could be much larger than n. In fact, since D is a function of the coordinates of the input, it could be exponentially larger than n ! Secondly, the perturbation of each point to the center of the grid incurs an error of possibly R/\sqrt{2}, and so R needs to be very smalll.

Consider the problem of computing the minimum enclosing ball of a collection of points. The radius of this ball is within a constant factor of D by the triangle inequality. This means that if we're willing to tolerate a multiplicative error in our bounds, we can perturb each point by epsilon*D, while maintaining an epsilon-approximation. Here, setting R = epsilon' D suffices, and we only need O(1/epsilon^2) cells. Voila ! a core-set.

Of course, we're not going to be this lucky all the time: for example, if we want to compute the width (the smallest distance between parallel lines that bound the point set), then the diameter has no relevance: the width of a point set can be zero even if its diameter is large (if all points are on a line). In that case, how would we choose a grid size that would work for us ?

Approximating all directions:

The answer, as it turns out, is to try and approximate the "diameter" in all directions. More precisely, we can define the extent of a point set along a certain direction as the diameter of the projection along that direction. Having done so, the property we desire of our "grid" is that it approximates this extent function along each direction. For many interesting optimization problems, the extents suffice to compute the optimal solution, and so approximating the extents suffices to approximate the solution.

The easiest way of approximating extents would be to compute the convex hull of the point set. This might not be a sparse representation in general. A better answer is to place a grid, and pick out all non-empty grid cells along the boundary (actually, even picking the extreme cells along each vertical direction suffices).

Notice that by concerning ourselves only with extents, we have essentially eliminated the problem of large diameters (because we can always scale the point set). What remains is the problem of variation in the extents. If a point set has a very small extent along a direction (say the direction that defines its width), then in order to capture that extent correctly, we'd need a very fine grid resolution.

Exploiting fatness:

The problem here appears to arise from the fact that a point set can be skinny. The notion of fatness in computational geometry is a very powerful one: there are many ways of defining it, but in all cases, the measure captures the degree to which a convex shape is "rounded" versus "skinny" (one definition is the ratio between the minimum enclosing ball and the largest inscribed ball). If a point set is alpha-fat, then upto a (constant) factor alpha, it's round, and therefore has comparable extents in all directions.

Performing an affine transform can make a point set alpha-fat (for any constant alpha). Although an affine transform will play havoc with distances, it preserves ratios: in particular, the ratio between exact and approximate extents.

And with this, we are done. We transform the point set so that it's alpha-fat, and choose a grid size that's a function of alpha and epsilon (after scaling the point set so that the diameter is 1). Then we snap all points to grid cells as before, and pick up the highest and lowest grid cells in each vertical column. What we get is a point set that captures the important characteristics of the input, but whose size depends solely on the error parameters: in other words, a core-set.

Extensions:

What makes this idea particularly powerful is that it can be extended. For example, we can construct core sets for the "extents" of a set of linear functions by going to the dual space. Similarly, we can construct core sets for collections of polynomials by using the linearization trick and then going to the dual space (in higher dimensions). But ultimately, it boils down to constructing a clever grid.

Note: most of this exposition comes from a reading of Geometric Approximation via Coresets, by Pankaj Agarwal, Sariel Har-Peled and Kasturi Varadarajan.

## Thursday, January 10, 2008

### Happy Birthday, Don Knuth !

Today is Don Knuth's birthday, and by way of tribute, there are posts from David, Luca and Scott that commemorate him. Among the many obsessions of Knuth is one that went into the new editions of TACP: typesetting the names of cited authors in their native language. This meant a laborious process of converting names into their original scripts and then doing a Unicode translation. I was at Stanford for some of this, and did my (tiny) share of translations in Hindi (and even once in Tamil!).

But what I wanted to highlight is a very innocuous result from Vol 2 of TACP that's both well known, elementary to prove, and provides a key tool for much of the work on streaming that came much later.

This is the kind of problem you might encounter on a Google/Microsoft interview: it's simple to state, and so is the answer, but it requires at least some "lateral" thinking. Without further ado, here's the answer (so stop right here if you want to puzzle it out yourself)

There are at least three different ways of arguing the correctness of this procedure - another reason why I find it so elegant.

1. Suppose the sampling procedure is correct for the first k items. Now when item k+1 appears, it becomes the new sample with probability 1/(k+1). Every prior item currently has a probability of 1/k of being picked, and therefore remains the chosen sample item with probability k/(k+1) * 1/k = 1/(k+1). Hence, by induction, the procedure works

Like most induction proofs, this argument leaves something wanting. Here's a less rigorous, but in my mind more appealing argument:

2. Suppose we pick the kth item with probability p_k. Let an adversary suddenly halt the stream at this step. If p_k is not equal to 1/k, then the last item is not picked with a uniform probability over the current stream. So in some sense, the algorithm has no choice, since it doesn't know when the stream ends.

3. A more muscular proof of the "let's roll up our sleeves and crank things out" variety, goes as follows: Let p_i be the probability that the i^th item is picked for the sample, replacing the current sample element. The probability that i survives is

Since these numbers must be equal for all i, we can equate any two of these, in particular the terms for i and i+1, which yields the expression

Some simple telescoping later, once we observe that p_1 = 1, gives the desired probabilities.

CS researchers and the practitioners of computer science tend to view each other warily from a distance, each side convinced that the other has absolutely no idea of what "real CS" is about. Don Knuth straddled both worlds effortlessly, gaining respect from 15 year old hackers and 50 year old researchers alike. And that's the most tremendous feat of all.

Update: More tributes, by Bill Gasarch and the Quantum Pontiff.

But what I wanted to highlight is a very innocuous result from Vol 2 of TACP that's both well known, elementary to prove, and provides a key tool for much of the work on streaming that came much later.

Can we pick a random element from a set of items whose cardinality we do not know ?In this setting, we can examine an item and decide whether to retain it or discard it, but at the end of the process, we must hold in our hands an element chosen uniformly at random.

This is the kind of problem you might encounter on a Google/Microsoft interview: it's simple to state, and so is the answer, but it requires at least some "lateral" thinking. Without further ado, here's the answer (so stop right here if you want to puzzle it out yourself)

Pick an item and store it. Pick the next one, and replace the first one with it with probability 1/2. Pick the third one, and do a replace with probability 1/3, and so on. At the end, the item you've stored has a probability of 1/n of being any particular element.This is of course a streaming algorithm. You can imagine the items arriving one by one, and using a local counter and a single word of storage, you can execute the algorithm. First of all, this can be generalized to maintaining a random sample of size k: in general, the technique is called reservoir sampling, and has spawned many generalizations.

There are at least three different ways of arguing the correctness of this procedure - another reason why I find it so elegant.

1. Suppose the sampling procedure is correct for the first k items. Now when item k+1 appears, it becomes the new sample with probability 1/(k+1). Every prior item currently has a probability of 1/k of being picked, and therefore remains the chosen sample item with probability k/(k+1) * 1/k = 1/(k+1). Hence, by induction, the procedure works

Like most induction proofs, this argument leaves something wanting. Here's a less rigorous, but in my mind more appealing argument:

2. Suppose we pick the kth item with probability p_k. Let an adversary suddenly halt the stream at this step. If p_k is not equal to 1/k, then the last item is not picked with a uniform probability over the current stream. So in some sense, the algorithm has no choice, since it doesn't know when the stream ends.

3. A more muscular proof of the "let's roll up our sleeves and crank things out" variety, goes as follows: Let p_i be the probability that the i^th item is picked for the sample, replacing the current sample element. The probability that i survives is

Some simple telescoping later, once we observe that p_1 = 1, gives the desired probabilities.

CS researchers and the practitioners of computer science tend to view each other warily from a distance, each side convinced that the other has absolutely no idea of what "real CS" is about. Don Knuth straddled both worlds effortlessly, gaining respect from 15 year old hackers and 50 year old researchers alike. And that's the most tremendous feat of all.

Update: More tributes, by Bill Gasarch and the Quantum Pontiff.

## Monday, January 07, 2008

### Levi Conant Prize for Hoory, Linial and Wigderson

The joint meetings of the AMS and MAA are going on right now, and among the prizes being awarded is the Levi L. Conant Prize for the best expository article in the Notices or the Bulletin of the AMS in the last 5 years. One of the two awardees is the group consisting of Shlomo Hoory, Nati Linial and Avi Wigderson, for their article 'Expander graphs and their applications' in the Bulletin of the AMS (here's the related course).

Congratulations ! On a side note, I'm truly impressed by the number of prizes awarded for articles that are expository or targeted at the general public. I don't think that the mere creation of prizes encourages people to write such articles; rather, the existence of such prizes is a post facto demonstration of the intrinsic value attributed to exposition in the mathematics community.

Congratulations ! On a side note, I'm truly impressed by the number of prizes awarded for articles that are expository or targeted at the general public. I don't think that the mere creation of prizes encourages people to write such articles; rather, the existence of such prizes is a post facto demonstration of the intrinsic value attributed to exposition in the mathematics community.

## Thursday, January 03, 2008

### A call for information and help

In response to Muthu's query about invited speakers at conferences, Iftah Gamzu at Tel Aviv University set up a very nice page containing all the relevant information. He also has the next generation of the 'Accepted papers at theory conferences' page originally maintained by Anupam Gupta, as well as a page with upcoming dates (conference/submission) for most theory conferences.

Iftah sent me a request: he's missing information on speakers/talks for many conference-year combinations, and was hoping for some help from the community. Here's what's missing:

Iftah sent me a request: he's missing information on speakers/talks for many conference-year combinations, and was hoping for some help from the community. Here's what's missing:

- Invited talks at
- STOC 2000.
- RANDOM-APPROX 2003 and 2005.
- SoCG 2001 and 2004.
- IPCO 2007 summer school.
- Titles of the talks
- CCC 2001.
- Gene Myers, Christos Papadimitriou, and Ehud Shapiro at ESA APPROX 2002.
- Marino Zerial at ESA 2005.
- Leslie Lamport, and Butler Lampson at PODC 2003.
- Elias Koutsoupias at SPAA PODC 2005.
- Robin Thomas at STACS 2004.
- Daniel Lewin at SPAA 2000.
- Andrew Chien at SPAA 2002.
- Adi Shamir at ICALP 2005.
- Name of the speaker of
- "Discrete Tomography: Reconstruction under Periodicity Constraints" at ICALP 2002.
- "Program Debugging and Validation Using Semantic Approximations and Partial Specifications" at ICALP 2002.
- Can someone confirm that there were no invited talks at
- FOCS 2004, CCC 2004, and SPAA 2004?

Subscribe to:
Posts (Atom)