The Geomblog

Wednesday, January 16, 2008

Two surveys/sites on string matching and text processing.

1. A survey on string algorithms and data structures by Paolo Ferragina.
2. A paper surveying the theory and practice of compressed text algorithms, by Ferragina, Gonzalez, Navarro, and Venturini.
3. The "Pizza and Chili" site for test data and sources for compressed text indexing (associated with the paper above).

Tuesday, January 15, 2008

The CACM is dead ! Long live the CACM

Readers of this blog will know that I love to HATE the CACM, the embodiment of all that is wrong in our IT-obsessed, SCIENCE-ignoring culture. Well, the main source of my angst is now gone ! vanished ! replaced by something that <shudder> resembles an actual scientific magazine. Here's the story:

Way back in 2005 (before Web 2.0 ! Prehistoric!), the ACM realized that something needed to be done about the CACM. As Peter Denning tells it,

When David Patterson was president of ACM, many researchers told him they thought CACM had never regained its vaunted glory of the 1970s. Patterson set up a committee to review the current model and propose ways to recharge its content and scope

This committee was chaired by ~~Moshe Vardi~~ David Patterson, and Moshe Vardi, who is now the editor-in-chief (EIC) of the new and improved CACM~~. He~~ recounts a story that is familiar to all of us:

While the format envisioned in the early 1980s was that of a content-rich magazine, the reduced role of the editorial advisory board combined with a relatively small professional staff meant that most of the content came from submitted articles. Over the years those articles have evolved to be strongly slanted toward Management Information Systems. Over time, a significant segment of the ACM membership lost interest in the publication.

The committee was working off the model of Science, as a fast turnaround-high-prestige magazine that showcased work of broad interest, without necessarily being too technical in focus. The main conclusions of the committee were to leave the more "practice" sides of computing to Queue magazine, and focus on a more research and news structure. The new CACM has the following basic outline:

News: The news section will have a very distinct “voice,” covering research and practice in computing on a global scale.
Practice: CACM will offer coverage of cutting-edge, emerging technology issues.
Breakthrough Research: The goal is to bring research articles, covering the broad spectrum of computing research, back to CACM. This will be a combined effort of conference and program chairs (ACM as well as non-ACM) and the CACM editorial board to select the best research material coming out of conferences and share that information with the wide readership of CACM. Each selected research paper will be adapted to the broad CACM readership and will be preceded by a short “Technical Perspective,” giving readers an overview of the important points and significance of the research.
Refereed Articles: CACM will continue to publish, as it has since its early days, peer reviewed articles.
Opinions and Views: The final component in the content model of the new CACM is a section dedicated to opinions and views, typically of a nontechnical nature.

Of course, the middle two bullets are the new ones, and hopefully will drive CACM back to its roots as a technical magazine. What's interesting to me is that as Peter Denning tells it, almost this exact same restructuring was first proposed in 1983, and floundered after a visit to the offices of Science, where the ACM folks realized that a true research-driven magazine required far more technical manpower than the ACM of the time was willing to commit. Moshe Vardi indicates plans to expand the staff of CACM in order to make sure that the new incarnation can be effective in this mode.

Another important aspect of the new incarnation is a complete graphic design overhaul:

There is also a need for a complete redesign of CACM’s Web site, with the goal of creating a dynamic, rich Web site that is much more than an online store of the magazine’s content. The aim is to think of CACM as consisting of a print publication, a rich Web site, and email channel to readers.

And so we now have the 50th edition of CACM, in its new form. It's extremely unfair to judge a major revamp like this after one issue, but let's see how it looks anyway.

The design: It's pretty slick, with a web-based contents and navigation mechanism inside what is called a "Published Web format" designed by a company called Texterity. The feel is PDF-like, but on the web. There are all kinds of nice features for extracting interesting web links from pages, or from sections. Overall, quite snazzy (but then I'm not a professional graphic designer, so what do I know). I will add that given the general 30 year lag between technology in the "real world" and on the ACM sites, this is a ginormously humongous improvement. The interface does seem a little slow to respond to things like page turns and menu options; I'm using the actual PDF for quick reading and text cut-and-pasting.
General content: Given the 50 year anniversary, there are many retrospectives, as well as articles by each of the past CACM EICs. These tell a very interesting picture of the rise and fall of CACM, especially in the period when Peter Denning was in charge. He tells of how budget cuts and restructuring led to the decision to remove research content from it. There are perspectives from well known luminaries in the field, including Jon Bentley, Rodney Brooks, Jeanette Wing among others. There's a longer article by Gordon Bell on "a theory of the evolution of a computer"
Research content: There are two main "breakthrough research" articles. A first on MapReduce, the Jeffrey Dean/Sanjay Ghemawat algorithmic design paradigm that runs much of Google's internals, with an introduction by David Patterson. The second is on an old friend, locality sensitive hashing, (in a new incarnation by Alex Andoni and Piotr Indyk) and the introduction is by Bernard Chazelle, which basically means you all should IMMEDIATELY stop reading this and go read what he wrote, gems like, "Sharp measure concentrations and easy spectral predictions are the foodstuffs on which science feasts."

Both of these are excellent choices for "broad interest" articles. MapReduce has revolutionized the way we think (or should think) about large-scale computing, and there's even been some attempt at an algorithmic model around it. Nearest neighbour search is one of the most basic operations in any kind of data analysis, and I know for a fact that many people actually use LSH for their data mining problems. More of this, please !
The opinion section: Not much to remark upon in this issue: definitely nothing that would satisfy a focus group attendee's wish for "blood on the pages". But again, time will tell.

For years, we've all been complaining about the CACM, and the new revamp definitely deserves our attention. It'll take some time before we can tell if the new version of CACM is truly better, but the signs are looking good.

Saturday, January 12, 2008

Core Sets As Grids

This is another post in my ongoing series of missives from the algorithms seminar I ran last semester (yes I know, I'm late: I have a good excuse though, and it currently weighs 6 lb 11oz).

One of the most exciting developments in the area of high dimensional geometry was the introduction of the idea of core sets. The basic premise is this: suppose we have a geometric optimization problem on a set of points P of size n (Amen!) that can be solved exactly in time p(n) (where p might be some relatively unpleasant polynomial in n). Suppose we can select a set of points S (not necessarily contained in P) of size g(1/epsilon) (the "core-set") such that the value of the optimal solution on S is close (i.e within a 1+epsilon factor) to the value on P, then we can run our expensive algorithm on S, rather than on P. The new running time is n (typically) to compute S, plus p(g(1\epsilon)) to obtain the approximate solution. What we've done here is construct a linear-time approximation scheme for our optimization.

There's a bit of nice magic here: we are essentially claiming that a set of points with size independent of the input can suffice to give a good approximation to the desired answer.

There are many ways of understanding the real power of core sets, and when they exist: perhaps the most surprising result is that for certain high dimensional problems, it is possible to find a core-set that has size independent of both the input size and the input dimension, thus circumventing the famed curse of dimensionality. But that's not what I want to discuss here. Rather, I want to present the view of core sets as a kind of generalized grid: I find this viewpoint less "magical" and often useful when trying to figure out when a core-set might actually exist.

Sparsifying via a grid:
Suppose we have a large collection of points in the plane, and we wish to "shrink" the input to some manageable size. The first thing we can try is to lay down a grid, with side length R, and snap each point to the center of the grid cell it's in. If n is large compared to R, we'd expect many points to lie in each cell. If the diameter of the point set is D, then we need (D/R)^2 grid cells.

There are two problems with this: firstly, D could be much larger than n. In fact, since D is a function of the coordinates of the input, it could be exponentially larger than n ! Secondly, the perturbation of each point to the center of the grid incurs an error of possibly R/\sqrt{2}, and so R needs to be very smalll.

Consider the problem of computing the minimum enclosing ball of a collection of points. The radius of this ball is within a constant factor of D by the triangle inequality. This means that if we're willing to tolerate a multiplicative error in our bounds, we can perturb each point by epsilon*D, while maintaining an epsilon-approximation. Here, setting R = epsilon' D suffices, and we only need O(1/epsilon^2) cells. Voila ! a core-set.

Of course, we're not going to be this lucky all the time: for example, if we want to compute the width (the smallest distance between parallel lines that bound the point set), then the diameter has no relevance: the width of a point set can be zero even if its diameter is large (if all points are on a line). In that case, how would we choose a grid size that would work for us ?

Approximating all directions:
The answer, as it turns out, is to try and approximate the "diameter" in all directions. More precisely, we can define the extent of a point set along a certain direction as the diameter of the projection along that direction. Having done so, the property we desire of our "grid" is that it approximates this extent function along each direction. For many interesting optimization problems, the extents suffice to compute the optimal solution, and so approximating the extents suffices to approximate the solution.

The easiest way of approximating extents would be to compute the convex hull of the point set. This might not be a sparse representation in general. A better answer is to place a grid, and pick out all non-empty grid cells along the boundary (actually, even picking the extreme cells along each vertical direction suffices).

Notice that by concerning ourselves only with extents, we have essentially eliminated the problem of large diameters (because we can always scale the point set). What remains is the problem of variation in the extents. If a point set has a very small extent along a direction (say the direction that defines its width), then in order to capture that extent correctly, we'd need a very fine grid resolution.

Exploiting fatness:
The problem here appears to arise from the fact that a point set can be skinny. The notion of fatness in computational geometry is a very powerful one: there are many ways of defining it, but in all cases, the measure captures the degree to which a convex shape is "rounded" versus "skinny" (one definition is the ratio between the minimum enclosing ball and the largest inscribed ball). If a point set is alpha-fat, then upto a (constant) factor alpha, it's round, and therefore has comparable extents in all directions.

Performing an affine transform can make a point set alpha-fat (for any constant alpha). Although an affine transform will play havoc with distances, it preserves ratios: in particular, the ratio between exact and approximate extents.

And with this, we are done. We transform the point set so that it's alpha-fat, and choose a grid size that's a function of alpha and epsilon (after scaling the point set so that the diameter is 1). Then we snap all points to grid cells as before, and pick up the highest and lowest grid cells in each vertical column. What we get is a point set that captures the important characteristics of the input, but whose size depends solely on the error parameters: in other words, a core-set.

Extensions:
What makes this idea particularly powerful is that it can be extended. For example, we can construct core sets for the "extents" of a set of linear functions by going to the dual space. Similarly, we can construct core sets for collections of polynomials by using the linearization trick and then going to the dual space (in higher dimensions). But ultimately, it boils down to constructing a clever grid.

Note: most of this exposition comes from a reading of Geometric Approximation via Coresets, by Pankaj Agarwal, Sariel Har-Peled and Kasturi Varadarajan.

Thursday, January 10, 2008

Happy Birthday, Don Knuth !

Today is Don Knuth's birthday, and by way of tribute, there are posts from David, Luca and Scott that commemorate him. Among the many obsessions of Knuth is one that went into the new editions of TACP: typesetting the names of cited authors in their native language. This meant a laborious process of converting names into their original scripts and then doing a Unicode translation. I was at Stanford for some of this, and did my (tiny) share of translations in Hindi (and even once in Tamil!).

But what I wanted to highlight is a very innocuous result from Vol 2 of TACP that's both well known, elementary to prove, and provides a key tool for much of the work on streaming that came much later.

Can we pick a random element from a set of items whose cardinality we do not know ?

In this setting, we can examine an item and decide whether to retain it or discard it, but at the end of the process, we must hold in our hands an element chosen uniformly at random.

This is the kind of problem you might encounter on a Google/Microsoft interview: it's simple to state, and so is the answer, but it requires at least some "lateral" thinking. Without further ado, here's the answer (so stop right here if you want to puzzle it out yourself)

Pick an item and store it. Pick the next one, and replace the first one with it with probability 1/2. Pick the third one, and do a replace with probability 1/3, and so on. At the end, the item you've stored has a probability of 1/n of being any particular element.

This is of course a streaming algorithm. You can imagine the items arriving one by one, and using a local counter and a single word of storage, you can execute the algorithm. First of all, this can be generalized to maintaining a random sample of size k: in general, the technique is called reservoir sampling, and has spawned many generalizations.

There are at least three different ways of arguing the correctness of this procedure - another reason why I find it so elegant.

1. Suppose the sampling procedure is correct for the first k items. Now when item k+1 appears, it becomes the new sample with probability 1/(k+1). Every prior item currently has a probability of 1/k of being picked, and therefore remains the chosen sample item with probability k/(k+1) * 1/k = 1/(k+1). Hence, by induction, the procedure works

Like most induction proofs, this argument leaves something wanting. Here's a less rigorous, but in my mind more appealing argument:

2. Suppose we pick the kth item with probability p_k. Let an adversary suddenly halt the stream at this step. If p_k is not equal to 1/k, then the last item is not picked with a uniform probability over the current stream. So in some sense, the algorithm has no choice, since it doesn't know when the stream ends.

3. A more muscular proof of the "let's roll up our sleeves and crank things out" variety, goes as follows: Let p_i be the probability that the i^th item is picked for the sample, replacing the current sample element. The probability that i survives is

$formdata=p_i+\Pi_{j+>+i}+(1-p_j)$

Since these numbers must be equal for all i, we can equate any two of these, in particular the terms for i and i+1, which yields the expression

$formdata=p_i+=+\frac{p_{i+1}}{1-p_{i+1}}$

Some simple telescoping later, once we observe that p_1 = 1, gives the desired probabilities.

CS researchers and the practitioners of computer science tend to view each other warily from a distance, each side convinced that the other has absolutely no idea of what "real CS" is about. Don Knuth straddled both worlds effortlessly, gaining respect from 15 year old hackers and 50 year old researchers alike. And that's the most tremendous feat of all.

Update: More tributes, by Bill Gasarch and the Quantum Pontiff.

Monday, January 07, 2008

Levi Conant Prize for Hoory, Linial and Wigderson

The joint meetings of the AMS and MAA are going on right now, and among the prizes being awarded is the Levi L. Conant Prize for the best expository article in the Notices or the Bulletin of the AMS in the last 5 years. One of the two awardees is the group consisting of Shlomo Hoory, Nati Linial and Avi Wigderson, for their article 'Expander graphs and their applications' in the Bulletin of the AMS (here's the related course).

Congratulations ! On a side note, I'm truly impressed by the number of prizes awarded for articles that are expository or targeted at the general public. I don't think that the mere creation of prizes encourages people to write such articles; rather, the existence of such prizes is a post facto demonstration of the intrinsic value attributed to exposition in the mathematics community.

Thursday, January 03, 2008

A call for information and help

In response to Muthu's query about invited speakers at conferences, Iftah Gamzu at Tel Aviv University set up a very nice page containing all the relevant information. He also has the next generation of the 'Accepted papers at theory conferences' page originally maintained by Anupam Gupta, as well as a page with upcoming dates (conference/submission) for most theory conferences.

Iftah sent me a request: he's missing information on speakers/talks for many conference-year combinations, and was hoping for some help from the community. Here's what's missing:

Invited talks at
STOC 2000.
RANDOM-APPROX 2003 and 2005.
SoCG 2001 and 2004.
IPCO 2007 summer school.
Titles of the talks
CCC 2001.
Gene Myers, Christos Papadimitriou, and Ehud Shapiro at ESA APPROX 2002.
Marino Zerial at ESA 2005.
Leslie Lamport, and Butler Lampson at PODC 2003.
Elias Koutsoupias at SPAA PODC 2005.
Robin Thomas at STACS 2004.
Daniel Lewin at SPAA 2000.
Andrew Chien at SPAA 2002.
Adi Shamir at ICALP 2005.
Name of the speaker of
"Discrete Tomography: Reconstruction under Periodicity Constraints" at ICALP 2002.
"Program Debugging and Validation Using Semantic Approximations and Partial Specifications" at ICALP 2002.
Can someone confirm that there were no invited talks at
FOCS 2004, CCC 2004, and SPAA 2004?

If anyone reading this has the relevant information, they can either post it in the comments, or email him directly

Monday, December 31, 2007

Comte de Buffon, and a happy new year !

My personal resolution for the new year is that I will actually try to write down a larger fraction of the blog posts that are floating around in my head, especially the many that were born while I was sitting in my algorithms seminar.

The NYT has a travel feature on visiting the birthplace of Comte de Buffon, a contemporary and competitor to Carolus Linnaeus, who by this account played Hooke to Linnaeus' Newton. Of course, I (and probably most of my readers) know Buffon not for his work in biology and the natural world, but for the famous problem that bears his name and by some accounts gave rise to the area of geometric probability.

The Buffon needle problem is this: Suppose we drop a needle of length l onto a grid with lines spaced d units apart. What is the probability that the needle touches a grid line ? (let's assume for now that l <> d, the expression is more involved).

The Buffon's needle problem is a textbook example of the area of geometric probability: doing probability theory on entities defined geometrically. This is a harder thing than one might imagine: as Bertrand demonstrated much later on, the very notion of a uniform random sample can be called into question. He showed this by posing a simple question:

Draw an equilateral triangle and circumscribe a circle around it. Pick a chord at random. What is the probability that the chord length is longer than the side of the triangle ?

It turned out that the "Pick a chord at random" step leads to all kinds of problems. Bertrand presented three different equally reasonable ways of picking a chord, and showed that each of the three ways led to a different probability. At the time, this was viewed as a paradox: it seemed strange that what appeared to be a well defined question would lead to three different answers. Of course, the modern view is that there is no paradox, merely a different choice of measure space to work with in each case.

But if that's the case, is there a "natural" choice of measure ? One elegant way of thinking about this was to think of this as a physical process that must obey certain transformation invariants. This leads inexorably to notions like the Haar measure: can we assign a measure to subsets of a group that is invariant over left (or right) group actions ?

The Haar measure has practical utility for anyone working in geometry. Consider the problem of choosing a random rotation (this comes up in when doing Johnson-Lindenstrauss embeddings). In 2 dimensions, this is easy, since the space of rotations is the unit circle. However, in three dimensions or higher, it gets a bit trickier to define the correct measure to sample with respect to. Since rotations form a group, we really need a measure that is invariant under the group action, and a (unique upto scaling) Haar measure is guaranteed to exist in this case, yielding the desired measure to sample from (it's actually fairly easy to generate a random rotation algorithmically: the easiest is to populate a matrix with independent normally distributed random variables, and then orthogonalize it).

But it all goes back to a needle, a grid, and a frustrated naturalist. Happy new year !

Wednesday, December 26, 2007

DARPA Mathematical challenges

Via Ars Mathematica and Not Even Wrong, a DARPA-issued list of challenges in mathematics for the 21st century. Some that puzzled me:

Computational Duality: Duality in mathematics has been a profound tool for theoretical understanding. Can it be extended to develop principled computational techniques where duality and geometry are the basis for novel algorithms?

I'm not sure what this is trying to say, or whether I'm reading it wrong, because the story of linear programming, primal dual schemes, and the Lagrangian is the story of using "duality and geometry as the basis for novel algorithms"
What are the Physical Consequences of Perelman’s Proof of Thurston’s Geometrization Theorem?
Can profound theoretical advances in understanding three-dimensions be applied to construct and manipulate structures across scales to fabricate novel materials?

Thurston's geometrization conjecture talks about the structure of 3-manifolds: i.e 3 dimensional surfaces living in higher dimensional spaces ? What kinds of materials could be fabricated using this ?
Computation at Scale: How can we develop asymptotics for a world with massively many degrees of freedom?

This sounds like there's a nice computational question lurking somewhere, but I'm not quite sure where.

Nice to see algorithmic origami and self-assembly also mentioned. I am particularly intrigued by the reference to the geometry of genome spaces.

Tuesday, December 11, 2007

On the dominance of the Ivies..

Via Cosmic Variance:

Exhibit A:

“One thing we all must worry about — I certainly do — is the federal support for scientific research. And are we all going to be chasing increasingly scarce dollars?” says Drew Gilpin Faust, Harvard’s new president.

Not that Faust seems worried about Harvard or other top-tier research schools. “They’re going to be—we hope, we trust, we assume—the survivors in this race,” she says. As for the many lesser universities likely to lose market share, she adds, they would be wise “to really emphasize social science or humanities and have science endeavors that are not as ambitious” as those of Harvard and its peers.

Exhibit B:

Mario Capecchi: 2007 Nobel Laureate, Ph.D Harvard (1967). Associate Prof. Biochemistry (Harvard) (1969-1971)
Moved to U. Utah (1973).

Through a series of bold experiments begun in the 1980s, Capecchi demonstrated that he could alter any gene in a mouse cell by replacing it with a modified version. At the time, scientists were skeptical that such altered DNA could be targeted to a particular gene. But Capecchi was not to be deterred. Indeed, his studies demonstrated that it is possible to replace an intact, functional gene with a modified version that can zero in on the corresponding DNA in the chromosome.

And finally:

Some 50 universities are located in the Boston area. Rather than collaboration, Capecchi felt that the thousands of researchers were working in isolation on projects that promised "immediate gratification." As he explained, "Everyone is so aware of what everyone else is doing. 'What's new?' was asked every day. That limits you to short-term returns, posing questions that you know can be answered in six months."

In contrast, the University of Utah in Salt Lake City offered "a relaxed atmosphere, where you could work on projects whose outcome may take 10 years. The relative isolation tends to make you more focused on the biological question you're working on.

"It was a good choice," said Capecchi of his decision to relocate to the U of U in 1973.

Friday, December 07, 2007

VLDB dogfight, and peer review

I guess this had to happen sooner or later, given how easy it is to publish a critique. I'm only surprised that more people aren't doing it.

Via Daniel Lemire comes a link to a page by John Wu where he rakes a VLDB 2006 paper over the coals and doing so, mounts a serious criticism of the review process that led to the paper being accepted. I don't have a dog in this fight, and know next to nothing about the area, but the criticisms are prima facie plausible, even if mitigated by the fact that he's the "injured party" in this argument. One idea that he proposes, and that Daniel endorses, is this:

Our first recommendation is for the program committee to assign one member to be responsible for a paper. To encourage committee members to take this responsibility seriously, the accepted paper should be printed with the name of the responsible person. Many journal articles are printed with a footnote to the effect of “Accepted on the recommendation of So-and-So.” Similar footnote could be printed on the final copy of the conference proceedings.

It's an interesting idea. I've heard this being proposed before, and apart from the fact that papers can often slip in without strong champions, I don't see too much that's wrong with it.

Neighborly polytopes and compressed sensing

I ran a seminar on approximate high dimensional geometry this semester, and there are many posts queued up based on interesting observations from the papers that were presented. This post is about polytopes, compressed sensing, and lower bounds for convex hulls.

Compressed sensing is a fascinating topic that (depending on how you look at it) is influenced by approximation theory, sketching techniques, compression, and geometry. Posts by Terry Tao and Igor Carron are good places to learn more about this, and the paper repository at Rice is another great resource. What I wanted to talk about in this post however, is an interesting result by Donoho and Tanner that connects back to structures that geometers are intimately familiar with, as well as helping me understand the nature of convex hull lower bounds.

Let's briefly sketch the motivating problem. Suppose you're given a signal s that you want to express in terms of linear combinations of basis functions drawn from a dictionary. A key idea in compressed sensing is that the dictionary itself is redundant, in that there are multiple ways of representing a signal via linear combinations of basis functions. A useful analogy is that of barycentric coordinates. We can represent a point in the plane uniquely via a linear combination of three non-degenerate points (i.e a simplex), but this stops being true if we have more than three points that form the "basis".

Given that there are multiple representations, which is the best one ? The next important idea is that of sparsity: namely, that the number of coefficients needed is very small compared to the size of the dictionary. Again, returning to our analogy, it would be as if the dictionary had 100 points in it, and all we needed to determine was a representation using a linear combination of three points.

So one rendition of the compressed sensing problem is: Suppose there's a signal that you interact with via a 'measurement matrix' (which means you supply a vector of coefficients, and get the value of the inner product of the signal coefficients with the measurement coefficients; each row of the matrix is one such measurement). You retain the output values y; formally, you can write this as y = Ax, where x is the unknown signal vector. Can you find an x with the fewest number of nonzero coefficients (i.e can you minimize the l0 norm) that satisfies this equation ?

This problem is NP-hard; intuitively, you can encode SUBSET SUM in this problem, by setting up an appropriate y vector. Suppose however that we relax the optimization, and rather than asking to minimize the l0 norm, we ask to minimize the l_1 norm of the target vector x ? It turns out that this problem can be solved via linear programming.

Under what conditions then can the l_1 solution yield the l_0 solution ? Suppose that we could reconstruct the vector x uniquely from y. In that case, it doesn't really matter what cost function we use: if there's only one solution, then we'll find it either way.

This is where neighborly polytopes enter the picture. A polytope in R^d is said to be m-neighborly if any m vertices lie on a face of the polytope. Note that the simplex is always d-neighborly: essentially this definition tries to capture the idea of a "simplicial core" of a polytope: crudely put, an m-neighborly polytope has an m-simplex encoded within it somewhere.

If we think of the measurement y as being a point reconstructed from m coefficients, then if y lies on the m-simplex, there are unique coefficients we can assign to the vertices of the simplex to reconstruct it. Donoho and Tanner showed that this is essentially sufficient: namely that the polytope AT^d (where T^d is the d-simplex) is m-neighborly if and only if there's a unique reconstruction of y (implying that the l_1 optimization yields the desired sparse vector).

This is quite an elegant result in its own right: further, they generalize this to a "robust" variant, where the polytope might be approximately m-neighborly, in which case they show that with high probability a unique solution can be reconstructed (this latter result resembles the Johnson-Lindenstrauss theorem). What's interesting about these results is that they connect a very old concept in geometry (neighborly polytopes) to a reconstruction algorithm in compressed sensing.

A side effect of understanding neighborly polytopes from this perspective is that I finally saw why cyclic polytopes "work" as a lower bound for estimating the size of a convex hull. We can ask what the largest value of m can be for which some d-dimensional polytope is m-neighborly. It turns out that the answer is md/2, and the example is the cyclic polytope.

Why is this interesting ? If we go back to the intuition of m-neighborly polytopes containing an m-dimensional simplicial core, it becomes clear. All vertices of a simplex lie on the convex hull. If a polytope is m-neighborly, it should come as no surprise that its convex hull has size n^m. We thus recover the lower bound for a convex hull size in d dimensions from this fact.

Thursday, December 06, 2007

Leibniz Prize

Via an anonymous commenter comes the news that Susanne Albers is one of the 2008 Leibniz Prize winners. This is the most prestigious science award in Germany (and that's all of science, not just computer science). Going back over the list of past winners, it seems (and everything's in German, so I could be wrong here) that the last theory prize winner was Gunter Ziegler, back in 2001. Susanne Albers, for those who don't know, works in online algorithms, approximations, and algorithmic game theory.

From the Leibniz Prize page:

The Leibniz Programme, established in 1985, aims to improve the working conditions of outstanding scientists and academics, expand their research opportunities, relieve them of administrative tasks, and help them employ particularly qualified young researchers. Up to ten prizes are awarded annually with a maximum of €2.5 million per award. The prize is awarded from a slate of nominations put forward by third parties. The prizewinner is selected by the Joint Committee on the basis of a recommendation from the Nominations Committee for the Leibniz Programme.

The Leibniz Prize is administered by the DFG, which according to Wikipedia translates to the German Research Foundation, which would make it the equivalent of the NSF, and the Leibniz Prize roughly comparable to the NSF's Waterman Prize.

Update: here's the announcement (in German)

Academia vs Industry: Hierarchies...

Universities, and the people within it, often like to project the image of an egalitarian society. "This is your university", "A university is made up of professors", "WE run the place", and so on. Nowadays, in the era of 'university-as-business-enterprise', these buzzphrases are often supplemented by, "we're all partners in this enterprise", and other business metaphors.

There's one obvious way in which this is fiction: the tenured/non-tenured divide is vast, and dominates the thinking of the untenured in many ways. Continuing the business metaphors, the tenured/untenured divide is like the partner-associate divide at law firms or investment banks.

But factoring that out, there are more subtle hierarchies: the administrative pathway (chair to dean to provost) is one obvious one, and there are class hierarchies based on funding output, fame, and other parameters like that.

In itself, this is not particularly noteworthy: where humans exist, hierarchies are sure to follow. What I find interesting to observe is how the lack of an institutionalized hierarchical structure (that we see in a company) removes some of the obvious external cues that help us slot into our various positions in the hierarchies. In other words, you have to be reasonably socially aware to pick up on the signals that indicate hierarchical patterns in academia, and once you do, you have to constantly remind yourself that they exist, otherwise you run the risk of silently crossing lines that you shouldn't be crossing.

The key issue here is that the natural tendency to form hierarchies is in this case going counter to the professed semi-fiction of equality, and it's the conflict between the two views that makes for an interesting social dynamic.

In all the ink that gets spilled about academia vs industry, this is one of the more subtle issues that crops up. I doubt it makes a difference to anyone's decision on which way to go, but it's curious nonetheless.

Wednesday, December 05, 2007

Tenure-track position in geometric computation at McGill

Contact McGill Geometry for more details. Here's the summary:

The School of Computer Science at McGill University invites applications for one tenure-track position at the assistant professor level, to begin August 1, 2008, in the general area of geometric computation.

The successful candidate must have a strong theoretical foundation and should be active in research on geometric problems and their applications.

Complete pdf format applications, including a curriculum vitae, a list of publications with copies of one or two sample reprints, a research statement as well as a teaching statement, and the names and e-mail addresses of three references should be sent to geometry@cs.mcgill.ca.

Applications will be reviewed as soon as they are received. No applications will be accepted after January 15, 2008.

The School of Computer Science offers a collegial environment with opportunities for interaction with world class researchers in areas including (but not limited to): computational geometry, discrete mathematics, mobile robotics, computer vision , computer graphics, bioinformatics, cryptography and quantum information, reasoning and learning, and scientific computing.

For further information on the School, see: http://www.cs.mcgill.ca.

Deadline is Jan 15, 2008.

2007 ACM Fellows

A huge crop of algorithms/complexity folks have been named ACM Fellows in this year's batch of awards: among them

Rajeev Alur
Avrim Blum
Andrei Broder
Danny Dolev
Rodney Downey
Tao Jiang (thanks, Anon)
Lance Fortnow
Rajeev Motwani

Congratulations to all !

(12/6): A special congratulations also to Rajeev Motwani (my research "father") and Lance (my blog "father") :)

Monday, December 03, 2007

A question on quantum debugging

As my algorithms class winds down, I've been doing some "topics" lectures, and today I attempted to not make a fool of myself giving students a very high level view of quantum computing, with much help from Dave Bacon and Umesh Vazirani's lecture notes, as well as the Nielsen/Chuang book.

One question came up that I thought was interesting, and I'm hoping that Pontiffs in Shtetls can help answer it :). One of the "features" of programming a randomized algorithm with pseudorandom bits generated by a seed is that you can reproduce a program run by using the same trace. This is of course handy when debugging.

Is there any way of doing this with a quantum algorithm ? I.e is there any way to run a quantum algorithm twice and get the same answer ? maybe using entangled bits in some strange way ?

Friday, November 30, 2007

The true meaning of peer review

from Cosma Shalizi:

passing peer review is better understood as saying a paper is not obviously wrong, not obviously redundant and not obviously boring, rather than as saying it's correct, innovative and important.

Of course, the contrapositive of this is that an obviously wrong, redundant and boring paper has some nonzero chance of being rejected ;)

Interesting new blogspam method ....

When you've been writing a blog for a while, you'll discover what appear to be automated blogs that shamelessly steal your posts and repost them as new posts verbatim. I can only imagine that they do this to get high hits (and therefore advertising revenue) from Google, although why someone would take *my* posts for this purpose escapes me.

A new form of blog plagiarism combines the best of spam techniques with post copying, so as to evade the obvious duplicate detection methods. Consider the following examples:

The original post:

There's been a lot of grief over the reduction in price for iPhones only a few months after they were released. Wired magazine interviewed people who don't regret paying the higher price for the virtue of being an early adopter/arbiter-of-cool, and this comment caught my eye

The modified post (and no, I'm not going to link to it):

There's been a number of visibility at the decline master p scholomance for online in very couple weeks and we were released. Wired magazine interviewed people who don't regret paying a low frequency of the world as both an early riser and perhaps comment and the eye

The original post:

Suppose you're given a metric space (X, d) and a parameter k, and your goal is to find k "clusters" such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster "center" (a designated point).

The modified post:

Lil romeo you're given a multidimensional space (X, d) and a parameter k, and a buff being use find k "clusters" such as lump- end of the effect is minimized. Here, lil romeo use 70million a target wikipedia, the sum of these items out the amount of the need codify the cluster "center" (a designated point).

And so on. What's more mysterious is that this isn't straight plagiarism: the post credits my post as the "source", although given the garbage that's generated, I'm not sure I want that "source" credit :).

Very mystifying...

Saturday, November 24, 2007

Black Friday -- online

I'm on a cable modem, and thus I share my network connection with my neighbours. Ever since Friday morning, my connection has been awfully slow during the day (and much faster in the early AM). I wonder if the whole 'Black Friday, crowded mall' phenomenon is moving online.

Tuesday, November 20, 2007

Author contributions

In this conference season, it seems like a good time to think about the issue of author contributions. Here are some of the methods for author listing used in computer science:

Alphabetical (the norm in theoryCS and communities heavily influenced by it)
Advisor-student: student name comes first, advisor comes last.
Work-ordering: first author did all the work, last author provided the funding, intermediate authors ordered by contribution (common in systems work)

Of course, there are combinations: I've seen alphabetical modified by advisor-student ordering, or alphabetical with sub-level work-ordering pieces etc, although the latter is not common in theoryCS. The arguments for and against the various methods are well known I imagine:

alphabetical ordering conceals who did the real work ("but the cream rises to the top")
Work ordering allows advisors to slap their names on anything coming out of their lab ("But they did all the work needed to fund this research: grant writing, selling, etc")
alphabetical gives undue eminence to authors whose names begin with 'A', (but see here for a counter-theory)
alphabetical ordering makes authors with names in the end of the alphabet lazy (Who're you looking at, punk ?)
theory papers can't be work-ordered, because ideas "float in the air without ownership" (yes, I have heard this argument many times, so don't snicker)

But what is perhaps more interesting is the problem of assigning author contributions, especially in systems like the alphabetical one where author contributions are not signalled by placement. Luca Aceto points out an unusually detailed description of author contributions, and speculates as to what a similar note might look like on theory papers.

He also links to an interesting set of axioms that Hardy and Littlewood developed before their famous collaboration started. I actually admire this approach. As a general rule, theory folks tend to be very allergic towards detailed discussions about contributions and what-not, but in collaborations, it's critically important to lay out the ground rules in advance, and the Hardy-Littlewood axioms are good because they lay out rules in advance that eliminate any problems later on: for example,

And, finally, the fourth, and perhaps most important axiom, stated that it was quite indifferent if one of them had not contributed the least bit to the contents of a paper under their common name . . .

Agreeing to such an axiom requires great trust between the collaborators, because of the immense potential for abuse, and maybe that's exactly the point: a good collaboration requires trust, and you can't agree to such an axiom unless trust already exists.

Monday, November 19, 2007

My first ipelet: 1-medians

I just wrote my first ipelet, and it was an amazingly pleasant experience. Not to mention the thrill I got from marking a collection of points and whoosh! getting the answer right there on the screen.

The ipelet is for computing the discrete and geometric 1-median of a set of points. The geometric 1-median of a set of points (also called the Fermat-Weber-Steiner-Torricelli-< insert your name here > point) is the point that minimizes the sum of distances to a set of points. The discrete 1-median is the same minimization, restricted to points in the input.

Computing the discrete 1-median of a set of points in the plane is trivial in n^2 time, and I didn't do anything clever there. Computing the geometric 1-median is more nontrivial though: there is no closed form solution beyond 4 points, and it can be shown that the point cannot be computed using arithmetic and k^th roots.

There are various approximation schemes: the most pertinent one here (in the plane) is a scheme by Bose, Maheshwari and Morin that uses a cone decomposition of the plane to yield an approximation scheme. In practice, researchers often use an iterative scheme developed by Weiszfeld; it's a simple iterative scheme built around the fact that the optimal solution must be a point such that the sum of unit vectors from it to all other points is zero (formally, this is the condition on the gradient).

The cost function is convex and differentiable except at the input points themselves. Thus, the Weiszfeld iteration is guaranteed to give the optimal solution as long as it doesn't get stuck at one of the data points. This algorithm is often referred to as the "founding algorithm of location science", and there are numerous modifications, extensions, and refinements. The ipelet implements the basic Weiszfeld scheme with a fairly brain-dead termination condition.

What possessed me to do such a thing ? Well, I'm trying to find counter examples for a conjecture involving 1-medians, and needed a way of visualizing examples :).

The code is here: feel free to use it, comment on it, modify it, etc.

IKEA = NP ?

Brian Hayes, in a post with a very catchy title, makes this painful, and yet not-off-the-mark comment about complexity class nomenclature:

The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.

Ouch ! For more on this, read Lance's post from way back when.

Tuesday, November 13, 2007

A day in the life...

Here's a list of things I did today:

I taught one lecture of my class
I attended a departmental committee meeting
I had a meeting with collaborators to work out a paper outline for something we're submitting
I met with my student and did some nontechnical advising
I had a (brief) discussion with a collaborator about a pending grant proposal.

In other words, I covered the entire gamut of activities one might expect of an academic - with one significant exception. Did you notice it ?

Nowhere in there was any actual research done ! Gaaaah !

Tuesday, November 06, 2007

"As good as your best result"

Mihai makes the argument that in theoryCS, you're as famous as your best result. I think the number of good results does matter a bit more than that, but getting well known for your best result is the best way to make a first splash and get on the radar.

I've actually heard the opposite claim made by systems researchers, and this has been extended to other empirical researchers as well. Namely, "you're as good as your worst result" (presumably thresholded to ignore grad school). The rationale here appears to be that in the conservative empirical sciences, where a badly designed experiment can cast a shadow on all your later work, it's your worst result that matters.

I can see how this works (sort of): in the more mathematical disciplines, a "proof" can to a large extent be validated independent of the author, so mistakes in past proofs can be tolerated (though if this becomes an endemic problem, then ...). Louis de Branges is famous for his proof of a conjecture in complex analysis known as the Bieberbach conjecture, and is currently well known for his claimed proof of the Riemann Hypothesis (Karl Sabbagh's book on this has more on de Branges). His proof of the Bieberbach conjecture was not without some initial skepticism from other mathematicians, because of some earlier false starts. However, it was soon seen as correct, and has opened up new areas of mathematics as well. As a consequence, his proofs of the Riemann hypothesis have received more cautious consideration as well (although I am not aware of the current status of his claims).

On the other hand, validating empirical work is largely on trust: you trust that the scientists have made faithful measurements, have kept proper lab notes etc, and I can see how a weakening of that trust can lead to much higher skepticism.

In this light, it's interesting to read Malcolm Gladwell's latest article for the New Yorker. He profiles the "profilers": the psych experts who help the FBI track down serial killers, and comes to this conclusion:

He seems to have understood only that, if you make a great number of predictions, the ones that were wrong will soon be forgotten, and the ones that turn out to be true will make you famous.

Sunday, November 04, 2007

While I wait for inspiration to strike..

I need to find all the napkins I scribble things on...

What a great idea ! PostSecret for the researcherati....

Wednesday, October 24, 2007

Catchy: "Put the Turing in Manufac-turing"

From an op-ed piece by Ken Goldberg and Vijay Kumar:

But U.S. manufacturing today is where database technology was in the early 1960's, a patchwork of ad hoc solutions that lacked the rigorous methodology that leads to scientific innovation. That all changed in 1970 when Ted Codd, an IBM mathematician, invented relational algebra, an elegant mathematical database model that galvanized federally funded research leading to today's $14 billion database industry.
Manufacturing needs the same treatment. Just as the method to add two numbers together doesn't depend on what kind of pencil you use, manufacturing abstractions can be wholly independent of the product one is making or the assembly line systems used to assemble it. Another precedent is the Turing Machine, an elegant abstract model invented by Alan Turing in the 1930s, which established the mathematical and scientific foundations for our now-successful high-tech industries. Without Turing's theoretical work, the system that typeset this line wouldn't exist.
What's needed today is an analogy to the Turing Machine for design, automation and manufacturing. Recent developments in computing and information science have now made it possible to model and reason about physical manufacturing processes, setting the stage for us to "put the Turing into Manufacturing". The result, as was the case with databases and computers, would be higher quality, more reliable products, reduced costs, and faster delivery

As they say, RTWT.

Untangling a geometric graph

Planarity is a rather addictive game developed by John Tantalo. The idea is that you're given a planar graph, drawn with lots of crossings, and your goal is to move vertices around so as to recover a planar embedding. The game can get surprisingly challenging, and has even spawned a photoshop tribute called, 'Planarity, My Anti-drug'.

If we put on our theory hats, the first question that pops out is: how many vertices might you need to move in order to move from an embedding of a planar graph to a planar embedding ? This process, of moving vertices around to remove crossings, is called untangling. Technically, H is an untangling of G if H is crossing free and is isomorphic as G (in other words, untangling might allow more general moves than the sliding allowed by Planarity). A vertex of G was said to be fixed if it was mapped (by the isomorphism) to a vertex of H with the same coordinates.

Back in 1998, it was first asked whether a graph could be made planar while keeping a constant fraction of vertices fixed. The answer, shown by Pach and Tardos, was no, via a graph in which at most roughly n^{2/3} of the vertices could stay fixed (which is one way of explaining the difficulty of Planarity). Of course, the next question was whether it was the case that all planar graphs could be untangled by fixing at least some n^epsilon vertices.

A recent paper by Bose, Dujmovic, Hurtado, Langerman, Morin and Wood answers this question in the affirmative, showing that any geometric planar graph can be untangled while keeping at least n^{1/4} vertices fixed. The proofs involved are intricate, but are graph-theoretic and "elementary" in nature.

Wolfram (2,3) machine exists

Back in May, Bill and David discussed the new Wolfram challenge: Show that a particular 2-state 3-color TM is universal. Apparently, this question has now been resolved affirmatively, by Alex Smith, an undergraduate at the University of Birmingham.

Read more about here, and here.

(via BB)

Monday, October 15, 2007

I foresee mass murders of cell phone users...

From the NYT:

It’s been a long time, but Led Zeppelin, one of the last superstar acts to refrain from selling its music online, is finally offering its catalog to digital-music fans. The shift by Led Zeppelin, whose reunion concert in London next month has already incited a frenzy for tickets, highlights the clout of digital sales in the music market as mass merchants reduce the shelf space devoted to compact discs. Under a series of new agreements expected to be announced today, the band will make its songs available first as ringtones and similar mobile features starting this week in an exclusive deal with Verizon Wireless.

And if you don't believe me, watch Wayne's World.

Thursday, October 11, 2007

Popularizing computing

Brian Hayes writes for the American Scientist (and on his blog, bit-player), and in my mind is one of the best writers of popular material about computing. Every time I read something he's written, I learn something new, even if it's about a topic I'm familiar with. What's even more impressive is that he often focuses on the mathematical aspects of computing and finds interesting ways to explain them to his audience.

Two recent articles of his are exemplars:

1. Sorting out the genome, from September-October 2007. Here's a key extract:

Given two arrangements of a set of genes—say a b c d e f g and f e b a g c d—how do you determine what sequence of reversals produced the transformation? This example has a three-step solution, which you can probably find in a few minutes with pencil and paper. For larger genomes and longer chains of reversals, however, trial-and-error methods soon falter. Is there an efficient algorithm for identifying a sequence of reversals that converts one permutation into another?

The genetic reversal problem lies at the intersection of biology, mathematics and computer science. For some time, the prospects for finding a simple and efficient solution seemed dim, even with the most powerful tools of all three disciplines. But the story has a happy ending. A little more than a decade ago, computing gene reversals was still a subtle research problem; now it can be done with such ease that it's a matter of routine technology. If you need to know the "reversal distance" between two genomes, you can go to a Web site and get the answer in seconds.

The article proceeds to lay out the entire history of sorting with reversals, pancake sorting, and related problems. It's a great case study for the study of algorithms in an application domain: how to model a problem, come up with algoritms, discover that the model doesn't quite capture what you want, rethink the model, come up with more algorithms, and so on. It touches on dynamic programming, greedy algorithms, NP-hardness, and approximations. I liked it so much I'll be devoting a lecture in my algorithms class to it, to illustrate how algorithms research works "in the real world".

2. Conquering divide, bit-player, October 9th, 2007.
The "hook" for this article is is Eric Allender's review article on the true complexity of division (short summary: Division is complete for DLOGTIME-uniform TC^0), which is a must-read in its own right. What Hayes discusses in his article is the reason to even ask this question, starting with the general lack of symmetry between grade-school methods for multiplication and division. I found this particularly insightful:

Why is division so hard?

At times I feel this is a dumb question, and that the universe doesn’t owe us an answer to such questions. Some things are just harder than others; that’s the way the world works. We can count ourselves lucky that multiplication has a polynomial-time algorithm; we have no right to expect that our luck will hold when we turn to division.

But then I get to thinking about physical systems that multiply and divide, where the two operations seem perfectly symmetrical. An example is an electrical circuit governed by Ohm’s law.

If we attach the terminals of a voltmeter to opposite ends of a resistor, we observe Ohm’s law in the form E = IR: Voltage is equal to the product of current and resistance. If we install an ammeter in series with the resistor, we see the effects of the law I = E/R: Current is equal to voltage divided by resistance. If you prefer pipes to wires, you can do the hydraulic version of the experiment, and there are also lots of mechanical schemes, where the same principle is illustrated with levers, gears, pulleys and such. In all of these systems, nature seems to divide just as easily as it multiplies. Why can’t we do the same with pencil and paper, or transistors, or neurons?

Saturday, October 06, 2007

CoM #18

The latest installment of the Carnival of Mathematics is out, hosted by jd2718. It's an amazing effort, with 52 posts from 49 different authors. The first few installments of the CoM were quite sparse, and for a while I wondered whether there was enough chatter in the blathosphere to sustain it. The answer, looking at the variety and quality of posts that Jonathan has put together, is clearly yes !

Well done !

Sunday, September 30, 2007

"Related Work" vs "Prior Work"

(ed: As deadlines draw near, I get increasingly caught up with the minutiae of paper-writing. Be warned)

"Related work" or "Prior work" ? I've seen both section titles used when describing all the work that came before the magnificient specimen of research that I'm reading right now. Personally, I prefer 'Prior work': it seems somehow closer to the spirit of 'I stand on the shoulders of giants', even if that was really just a dig at Hooke. 'Related Work' to me has the tone of a grudging 'yeah yeah, these guys did something that might vaguely resemble what we do here, and I guess I should include it, but don't imagine that their work in any way influences my brilliant thoughts, which are sui generis'.

An issue that appears to be related to this is the placement of the Related/Prior Work section. I personally prefer to have prior work discussions just after the introduction, unless discussing prior work requires the use of notation that I introduce in a definitions section. In many papers (and some of mine as well), the prior work gets shunted off to the no-man's land before the conclusions. I wonder if there's some correlation between the choice of the title, 'Related Work' and the banishing of the section to the very end, a kind of double dismissal of the value of the cited works.

Wednesday, September 26, 2007

The "Voronoi trick"

Only a few days after talking about work on redistricting based on computing a geometric clustering, I came across (via Andrew Gelman) yet another paper on automatic redistricting by Roland Fryer and Richard Holden.

It's a very interesting paper, and at the risk of sounding patronizing, I have to say that I'm impressed at their use of power diagrams and the Voronoi trick, something that many theoryCS folks may not have encountered. Once again, the problem they're looking at is redistricting: can you design voting districts to satisfy certain fairness criteria ?

One specific question that they ask is: can you evaluate the quality of a state's redistricting plan ? Their answer is in the form of an approximation ratio. They fix a measure of quality for a redistricting, and take the ratio of the actual plan cost to the cost of the optimal plan. Their cost, in this case, is a well known clustering measure: the cost of a cluster (in this case, a district) is the all-pairs sum of distances squared, and the cost of a clustering (in this case, a redistricting) is the sum of cluster costs. One extra twist is that they require all clusters to be equally sized upto rounding errors. That is, a 2-clustering of 9 people must have one cluster with 5 and one cluster with 4 people.

The problem of computing an optimal clustering is NP-hard if k, the number of clusters, is part of the input. However, if not, then there's an elegant method for solving these kinds of problems which I'll call the "Voronoi trick".

The Voronoi Trick:

A clustering of n items into k clusters is a partitioning problem, and as such, there are roughly k^n ways of assigning items to clusters (k^n comes from treating all clusters as distinct; in general, the amount might be reduced by as much as k!). But for geometric problems, we fully expect that not all partitions are likely to be optimal, because of proximity constraints and the triangle inequality. Is there some way of restricting the set of possible partitions ?

Consider the following set of possible partitions of a set of n points in a Euclidean space. We put down k centers, and assign each point of the input to its nearest center. Now not all placement of k centers lead to distinct partitions: I can always perturb one center slightly without changing the combinatorial structure of the partition. We'll call such a partition a Voronoi partition, from the connection to Voronoi diagrams.

Now it's clear that not all partitions of the input are realizable as Voronoi partitions. For example, if we consider k=2 points in the plane, then there are only O(n^2) distinct partitions, rather than the 2^n ways of partitioning a set of items into two groups. In general, for a set of n points in d dimensions, there are roughly n^{kd} distinct Voronoi partitions, which is polynomial in n, rather than being exponential.

Now here comes the trick: for many clustering problems, including the one mentioned above, we can show that the optimal clustering must be realizable as a Voronoi partition. This kind of result has appeared in many different forms: Inaba et al showed that for two different kinds of minimum variance clustering (one where the cost of a cluster is the sum of squared distances from the center, and the other where the cost is the pairwise sum of squared distances of points in the cluster), the optimal clustering is a Voronoi partition. Other forms of this trick were used by Aurenhammer et al in their paper on least-squares partitioning where they showed that we could compute the optimal RMS matching between two point sets using Voronoi partitions.

One aspect of this trick that I elided: often, the Voronoi partitioning doesn't come from using a straight distance function, but from using a weighted distance function. One way this is done is to associate a weight w(s) with each site, and define the Voronoi bisector as the set of points satisfying d(x,s) - w(s) = d(x,s') - w(s'), rather than the usual d(x,s) = d(x,s'). If you recall the view of a Voronoi diagram as the lower envelope of a collection of cones in one higher dimension, with apexes at the site locations, then a weighted Voronoi diagram corresponds to lifting the cones by different (even negative) heights, corresponding to the weight function w(s). This diagram is also called a power diagram.

The Voronoi trick and clustering:
Apart from the fact that the Voronoi trick allows us to reduce the space of possible partitions to consider, it also allows us to isolate the influence of points on clusters. This idea was first exploited in the above-mentioned paper by Inaba et al, and has since been used in approximate clustering work by Badoiu, Har-Peled and Indyk, as well as the recent linear-time k-means algorithms by Kumar, Sabharwal and Sen. The argument goes something like this:

Since we know that each center is "influenced" by points in its Voronoi cell, we can focus on the problem of finding a single center out of the k. This is typically solved using a (simpler) 1-median or 1-means algorithm, run on a random sample of points that is likely to contain a large number of points from a single cluster. Once this is done, we can "peel" off this center, and repeat. The ability to do random sampling and peeling comes from the idea of irreducibility: namely, that either a k-clustering can be approximated using a k-1-clustering, or if not, it means that the k^th cluster is "reasonably well-separated" from the rest, and can be isolated using the above ideas.

Returning to redistricting:
Coming back to the redistricting paper, what they do is essentially show that their all-pairs measure of the cost of a cluster reduces to the 'cost to the centroid' measure, since all clusters are required to have the same size. In this case, they can make use of the power diagram to search efficiently for solutions. If I may nitpick, I wonder why they didn't look into the many approximation schemes available for these problems (but this is really a nitpick).

Postscript:
There are other tricks that are also worth of being called a 'Voronoi trick'. One such idea dates back to a paper by Dobkin, Drysdale and Guibas from 1983 ("D.P. Dobkin, R.L. Drysdale, IIi, and L.J. Guibas. Finding smallest polygons. In Advances in Computing Research, Voi. 1, JAI Press, 1983, 181-214."), and works as follows:

Suppose you want to find the k points from a point set of size n that optimize some function: for example, you want the k points with the smallest enclosing ball, or the k points with the smallest convex hull area, and so on. Further, let's say you have some expensive algorithm that can solve this problem in time O(n^c). For many of these functions, it can be shown that the optimal set of k points must lie within a single cell of the O(k)^th order Voronoi diagram of the set of points. In case you hadn't encountered this notion before, the k^th order Voronoi diagram is what you get if instead of taking the lower envelope of your cones, you take the k^th level of the arrangement. Intuitively, it consists of cells in which for any point, its k nearest neighbours are a fixed set of points.

The k^th order Voronoi diagram of points in the plane has complexity O(kn). Using the above fact, we can enumerate over all cells, and run the expensive algorithm over the O(k) points in each cell. This gives a running time of O(k^c * kn = k^{c+1} n), rather than n^c. This idea works upto a point: see this paper by fellow bloggers David and Jeff for improvements that bypass the Voronoi construction in favor of iterated nearest neighbors.

Friday, September 21, 2007

Manifold learning and Geometry

In June, Partha Niyogi gave the invited lecture at SoCG. The subject was manifold learning:

Given a collection of (unlabelled) data inhabiting some high dimensional space, can you determine whether they actually lie on some lower dimensional manifold in this space ?

This talk was a great window into an area of machine learning with strong geometric connections, an area where geometers could profitably play and make useful contributions.

Now NIPS (one of the major machine learning conferences) is returning the favor, with a workshop devoted to the topic of Topology Learning:

There is a growing interest in Machine Learning, in applying geometrical and topological tools to high-dimensional data analysis and processing.

Considering a finite set of points in a high-dimensional space, the approaches developed in the field of Topology Learning intend to learn, explore and exploit the topology of the shapes (topological invariants such as the connectedness, the intrinsic dimension or the Betti numbers), manifolds or not, from which these points are supposed to be drawn.

Applications likely to benefit from these topological characteristics have been identified in the field of Exploratory Data Analysis, Pattern Recognition, Process Control, Semi-Supervised Learning, Manifold Learning and Clustering.

However it appears that the integration in the Machine Learning and Statistics frameworks of the problems we are faced with in Topology Learning, is still in its infancy. So we wish this workshop to ignite cross-fertilization between Machine Learning, Computational Geometry and Topology, likely to benefit to all of them by leading to new approaches, deeper understanding, and stronger theoretical results about the problems carried by Topology Learning.

The list of invited speakers bridges geometry, topology and learning. It looks like a great forum for continuing the machine learning-computational geometry cross-talk that Partha kicked off at SoCG.

Thursday, September 20, 2007

Modelling via the algorithmic lens

Much of my time is spent on the interface between algorithms research and other areas, in the dreaded trenches of 'algorithmic modelling'. Algorithmic modelling can be a tremendously useful tool, if wielded skillfully, and in the few cases where one hits the jackpot, you have an instant argument for the value of theoryCS outside computer science. Most importantly, it provides a kind of rationale for the intrinsic value of the algorithmic method: that thinking algorithmically leads to new ways of thinking about computational problems, and hooks problems in other domains into the engine of algorithmic improvements, so that better algorithm design leads quickly to better solutions to "real-world" problems.

Such work however can be an endless source of frustration. Most commonly, there are cultural difference between areas, and terminology barriers that need to be overcome. But most of all, there's a sense of apathy towards any effort at producing formal guarantees for computational methods (either for running times, or quality). In general, methods based on formal algorithmic analysis still prove themselves in the experimental setting, rather than possessing any intrinsic value by virtue of being correct, having running time and quality guarantees etc.

This in itself can be defended: if the main bottlenecks towards providing an effective practical solution to a problem are not computational (which they often are though), then it's not clear that a practitioner needs to care about esoteric notions like formal guarantees (which can often be quite weak). What is harder to surmount is the 'I've dug my hole and I like it' phenomenon.

For example, the RMS (root-mean-square) distance is commonly used to compare two 3D protein models. But I've never heard any fundamental argument as to why this measure is superior to other possible candidates for matching shape. What's worse, the RMS measure is harder to minimize (under transformations) than other equally plausible measures of shape similarity. However, as the status quo, it's hard to do anything to change this, short of voluminous studies that argue for the relative merits of a different measure based on comparisons on shapes.

Lest it seem that I'm merely whining, Barbie-style, because 'Modelling is hard', let me summarize my main point:

The computational efficacy and guarantees associated with a particular modelling technique are not in and of themselves sufficient to convince a practitioner that this approach is valid, even if in all other respects, the method is identical to other, more entrenched methods.

This makes me wonder about the best way to do algorithmic modelling, if one chooses to do it at all.

Sunday, September 16, 2007

Sunday afternoon MathTubing

Via John Baez, learning category theory on YouTube ! Now what we need is a YouTube video for Scott Aaronson's soap film demonstrations.

The videos are listed under the category, 'How-to and DIY'. Indeed ! Make your own monads, in 9 minutes or less !

Saturday, September 15, 2007

The hardness of fair redistricting

Through a series of pointers, I ended up at Brian Olson's page on redistricting, where he proposes an algorithm to do "fair redistricting" of districts so as to avoid problems with gerrymandering. The measure proposed appears to be a discrete k-median variant: namely, cluster "blocks" into k districts such that the average distance to the center of the district (defined as the centroid of the collection of blocks) is minimized. His code is online, and appears to use a GA-type heuristic. This is essentially a planar k-median problem, and is solvable approximately by a variety of methods.

In reading some of the (critical) comments on some of the original posts, (most of the form (a) this has been proposed ever since the early 60s (b) automation is not the hardest problem in redistricting), I was led to an award-winning political science dissertation by Micah Altman, in which (among other things) he shows that most of the reasonable redistricting formulations (although not the one above) are NP-hard (they tend to be variants of set packing). It was interesting to read a lengthy discourse on computational complexity and NP-hardness in the middle of such a dissertation.

Monday, September 10, 2007

An interesting blog

Derrick Coetzee writes a blog at MSDN titled, 'Developing for Developers'. There's lots of stuff related to code development, but also articles on general computing and theory. Some recent posts:

None of this is necessarily new to people reading this blog, but it's heartening to see an audience for this material in the developer community. And of course, anyone who says,

In reality, computing has nothing to do with computers. Computing is fundamentally about solving problems using strictly defined processes under resource constraints. The theory of computing, by design, has little regard for what device you use to carry those processes out.

will always have a special spot in my blogroll :)

Sunday, September 09, 2007

The TheoryCS blogosphere flourishes !

Jeez. I leave my computer for two hours, and four different bloggers announce the SODA accepts. If for some extremely obscure reason you haven't yet seen the postings, here's the list.

Saturday, September 08, 2007

Carnival of Mathematics XVI

is up, courtesy Kurt van Etten @ Learning Computation. The theme is 'Good, Bad and Ugly', and for some reason, computer science is "Ugly" :). Given that two of the entries involve mathematical feuds, maybe this is fitting.

Physician, heal thyself ?

There's been a lot of grief over the reduction in price for iPhones only a few months after they were released. Wired magazine interviewed people who don't regret paying the higher price for the virtue of being an early adopter/arbiter-of-cool, and this comment caught my eye (emphasis mine):

"If they told me at the outset the iPhone would be $200 cheaper the next day, I would have thought about it for a second - and still bought it," said Andrew Brin, a 47-year-old addiction therapist in Los Angeles. "It was $600 and that was the price I was willing to pay for it."

I don't know: I think there are some other people who should get their money back.

Wednesday, September 05, 2007

Heuristics for NP-hard problems

A while back, Michael was discussing the teaching of heuristics in his undergraduate algorithms class, and had this request:

I wish the standard textbooks would devote a solid chapter on heuristic techniques; in real life, most students are probably more likely to have to implement a heuristic for some NP-hard problem than to have to implement a minimum spanning tree algorithm.

If you remove the word, 'standard', he may just have got his wish. From the press blurb for the Handbook of Approximation Algorithms and Metaheuristics:

Delineating the tremendous growth in this area, the Handbook of Approximation Algorithms and Metaheuristics covers fundamental, theoretical topics as well as advanced, practical applications. It is the first book to comprehensively study both approximation algorithms and metaheuristics. Starting with basic approaches, the handbook presents the methodologies to design and analyze efficient approximation algorithms for a large class of problems, and to establish inapproximability results for another class of problems. It also discusses local search, neural networks, and metaheuristics, as well as multiobjective problems, sensitivity analysis, and stability.

Browsing the table of contents, the reader encounters an entire section on heuristics, with chapters on local search, stochastic local search, neural networks, genetic algorithms, tabu search, simulated annealing, ant colony optimization and the like. The book is published by CRC, which stands for "Organization-That-Singlehandedly-Destroys-Rain-Forests-While-Making-Your-Biceps-Extremely-Strong".

I generally have mixed feelings about CRC books: they are good reference texts, but tend to skim over content way too easily, and CRC has this unhealhy obsession with handbooks for everything. But for teaching heuristics, this might be an invaluable reference.

I should mention that the vast majority of the handbook deals with "proper" approximation algorithms, as well as applications to specific areas. The problem is that approximation algorithms is a fast moving field (though maybe not as fast as in the middle 90s), and so a handbook, being a static snapshot of the time, will get dated quickly.

Tuesday, September 04, 2007

Teacher's Day

I've been out of India for far too long, or so it seems. I had to be reminded by this article that Sep 5, Wednesday, is Teacher's Day in India. This day was chosen for its co-incidence with the birthday of India's second president, S. Radhakrishnan, a statesman and politician famous for his erudition and learning. According to Wikipedia, only 21 countries have an official celebration of Teacher's Day. Much to my surprise, the US is among them, with the day being part of a week called Teacher Appreciation Week.

Teacher's day in my school was always a big event; teachers did not teach that day, and came to school mainly to be entertained and feted in an assembly with variety shows/skits/gentle mockery. Senior students would mind the junior classes. when I was younger, this meant a day of partying in school. When I became one of the seniors, this meant I was able to come to school in "civilian clothing" (most schools in Delhi had strict uniform policies) to boss over the younger students. Good times...

Now I'm a teacher myself (after a fashion), and find it both easy (and fashionable) to grumble about the lack of respect shown by students in the US, as compared to students in India. The truth of course is far more complicated than that. Schools in Delhi appear to have acquired many of the "bad traits" of the worst high schools here; my mother has taught in Delhi schools for nearly 20 years and has seen them acquire more "western" habits (which is a shorthand for more boy-girl interaction, less "studiousness", less fear/reverence for the teachers, take your pick). And ultimately, as a university professor, I'm not even on the frontlines of education here in the US, and have no business complaining about the largely excellent students I teach.

In any case, here's a cheer for the teachers I had; the ones who tolerated my brashness, general arrogance, and constant questions, and helped me reach the teaching pedestal I find myself at today. And here's hoping that one day there'll be some student who might think as fondly of me as I think of all my teachers long past.

Monday, September 03, 2007

NP hardness of Euclidean k-median clustering

Suppose you're given a metric space (X, d) and a parameter k, and your goal is to find k "clusters" such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster "center" (a designated point).

This is the well-known k-median problem (which differs from the also popular k-means problem by the use of distances, rather than squares of distances). In a general metric space, the k-median problem is known to be NP-hard, as well as being hard to approximate to within arbitrary constant factor. The current best known approximation ratio for the k-median is ~~4, due to Charikar and Guha~~ 3 + eps, via a local search heuristic due to Arya, Garg, Khandekar, Meyerson, Munagala and Pandit (thanks, Chandra).

If the underlying metric is the Euclidean metric, then the problem complexity changes: in fact a PTAS for the Euclidean k-median can be obtained, due to the results of Arora, Raghavan and Rao, and then Kolliopoulos and Rao (who provide an almost-linear time algorithm). But as far as I am aware, there is still no NP-hardness proof for the Euclidean k-median problem, and I'd be interested in knowing if I am wrong here.

Note that the related problem of Euclidean k-means is known to be NP-hard from an observation by Drineas, Frieze, Kannan, Vempala and Vinay.

Update: As pointed out in the comments, the NP-hardness of k-median in the plane was shown in 1984 by Megiddo and Supowit. Once again, the wisdom of the internet beats independent research ;)

Tenure committee is to Colombian drug cartel as ...

Via HWTW, a hilarious advice piece to new profs by Phil Ford at Inside Higher Ed, adapted from a Notorious B.I.G rap, "The Ten Crack Commandments". The article is great, but the comments are even funnier.

"Data Mining" = voodoo science ?

On the Statistical Modeling blog, Aleks Jakulin has a rant on the virtues of data mining:

I view data analysis as summarization: use the machine to work with large quantities of data that would otherwise be hard to deal with by hand. I am also curious about what would the data suggest, and open to suggestions. Automated model selection can be used to list a few hypotheses that stick out of the crowd: I was not using model selection to select anything, but merely to be able to quantify how much a hypothesis sticks out from the morass of the null.
The response from several social scientists has been rather unappreciative along the following lines: "Where is your hypothesis? What you're doing isn't science! You're doing DATA MINING !"

I had almost the same reaction a while back when I was visiting JPL: the climatologists there were horrified at the idea of trolling for patterns in climate data, and to the person, asked me the dreaded 'But what is the science question?" question. Of course, given the general hot-potato-ness of climatology right now, one might sympathize with their skittishness.

Data mining is a tricky area to work in, and I've discussed this problem earlier. It's a veritable treasure-chest of rich algorithmic problems, especially in high dimensional geometry, and especially over large data sets. However, it's often difficult to get a sense of forward progress, especially since the underlying analysis questions often seem like elaborate fishing expeditions.

In that context, the distinction Aleks makes between confirmatory data analysis (check if the data validates or invalidates a hypothesis) and exploratory data analysis (play with the data to create a non-uniform distribution on plausible hypotheses) is quite helpful. It also emphasizes the interactive and very visual nature of data mining; interactive tools and visualizations are as important as the underlying analysis tools as well.

Update: Chris Wiggins points me to some of the earlier references to 'data mining'. One of the most vituperative is a paper by Michael Lovell in 1983 in The Review of Economics And Statistics. This paper drips with scorn for 'data miners', but makes a point that is at the very least worthy of consideration: namely that because of the large dimensionality of the space of hypotheses that a data mining application typically explores (here couched in terms of explanatory variables for a regression), patterns with apparently high p-values might not actually be that significant (or stated another way, in high dimensional spaces, there are many seemingly rare patterns that aren't that rare).

Pages