Thursday, October 28, 2010

FOCS Day 2: (much delayed)

I had to return to Salt Lake the evening of day 2, so blogging got left by the wayside. Piotr is to blame for this post :).

As usual, by day 2 (day 3 for me because of the tutorials day), my desire to attend talks far outstripped my ability to attend them, and I especially wanted to hear more about the Kumar/Kannan clustering with spectral norm paper, and the stability for clustering result of Awasthi, Blum and Sheffet. If I ever have control of setting the program for a conference, I'm going to decree that all talks start no earlier than 11am (they can go late in the evening, for all I care).

The first talk that I attended was by Alex Andoni on the new polylog approximation for the edit distance that he has with Robert Krauthgamer and Krysztof Onak. Dick Lipton actually spends a good deal of time on the problem and background in this post, and also highlights their earlier paper that gave a sub-polynomial appproximation in subquadratic time.

This paper goes one better, producing an algorithm that runs in time $O(n^{1+\epsilon})$ while yielding an $\log^{1/\epsilon}$ approximation. A key idea in this result is a very neat trick: rather than treating the strings x,y symmetrically, the algorithm first compresses x down to a much smaller string (of length n^\epsilon), and then runs a crude exact algorithm on this compressed string and y.

This compression is randomized: the algorithm subsamples small pieces of x. However, a straightforward sampling doesn't preserve the distance, so the algorithm actually starts by designing a careful hierarchical partitioning of x which induces a modified distance function (that they relate to the edit distance). Then the sampling takes place in this tree. Obviously I'm skipping many of the details that I didn't get, but Alex gave a very nice talk that laid out the hig level picture (his talks are always a model of clarity and intuition).

Another talk  that I found intriguing was the work by Saks and Seshadri on computing the longest increasing subsequence of a string. While there's a straightforward dynamic program for this problem, they showed that the length of the sequence could be estimated in "sublinear" time (in the size of the program). In fact, their technique is quite general, and might lead to a general method for approximating the output of a dynamic program in time sublinear in the size of the table. A key structural property that they develop is that either the dynamic program can easily be broken up into two balanced subrectangles, or it can be subsampled O(log n) times, yielding smaller copies that can be combined to find a solution (it's possible I'm misrepresenting the last bit). Sesh gave an excellent talk, but his taste in talk jokes was so terrible it was actually funny :) (sorry, Sesh!)

I deeply regret having to leave before the evening plenary session, and especially regret missing Dan Spielman's talk, which by all accounts was magnificient. All the talks were videotaped, so they'll eventually be online and I can watch it then.

Monday, October 25, 2010

FOCS Day 1: some notes

Three newsworthy items from the FOCS business meeting: (for more details follow the focs2010 tag on twitter)
  1. Petros Drineas is now an AF program director at the NSF. Congrats, Petros !
  2. STOC 2011 might experiment with a poster session with a different call for submissions. That's a great idea - I hope they decide to go ahead with it
  3. Not one person felt that FOCS should continue to have printed proceedings. As recently as a few years ago, at least a third of the audience at theory conferences would have asked for printed proceedings. Times they are a'changin' !!!

Journals, Conferences, and non-galactic algorithms

Does our lack of attention to journal publications influence the "impracticality" of our algorithms ? 

I've been thinking about algorithms, applications and the dreaded "practicality" issue. And the following thought occurred to me.

We have a well-known aversion to journal papers in our community. I won't rehash the arguments for and against. But it is definitely true that in a conference paper we elide many details of the algorithm construction and proof. It's not uncommon for a 10-page conference paper to blow up to a nearly 100 page journal paper (cough cough MST cough cough).

So suppose you have this snazzy new algorithm for a problem. You're happy with your handwavy conference version, and you don't bother to write the journal version. When you do, you realize to your horror that your snazzy 1-page procedure has become this 12 page lumbering beast of an algorithm that's so insanely complicated that you'd be ashamed to even call it 'galactic'. Now this might motivate you to come up with a simpler algorithm, or it might motivate someone else !

But if you don't write the journal version with full details, how is anyone to know ? 

While this is in part about practicality (since practical almost always equals simple), it's more than that. Clarity in algorithm design is useful even if there's no practical value, because it aids in understanding. So this is in my mind a completely different reason for why we should write more journal papers. It might make our algorithms simpler !!
p.s Yes, I am aware that very often the hard part of a paper is PROVING that an algorithm is correct, and not the algorithm itself. Obviously this would not apply to those cases. But even there, having to contemplate the monstrosity of the proof you end up writing would motivate you to simplify it. After all, proofs are about understanding, not fireworks and fountains at the Bellagio.

FOCS Day 1: Clustering

A long time ago I realized that if I took blogging duties too seriously, I'd be exhausted trying to report on talks that I attended. So I'm not going to even try. I will talk about a pair of talks on clustering though.

A core question of clustering is this: learn a mixture of $k$ $d$-dimensional Gaussians with different means and covariance matrices from samples drawn from the mixture.A long line of research, starting with a paper by Sanjoy Dasgupta back in 1999, produced variations on the following result:
If the Gaussians are "well separated" and have "nice" shapes, then a "small number" of samples suffices to determine the parameters of the system. 
Here, "parameters" would be means, covariance matrix elements, and relative weights. As time went on, the separations got smaller, the niceness reduced, and the "small number" got better.

These two papers take a slightly more general tack, in different ways. They both eliminate the conditional nature of prior bounds ("if the Gaussians are well separated"), instead replacing this by an unconditional bound ("poly in a parameter capturing the separation"). 

The first, by Moitra and Valiant (both Ph.D students!) measures the separation between the distributions via their statistical distance, and gives a bound for learning in terms of the reciprocal of this distance. Their main insight is to project the distributions onto a few directions (i.e 1D), learn what they can, and then recurse on the data that doesn't appear to be well-separated enough to learn properly. This takes a lot of technical skill to do correctly.

The second paper, by Belkin and Sinha, takes a different approach. They  pay more attention to the parameters of the distribution. This can be a problem in general because closeness in distribution does not imply closeness in parameter space. However, they show that the set of all distributions with fixed moments forms a nice algebraic variety, and that the above problem can be quantified by determining how closely this variety folds in on itself (intuitively, if the variety intersects itself, then two far away parameter sets yield the same distribution). Using some standard (but not trivial) results in algebraic geometry, they can bound the complexity of the resulting structures (so that they can estimate the number of samples needed to get to a distiinct region of the space). There's more detail involved in the estimation of parameters and moments and so on.

While both papers are impressive achievements that essentially resolve the theoretical aspects of learning mixtures of distributions, I came away not entirely sure how to compare them. On the one hand, the Moitra/Valiant paper is more explicit and is likely to be less "galactic" in nature. But it appears to be limited to Gaussians. The Sinha/Belkin paper is more general, but uses more abstract machinery that is unlikely to be useful in practice.

So what remains is stilll an effective way of learning Gaussians or other families of distributions, and also a better understanding of how these bounds relate to each other.

Sunday, October 24, 2010

FOCS Day 0

I'm in Las Vegas for FOCS 2010. It's been a while since I've attended FOCS, and the short distance from Salt Lake was definitely a factor.

Kudos to the organizing committee for the tutorial sessions today. I've long whined about the lack of non-paper-presenting events at theory conferences, and the topics for today were great ones. Unfortunately, a spectacular debacle on the part of the hotel restaurant meant that I (and 11 other poor souls) missed most of Mihai's tutorial on data structures.

But the first tutorial by Ketan Mulmuley was fantastic. Let's be clear. It's impossible to convey any depth of material on GCT in 1.5 hours, but he hit (to me) just the right level of high level exposition and low-level technical conjectures. I also learnt something brand new from the presentation, which I'll try to summarize.

In Godel's incompleteness theorem, we all know that the key step is phrasing a statement of the form "I am not true". The self-referentiality of this statement is what drives the paradox within the logical system that allows you to frame it.

The part that's important to remember is that the logical system has to be powerful enough in the first place, so that it's possible to even do self-referencing. A weak logical system can't do that, which is why Godel's theorem only kicks in once the system is strong enough (a very Greek tragedy concept this, that it is your own power that sows the seeds to destroy you).

It turns out that a similar mechanism comes into play with P vs NP. Mulmuley starts with a "trivial proof of hardness" for a function f. Write down all circuits of small complexity, and for each circuit write down some input for which C(x) != f(x). This table is of course of exponential size, but never mind that.

He now introduces the flip operation. This is the idea that instead of trying to manipulate this large table, we actually store a smaller table of certificates, with the property that from one such certificate, you can reconstruct efficiently a set of circuit and witness pairs from the table, and the set of such certificates cover the entire table. You also need to able to decode these certificates efficiently and quickly verify that an input produced in this manner actually does demonstrate a counter example.

There's a problem with the flip operation though. We're using this flip structure to show that P != NP (nonuniformly for now). On the other hand, the success of the flip operation requires us to construct a mapping from the certificates to the table that can also be verified in polynomial time. But since f is an NP-complete problem, what we are essentially saying is that we can efficiently construct short witnesses of membership or non-membership, which means that we've collapsed P and NP !

You could of course retort by saying, "well I don't want to use the flip operation then !". It turns out that you can't ! He states a result saying that ANY proof separating P and NP will essentially look like a flip-based proof. So no matter what you do, you have to deal with the self-contradicting structure of the flip. The geometric construction (GCT) is his program for constructing the flip so that this cycle of referencing can be broken.

But that's not where the analogy to Godel's theorem comes from. When we start digging into separating the permanent from the determinant, we can translate the problem into an embedding problem over algebraic varieties. This is because both the permanent and determinant have unique characterizations as the only polynomials satisfying certain properties over matrices.

Now the separation problem can be framed as the problem of finding an efficient obstruction that demonstrates the separation between these two varieties. What's even more bizarre is that ANY proof that separates the permanent and the determinant will end up producing such an algebraic-geometric obstruction, even if the proof itself doesn't use algebraic geometry.

So let's review. Any proof of P vs NP will induce a "flip" construction that generates small certificates that explicitly show hardness of  a problem. The permanent is one such hard problem, and lends itself to an invariant representation that yields a key obstacle that we need to construct in polynomial time in order to make the flip work.

In other words, we need to be able to construct an object in polynomial time (that to date we don't know how to do in less than triply exponential time) in order to show that these very same poly-time computations CANNOT solve the permanent.

And this is the core paradox at the heart of the program. Our inability to prove that P != NP is because we can't show that P is strong enough to express within itself the certificate that it is NOT strong enough to capture NP.

Mulmuley makes another very interesting point at the end of the tutorial. Many people have wondered why the GCT framework can't be used to prove "easier" bounds for other separations. While he mentions some examples of problems (matrix multiplication) where people are attempting to use the GCT framework to show poly lower bounds, he also points out that it is only because of the beautiful invariant representations of the permanent and determinant that we can bring the machinery of algebraic geometry to bear on the problem. As he says, we got lucky.

Wednesday, October 20, 2010

On outreach to applied communities

There's an argument playing out on cstheory (yes, we need a better name) on whether we should encourage or discourage questions like "how can I model X phenomenon theoretically", or "Are applied questions off-topic"

While Scott Aaronson posted an eloquent defense of why we should always entertain such questions, the general tone of the responses has been negative. The main fear has been a 'slippery slope'  - that if we allow one modelling question, then the site would be overrun by modelling questions.

I think this sentiment is particularly interesting in the context of David Johnson's post requesting examples where approximation algorithms were deployed usefully in practice (and that he couldn't come up with many good examples)

The truth is that outreach to more applied communities is not about taking your algorithms hammer and beating their heuristic over the head with it. As I've said before, you have to have a DRAMATIC improvement in performance before people will change their standard practices.

It's firstly about communicating a style of asking and answering questions, even if the actual answers are "trivial". A collaboration where you persuade someone to use max flows might not be theoretically very splashy, but it has a huge impact in how people think about algorithms, theory and problem solving.

Doing this first prepares the ground for future collaborations. If you want someone to use your new and snazzy approximation for vertex cover, you have to first convince them that vertex cover makes sense, and before that you have to convince them that even modelling the problem as a graph makes sense. Believe me when I say that the very notion of defining a problem formally before solving it is not how work gets done in many applied areas.

Michael Mitzenmacher makes a similar point in a comment to the above post, where he says:
I do think, however, that sometimes the focus of theorists -- to get the best constant-approximation or to shave log or log log factors off -- obscures rather than highlights a good part of what I see as the real "practical value" of this line of research. Focusing on what algorithmic ideas underlie the approximation, and more implementation and experimentation, would (I think) give more real-world power to this theoretical work.

Tuesday, October 19, 2010

Practical Applications of Approximation Algorithms

(Guest Post by David Johnson)
In my Knuth Prize Lecture on approximation algorithms at the last STOC conference, I celebrated the many great theoretical advances we have seen in the area of approximation algorithms over the last four decades, but lamented the apparent paucity of practical applications of this work. We motivate our work in this area as a way of coping with NP-hardness, but how many people who actually have to do such coping in practice are
using our ideas?

In my talk I could only come up with a few examples, these from our work at AT&T Labs - Research, where we have adapted the Goeman-Williamson primal-dual approximation algorithm for the Prize Collecting Steiner Tree problem for use in tools that help design access networks, and where exponential potential function methods for approximating multicommodity flows are being exploited in a proposed content-distribution scheme. I also noted that of the 14 Kanellakis "Theory and Practice Awards" given out so far, of which some 6 or 7 have been for algorithms, only one, that to Freund and Schapire for their statistical "boosting" technique, could be viewed as related to approximation algorithms, and the boosting research is typically classified instead as Learning Theory.

So, I think many viewed my talk as a rather negative exercise. Certainly, practical impact is only one dimension along which to evaluate a field, and, if we relied only on practical motivations, we would not have developed the deep and informative theory we now have. However, practicality IS an important dimension, and I hope that my inability to identify many examples of practical impact was more a fault of my own ignorance, than one of the field. Giving a talk about this to a primarily theoretical audience did not develop many new leads, so I'm hoping that a blog posting will be more successful.

In particular, I would like to collect

(A) Stories about algorithms or algorithmic approaches developed by approximation algorithm theorists that were implemented and applied, either directly or in modified form, by people to solve real-world problems.
(B) Stories about the analysis of algorithms developed by others (for example, the greedy algorithm for SET COVER) that influenced the use of these algorithms in practice, or at least helped explain their performance and hence was cited in "Applications" papers.
The algorithms need not be restricted to NP-hard problems -- I am also interested in approximation algorithms for problems with polynomial-time optimization algorithms that can be too slow for some practical applications.

For both types of stories, I'd like as much detail as possible - who did the work, where, etc., and citations if known. You can post your nominations/comments on this blog, or send them directly to me (dsj@research.att.com). If you post as a comment, please do not do so anonymously, as I may want to contact you for more information.

Assuming I get a good response, I plan to prepare a survey paper (or at least another blog posting) summarizing what I learn.

Thanks! David Johnson (dsj@research.att.com)

SoCG deadline, and a new NSF rule

SoCG CFP is out:
  • Titles/abstracts: Nov 22
  • Papers: Dec 1
  • Notification: Feb 14
  • June 13-15, conference. 
As Sariel noted,  the CFP makes an explicit request for more diverse submissions and plans to accept more papers:
Anticipating the usual high overall quality of submissions, the program
committee intends to accept more papers than was typical in previous years. Also, we intend to interpret the scope of the conference broadly, and will accept all papers of high quality that are of significant interest to our research community. The conference venue in Paris is attractive and allows a bigger conference than previous years.
My optimistic side is excited by this, and I really hope we can see more papers on diverse topics. My pessimistic side is grumping "yes, that means any papers that aren't mine". sigh...

In unrelated news, the NSF appears to have a new policy for 'data management' and will require a new supplementary document to this effect. Note that since it's a mandatory document, it will affect theory proposals as well:

Plans for data management and sharing of the products of research. Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan.” This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:

* The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project
* The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)
* Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
* Policies and provisions for re-use, re-distribution, and the production of derivatives
* Plans for archiving data, samples, and other research products, and for preservation of access to them


Tuesday, October 05, 2010

Absolutely terrible news: Partha Niyogi passes away

(Via Lance) Partha Niyogi just passed away.

I first met Partha at SoCG 2007 in Gyeongju, where he gave an invited presentation on geometric perspectives in machine learning. We also spent a very pleasant day playing hookey from the conference, wandering around the nearby tourist spots, and talking about geometry and machine learning. He was a brilliant researcher with an impatience and drive that led him to do some truly excellent work.

Anyone who's ever dabbled in machine learning knows about the strong connections between ML and geometry, dating back to the very notion of VC dimension. But while the connections between the areas are manifest and plentiful, there haven't been many people who could successfully navigate both realms. Partha was one of those unusual people, who coupled deep mathematical insight with a strong vision for problems in ML. Most recently, he organized a wonderful summer school in machine learning and geometry, whose syllabus is worth perusing to get a sense of the depth of this interaction.

In many circles, he's most known for his work in manifold learning, where in collaboration with Misha Belkin, Steve Smale and others, he laid out a series of rigorous results on the problem of how to "learn" a low dimensional manifold given access to samples of points from it. The work on Laplacian Eigenmaps is a key marker in this area, and has had tremendous influence in the realm of "non-Euclidean" data analysis. Among his other recent contributions in theory-land are a neat result from FOCS 2006 with Belkin and Narayanan on sampling the surface of a convex polytope using heat flow and the Laplace operator.

It's a great loss.

Monday, October 04, 2010

We're hiring !

The School of Computing is looking for three ! count 'em ! THREE ! tenure-track faculty this year, in a broad search with interests in:
algorithms, concurrent software, data bases, data mining, formal verification, large data analysis, machine learning, performance verification, parallel systems, robotics, and theory.
We're looking for interdisciplinary folks with interests in possily more than one of the above, and there are nominally three "areas of interest" that we are looking in: parallel computing, robotics and big data.

For theory folks, probably the most relevant of the three areas of interest is the big data search, especially if you're interested in data mining/machine learning/statistics and other forms of data analysis. But if you have strengths in any of the other two areas, then make sure to highlight that.

Email me if you have any questions about this. I'm very excited about growing our large-data program by getting strong algorithms folks. rec

Disqus for The Geomblog