Jeff Phillips and I organized a workshop on Geometry In The Field at SoCG 2012. I haven't blogged about the workshop yet because Jeff and I are working on a more detailed report for the SIGACT Computational Geometry column, and I'll post it here when it's done.
Many people had asked me when the talk slides would be available, and I'm happy to announce that all the talk slides are now up at the workshop site.
I strongly recommend looking over the slides if you have the time: these were excellent talks by great speakers that gave a masterful overview of their specific areas, as well as challenges for the future.
Ruminations on computational geometry, algorithms, theoretical computer science and life
Thursday, June 28, 2012
Geometry in the Field: Talk Slides
Friday, June 22, 2012
CGWeek II
Some highlights from day 2 of CGWeek:
The Cheeger inequality is a well known inequality in spectral graph theory that connects the "combinatorial expansion" of a graph with "spectral expansion". Among other things, it's useful for clustering, because you can split a graph using Cheeger's inequality to find a "thin cut" and then repeat. There's been work recently on a "higher-order" Cheeger inequality, that allows you to cut a graph into $k$ pieces instead of two by connecting this to observations about the first $k$ eigenvectors of the Laplacian.
The above paper generalizes the notion of combinatorial expansion versus spectral expansion in a very different way. Think of a graph as the 1-skeleton of a simplicial complex (i.e the set of faces of dimension at most 1). Is there a way to define the Laplacian of the complex itself ? And is there then an equivalent of Cheeger's inequality ?
It turns out that this can indeed be done. Recent work by Gromov and others have shown how to define the notion of "edge expansion" for a simplicial complex. Roughly speaking, you compute the edge expansion of a cochain (a function of a chain) by endowing it with a norm and then looking at the norm of the edge operator with respect to the norm (sort of) of the cochain itself. What is interesting is that if you choose the underlying coefficient field as $\mathbb{R}$ and the norm as $\ell_2$, you get the spectral equivalent, and if you choose instead $\mathbb{Z}_2$ and the Hamming distance, you get the combinatorial equivalent.
It's known that for 1-skeletons, and even for things like persistence, the underlying field used to define homology doesn't really matter. However, for this problem, it matters a lot. The authors show that there is no equivalent Cheeger's inequality for simplicial complexes ! They also look at random complexes and analyze their properties (just like we can do for random graphs).
Suppose you have three Gaussians in the plane, and you look at the resulting normalized distribution. You expect to see three bumps (modes), and if the Gaussians merge together, you'd expect to see the modes come together in a supermode.
Can you ever get more than three modes ?
This is the question the above paper asks. It was conjectured that this cannot happen, and in fact in 2003 it was shown that it was possible to get 4 modes from three Gaussians (you can get a little bump in the middle as the three Gaussians pull apart). In this paper, they show that in fact you can get a super-linear number of "bumps" or critical points for $n$ Gaussians, and these modes are not transient - they "persist" in a certain sense.
This is quite surprising in and of itself. But it's also important. A plausible approach to clustering a mixture of Gaussians might look for density modes and assign cluster centers there. What this paper says is that you can't really do that, because of the "ghost" centers that can appear.
Other quick hits:
On Laplacians of Random Complexes, by Anna Gundert and Uli Wagner.
The Cheeger inequality is a well known inequality in spectral graph theory that connects the "combinatorial expansion" of a graph with "spectral expansion". Among other things, it's useful for clustering, because you can split a graph using Cheeger's inequality to find a "thin cut" and then repeat. There's been work recently on a "higher-order" Cheeger inequality, that allows you to cut a graph into $k$ pieces instead of two by connecting this to observations about the first $k$ eigenvectors of the Laplacian.
The above paper generalizes the notion of combinatorial expansion versus spectral expansion in a very different way. Think of a graph as the 1-skeleton of a simplicial complex (i.e the set of faces of dimension at most 1). Is there a way to define the Laplacian of the complex itself ? And is there then an equivalent of Cheeger's inequality ?
It turns out that this can indeed be done. Recent work by Gromov and others have shown how to define the notion of "edge expansion" for a simplicial complex. Roughly speaking, you compute the edge expansion of a cochain (a function of a chain) by endowing it with a norm and then looking at the norm of the edge operator with respect to the norm (sort of) of the cochain itself. What is interesting is that if you choose the underlying coefficient field as $\mathbb{R}$ and the norm as $\ell_2$, you get the spectral equivalent, and if you choose instead $\mathbb{Z}_2$ and the Hamming distance, you get the combinatorial equivalent.
It's known that for 1-skeletons, and even for things like persistence, the underlying field used to define homology doesn't really matter. However, for this problem, it matters a lot. The authors show that there is no equivalent Cheeger's inequality for simplicial complexes ! They also look at random complexes and analyze their properties (just like we can do for random graphs).
Add Isotropic Gaussian Kernels at Own Risk: More and More Resilient Modes in Higher Dimensions, by Edelsbrunner, Fasy and Rote
Suppose you have three Gaussians in the plane, and you look at the resulting normalized distribution. You expect to see three bumps (modes), and if the Gaussians merge together, you'd expect to see the modes come together in a supermode.
Can you ever get more than three modes ?
This is the question the above paper asks. It was conjectured that this cannot happen, and in fact in 2003 it was shown that it was possible to get 4 modes from three Gaussians (you can get a little bump in the middle as the three Gaussians pull apart). In this paper, they show that in fact you can get a super-linear number of "bumps" or critical points for $n$ Gaussians, and these modes are not transient - they "persist" in a certain sense.
This is quite surprising in and of itself. But it's also important. A plausible approach to clustering a mixture of Gaussians might look for density modes and assign cluster centers there. What this paper says is that you can't really do that, because of the "ghost" centers that can appear.
Other quick hits:
- Chris Bishop ran a very interesting workshop on Analysis and Geometry. While I couldn't attend all the talks, the first one was by Ranaan Schul gave an overview of the analytic TSP, in which you're given a continuous set of points in the plane, and want to know if there's a finite length curve that passes through all the points. The techniques used here relate to multiscale analysis of data and things like local PCA.
- One of the cool innovations in CGWeek was the fast-forward session: each afternoon before the workshops started, speakers were allowed to give a 1-slide overview of what would happen in their events. The slides were all placed in a single presentation ahead of time, and it was clocked so that people couldn't run on. It was great - I didn't have to do careful planning, and got a much better idea of what was coming up. We should do this for all talks at the conference !
Sunday, June 17, 2012
CGWeek Day I
CGWeek day 1 has my head spinning. Attending a mix of conference talks and workshops is something I'm used to in the DB/data mining world, but not in theory conferences. It's refreshing, exhilarating, and exhausting, all at the same time.
There were a number of great talks today at the conference and the workshops -- too many to do justice to. Here are some random thoughts:
There were a number of great talks today at the conference and the workshops -- too many to do justice to. Here are some random thoughts:
- This year, there's a best student presentation award. My student Amirali Abdullah is one of the competitors, so I don't want to say too much about it, but I'll say that everyone uppped their game - the presentations by students were great ! Maybe they should have best presentation awards in general to improve everyone's presentation skills !
- Amir's talk went pretty well (if I say so myself). It was his first conference talk, but I thought it was well above the average (maybe this says something about the average :)). The talk was about approximate Bregman near neighbors.
- Very intriguing paper by Adamaszek and Stacho on the connection between homology structure of flag complexes and matchings that induce maximal independent sets.
- The workshop on geometric learning was fantastic. Lots to think about on multiscale representations, and distances to measures.
- Jeff Erickson has a four part liveplussing (one, II, trois, fear) of the business meeting
- Most importantly: Rio in 2013 !! Kyoto in 2014 (boo) !! OSU no way (I kid, I kid).
- Even more importantly: Herbert proposes changing the name of the conference to "Symposium on Computational Geometry and Topology". Major concern - what about the sausage ? Audience votes confusedly after Herbert promises a Klein bottle of beer for everyone.
A Reflection On Big Data (Guest post)
Ed: Graham Cormode kindly agreed to send a missive from the recently concluded big data workshop at Duke
There seem to be no shortage of Big Data events at the moment, reflecting some of the enthusiasm around this topic. Last week, there were meetings at NIST and at Duke and there are upcoming events at Stanford and SAMSI to name but a few. (Ed: also the new Simons Foundation program on big data)
I attended the Duke event, and was hoping to blog about the meeting, but suffered a big data problem of my own: with 25 speakers, plus panels and breakout sessions, how to pull out any meaningful summary? So instead, I'll just pick on one issue that was the subject of much discussion: what exactly is new about big data? In particular, there has been much interest in what we called 'massive' data over the last years (as captured by MADALGO, the center for massive data algorithmics and the accompanying MASSIVE workshop). Is Big Data just a rebranding of Massive Data? Which is bigger, Big Data, Massive Data, or Very Large Data ?
What came out from our discussions is that we believe that Big Data is qualitatively different from the challenges that have come before. The major change is the increasing heterogeneity of data that is being produced. Previously, we might have considered data that was represented in a simple form, as a very high-dimensional vector, over which we want to capture key properties. While Big Data does include such problems, it also includes many other types of data: database tables, text data, graph data, social network data, GPS trajectories, and more. Big Data problems can consist of a mixture of examples of such data, not all of which are individually "big", but making sense of which represents a big problem. Addressing these challenges requires not only algorithmics and systems, but machine learning, statistics and data mining insights.
This is by no means a new observation: the axes of Volume, Velocity and Variety have been widely discussed before. But it did remind me that we should not get too hung up on how big our big data is, since size is not the sole defining characteristic of handling big data (although it is a necessary characteristic).
Many other familiar topics came up in discussions, such as: how can big data sets be made more accessible to researchers? how can privacy concerns with big data be addressed? what are the right models for big data, for both the data and the computation? what are the key techniques for working with big data? do we even know enough to teach a course on big data?
What struck me about these discussions is that not only were we far from reaching a consensus on any point, but also no one seemed to think they had any strong candidate solutions. Consequently, I left assured that Big Data is much more than a rebranding of existing work. It represents a significant research agenda that will challenge us for years to come.
There seem to be no shortage of Big Data events at the moment, reflecting some of the enthusiasm around this topic. Last week, there were meetings at NIST and at Duke and there are upcoming events at Stanford and SAMSI to name but a few. (Ed: also the new Simons Foundation program on big data)
I attended the Duke event, and was hoping to blog about the meeting, but suffered a big data problem of my own: with 25 speakers, plus panels and breakout sessions, how to pull out any meaningful summary? So instead, I'll just pick on one issue that was the subject of much discussion: what exactly is new about big data? In particular, there has been much interest in what we called 'massive' data over the last years (as captured by MADALGO, the center for massive data algorithmics and the accompanying MASSIVE workshop). Is Big Data just a rebranding of Massive Data? Which is bigger, Big Data, Massive Data, or Very Large Data ?
What came out from our discussions is that we believe that Big Data is qualitatively different from the challenges that have come before. The major change is the increasing heterogeneity of data that is being produced. Previously, we might have considered data that was represented in a simple form, as a very high-dimensional vector, over which we want to capture key properties. While Big Data does include such problems, it also includes many other types of data: database tables, text data, graph data, social network data, GPS trajectories, and more. Big Data problems can consist of a mixture of examples of such data, not all of which are individually "big", but making sense of which represents a big problem. Addressing these challenges requires not only algorithmics and systems, but machine learning, statistics and data mining insights.
This is by no means a new observation: the axes of Volume, Velocity and Variety have been widely discussed before. But it did remind me that we should not get too hung up on how big our big data is, since size is not the sole defining characteristic of handling big data (although it is a necessary characteristic).
Many other familiar topics came up in discussions, such as: how can big data sets be made more accessible to researchers? how can privacy concerns with big data be addressed? what are the right models for big data, for both the data and the computation? what are the key techniques for working with big data? do we even know enough to teach a course on big data?
What struck me about these discussions is that not only were we far from reaching a consensus on any point, but also no one seemed to think they had any strong candidate solutions. Consequently, I left assured that Big Data is much more than a rebranding of existing work. It represents a significant research agenda that will challenge us for years to come.
Friday, June 08, 2012
Geometry in the Field: Abstracts are up !
As Jeff mentioned, we're organizing a workshop at SoCG called "8F-Computational Geometry" that looks at the past and future of the impact computational geometry has had "in the field". It was quite difficult to narrow things down to 8 speakers, and even harder to list only four areas where CG has already had impact.
The abstracts for the talks are now up at the workshop page. We hope to see you there !
The abstracts for the talks are now up at the workshop page. We hope to see you there !
Wednesday, June 06, 2012
Mihai Patrascu, R.I.P
I just got word from Mikkel Thorup that Mihai Patrascu passed away. I had heard that he was ill, and only realized the seriousness of his condition when I saw him at STOC 2012. He came in a wheelchair to be honored for getting a Presburger award, and the entire room gave him a standing ovation.
I can only hope that he's out there proving lower bounds in the sky, and arguing fiercely with the clouds while he does so.
Update: There is now a memorial page for Mihai at mipmemorial.blogspot.com. Please visit there to post your comments, and have a drink for Mihai !
I can only hope that he's out there proving lower bounds in the sky, and arguing fiercely with the clouds while he does so.
Update: There is now a memorial page for Mihai at mipmemorial.blogspot.com. Please visit there to post your comments, and have a drink for Mihai !
Friday, June 01, 2012
Minimum language complexity needed to construct exponential time algorithm
(while this could have been a question on cstheory, I think it's a little too vague - but it's perfect for a blog post!)
I was at dinner with some faculty colleagues (assistant professor beer night - best invention known to mankind!) and we were in the traditional "all our students are terrible" phase of the evening *. The discussion turned to "what programming concepts do students understand after an intro-CS class", which led to the following claim:
Of course one might argue that a fixed point operator does give you recursion (indeed the Y-combinator is often referred to as anonymous recursion). So my question really is:
Is there a well defined notion of "programming languages without recursion or recursion-like substances" and if so, what is the most expensive algorithm that can be expressed in such a language ?
p.s BDDs seem like an obvious candidate...
1 (other regularly discussed phases include "I can't believe < latest cryptic thing > senior faculty member said", "How many frequent flyer miles do you have" and "chickens!". Beer night is awesome !↩
I was at dinner with some faculty colleagues (assistant professor beer night - best invention known to mankind!) and we were in the traditional "all our students are terrible" phase of the evening *. The discussion turned to "what programming concepts do students understand after an intro-CS class", which led to the following claim:
"you can't write an exponential time algorithm without knowing recursion"While this seemed plausible, I was skeptical. Thankfully my PL colleague (and super blogger) Matt Might was at the table, and he pointed out that the lambda calculus formally has no recursive constructs and yet is Turing-complete, via the magic of the Y-combinator (which essentially computes fixed points directly).
Of course one might argue that a fixed point operator does give you recursion (indeed the Y-combinator is often referred to as anonymous recursion). So my question really is:
Is there a well defined notion of "programming languages without recursion or recursion-like substances" and if so, what is the most expensive algorithm that can be expressed in such a language ?
p.s BDDs seem like an obvious candidate...
1 (other regularly discussed phases include "I can't believe < latest cryptic thing > senior faculty member said", "How many frequent flyer miles do you have" and "chickens!". Beer night is awesome !↩
Extracurricular events at STOC
I didn't do any STOC blogging this time, but I do want to say something about the extracurricular events.
For a while now, people (including me) have been clamoring for an expansion of the traditional three-days-of-talks format at theory conferences, and now STOC and FOCS (and SoCG soon!) are doing workshops, tutorials and poster sessions on a regular basis. Chandra Chekuri, Sergei Vassilvitskii and I organized the poster session at STOC this year, and Avrim Blum and Muthu (Muthu!) were in charge of workshops.
How did they go ? The workshops, the day before the conference, looked extremely successful. Michael Kearns' morning tutorial on computational finance was standing room only in a large auditorium - I don't know the exact numbers but there were easily more than 150 people in the room. The workshops went pretty well as well: I spent much of my time at the distributed streaming workshop, and that was also standing room only, with probably at least 60 people in the room at all times.
I'm really glad that we had these events, and even happier that FOCS 2012 plans to continue the event. Speaking of which, Avrim is looking for workshop proposals for FOCS 2012, and specifically asked me to suggest that geometry folks suggest a workshop. The deadline is June 20, and all you need is a 2-page proposal and 2-3 people who'd organize it.
I'm of course biased about the poster session, but I do think it went off quite well, fueled by Howard Karloff's brilliant idea to provide ice cream sandwiches as refreshments during the event. There were 30 posters in all, and a ton of discussion. It was really nice to walk around looking at the posters and seeing the level of engagement. I polled a number of presenters and attendees, and they all seemed to enjoy the session.
The only complaint was that the poster session was too short, and that we should have left the posters up for people to browse while taking breaks. I think this is an excellent idea that the next posters committee (at FOCS 2012) should take into account.
For a while now, people (including me) have been clamoring for an expansion of the traditional three-days-of-talks format at theory conferences, and now STOC and FOCS (and SoCG soon!) are doing workshops, tutorials and poster sessions on a regular basis. Chandra Chekuri, Sergei Vassilvitskii and I organized the poster session at STOC this year, and Avrim Blum and Muthu (Muthu!) were in charge of workshops.
Workshops
I'm really glad that we had these events, and even happier that FOCS 2012 plans to continue the event. Speaking of which, Avrim is looking for workshop proposals for FOCS 2012, and specifically asked me to suggest that geometry folks suggest a workshop. The deadline is June 20, and all you need is a 2-page proposal and 2-3 people who'd organize it.
Posters
The only complaint was that the poster session was too short, and that we should have left the posters up for people to browse while taking breaks. I think this is an excellent idea that the next posters committee (at FOCS 2012) should take into account.
The business meeting
We had an incredibly long business meeting - even our most rambunctious SODA meetings haven't gone on this long (almost 3 hours). For those not yet in the know, STOC 2012 is going to a two-tier format. Joan Feigenbaum is the chair, and the "must-not-be-called-senior"-PC will have 9 people on it. Their primary role will be managing the review process with the "certainly-not-junior"-PC consisting of 70-80 people who will review no more than 10 papers each, AND will be allowed to submit papers.
This is a major change for a theory conference, albeit one that was brought up for discussion at SODA 2012. I'm particularly curious to see how the whole "letting PC members submit papers" goes. Before everyone starts freaking out, it's worth pointing out that all this is effectively doing is bringing the unofficial reviewers into the fold, thereby giving them a chance to do reviews in context of the entire pool, rather than making isolated decisions. Since the unofficial reviewers were mostly drawn from the author pool, this is not too different from how it's always been. I'm hoping that the reduced load per reviewer will bring in better reviews and more consistency in decisions.
All in all, a time of experimentation - workshops, posters, tutorials, two-tier PCs - the theory community moves slowly, but when it does move, things happen quickly :)
Subscribe to:
Posts (Atom)