The Geomblog

Tuesday, November 24, 2009

Locating SODA outside the US

David Johnson forwards a note from the SODA Steering Committee about the possibility of having SODA outside the US. Summary: it can be done, with the kind of bid information that accompanies bids to other conferences with independent local arrangement. What follows is the note:

To those who might be interested in having a future SODA conference take place outside North America:

As you may know, local arrangements for SODA have to date always been handled by SIAM. The conference Department at SIAM has wide experience in organizing conferences in North America, but typically relies on help from local organizers when a conference they sponsor takes place elsewhere.

This presented difficulties when, at last January's SODA, the Business Meeting voted a preference for holding the 2011 SODA in Paris. SIAM and the SODA steering committee were unable to find any French organization that was prepared to handle the meeting in Paris. One organization might have been willing to host the meeting in the Parisian suburbs, but we judged that most of the voters for Paris would not have voted for "suburbs of Paris".

(No hotel was available in the second vote-getter, the Virgin Islands, and so SODA will be back at San Francisco, the 3rd-place choice, in 2011, January 22-25.)

In light of the experience in finding a location for the 2011 SODA, we are going to change the ground rules a bit for the 2012 site selection. Proposals of sites outside of North America will still be welcome, but to be considered, they must include the details of what group will handle local arrangements, where precisely the conference will be held, and what the expected hotel costs will be. In other words, the kind of information typically provided by groups proposing to host STOC or FOCS. (Proposals from groups that can collect registrations and take financial responsibility are preferred because of the added costs of SIAM providing these services in a country outside North America.)

If you are interested in making such a proposal at the SODA 2009 Business Meeting, please contact Kirsten Wilden (wilden@siam.org) for more details about what kinds of information SIAM will need about your proposal.

There has also been some thought that costs for SODA in North America could be reduced if local arrangements were placed in the hands of local volunteers, as they are for FOCS and STOC. If anyone wishes to propose to host SODA 2012 in this way, their detailed proposals will also be considered at the Business Meeting. (Again, contact Kirsten Wilden at SIAM in advance.)

And of course, proposals for sites that the SIAM conference staff can handle on their own can be made as usual.

Hope to see you all at SODA 2009 in Austin, January 16-19.

David Johnson, SODA Steering Committee Chair.

Sunday, November 08, 2009

Anathem, and mathematical stereotypes

Neal Stephenson is (or should be !) a familiar figure in the sci-fi/speculative fiction landscape: his Cryptonomicon is a retelling of the story of Turing, along with much modern day drama involving advanced crypto and security. His new book, Anathem, came out with much fanfare, and is a vast tale set in a place where mathematics is a pursuit conducted in monastery-like places with strong religious overtones.

I'm reading Anathem right now, and am only at the beginning (so no spoilers please), but there's already a beautifully rendered discourse on stereotypes of the mathematican (or scientist in general). Inside the 'concent' (the monastery), these are termed 'Iconographies', as formal templates by which to understand how the "saecular" world perceives the mathematicians. I was reminded of this when writing I was writing the post on Soviet-style mathematics and realized the stereotypes at the heart of the referenced WSJ article.

So here are some of the iconographies discussed early on (you get one point for each one that you've encountered in real life):

Tennestrian: seemingly clownish eccentric figures that have a darker side, luring impressionables and innocents into folly (a play on the story of Socrates)
Doxan: brilliant, unemotional and cold, but as a consequence subordinate to passionate leaders with true human feeling. (can you say 'Spock'?)
Yorran: Criminally insane mad scientist shut up in a lab with plans to conquer the world.
Rhetorian: Sinister conspiracy plotters, out to take over the world by planting minions out in the secular world to attain positions of power.
Muncostran: Eccentric, lovable, dishevelled theoreticians, absent-minded and good-natured (the subtext being: ignorable)
Pendarthan: High-strung nervous meddling know-it-alls, unable to understand the realities of life, always subordinate to more 'masculine' (his words, not mine) secular folk.
Klevan: Awesomely wise elder statesman who can solve all the problems of the world (apart from Einstein, I don't know of anyone who achieves such stature in our world)
Baudan: Cynical frauds living in luxury at the expense of the common man (this sounds like the viewpoint of letter-writers to the Salt Lake Tribune)
Penthabrian: keepers of mystical secrets handed down from above, with 'theory' as a smokescreen used to fool the lay folk (if only...)
Moshianic: A combination of Klevan and Penthabrian - viewed as the most dangerous iconograph because of the high expectations placed on the theorist shoulders.

It's a neat trick to identify the ways in which the outside world perceives the 'theors', as they are called, and in doing so understand where the outsider is coming from, and what kind of threat they pose. I suspect I'm going to start classifying people in "the real world" the same way when I describe what I do.

Saturday, November 07, 2009

Soviet-style mathematics

Via Anand Kulkarni (aka polybot) comes an interesting article in the WSJ by Masha Gessen on Grigori Perelman, Soviet-era mathematics and the question of 'big math'. The premise of the article (Masha Gessen has a book out on Perelman and the Poincare conjecture) is that special environments are needed to prove big results, and the Soviet-era mathematical enclaves fostered this environment both because of, and inspite of the Soviet political system.

It is indeed true that amazing work came out of the isolated confines of Soviet mathematical institutes, often parallel to or well before similar work in the Western world. There's a joke that goes around theoryCS circles that for every theorem proved before the 80s in the west, there's an equivalent result proved 10 years earlier by a Russian mathematician. We need look no further than the Cook-Levin theorem, the Koebe-Andreev-Thurston theorem (on circle packings), Kolmogorov-Chaitin-Solomonoff complexity (and according to some, the Cauchy-BUNYAKOVSKY-Schwarz inequality, though this is disputed).

But in the article is a more thought-provoking claim:

The flow is probably unstoppable by now: A promising graduate student in Moscow or St. Petersburg, unable to find a suitable academic adviser at home, is most likely to follow the trail to the U.S.

But the math culture they find in America, while less back-stabbing than that of the Soviet math establishment, is far from the meritocratic ideal that Russia's unofficial math world had taught them to expect. American math culture has intellectual rigor but also suffers from allegations of favoritism, small-time competitiveness, occasional plagiarism scandals, as well as the usual tenure battles, funding pressures and administrative chores that characterize American academic life. This culture offers the kinds of opportunities for professional communication that a Soviet mathematician could hardly have dreamed of, but it doesn't foster the sort of luxurious, timeless creative work that was typical of the Soviet math counterculture.

For example, the American model may not be able to produce a breakthrough like the proof of the Poincaré Conjecture, carried out by the St. Petersburg mathematician Grigory Perelman.

This is a reflection of one of the enduring myths of mathematical research, "a mathematician would be happy in jail if they had paper and pen", with a bit of the 'a mathematician is a solitary (and slightly crazy) genius'. I can see the allure in the idea: mathematics requires great concentration, and removal of distractions would surely make it easier to focus on a big problem.

But is this really impossible to achieve in the Western model of research ? After all, even Perelman's work built heavily on a program first outlined by Richard Hamilton from Columbia. Andrew Wiles proved Fermat's theorem while at Princeton. Ketan Mulmuley has been banging away at P vs NP while shuttling between Chicago and IIT Bombay (yes, I know it's not a perfect comparison because it hasn't been resolved yet). Stephen Cook proved that SAT is NP-Complete while at Toronto. And so on and so forth.

Possibly one argument in favor of the 'isolation: good' theory is that Perelman didn't need to prove himself for 6-7 years, maintain a steady stream of funding, and teach lots of classes in order to "earn" the right to study such a hard problem. It's hard to imagine a researcher in the US being able to do this before they get some kind of job security (tenure, or otherwise).

Monday, November 02, 2009

Innovation in Computer Science

As the polylogblogdogslogglog blog points out, the ICS results are out. 39 papers were accepted in all - at some point I knew the number of submissions, but I've forgotten since.

The ICS folks didn't make life easy for themselves by explicitly stating that they wanted "conceptual contributions". But looking over the list of papers, a few things come to mind:

It's a great list of papers. Nothing to complain about really, and any of these could have been a credible paper at FOCS/STOC
The Arora et al paper on designing derivatives using NP-hard problems has already received so much chatter, one might argue that the conference mandate has already been satisfied. Similarly for the quantum money followup.
Even if the whole 'conceptual contributions' thing doesn't pan out, I see no harm in having a third conference inserted between FOCS and STOC - the more the merrier.
I guess innovation = "game theory + crypto + quantum + misc" :)

A side note. 8 of the 12 PC members have papers at the conference. This is a departure from the theory "norm". While I don't necessarily think it's a problem (everyone except theoryCS does this, and this is also a new conference), it's worth discussing. Is this something we'd like to have happen in other theory conferences as well ? Recall that the first year of the SODA 2-page experiment also had PC-submitted papers (and that was mainly to ensure submission volume, from what I recall).

Update: Shiva Kintali has PDFs for the accepted papers.

Sunday, November 01, 2009

What is computational topology ?

Jeff's post on his computational topology class (and the wonderful class outline), prompted me to write about something that I've tried to explain to people before.

Computational topology is arguably the hottest thing in SoCG-land right now, and has been so for a number of years (for curious folk, the "other" hot topic is high dimensional approximate geometry). But if you ask different people, you'll get different answers to the question "what is computational topology ?". I've even had to explain to local students why my computational geometry class is different from the computational topology class being offered. So here goes:

CT I: Using topological ideas to understand the 'shape' of data
This is the version of CT that has taken up the largest fraction of CT mindshare in the SoCG land. Dating back to work by Edelsbrunner and Mucke on alpha-shapes, the field now encompasses a range of ideas for extracting topological structure from data. New topological constructs that have come from this area include various notions of persistence, as well as related work on reconstruction, data smoothing, and visualization.

Afra Zomorodian's thesis (now a book) is a nice introduction to these concepts for a CS audience. Herbert Edelsbrunner is coming out with a book on this topic very soon (Jan 16, 2010! Mark your amazons!), and Valerio Pascucci teaches a class on computational topology at the U.

CT II: Asking computational questions about fundamental topological objects and concepts.
This is closest to the 'Computational X' flavor of work, where a computational perspective is brought to the study of X. There are many interesting problems here, and they have a nice discrete flavor that makes them 'friendly' for theoryCS folk. For example, computing the Betti numbers of a simplicial complex efficiently, or finding homotopy equivalent paths on a surface, or lots of work on graphs on surfaces.

I don't think there's a single book on this topic. JeffE has put together a fantastic course outline (with reading material), and there's also a great book on graphs on surfaces by Mohar and Thomasson. It's worth noting that some of the deeper tools in the Robertson-Seymour graph minor results take us into graphs-on-surfaces-of-large-genus land.

CT III: Using topological ideas in the heart of complexity theory.
Almost no one I know uses 'computational topology' in this sense, and there isn't a coherent and connected body of work to talk about as such. But there are some fascinating results at the core of complexity theory that rely on topological constructions. There's the algebraic complexity work of Yao/Steele/Ben-Or and others, showing that lower bounds for algebraic complexity of certain problems can be related to the sum of Betti numbers of associated surfaces. There's the Kahn-Saks-Sturtevant (and Chakrabarti-Khot-Shi) work on evasiveness of graph properties, and then the work (that I don't quite understand) on topology in distributed computing (that got Herlihy, Saks, Shavit and Zahoroglou the Godel Prize)

This is one of the reasons I think a course in topology (of some flavor, whether it be combinatorial, algebraic or point-set) should be required mathematical background for all theoryCS students.

Wednesday, October 28, 2009

Clustering interlude: Time series

In the absence of an actual post on clustering, I encourage all of you to go and read Sorelle's (or should I say ***SORELLE***'s) post on time series clustering.

Monday, October 19, 2009

"Wave"ing...

(while I wait for actual inspiration to strike)

I just acquired an invite to Google Wave (thanks to John Moeller), and have been finding it quite intriguing (note: you can't "get" Google Wave unless you're invited, and no, I don't have any invites to give out).

Google Wave is a mysterious combination of email, chat, wikis and browser extensions that's hard to describe - you can read descriptions of it all over the web, but unless you are able to get in and start playing, it's like trying to buy wine based on text descriptions of the taste.

So far I've used it to:

discuss an outline for next semester's seminar with my students (Topics in Graph Algorithms, for those curious)
Start working with a collaborator on an outline for a tutorial we want to do
set up administrative struts for our research group svn server (I've started using svn for writing papers - it's an amazing experience - more on that some other time)

By far the coolest feature of Wave for me is that you can include a LaTeX extension into any "wave"or conversation, and it automatically converts things between $$...$$ marks into latex, which makes me hopeful that it will be useful for more substantive discussions (since I often wish I had such capabilities in Skype/Google chat)

Although the preview is painfully slow, and is crippled in various ways, the potential is clearly there, and as more people get on it, it will only start getting more effective. I'm looking forward to it.

Monday, October 12, 2009

On the virtue of NOT working on a problem

Semester has hit with a vengeance, and while I've been busy (among other things) with this and this, my clustering series has gone on temporary hiatus, hopefully to return shortly.

In all the pages and pages of advice given to grad students, postdocs, and starting faculty, I think one item tends to get left by the wayside, or at least is not explicitly stated.

You always underestimate the time spent managing a project from start to finish.

What I mean is this: problems (at least in theoryCS) are easy to state, and fun to work on. Sometimes they take a while to crack, and sometimes they give up their secrets easily. But the time you spend on any given project is much more than the actual time spent thinking about it. There's

Writing up the first few drafts
Iterating to get a polished submission version
(...this step repeats until the paper is accepted)
Preparing the final (often very cramped) version
Making slides for the talk/talks you'll be giving
Preparing a full/journal/arxiv version, which often involves simplifying, rewriting, reproving, adding new references, etc etc.
Submitting to a journal, and waiting endlessly for updates on its status.
Addressing reviewer concerns, and resubmitting
And finally, getting it into print.

The few-months thrill of actually thinking about the problem and solving it ends up being a multi-year odyssey filled with many fallow periods punctuated by bursts of activity.

It's not so much the time involved - papers tend to time-multiplex quite well so you're usually in different phases of the above sequence for different papers.

It's more a matter of motivation. I don't think I'm the only person who feels this, but once I have some nice results, and especially if there isn't follow-on work to be done, I get bored with a paper. Having to deal with it for months and months afterwards is then as excruciating as killing off zombies that keep coming back (not to mention what happens if it keeps getting rejected).

So be careful when you choose a project: make sure it can last through at least a few papers, or you'll be spending a lot of time cursing yourself for the time you spend.

Sunday, September 27, 2009

Rajeev Motwani Memorial

I just returned from the technical workshop and memorial in honor of Rajeev Motwani. The workshop was excellently run by Ashish Goel, and designed (extremely well, I thought) as a mix of retrospective and technical content, with one speaker presenting a brief retrospective and introduction, and a technical speaker laying out a body of work.

The topics were varied: they started with Sanjeev Khanna discussing matchings in random graphs (Rajeev's thesis work), and going onto brand new results in this area: for example, an expected time O(n log n) bound for matchings in d-regular bipartite graphs. Piotr Indyk talked about locality-sensitive hashing, David Karger talked about randomized min cuts, Aneesh Sharma discussed monetization problems in social networks, and Sudipto Guha concluded with a overview of data streams.

The talks were surprisingly technical: if I closed my eyes, I could have easily imagined being in a SODA conference room. The only difference was that people were actually paying attention, as opposed to clicking away on laptops (or tweeting!). It was a large crowd: over 100 people, by my casual count.

There were many retrospectives, given by Dick Karp, Jeff Ullman, Chandra Chekuri, and Ron Conway. Chandra spoke in particular about the experience of being Rajeev's student, and as a former student myself, his words touched me the most. He talked with feeling, compassion and honesty, and drew a compelling picture of a man that we began to know all over again.

There was a beautiful memorial service in the Stanford Church, with words in English and Sanskrit, a hauntingly beautiful hymn from the Vedas sung by Rajeev's elder daughter, and testimonials from colleagues and friends old and new. Don Knuth was the organist for the entire ceremoney, and played pieces you didn't think could be played on a church organ. After the service, and a reception, there was a concert by one of Rajeev's favorite bands, Indian Ocean. They played amazing music, and I'm downloading their songs as we speak, but that's a tale for another time.

It was good to go back and meet people who I knew so well for a brief period of time, and then lost touch with. Many (if not all) of Rajeev's former students were there, and there were many others who cohabited the Gates Building along with me. All of us older, a little grayer, but still recognizable :). Spread-apart families often only get together at weddings or at funerals, and this was one of those occasions where it was great to see everyone, but as we all kept murmuring "unfortunately it had to happen like this".

If I had to describe the feeling that dominated my thinking that day, it was a sense of being robbed. Upon hearing testimonial after testimonial, anecdote after anecdote, listening to this divine rock group that Rajeev listened to and loved, I could only wonder at the many sides of this person whom I knew so little of. I wished I had known more about him: that our interactions had been more multidimensional than that of advisor and student, and that I (and my fellow students at the time) had seen more of the ebullience and vivacity that others spoke so vividly of.

By the end, a new picture began to emerge, of a 'hub', a 'connector' and a 'facilitator', someone who had the clarity to know what people really needed to succeed, and the self-effacement to stand back and make it happen, by connecting people together. He helped legions, and legions came to bid him farewell.

It therefore seems oddly fitting that his career in research started with studying random matchings, and ended with new explorations of social networks. His life, one might think, has always been about creating, analyzing and enriching connections.

Thursday, September 24, 2009

Recent items

Deadlines are keeping me busy, and away from blogging.

Speaking of which, the Fall Workshop on Computational Geometry has a submission deadline this Friday. The Fall workshop is a "true" workshop, in that you go, give a talk on whatever you're working on, and there's no messing around with proceedings, publications and things like that. It's a pure meet-and-chat kind of venue, which means the pressure is low, and the visibility is quite high (many of the east-coast geometry folks show up).

This year it's in Tufts, so the location is even better. So get those 2-page abstracts in !

In other news, FOCS registration deadline is looming. Bill mentions a workshop to celebrate 50 years of FOCS (which dates back to when it was the conference on switching and circuits (!)) and 20 years of the GATech ACO program.

The workshop is NOT like the Fall workshop :). It's a series of invited talks by a star-studded cast: KARP !! YANNAKAKIS !! ALON !! BLUM !! All together, for one brief engagement !!

Sounds like a great idea: the kind of thing we probably need to do more of to get more folks to the conference.

Wednesday, September 16, 2009

Memorial Workshop for Rajeev Motwani: Update

I had mentioned a memorial workshop for Rajeev Motwani to be held at Stanford Friday Sep 25. Registration is now open for the workshop and the memorial service to follow.

Registration is free, but mandatory. So if you plan on attending either the workshop or the service or both, make sure you register.

Beamer + Ipe + views...

At this point, mostly everyone is aware of how to use beamer to make presentations in LaTeX. However, many fewer people (mostly only geometry folk) are aware of Ipe, a kind of next-generation xfig.

You may stop reading right now if

you always use powerpoint for slides OR
you rarely have to use LaTeX in slides OR
you really hate non-WYSIWYG presentation software

Still here ? ok...

I wanted to share a workflow tip that I found quite useful when making slides with step-through animations. Suppose you have a sequence of slides in a presentation that unfold a figure step by step, or animate some algorithm execution etc. Ipe, coupled with a few LaTeX commands, provides a really nifty way of rendering the animation without jumps, misalignments, or misdrawings.

Ipe (and many other drawing programs) has the notion of a layer. More powerfully, Ipe also has the notion of a 'view', which you can think of as a (sub)set of layers. For example, if you have a drawing with layers 1,2,3,4,5, then view 1 might consist of {1,2,3}, and view 2 might consist of {1,2,5}, and so on.

What this means is that when you want to do step-animation, it's really easy. Each time you move to a new step, you create a new view in Ipe (which also usually creates a new layer), and you can select whichever subset of the current set of layers you want to render, as well as drawing new ones.

Ipe stores all the views as separate pages in a PDF file, so your final animation consists of a multi-page PDF file. And now comes the cool part.

Suppose you want to include this sequence of views in a beamer slide, with each view appearing after the next in response to a mouse click. You need two things:

pdftk (which comes standard in most linux installations), which allows you to split a PDF file into multiple files (one per page), with any format for the filename that you specify. For example, I have a command called 'splitpdf' that does this:
pdftk $1.pdf burst output $1-%d.pdf

which takes a file name.pdf and splits it into name-1.pdf, name-2.pdf and so on.

Next, you need the (standard) LaTeX package 'xmpmulti' which gives you the command \multiinclude. It allows you to include multiple figures that share a common prefix. So for example, to include all the figures created by the previous pdftk command, you merely write
\multiinclude[start=1,format=pdf]{name}
.The 'start=1' starts counting from 1 instead of 0, and you can also specify an end-counter.

But the best part is when you instead use

\multiinclude[<+>][start=1,format=pdf]{name}

Now, the files are included with an automatic 'replace each by the next' mode (<+> is standard beamer notation for this). At this point, you have a sequence of animations ready to go. In fact, when I give lectures, I have a number of slides that just look like this:

\begin{frame}
\frametitle{Title}
\mi{name}
\end{frame}

where \mi is a macro I defined for the above multiinclude. Ipe does all the layering and viewing work for me, and multiinclude takes care of the rest. This has made making complicated animations really simple and fast.

p.s if you're still wondering why one should use Ipe instead of xfig, the LaTeX integration in Ipe is superb. No nonsense with special flags, and pstex_t and craziness like that. You get WYSIWYG LaTeX inside Ipe, you can use whatever macros you have in your text, and the various nifty alignment tools make an Ipe drawing look really clean.

Friday, September 11, 2009

Memorial Workshop for Rajeev Motwani

Plans have been in the offing for a memorial workshop for Rajeev Motwani, and now the details are available. Below is the announcement (sent by Ashish Goel):

Dear friends,

As you might know, our dear friend and colleague, Rajeev Motwani, passed away in a tragic accident on June 5, 2009. We are holding a technical workshop titled

Randomized Algorithms: Theory and Applications

at Stanford University on Sep 25th, 2009, from 10am - 2:30pm to honor Rajeev's research in the area of algorithms and their applications. If you did not know Rajeev's research, please see http://www.stanford.edu/~ashishg/cgi-bin/RememberingRajeev/ for a brief introduction.

The workshop will take place at the Bechtel Conference Center in Encina Hall. Workshop attendees are also invited to a memorial celebration starting at 4:00pm at the Stanford Memorial church, followed by a performance by one of Rajeev Motwani's favorite bands, Indian Ocean. The registration URL, web-page information, parking directions, and talk titles will follow in a later email.

Registration will be free, but mandatory. Please feel free to bring this to the attention of any colleagues/students who you think might wish to attend, and send me an email if you have any questions.

Workshop program:

10 - 10:45 am
Welcome remarks
Retrospective by Richard Karp, UC Berkeley
Technical talk by Sanjeev Khanna, University of Pennsylvania

10:45 - 11:30 am
Retrospective by Jeff Ullman, Stanford University
Technical talk by Piotr Indyk, Massachusetts Institute of Technology

11:30 am - 12:15 pm
Retrospective by Chandra Chekuri, University of Urbana-Champaign
Technical talk by David Karger, Massachusetts Institute of Technology

12:15 - 1:30 pm
Lunch for registered attendees

1:30 - 2:30 pm
Retrospective by Ron Conway, Venture Capitalist
Technical talk by Sudipto Guha, University of Pennsylvania
Technical talk by Aleksandra Korolova, Stanford University

The Scientific committee for the workshop consisted of:
Moses Charikar
Ashish Goel
Richard Karp
Prabhakar Raghavan
Tim Roughgarden

I'll update this post with the registration URL when it becomes available.

Wednesday, September 09, 2009

Geometry at ESA (guest post)

(ed. note: Jeff Phillips is a researcher in computational geometry, data analysis and statistics. He's also one of the newly minted CIFellows, and will be working with me starting in a week. He kindly agreed to file this report from ESA.)

ESA (along with ICALP) is one of the top two European conferences for Theory A. And probably since the ICALP deadline is typically between the SoCG submission and notification, ESA is the home of many of the best computational geometry talks. This year was no exception, with two "geometry" sessions, as well as many other geometry talks mixed in other sessions.

Another interesting pattern at ESA which makes it different to other algorithms conferences is the juxtaposition of theoretical and experimental work. The two tracks (A: Design and Analysis, B: Engineering and Applications) have separate submissions, but are not treated separately in the program. This leads to the fun game of trying to determine whether a talk was from track A or track B. Sometimes you are surprised when a talk starts with a nice theoretical results, and then the speaker presents several experiments to show the algorithm works well in practice, or mentions that it has already been added to CGAL.

I think this a great practice and should be considered by other conferences such as SODA or SoCG. For instance ALENEX talks could be mixed in with SODA talks. This would encourage more algorithms people to actually implement their algorithms, because it would allow them to apply to a separate track that would give more value to practical algorithms. How would this not be a positive for our community and its perception from other parts of computer science!

A couple of related data points: There were 56 track A papers and 10 track B papers, and both were accepted at a rate of about 25%.

The invited talks followed the theme of theory and practice as well. Michael Mitzenmacher began with a very clear talk explaining many open problems in Cuckoo hashing. He has been working on small variations in the algorithm that lead to large differences in performance, and then has been going to great lengths to explain why these variations make such a large difference.

Erik Demaine gave a very entertaining talk describing how a certain line of his work has alternated between art and the theory behind the art. He also performed a couple magic tricks, reminding us that most of the best magic requires lots of research, and sometimes algorithms!

On the third day, Noam Nisan presented a great high level view of how Google's TV ad auctions work. Much of his talk served to point out all of the important details often ignored in theoretical analysis, but critical in implementing such a system effectively.

I'd like to note a few papers I found interesting. This is not by any means an exclusive list, but just enough to give a taste of the (geometric) results.

Manuel Caroli and Monique Teillaud. Computing 3D Periodic Triangulations.

They presented a set of rules and an algorithm for computing triangulations in 3D which are periodic, that is they tesselate to fill an infinite space, but can be defined as a complex within a unit cube. These triangulations are needed for many simulations where it is difficult to deal with boundary conditions, so these can be avoided, but effectively having no boundary. Despite, the usefulness of these triangulations, they had not really been properly formally defined until this paper. Furthermore, their algorithm has been implemented and will soon me in CGAL, if not already.

Mohammad Ali Abam, Mark de Berg, Mohammad Farshi, Joachim Gudmundsson and Michiel Smid. Geometric Spanners for Weighted Point Sets.

They study Euclidean a size n point sets with weights so that a weighted distance between two points p and q is described
d_w(p,q) = w(p) + d(p,q) + w(q)
where d(.,.) is the Euclidean distance and w(.) is the weight of a point. This may, for instance, represent a potential road network where it takes w(p) time to get to the center of a city from its outskirts. This paper shows that typical spanner-type results can be found for a weighted point set using this weighted distance. I.e. in R^d a (5+eps)-spanner can be found with O(n) edges, and (2+eps)-spanner can be found with O(n log n) edges.

Sumit Ganguly and Christian Sohler. d-dimensional Knapsack in the Streaming Model.

This paper studies the d-dimensional knapsack problem, that is given a set of n items with a value v_i and a size in d-dimensions s_{i,1}...s_{i,d} find a set of items I with maximum total value such that for all j \in [1,d] that sum_{i in I} s_{i,j} <= 1. This problem is studied in the streaming model, which I found interesting, because they were unable to store the actual solution, because it might have linear size. Instead, they approximate the optimal cost and present a "template" solution where they give an approximate size of each element in their approximate solution I'.

Bernhard Haeupler, Siddhartha Sen, and Robert E. Tarjan. Rank-Pairing Heaps.

A data structure paper, but for one very useful in geometry, among other places. This paper does not present any new results from a classical theoretical perspective. They describe a heap data structure which has the same running time for all operations as Fibonacci heaps. The contribution is that their data structure is considerably simpler than any variant of Fibonacci heaps, in particular, the structure of the heap never needs to be restructured. As an added benefit, their data structure easily outperforms Fibonacci heaps.

Other notes (mainly from business meeting): See also Michael Mitzenmacher's post.

There were 37 computational geometry submissions, 10 were accepted.
Mark deBerg (next years track A chair), declared that next year the page size will be STRICTLY ENFORCED and he is not opposed to automatically rejecting papers if they play too many games to squeeze more text in the alloted number of pages. Be warned.
next year, ESA 2010 (and ALGO) will be in Liverpool, England, and it was voted for ESA 2011 to be in Saarbrucken. Which won over a bid from Greece. (ed: HOW ON EARTH ? !!!!!!)
Proceedings were not included in registration, but could be bought. On the second day, after politely asking Springer, participants were able to download a single pdf of the proceedings from a couple USB keys. This was very useful!, but it would have been better earlier in the conference.

Things I missed (I had to leave partway through the third day).

- best student paper: Heidi Gebauer. Disproof of the Neighborhood Conjecture with Implications to SAT.
- best paper: Christoph Dürr, Flavio Guiñez and Martín Matamala. Reconstructing 3-colored grids from horizontal and vertical projections is NP-hard.

Attached workshops which are part of ALGO on Thursday and Friday:

Friday, September 04, 2009

SODA 2010

Ironically (I'm on the PC), I'm probably the last one to post about the SODA acceptances (although I did tweet it a while ago)

In case you don't already know, one major change this year is the 20 page limit on final versions, brought about by the all-electronic proceedings. It's worth pointing out here that this is a major change that I don't think ANY conference (theoretical or otherwise) has put into place (I can say this because I had nothing to do with it :)). 20 pages is huge: there's no way you can wiggle out of writing full proofs at this point, and I hope that this trend will continue as more of our conferences go all-electronic.

One caveat: I don't know what the final version format is. It seems unnecessary to go with the butt-ugly Sheridan 2-column format, but we'll see.

On the papers:
I'm not going to comment on the papers, since I reviewed many of them. Suffice it to say that (even though it sounds like a cliche), I was impressed by the quality of the submissions, and there were many good papers that unfortunately couldn't make it. We're in the process of writing decision summaries for the authors: these are intended to capture more of the discussion surrounding a paper. Hopefully, they will give a better sense of what the reviewers liked and disliked about the paper than just the actual reviews.

On the process:
I felt a bit more disconnected from the overall deliberations this time, and maybe others felt this way too. I spent a lot of time on "my pile", but I didn't have a lot of time to look over papers that I wasn't in some way responsible for. Given the number of submissions, this is unavoidable I guess.

I actually think the reason I felt this way was because of the software interface. Instead of the (clunky) SIGACT server interface used in years gone by, we used the snazzy HotCRP interface. In almost every way, this is a vastly superior interface (and I've used EasyChair, the Microsoft CMT, and other packages as well). It feels lightweight, it has great searching and tagging capabilities, and most of the interfaces are one or two clicks from the home page. It cleanly combines email notifications and uploads/downloads with web-based editing, and I'd recommend it very strongly for anyone organizing a conference in the future.

The only feature the SIGACT server had which this doesn't, was a landing page where you got a stream of all comments on all papers. It was a serendipitious way of picking up on discussions not related to papers you were in charge of, and I remember in the past getting interested in a discussion and a paper and actually reading and submitting a review myself. In HotCRP, you land at a page containing your reviews, and it doesn't give you a direct view into the global picture (except maybe for the PC chair).

One amazing feat of social engineering that HotCRP also does: at this landing page, it tells you how many reviews you've put in compared to the average. There was a time during the review process where I'd reload the page every few hours and see myself falling behind the PC average, increasing the pressure on me to submit reviews :).

Sunday, August 30, 2009

Spectral Clustering.

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here)

I don't like you, and I like them, but maybe if we change, we can get along.

Spectral clustering is not clustering: it's actually a metric modification method. It's another way of capturing things that are close versus things that far, using a more global method.

We saw how correlation clustering could be used to capture both the ``I don't like you'' and ``I like them'' aspects of clustering by assigning positive and negative weights to individual element pairs. However, it's not easy to come by such a labelling of pairs. More often than not, all we have is a distance function. We could threshold it, declaring all distances within some parameter r to be positive, and all distances larger to be negative, but this is completely arbitrary and depends on the choice of r.

Let's think of the two-class problem for now: we merely want to divide the points into two clusters. Suppose we were able to guess a function that assigns one label (0) to points in one cluster and another label (1) to points in the other cluster. Then we could write down the cost of the clustering (in a correlation sense) by summing up, over all pairs, the difference

$f_i - f_j$ . In fact, just to make sure it's always positive, we'll square it, and thus will sum up the function $(f_i - f_j)^2$ . We'll also assume a weight function $w_{ij}$ on the edge $ij$ , so that what we're really summing up is $\sum w_{ij}(f_i - f_j)^2$ .

This is where things start to get very interesting. It's not hard to see that this can be written in term of a variant of the graph Laplacian, treating the f values as a vector. If we now allow the f values to be reals, rather than 0 or 1, we get a classical eigenvalue problem on the Laplacian, in effect determining the second smallest eigenvalue of the Laplacian.

Once we do that, the corresponding eigenvector gives us an assignment for f. we can either merely take positive or negative values of f to group the data, or we can use the f values as a feature representation and recluster the data into two groups based on that.

This is the key insight of spectral ``clustering''. We perform a spectral decomposition of the Laplacian, and take the top k eigenvectors. Those vectors give us a k-dimensional feature represntation of the data in an inner product space, and we can now recluster. The real question is: why should this feature representation help us in any way at all ?

Here's my take on what's really going on. We know that the graph Laplacian is a discrete equivalent of the Laplace-Beltrami operator on a manifold, which in turn (via the heat equation) captures the ``flow'' of material in a manifold. Return for a second to our cut-based view of the distance function. In essence, two points are close if there are many ways of getting from one to the other (i.e the min cut between them is large) and they are far if not. This is a flow-based way of thinking about distances, and in effect, what the Laplacian does is map the original data into a new metric space in which distance is based on flow, rather than just on metric distance.

This intuition can be formalized: if we switch now to the stochastic viewpoint, in which we're doing random walks on the graph, then the commute time between two vertices turns out to be precisely the Euclidean distance (squared: thanks, dsivakumar) between the feature vectors constructed from the eigenvalues of the Laplacian.

... take a deep breath...

There are a number of different concepts swirling around here that are worth keeping track of. The cut-based viewpoint of node similarity in graphs lends itself naturally to a Laplacian-based treatment, which in turn leads us to random walks and the heat equation on manifolds. At a deeper level, the Laplacian of a graph, by virtue of how it acts on functions, tells us something very basic about the structure of the distances involved. For more on this particular aspect, you should check out the discussion 'Can you hear the shape of a drum'.

But it's very important to keep in mind that spectral clustering is not so much a clustering technique as a data transformation method. Although k-means is the clustering algorithm of choice once we get down to the feature representation, other methods could be used as well. Also, as relaxations go, this paricular one (relaxing from the hypercube to the reals), isn't that great: the ``integrality gap'' can be quite high. It turns out that this doesn't matter terribly though.

As an aside, the idea of using the graph Laplacian to define a ``stretched'' distance that looks Euclidean is not limited to this problem. The most "famous" version of this idea is the Laplacian eigenmaps work by Belkin and Niyogi in the realm of manifold learning. There's also work by Guibas, Ovsjanikov and Sun that uses similar ideas to create signatures of shapes. The key idea, once again, is that if you're on a manifold of some kind, the Laplacian gives you a handle on the shape of this manifold, as well as how to ``stretch it out'' with local coordinates to make it look flat (and therefore Euclidean). It's a useful trick to keep in your data analysis toolbox.

For further reading, Ulrich von Luxburg has a great survey on spectral clustering.

Saturday, August 08, 2009

Negative-type distances and kernels

This is a story of two constructions that actually end up being essentially the same thing, as I recently discovered.

1. Kernels

The story of kernels in machine learning goes somewhat like this:

Take a positive definite function $$K(x,y)$$ that captures some notion of similarity between objects (a standard example of such a kernel is the Gaussian $$K(x,y) = \exp(-\|x - y\|^2)$$). A positive definite function, btw, is like the generalization of a p.d matrix: the integral $$\int f(x)f(y)k(x,y)dx dy \ge 0$$.

You can then construct a Hilbert space that captures the structure of the kernel. Specifically, for a fixed set of points S, construct a vector space from the basis $$\{k_x | x \in S\}$$, where $$k_x(y) = K(x,y)$$, and then define an inner product of two vectors in this space in the usual way: If $$v_a = \sum a_x k_x$$ and $$v_b = \sum b_x k_x$$, then $$v_a \cdot v_b = \sum a_x b_y K(x,y)$$.

You get nice properties from this construction: the so-called reproducing property is one of them, which tells you that $$K(x,y) = k_x \cdot k_y$$. In other words, we can capture the similarity function K(x,y) by a "standard" inner product in a Hilbert space.

What's even neater is that by invoking Mercer's theorem, you can construct an orthogonal basis, and make sure that the Hilbert space is actually a Euclidean space. The squared Euclidean distance in this space can be written in kernel form, as
\[d_K(x,y) = K(x,x) + K(y,y) - 2K(x,y)\]
which is what you'd expect when treating K(.) as an inner product.

2. Negative-type distance.
The second story comes from Deza and Laurent. When can a distance function be embedded in Euclidean space ? It turns out that it's more convenient to talk about the square of the distance function, which we'll call D.

There's an elegant characterization of when D can be embedded isometrically into $$\ell^2_2$$: this can be done if and only if D satisfies the negative-type inequality:
\[ \sum b_i b_j D(x_i, x_j) \le 0, \sum b_i = 0\]
for all possible assignments to $$b_i$$.

The proof works via a construction called a covariance mapping that takes $D$ to the function $$k_D$$ defined as:
\[ k_D(x,y) = (1/2)(d(x, x_0) + d(y, x_0) - d(x,y)) \]
which differential geometry folks will recognize as the Gromov product.

The proof completes by showing that the negative-type condition on D implies positive definiteness of $$k_D$$, and this in turn means that $$k_D$$ can be expressed as an R-covariance:
\[ k_D(x,y) = \int f_x(\omega)f_y(\omega) d\mu(\omega) \]
for some measure space $\mu$.

Note that the RHS of the equation is an infinite-dimensional inner product.

3. Where the two stories come together
The mapping that takes a kernel to a distance is the inverse of the covariance mapping used to map a distance to a metric. In other words, if we take a kernel K, compute $$d_K$$, and then use the distance to kernel mapping to compute $$k_{d_K}$$, we get back K. Further, since we can show that the negative-type condition on D implies a positive-definite condition on $$k_D$$, we can start off with either "D satisfies negative-type inequalities" or "K is a positive definite kernel" and yield the same conclusion on $$\ell_2^2$$ embeddability.

What's interesting is that the two ways of thinking about this don't seem to have been connected together explicitly before.

p.s This is not a fully rigorous correspondence except possibly in the finite dimensional case, but I find that it's often convenient to replace the negative-type argument with a kernel positive-definiteness argument in my head.

Sunday, August 02, 2009

Correlation Clustering: I don't like you, but I like them...

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here)

Whether it's k-clustering, or any kind of hierarchical clustering, the world we live in is still the world of ``I don't like you'' clustering, where the geometric landscape is defined by distances between points. Consider the following example:

There is an intuitive sense in which the first clustering is not ``real'' and the second one is: the idea of 'well-separatedness'' is a pervasive component of how a good clustering is perceived. But if we only measure distances between points, and only measure the cost of a clustering in terms of how costly each cluster is, we'll never be able to distinguish between these two examples.

What's needed is a way of declaring likes (similarity) as well as dislikes (distance), and then, critically:

penalizing similar items in different clusters AS WELL AS different items in the same cluster.

By that measure, we'd be able to distinguish the first and second clusterings, because in the first case, presumably elements close to each other that lie in different clusters will make the clustering look more expensive. This point is worth reiterating. Unless we have some way of penalizing mis-separations as well as mis-groupings, we'll always be at the mercy of the tradeoff between k and cost.

Continuing the "clustering via self-help metaphors" theme to these essays, I call this the "I don't like you, but I like them" way of modelling data.

Correlation Clustering

This is where correlation clustering enters the picture. The correlation clustering model is as follows: every pair of elements is assigned a 1 or -1, encoding a similarity or dissimilarity. The goal is to find a clustering (note: no k!) in which any pair of points in a cluster is penalized for being dissimilar, and any pair of points in two different clusters is penalized for being similar. This can be generalized to arbitrary weights: for each pair of elements you assign weights w+ and w-, with the possible caveat that w+ + w- = 1.

So now the goal is merely to minimize the cost of a clustering. The elegance of correlation clustering lies in the natural way that clusters merge or split depending on the number of similar or dissimilar pairs. Do note though that you need to have input data that can be written in that manner: thresholding a distance function will give you the desired input, but is an ad hoc way of doing it, since the thresholding is arbitrary.

There are a number of different algorithms for correlation clustering, and also a very simple one that yields a good approximation guarantee: pick a point at random, and pull in all its neighbors (all points similar to it). Repeat this process with a new unpicked point, until all points have been selected. This algorithm gives a 3-approximation for correlation clustering.

Correlation clustering is also useful as a way of combining clusterings. We'll talk about this later (ed: how many times have I said that !), but the problem of conensus clustering is to ``cluster the clusterings'', or aggregate them into a single average clustering. An easy way to see the connection is this: given a collection of clusterings of a data set, create an instance of correlation clustering with the positive weight for a pair corresponding to the fraction of clusterings that ``vote'' for that pair being in the same cluster, and the negative weight being the fraction of clusterings that ``vote'' for that pair being separated.

Thursday, July 23, 2009

Clustering: Hierarchical methods

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here)

k-center, k-median, k-means, k-medoids, .... The list is endless, but that pernicious k comes up everywhere. What can we do about it ? There are really two ways to go:

Figure out the "right" k for a problem. This is a complicated matter, and will be the topic of a later post
Don't choose: give the user a universal representation from which they can decide what k they want.

This latter formulation takes us into the world of hierarchical clustering, the topic of this post.

There are two different ways to think about hierarchical clustering: a representational view and an algorithmic view. The algorithmic view is quite simple: let's try to design a clustering via greedy operations. In the top-down view, we find a good split of the data into two parts, and then recurse on each side. In the bottom-up view, we select two clusters for merging (in the beginning, all items are in separate clusters), and merge our way up.

Perhaps the most well known of the bottom-up methods is the 'single link' clustering algorithm, in which at each step, you merge the two clusters that are closest to each other. If this sounds familiar, it should: this is merely Kruskal's MST algorithm run partially to completion.

The "single-link" method is optimistic: it creates clusters via connected components, assuming that transitivity is enough to construct a cluster. The "complete-link" approach is more pessimistic: rather than defining the distance between two clusters as their nearest pair distance, it defines it as the furthest pair distance, ensuring a clique-like structure for clusters. Merging happens as before.

Other variants that are more robust to noise average the pairwise distances to obtain the clustering distance. With the right choice of distance, these amount to comparing the centroids of clusters.

You'll notice that I didn't actually define a problem that these algorithms solve, keeping in with the grand tradition of clustering :). There is a way of defining a general optimization function that single-link clustering solves optimally. For the others though, although we can define a cost function, we can't show that they're optimal.

So much for the algorithmic view of HC. The representational view goes somewhat deeper.

I've had at least one person (you know who you are) tell me that one of the only two clustering algorithms that worked for them is hierarchical clustering. And indeed, the idea is very seductive. Rather than present the user with a single k-clustering - a snapshot, if you will, of the data - we give them a tree of merges, in which there is one leaf for each object. The central idea here, and one that bears emphasizing, beacuse it's so different to how we've thought about clustering thus far, is this:

we understand the structure of data from its transitions, rather than from its states.

What I mean is this: rather than defining a clustering as a static partitioning of the data, we watch the data ``evolve'' as we start merging points together, and use the merge history to figure out what the real structure is. There are a couple of ways of doing this: the most common way is to imagine drawing the tree from top to bottom, and then cutting it by a y-monotone curve (one that intersects any vertical line in exactly one point). This can be done to ensure we have k clusters, or it can be done at points where the merge appears to be ``stable'' (the subtree doesn't merge with any other subtrees for a long time).

A particularly effective visual metaphor for this is the dendrogram, which is very popular in evolutionary tree analysis.

The dendogram is visually appealing because it does two things: firstly, it depicts the tree of merges, permuting the nodes so that there aren't crossings. Secondly, and more importantly, it uses the lengths of edges as a visual marker to indicate the 'time of merge' for clusters. If we think of starting at time 0 and merging clusters, then a cluster that sticks around for a long time will have a tall edge connecting it to its parent, in comparison with a cluster that gets merged quickly. (as an aside, visual metaphors are possibly underrated when it comes to thinking about clustering algorithms. My firm belief is that one of the reasons soft clusterings aren't used as much as hard clusterings is because it's hard to visualize them)

Sidebar:
And this is the key operational idea behind ``clustering from the transitions'': clusters that stay unmerged longer are likely to be more interesting than clusters that get merged quickly. Till recently, this was merely a heuristic way of thinking about hierarchical clusterings. However, we now have the idea of persistence that comes from the realm of computational topology. In short, persistence is a way of quantifying topological features of a shape that persist across different scales. Persistence has been studied extensively as a rigorous way of quantifying the triviality (or non-triviality) of features in a shape, and there's a recent paper that applies persistence-like concepts to clustering as well. It's still early days for this direction, but I think it's a promising one.

Returning to hierarchical clustering, one major problem with this approach is that it's local: make the wrong choice of merge early on, and you'll never get to the optimal solution for a k-clustering problem. And since there's no easy way to change your mind (i.e split a clustering), it's hard to reverse bad decisions. If you're so inclined (and I am nowadays), this is the problem of finding monotone paths in the merge-split lattice on partitions of [1..n], but I digress...

Ultimately, the value of hierarchical clustering comes in its ability to represent an entire history of clusterings, rather than just one, and the relative ease with which a bottom-up algorithm can be written. That, coupled with its uses where you really do want to find a tree (evolutionary or otherwise) is what makes it a popular implement in the clustering toolbag.

Wednesday, July 15, 2009

Consistent BibTeX formatting

I try not to write BibTeX by hand any more: too easy to introduce errors. So I usually use either DBLP or the ACM digital library to get BibTeX for papers. Sometimes the journal has BibTeX, or some format that can be converted. As an aside, IEEE is extremely lame: you have to login to their digital library even to get a citation !

For the most part, I don't need to go beyond ACM or DBLP, which is great. But here's the problem: their formats are different ! I needed the BibTeX for a recent paper of mine, and found it on both sites. Here's what ACM gave me:

@inproceedings{1516372,
author = {Ahmadi, Babak and Hadjieleftheriou, Marios and Seidl, Thomas and Srivastava, Divesh and Venkatasubramanian, Suresh},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology},
year = {2009},
isbn = {978-1-60558-422-5},
pages = {84--95},
location = {Saint Petersburg, Russia},
doi = {http://doi.acm.org/10.1145/1516360.1516372},
publisher = {ACM},
address = {New York, NY, USA},
}

and here's what DBLP gave me:

@inproceedings{DBLP:conf/edbt/AhmadiHSSV09,
author = {Babak Ahmadi and
Marios Hadjieleftheriou and
Thomas Seidl and
Divesh Srivastava and
Suresh Venkatasubramanian},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT},
year = {2009},
pages = {84-95},
ee = {http://doi.acm.org/10.1145/1516360.1516372},
crossref = {DBLP:conf/edbt/2009},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

@proceedings{DBLP:conf/edbt/2009,
editor = {Martin L. Kersten and
Boris Novikov and
Jens Teubner and
Vladimir Polutin and
Stefan Manegold},
title = {EDBT 2009, 12th International Conference on Extending Database
Technology, Saint Petersburg, Russia, March 24-26, 2009,
Proceedings},
booktitle = {EDBT},
publisher = {ACM},
series = {ACM International Conference Proceeding Series},
volume = {360},
year = {2009},
isbn = {978-1-60558-422-5},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

So as you can see, we have a problem. The formats are not consistent, which means that if I need to get some references from DBLP, and others from the ACM, my references file is going to look very irregular.

Other critiques:

I have never understood why DBLP splits up the conference and the paper: with BibTeX, if you cite three or more papers that use the same crossref, the crossref is included itself as a reference, which is just strange.
Unless you use double curly braces, capitalizations inside a string get removed, which is mucho annoying: It's "Riemannian", not "riemannian".
The DBLP name for the conference is too cryptic: who'd even know what EDBT is outside the database community. On the other hand, the ACM citation is clunky, and is a page-length disaster waiting to happen.

Thoughts ?

Pages