Sunday, May 19, 2013

Coding, Complexity and Sparsity 2013 (SPARC) 2013.

Atri Rudra reminds us that the 3rd incarnation of SPARC is soon to be upon us (disclaimer: I'll be speaking at the event):
Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms). 
Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas. 
The Coding, Complexity and Sparsity workshop (SPARC 13) will be held in Ann Arbor, MI on Aug 5-7. 
These will be hour-long lectures designed to give students an introduction to coding theory, complexity theory/pseudo-randomness, and compressive sensing/streaming algorithms. We will have a poster session during the workshop and everyone is welcome to bring a poster but graduate students and postdocs are especially encouraged to give a poster presentation. 
This is the third incarnation of the workshop and the previous two workshops were also held in Ann Arbor in August of 2011 and 2012. 
Confirmed speakers:
* Jin Yi Cai (University of Wisconsin, Madison)
* Shafi Goldwasser (MIT)
* Piotr Indyk (MIT)
* Swastik Kopparty (Rutgers University)
* Dick Lipton (Georgia Tech)
* Andrew McGregor (University of Massachusetts, Amherst)
* Raghu Meka (IAS)
* Jelani Nelson    (Harvard)
* Eric Price (MIT)
* Christopher RĂ© (University of Wisconsin, Madison)
* Shubhangi Saraf (Rutgers University)
* Suresh Venkatasubramanian (University of Utah)
* David Woodruff (IBM)
* Mary Wootters    (Michigan)
* Shuheng Zhou (Michigan)

We have some funding for graduate students and postdocs with preference given to those who will be presenting posters. For registration and other details, please look at the workshop webpage: http://eecs.umich.edu/eecs/SPARC2013/

Friday, May 10, 2013

On GPU algorithms

Lance talks about GPU algorithms in his latest post:
The theory community hasn't seem to catch on yet. There should be some nice theoretical model that captures the vector and other operations of a GPGPU and then we should search for algorithms that make the best use of the hardware. The theory world writes algorithms for non-existent quantum computers but not for the machines that we currently use.
Well.

As someone who was doing GPU algorithms back when this meant flipping state bits on the fixed function pipeline (this is the GPU analog of "I wrote video games in assembly"), maybe a little perspective is in order.

Aside: I actually gave a talk on GPU algorithms at Chicago when Lance was there back in 2004. Clearly I wasn't interesting enough to get him to take note :).

Back in the early 2000s, I got quite interested in GPU algorithms. This came about from a summer project that Nabil Mustafa was doing with Shankar Krishnan and myself on real-time map simplification. Shankar had this cool idea to try and accelerate the Douglas-Peuker algorithm by implementing it on a GPU, which at the time meant trying to retrofit the algorithm to a very strange OpenGL rendering pipeline.

One thing led to another, and it got us thinking more abstractly about what kinds of computations could be done on the GPU. This was pre-CUDA, which meant that essentially we could abstract the GPU machine model as a massively parallel SIMD architecture in which each "processor" ran a simple straight line program (no loops, limited conditionals, and small memory - much like a streaming algorithm). The parallelism lent itself nicely to geometric problems, and we wrote papers on map simplification, depth contours, and general geometric optimization. We also designed an algorithm for a CSG rendering problem that had buried inside it a simple streaming model for proving lower bounds: in fact we were able to show a exponential gap (in the number of streaming passes needed) between deterministic and randomized median finding in this model (the conference wasn't interested in the lower bounds though :)).

In a bout of perfect timing, I stopped working on GPUs a year or so before CUDA. At the time, my reasons were simple. I was tired of the computational model changing on me every six months, and didn't have the resources to keep buying the latest and greatest graphics cards (although AT&T was very generous in getting me a state of the art card originally). It was also annoying that in order to really exploit the power of the hardware, I needed secret sauce that was only revealed if you had NVIDIA inside connections (or went to the NVIDIA U events).

Then CUDA came along and everyone went nuts. If you go to hgpu.org (originally gpgpu.org) you'll see the number of papers being published every year on GPU-accelerated methods. CUDA changed (quite radically) the way in which we thought about GPU algorithms: the memory model became more sophisticated, and the SIMD component was more powerful, although it was still a good idea to avoid excessive looping and conditionals.

Eventually I got dragged back into the GPU world. I collaborated on a  paper on graph coloring which was very instructive in demonstrating the differences between a GPU and straight up parallel machine. I also did a series of lectures on GPU algorithms at a 2012 MADALGO summer school on new models of computing. Incidentally, folks at Utah spend a lot of time working on issues relating to GPUs: Mary Hall's group has some nifty autotuning tools for mapping algorithms to the GPU, and Ganesh Gopalakrishnan has been thinking about memory consistency issues for GPUs.

Is there a good "GPU model" that theoreticians can study ? I think this is the wrong question. I think the GPU is one point (multi-core CPUs are another) in a space of tradeoffs between local computation, memory latency, and communication. I've been saying for a while now that communication appears to be the key resource to understand theoretically when dealing with our new networked/distributed/multiprocessor world. GPU algorithms are interesting to study as one example of how communication affects a computation. But the "right" way to deal with this involves focusing on communication, and not the GPU model per se. The latter has many idiosyncracies and what I'd consider distractions that take away from a clean formal understanding.

I should add that I'm not even the only person from theory-land who's been thinking about GPU-derived models. There's a great body of work on multicore models (see Phil Gibbons' lecture notes at the same summer school), and I recall Michael Mitzenmacher having recent work on GPU-accelerated hashing. Uzi Vishkin had a STOC workshop a few years ago on multicore models as well. (Turing Award winner) Valiant's bridging model clearly needs to get a lot more attention as well so that people don't think no one's paying attention to the models.

For more on this, also see my colleague Jeff Phillips' lecture notes on GPUs from his class on new models of computation.

Saturday, March 23, 2013

Free, Freemium, and Paid

There was a time when I'd bridle at the idea of having to pay for software or services. But I browse the iTunes app store now, and see people pleading to have the chance to pay for an app that they like, so that the authors won't stop updating it. The whole kerfuffle with Google Reader, Keep and Evernote is another example of how people have begun to prefer to pay for products, rather than rely on something free.

It feels like the end of an era where open source and free software (not the same thing, but often referred to in the same breath) were the default. Maybe we've come to the realization that nothing is really ever free, and that it's more realistic to get the costs out in the open rather than "being the product".

Friday, March 15, 2013

The SIGACT CG column

As you might have just discovered, I'm the second half of the two-headed monster that's taken over the SIGACT Geometry Column after Joe O'Rourke stepped down (Adrian Dumitrescu, who hopefully does not mind being referred to as the head of a monster, is the other half). My first column is up, and it talks about some recent fascinating developments in nonnegative matrix factorization.

My hope with the pieces I write is to cover areas of geometry that may have not had sufficient representation in the past (especially things closer to problems I'm interested in). My next column is due in August, and apart from doing a wrapup on SoCG, other things that come to mind include Laplacians and graph geometry, reproducing kernels, or even Bregman geometry.

But everything is now mobile, crowdsourced and social networked, so I'm looking for your suggestions on interesting topics to cover, new emerging areas, results that I'm not tracking, and so on. So post away here or on G+.

Monday, February 25, 2013

Data analysis, interpretation and explanations

There was a recent data-related kerfuffle between the New York Times and the makers of the Tesla electric car. If you haven't read the articles, (and the NYT public editor's post on this has good links), the crude summary is this:

  • NYT reporter takes Tesla on long test drive, and reports problems with running out of charge.
  • Tesla Motors CEO Elon Musk writes a long takedown of the reporter's review, complete with graphs generated from data the car recorded during the drive
  • NYT comes back with their own interpretation of data, rebutting Musk's claims.
  • Others attempt to reproduce the reporter's experience and fail, but arguably in different weather conditions that might or might not make a difference.
In an insightful meta-analysis of the dustup, Taylor Owen of the Tow Center for Digital Journalism discusses the implications for journalism in a data-driven world. He also references an article by David Brooks that makes the point:
People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking...
I've felt for a while now that it's time to design mechanisms for providing context and narrative to data analysis*. Some of the research my student Parasaran does is on metaclustering: essentially the meta-analysis of different perspectives (clusterings) of data to draw out a larger meaning. We've also just submitted a paper on how to derive local markers of prediction validity in clustering (rather than just relying on global measures of quality). And my seminar this semester is about the larger problems of explanations and accounting in data mining.

I think as computer scientists, we have a lot to offer in the realm of data mining - not just in the design of tools for prediction, but in the design of tools to facilitate better understanding.


* None of this is surprising to experimental scientists, who will routinely attack a problem from multiple directions in order to acquire a larger understanding of a phenomenon rather than just the ability to predict. 


Friday, February 08, 2013

TCS+ hangout on the TSP polytope.

+Thomas Vidick  and the folks at the +TCS+ community have started a new experiment in G+ hangout talks. The first talk in this series was by Ronald de Wolf on the STOC 2012 paper that showed that there was no polynomial-sized lifted representation of the TSP polytope.

Overall, it was a very pleasant experience. I was able to reserve one of the 10 slots (a Hangout limit) for myself and my students at the University of Utah to attend and interact with the speaker, and from Thomas's post-review it seems that many more were signed on for "view-only" access to the stream.

There were very few technical hiccups, and +Oded Regev did a great job making sure that people were muted when not talking, and had a chance to ask questions in an orderly way. The chat sidechannel was a great help.

The talk itself was the best part: de Wolf did an excellent job conveying the main ideas of the proof without getting bogged down in details, and it felt as comfortable as listening to a talk live at a conference. Given the number of people listening in, this was already approaching medium-sized-workshop levels.

I'm looking forward to more of these events, and I'm glad that the +TCS+  folks are doing this. I also hope they can try more experiments with Google Hangout. For example, two ideas come to mind:

  • A reddit-style AMA ("Ask Me Anything"). One way to make this work is that the speaker would do a short presentation (maybe 5-10 minutes) and then would open up the floor for questions. To keep things manageable, people could write in questions on chat, and the moderator could filter them and ask the questions live. With sufficient preparation, some questions could be submitted ahead of time.
  • A live panel discussion with a moderator and a few participants, which again could have questions from the audience moderated by the moderator.

Monday, January 21, 2013

Accountability in Data Mining: A new seminar

Glencora Borradaile makes a number of interesting points about the implications of Aaron Swartz's life and work for us academic computer scientists. 
As computer science academics we are in a very powerful position. We are trusted with shaping the next generation that will make very important decisions that will have far-reaching social implications. Decisions like those over Facebook’s privacy defaults, motivating technology that enables autonomous private vehicles at the expense of the public interest, defining ownership of electronic media. We make those decisions ourselves in our research; what we research, how we allow our research to be used.
 Whether or not you agree with her particular policy preferences, the point remains that the technologies we develop can have lasting consequences for the "shape" of the world to come. And if we don't speak up (either through our work, or through our own advocacy), others will take the technologies we develop and find their own ways of using or misusing them.

All of this leads up to my current interests in data mining. I've been thinking about the problems of accountability in data mining for a while now: initially mostly in private, but now more and more in public (along with +Graham Cormode and +Andrew McGregor) as I see the importance of the issue.

What is accountability in data mining ? It means many things really, but for me, for now, it means this:

The fruits of data mining pervade every aspect of our life. We are recommended books and movies; given differential pricing for insurance; screened for potential terror threats; diagnosed with various diseases; and targeted for political advertising. The ability to sift through massive data sets with sophisticated algorithms has resulted in applications with impressive predictive power. 
Since the internals of a learning process can be complex and opaque, it is important to verify that the results satisfy the properties claimed by the algorithm. The importance of this goes beyond merely checking the algorithm itself. Validation mechanisms also provide a way for users to demand accountability from authorities who might use the results of mining to affect users in some way (by changing their insurance rates, or even putting them on a terrorism watchlist). As the results of data mining affect more and more of our lives, the more crucial it is that the user be able to validate decisions made on their behalf and that affect them. 
The above is an introduction to a seminar I'm running this semester on this topic. I'm a little nervous about it, because the topic is vast and unstructured, and almost anything I see nowadays on data mining appears to be "in scope". I encourage you to check out the outline, and comment on topics you think might be missing, or on other things worth covering. Given that it's a 1-credit seminar that meets once a week, I obviously can't cover everything I'd like, but I'd like to flesh out the readings with related work that people can peruse later.

It's entirely possible that we don't care about our privacy any more (as Facebook seems to think*). But we will care about the consequences of the use of our data. My perspective is that a better understanding of what is technically possible (and not possible) to demand and get accountability will be critical to informing larger policy discussions on this topic.

* In an earlier version of this post I wrongly attributed this sentiment to an individual. Apologies.

Wednesday, January 16, 2013

A sampling gem: sampling from $\ell_p$ balls.

A while back, I had written about uniform sampling from the $d$-simplex. In brief, sample exponentially distributed random variables, and normalize them. Note that the simplex is one piece of the $d$-dimensional $\ell_1$ unit sphere.

I had also mentioned a similar trick to sample from the ($\ell_2$) unit sphere: sample Gaussian random variables, and normalize them.

As I discovered today, these are really just special cases of a much more general result that includes both of these approaches, as well as providing a general way to sample from the $d$-dimensional unit ball (as well as sphere) under any $\ell_p$ norm.

The result is due to Barthe, Guedon, Mendelson and Naor, and is breathtakingly elegant. Here it is:

To sample from the unit ball under the $\ell_p$ norm in $d$ dimensions, randomly sample $d$ coordinates $x_1, \ldots, x_d$ i.i.d from a density proportional to $\exp(-|x|^p)$. Also sample $Y$ from the exponential distribution with parameter $\lambda = 1$. Then the desired sample is
$$\frac{(x_1, x_2, \ldots, x_d)}{\Bigl(\sum_i |x_i|^p + Y\Bigr)^{1/p}}$$
Notice that merely eliminating the $Y$ recovers the procedure for sampling from the unit sphere and includes both of the above sampling procedures as a special case. It's far more efficient than either rejection sampling, or even the metropolis-sampling method from the theory of sampling from convex bodies. Also, if you only want to sample from an $\ell_2$ ball you can do it without this hammer using the sphere sampling technique and a nonuniform sampling of the length, but this seems so much more elegant.

Thursday, January 10, 2013

LEGO and math teaching

I was discussing Paul Tough's book with my wife yesterday at breakfast, and we somehow got onto my pet peeve: the mechanical way in which math is taught in schools, and how math gets reduced to arithmetic and counting. Of course the definitive word on this is Paul Lockhart's A Mathematician's Lament.

I looked over the floor of the living room, strewn with random bits of deadly adult-foot-stabbing Lego, and had a minor epiphany. As anyone with small children knows, LEGO sells a lot more Hollywood tie-in kits now compared to the "bags of bricks" that it used to sell. You can buy the Millenium Falcon, an X-wing fighter, a spiderman action scene, or some kit related any cartoon on Nickelodeon.

Which is a pain. Lots of specialized parts, key pieces that if misplaced causes massive bouts of tears and anguished searching, and so on.

But how do kids play with them ? They put it together carefully from the instructions the first time. This lasts about a day or so. Then they get bored, take it apart and mix up all the pieces with the OTHER kits they have, and spend many happy days building all kinds of weird contraptions.

Here's the thing about the kits. They show you how to build fairly complicated devices and doohickeys. The kids pick up on this, and they mix and match the doohickeys in their own new constructions. Essentially they figure out the functions of the different parts, and realize how to build brand new things with them.

But suppose they were told that all they could do was repeatedly assemble the kits (and maybe even different kits) over and over again. It's unlikely they'd learn anything more than just how to assemble kits really quickly. They'd also get terribly bored. In fact, the popular toys in our house are multipurpose objects: any single-purpose toy gets discarded rather quickly.

To cut a long story short, it feels to me that a lot of math (and theoryCS) education at the school and college level involves looking at various "kits", and seeing how they get put together. You get rewarded for putting kits together correctly, but not for playing with the kits and creating bizarre new objects. But finding a way to let students (in an unstructured way) play with mathematical concepts is the key to having them really understand the concepts and their different facets.

Tuesday, January 08, 2013

ICERM and Simons postdocs

Two upcoming TCS postdoc deadlines:

ICERM Program on Network Science and Graph Algorithms

This is a program out of Brown organized by Kelner, Klein, Mathieu, Shmoys and Upfal. It sounds quite fascinating if you're doing anything with graph data and spectral analysis. Deadlines for postdoc applications is Jan 14.

Simons special program on the theory of big data. 

As I've mentioned before, the Simons Institute is running a semester long program on the theory of big data. The deadline for applying for postdocs is soon (middle of January)

These two programs are coordinating, and if you're so inclined you might be able to spend one semester at Berkeley and another in Providence. Please let the organizers know if this is something you're interested in.

Monday, January 07, 2013

SODA 2013 4/n: Business

If you haven't been following my live-tweets at the SODA business meeting, here's a summary of the unusually quiet meeting:
  • Attendance at the conference was 311, which is quite low, but is a local high for New Orleans (third time in the last 10 years). 
  • 135 papers were accepted out of a net 459 submissions. This is an acceptance rate of nearly 30%, which should strike fear into the hearts of tenure-track assistant professors everywhere. 
  • PC members had to review "only" 42 papers each. Yes, I know this is insane. And no, we're not going to do anything about it. 
  • Shiri Chechik received one of two best student papers for her paper "New Additive Spanners". Bernhard Haeupler received the other for "Simple, fast and deterministic gossip and rumor spreading".
  • I've already mentioned the two best papers, on graph minors and dynamic connectivity
  • SODA 2014 is in Portland, land of beer, books and birds. Chandra Chekuri is chair. 
  • After Salt Lake City observers had to withdraw because of serious voting irregularities, the trumped up unfair winner of the SODA 2015 sweepstakes was San Diego. But since we never go with our first choice location (Honolulu, anyone?), San Francisco was listed as a second choice, with DC as a third choice. 
  • David Johnson is handing the baton over to Cliff Stein, after running SODA since before I knew it wasn't a fizzy drink. 

SODA 2013: Part 3/n

I just attended Valerie King's talk on her paper with Bruce Kapron and Ben Mountjoy that does dynamic connectivity in worst-case polylog time (randomized). This paper received a Best Paper award (along with the Grohe et al paper on graph minors I mentioned yesterday).

The problem is a classic: given a stream of updates (edge insertions/deletions) to a graph, maintain a data structure that can answer "Are u and v connected" efficiently. There's been a slew of work on the problem: if you want worst-case bounds for updates, the best till now was $O(\sqrt{n})$ by Eppstein, Galil, Italiano and Nissenzweig (from 1992). There are polylogarithmic amortized bounds, but they can incur worst-case update times of $\Theta(n)$.

In this paper, after 20 years, they knock down the worst-case update times to polylogarithmic: the algorithms are randomized (with 1-sided error: it might occasionally declare two nodes disconnected when they were connected).

The idea itself is very elegant, and it connects to techniques in streaming algorithms in a way that I think is neat. WARNING: IANAE in this area, so my rendition might be slightly naĂ¯ve, and is drawn from my understanding of Valerie King's excellent presentation.

The basic problem is this: I can maintain connectivity by maintaining a collection of trees. Adding an edge can be done without too much difficulty (indeed, if all you want to do is insert things, then you're back to union-find). It's insertion and deletion together that makes things hard: imagine that you delete an edge and disconnect a tree: you need to quickly determine if there's some other non-tree edge that can be used to reconnect the pieces, or if the two sides are now really disconnected.

More generally, consider the cutset maintainence problem. You have updates as before, but now a query is: given a set S, find a cut edge if one exists between S and V - S. It's not hard to see that this is the core routine you need for the tree.

The first elegant idea is this: suppose you represent each edge by a bit string consisting of the encoding of one endpoint followed by an encoding of the other. For example, the edge 2-3 might be written as 010011. Now for each vertex, compute the XOR of all the edges adjacent to it and call this its signature.

If you XOR all the signatures of vertices in a set, and if the set has exactly one cut edge, then what you get is the signature of that edge ! This is because each edge that's inside S will occur twice and therefore will zero out.

So this works if your cut set is of size 1. But suppose it's not ? Now you maintain $O(\log n)$ sets for each vertex. Each edge adjacent to the vertex is thrown into the $i^{\text{th}}$ set with probability $1/2^i$. With some appropriate duplication, you can show that if the cut set is of size $k$, then w.h.p the $\log k$th cut set will have exactly one element in it, and that's the element you can retrieve when you need to find a replacement edge.

There's a lot more technical work to do to get the whole algorithm working, and I won't go into that. But those of you who are familiar with streaming will recognize something here.

In essence, they've created a sketch for the set of edges adjacent to the vertex, and they have a way to represent the set compactly and retrieve individual elements from it. The trick with exponentially decaying levels is standard in streaming count estimation, and the GF(2) trick for storing the sketch is very similar to the trick used by Ahn, Guha and McGregor in their SODA 2012 paper on graph sketching.

And that's what I thought was the best part: that ideas that really have power in streaming can be used to help solve a longstanding open problem in graph algorithms. I should point out that the connection is post-facto: this paper was developed independently of the streaming work. But one does have to wonder what other connections we might be able to establish between sketching tricks on graphs and basic graph algorithms.

SODA 2013: Part 2/n

My previous post is here.

Day 2 of SODA, and the tenth time I've been asked "are you chairing all the sessions" ? No, just that many of my PC colleagues didn't (or couldn't) show up :), so those of us who did are doing more lifting. In reward, we got a nice dinner in the French quarter, and I tasted boudin for the first time (and maybe the last). 

An interesting talk today morning by Dror Aiger on reporting near neighbors. They were able to show a not-super-exponential relation between the number of points at unit $\ell_\infty$ distance from a query, and the number of points at unit $\ell_2$ distance. This was wrapped into a fast algorithm for reporting Euclidean near neighbors in high dimensions that has some interesting (if preliminary) experimental backing as well in comparison with ANN, FLANN and LSH. 

Jan VondrĂ¡k gave the invited talk on submodular optimization. I mentioned Satoru Fujishige's talk at NIPS, and this was an excellent complement. Fujishige's talk focused on the geometric intuition behind submodular functions (especially the associated polymatroid). VondrĂ¡k's talk focused on the algorithmic implications of submodularity, and he gave very convincing arguments for why it can be viewed as discrete convexity OR discrete concavity, or even neither. He pointed out how the Lovasz extension is useful for minimization and the multilinear extension is more useful for maximization, and gave a number of "recipes" for designing algorithms that optimize submodular functions. I hope the slides go online at some point: they were very clear and well-balanced. 

There was some discussion over whether next year's SODA should adopt the two-tier PC that STOC is currently experimenting. The jury's still out on that, and since the STOC PC is not done with their work, we don't yet have formal feedback. I will admit to being a little frustrated with the level of conservativeness on display here: it's not as if EVERY OTHER COMMUNITY IN CS doesn't do this and doesn't have best practices that we can learn from, and given our reviewing loads, it's really crazy that we aren't desperately trying things to alleviate the problem. 

Sunday, January 06, 2013

SODA 2013, Part I/n

On twitter, it's common to post longer thoughts in 140 character bursts. If you don't know how long the thought will be, you label them as 1/n, 2/n, and so on, so as not to exceed an arbitrary limit, but also imply that there is a finite limit. 

So we're back in the Big Easy for SODA. This time the conference has merged ALENEX and ANALCO with the main program, so at least for today and tomorrow, we have four parallel tracks. Having just come back from NIPS, SODA feels like a small cozy cocktail party in comparison (ed: not that I know anything about cocktail parties: I have small children)

The highlight was Bob Sedgewick's tribute to Philippe Flajolet. I've talked about their work on analytic combinatorics before, but it was nice to hear some of it again. Flajolet's contributions to the field are immense and valuable. He did big data before it was cool: his $\ell_0$ estimation work with Nigel Martin is now a seminal paper in the world of streaming, and his HyperLogLog paper on counting a stream (with Fusy, Gandouet and Meunier) is the best-in-class on this problem. Bob quoted one of Flajolet's students as saying, "Read any of his papers. You will learn something".

I chaired a geometry session in the morning, and one of the papers there caught my eye a while back. The Fréchet distance problem (commonly known as the dog-walking problem) is the problem of computing a monotone mapping between two curves that has minimim maximum length (you can think of this as a man on one curve walking a dog on another, and asking for the minimum leash length when both man and dog can walk forward at arbitrary speeds).

There's a relatively straightforward dynamic program that can compute the FrĂ©chet distance between two polygonal chains in time $O(nm\log (nm))$ (if $n$ and $m$ are the number of links in the two chains). But it's been a big open problem to figure out whether this can be done in subquadratic time. The paper, (by Agarwal, Ben Avraham, Kaplan and Sharir) shows that for the discrete version of the problem (that Rinat Ben Avraham in her talk calls the frog hopping problem: the frogs hop from vertex to vertex) you can get a slightly subquadratic time algorithm. The main idea involves adapting "string-like" methods for the problem, which is hard because the "alphabet" is infinitely large.

It will be interesting to see if this spurs more progress. There's already a newer paper by Buchin et al that gets a slightly improved (but still super-quadratic) algorithm for the continuous FrĂ©chet distance (i.e the original problem). If for no other reason, please click the link to see their awesome title.

Martin Grohe gave an award talk for his work with Kawarabayashi and Reed on an improved algorithm for graph minor decompositions. The problem is as follows: given a graph G and a graph H supposedly excluded by G, compute a decomposition of G promised by the graph minor theorem, or produce an H-minor.

The improvement reduces the dependency on the size of $G$ to quadratic (from cubic) and makes use of the wonderful and mysterious Courcelle's theorem. The dependence on the size of $H$ is equally mysterious (at least Martin declined to tell us), but is nowhere near as wonderful.

Please post papers you thought were interesting (and why) in the comments, and I'll add them as updates.

Tuesday, January 01, 2013

A SODA announcement, and a happy new year !

While you're all recovering from New Year's eve revelries, here's an important note from the ALENEX/ANALCO organizers regarding the debauchery in New Orleans otherwise known as SODA:


Dear SODA attendees,
We want to make sure that you are aware of a change in the format of SODA/ALENEX/ANALCO.  In recent years, ALENEX and ANALCO have been held on the Saturday before SODA and had separate registration.  This year we are trying a different format.  ANALCO and ALENEX will both be held in parallel with SODA; ANALCO on Sunday and ALENEX on Monday.  There is no longer an separate registration for ALENEX and ANALCO, everyone is automatically registered for all three.  That is, all SODA attendees are welcome to attend  ANALCO and  ALENEX talks.  We encourage you to look at the program and attend any talks that interest you.  We also welcome any feedback on this new format, after you have attended the conference. 
Regards,
Bob Sedgewick (ANALCO steering committee chair)
Cliff Stein (ALENEX steering committee chair)

Saturday, December 15, 2012

NIPS II: Deep Learning and the evolution of data models

(tl;dr: some rambles and musings on deep learning and data, as I attempt to sort out in my head what this all means)

Over the years, as we've engaged with "big data" more and more, the way we construct mental models of data has changed. And as I've argued before, understanding how we think about data, and what shape we give it, is key to the whole enterprise of finding patterns in data.

The model that one always starts with is Euclidean space. Data = points, features = dimensions, and so on. And as a first approximation of a data model, it isn't terrible.

There are many ways to modify this space. You can replace the $\ell_2$ norm by $\ell_1$. You can normalize the points (again with $\ell_2$ or $\ell_1$, sending you to the sphere or the simplex). You can weight the dimensions, or even do a wholesale scale-rotation.

But that's not all. Kernels take this to another level. You can encode weak nonlinearity in the data by assuming that it's flat once you lift it. In a sense, this is still an $\ell_2$ space, but a larger class of spaces that you can work with. The entire SVM enterprise was founded on this principle.

But that's not all either. The curse of dimensionality means that it's difficult to find patterns in such high dimensional data. Arguably, "real data" is in fact NOT high dimensional, or is not generated by a process with many parameters, and so sparsity-focused methods like compressed sensing start playing a role.

But it gets even more interesting. Maybe the data is low-dimensional, but doesn't actually lie in a subspace. This gets you into manifold learning and variants: the data lies on a low-dimensional curved sheet of some kind, and you need to learn on that space.

While the challenge for geometry (and algorithms) is to keep up with the new data models, the challenge for data analysts is to design data models that are realistic and workable.

So what does this have to do with deep learning ?

Deep learning networks "work" in that they appear to be able to identify interesting semantic structures in data that can be quite noisy. But to me it's not entirely clear why that is. So I've been mulling the question of what kind of structure in data might be "visible" to such a network. In the absence of formal results ("if your data can be separated in some RKHS, then an SVM will find it"), what follows is some speculation, based on talks I attended and conversations I had with NIPs attendees.

Observation I: Both Stephané Mallat's talk and a nice paper by Coates, Karpathy and Ng talk about the idea of "first-level" features that identify (low-dimensional) invariants and eliminate noise. In the Coates et al paper, they start with a very fine $k$-means clustering ($k$ very large), and attempt to "glue" the centers together into low dimensional subspace pieces. These are then propagated up a network of higher-order feature processors.

Observation 2: Yoshua Bengio's survey of deep learning (a very readable account) points to work by Geoff Hinton as part of the reinvigoration of the field. A central idea of this work is that deep belief networks can be trained "layer by layer", where each layer uses features identified from the previous layer.

If you stare at these things long enough, you begin to see a picture not of sparse data, or low-rank data, or even manifold data. What you see is a certain hierarchical collection of subspaces, where low-dimensional spaces interact in a low dimensional way to form higher level spaces, and so on. So you might have a low-level "lip" feature described by a collection of 2-3 dimensional noisy subspaces in an image space. These "lip" features in turn combine with "eye" features and so on.

This might seem rather far fetched, and a strange way to think about data. But I can't claim originality even here. Back in June, Mauro Maggioni gave a talk at CGWeek in Chris Bishop's workshop on the connection between analysis and geometry. In this talk, he described a multi-resolution view on data that admits structure at different scales, and admits different structures at these scales.

The upshot of all of this: it is possible that deep learning is trying to capture hierarchies of low dimensional subspaces that interact in a low dimensional way. This would explain how one is able to avoid the curse of dimensionality, and might also explain why it sometimes can find structure that other methods (kernels, manifold methods, etc) might miss.

Problem is: I'm not sure how one tests this hypothesis. Almost any data set could be viewed this way if you allow enough features and enough "depth" in the hierarchy.

Monday, December 10, 2012

NIPS ruminations I

(tl;dr a bucket load of trends/papers that I found interesting at the conference)


I just returned from NIPS in Lake Tahoe. For those not in the know, NIPS is one of the premier machine learning conferences, and has been growing rapidly over the last few years. This year's incarnation (the first of at least 7 in Lake Tahoe) has over 1600 attendees, a gazillion workshops, 4 tutorials, and more papers each DAY than all of STOC.

The format of the conference is very interesting. There are keynotes each morning and afternoon, a few selected 20 minute talks, and a few more more 5 minute "spotlight" talks, all in single session mode. All accepted papers get a poster slot one of of four days, and the poster session is a mammoth event from 7pm to midnight each night (yes, you read that correctly).

I'll say more about the format in a later post. For the next few posts, I'll highlight some of the papers and trends that were interesting to me. For more conference blogging, check out Anand Sarwate's sequence (I, II), and Hal Daumé's recap.

Perhaps the most obvious trend at the conference was on deep learning. While it's been in the news recently because of the Google untrained search for youtube cats, the methods of deep learning (basically neural nets without lots of back propagation) have been growing in popularity over a long while, and I was awash in autoencoders, pooling, and other jargon from the area. It was amusing to see speakers apologize for NOT talking about deep learning.

Stéphane Mallat's keynote on this topic was a thing of beauty. While I understood very little of the technical details, the central idea was that by using certain convolution-based encodings, and looking for "invariant substructures", you could essentially filter out irrelevant noise in the feature space, generating new features for the next level of the learning network. This simple idea is quite powerful: there was a paper on learning faces from videos from Coates, Karpathy and Ng that used a simple version of this idea by using k-means with lots of clusters to identify these low-D subspaces and factor them out.

Another trend that's been around for a while, but was striking to me, was the detailed study of optimization methods.  Optimization is a major theme in machine learning. This is primarily because so many machine learning problems can be formulated as convex/nonconvex optimizations. Solving these problems takes a lot of ingenuity and engineering, and after you've been swimming in a pond of regularization, sparsity, mixed norms, stochastic gradient descent, and alternating optimization for a while, things can get crazy. There are at least two different workshops on optimization in machine learning (DISC and OPT), and numerous papers that very carefully examined the structure of optimizations to squeeze out empirical improvements.

If I had to mount a critique of this large body of work, it would only be that a lot of this appears to be black magic (a criticism that seems even more true for deep learning). There are number of tricks in the optimization toolbox, and you can always try any number of them, but it's not always clear which methods work best, and why.

Quick hits on papers that interested me:

Bregman divergencesA paper by Iyer and Bilmes show how to extend Bregman divergences to the case when the generating function $\phi$ is not convex, but sub modular. This is a little tricky, because in such cases there's no unique gradient, and so technically the Bregman divergence is a function of both the function and its subdifferential. One interesting variant is the Lovasz Bregman divergence, which comes from using the Lovasz extension of a submodular function as the convex generator for a Bregman divergence. An application of these constructions comes in ranking, and it's interesting that things like the Hamming distance can be constructed.

On a separate note, Satoru Fujishige gave a wonderful invited lecture at DISC on submodular functions. I particularly liked the geometric interpretation of submodularity via its polymatroid and how a matroid can be viewed as the object you get when the submodular function is monotone. Of course this has been known for over 40 years, but I "got it" for the first time. If his book is anything like his talk, I really want to get it.

Kernel distances: Ever since I discovered  the kernel distance (as in, found it, not invented it) I've been fascinated by how it behaves more or less like the earth mover distance, but is so much easier to compute. Scott Aaronson (at his NIPS invited talk) made this joke about how nature loves $\ell_2$. The  kernel distance is "essentially" the $\ell_2$ variant of EMD (which makes so many things easier).

There's been a series of papers by Sriperumbudur et al. on this topic, and in a series of works they have shown that (a) the kernel distance captures the notion of "distance covariance" that has become popular in statistics as a way of testing independence of distributions. (b) as an estimator of distance between distributions, the kernel distance has more efficient estimators than (say) the EMD because its estimator can be computed in closed form instead of needing an algorithm that solves a transportation problem and (c ) the kernel that optimizes the efficient of the two-sample estimator can also be determined (the NIPS paper).

Metrics for PSD: In my continuing quest to find strange geometries to design algorithms for, I had this paper some time ago on doing geometry on the Riemannian manifold of positive definite matrices. It's a tricky business, and life gets sad quickly when you can't describe the convex hull of three points finitely.

Suvrit Sra proposed a new metric for the space of $n \times n$ positive definite matrices. It's what you get if you apply the Jensen-Shannon construction to the Burg entropy on matrices. It has the property of being more easily computable than the standard Riemannian metric, and also shares with that metric a nice closed form for the midpoint of two matrices. What remains open is the kind of geometry this metric induces: a weird property is that the associated kernel exp(-\gamma d^2) is p.d iff $\gamma$ is half integral below $n-1$, and any real above it.

Distance metric learning and MKL.

Our presentation at OPT was about multiple kernel learning, which is closely related to the general problem of distance metric learning (especially when the distance metric is defined by a kernel). There were a good number of  distance metric learning at NIPS - different approaches that either did local learning of "multi-metrics" or learned a metric based on k-NN points, or even Hamming distance metric learning.

Distributed learning.

Distributed learning (in a communication-restricted environment) seems to be a growing concern, which of course makes me happy. There were a few papers on different kinds of communication-limited learning, including one on doing this for probabilistic PCA (which didn't have formal bounds on the amount of communication though) and one on distributed "non-stochastic" experts.



Wednesday, November 07, 2012

Data, Dimensions and Geometry oh my !

The following is a summary of a talk I gave to undergraduates interested in going on to graduate school. It's targeted at the layperson, and tries to convey a sense of the interplay between data mining and geometry. I gave this talk partly because I realized that things that we take utterly for granted in the rarified world of high dimensional data mining are completely foreign to people who don't think about this for a living. 

tl;dr: High dimensional geometry (and non standard geometry) is the unifying language that binds data mining problems together.

We're trying to find patterns all over the place, with very different kinds of data. A search engine is trying to find patterns in web pages that might indicate that they have similar content. Brain researchers are throwing MRIs of patients with diseases into an algorithm that attempts to infer patterns in brain scans that might yield clues about pathology and diagnosis. Genome-wide analysis takes what are essentially long strings of letters and tries to explain why certain populations might be susceptible to certain diseases.

Pandora, Shazam and other music sites analyze fragments of music to find related artists, or even just match a tune. While an infinite gangnam-style video might be a frivolous use of data mining on video streams, video scans are being analyzed by robots trying to drive unmanned cars or even just play soccer. Social networks are all the rage: Facebook, Twitter and Google are desperate to understand your circle of friends in order to sell things to you more effectively.

How are we able to find patterns, clusters and trends in such different kinds of data ? The key step in all of this is the idea of features. The trick is to describe each object we are studying as a sequence (or set) of features. For example, a feature set for a document could be the number of times each particular word appears. The feature set for an image could be the count of different colors. For a tune, it might be a collection of features identified by hiring artists to list out which of more than 700 characteristics a piece of music has.

And so on, and so on. What this allows us to do is represent each object as a set of features, whether it's a web page, a brain scan, a video, or even a social graph. Once we do that, we no longer have to worry about the original data (well, kind of !), and different types of data are all on an equal footing.


But what now ? Here's where a cool trick that dates back to the 1600s comes in.

I doubt that René Descartes ever heard of the term "data mining". But in a strange way, he's the father of the field. One of Descartes' claim to fame was establishing a link between geometry and algebra. He said that if we wanted to represent points, lines and other shapes, we could do so not abstractly as Euclid and other classical geometers did, but using algebra. A "point" in the plane could be described by two coordinates (x,y), and a line in the plane could be described by the equation y = mx + c.

This is a very simple idea - children learn it in middle school. And yet like most simple ideas, it was revolutionary, and completely changed our perspective on geometry. The unification of algebra and geometry is so powerful and so natural, we use it almost unconsciously today.

But what does Descartes have to do with data mining ?

Remember those features I was telling you about ? That every object can be represented by a sequence of numbers, each number describing some aspect of the object.

Let's do the opposite of what Descartes proposed ! In other words, let's pretend that the objects are actually points in a space. Now this space is not the plane (unless we had only two features). It's a high dimensional space, with one feature for each dimension.

This process is entirely artificial. There is no real reason why we must think of objects as points in a high dimensional space. But here's the cool thing that happens. We can now express all basic mining primitives as geometric concepts, by translating the language of the primitive to this high dimensional space.

A clustering of data becomes a grouping of points so that "nearby" points are grouped together. A classification of reviews into positive and negative statements becomes a way to separate "positive" points and "negative" points by a line. Finding trends in data becomes the problem of fitting a straight line to a collection of points.

It is hard to emphasize how utterly bizarre this is. There is no underlying "geometry" in the problem we're solving. We're essentially creating a castle out of thin air in order to understand the problem. And yet it works,  and is the bedrock of how we think about data mining.

 But wait ! there's more. What exactly does it mean to say that points are "nearby" or they are "separated" ? To answer this question, it's not enough to view objects as points in a high dimensional space. You have to give this space a shape - a geometry (and also a topology, but I'll skip that for now).

For example, if I have two feature lists, how do I measure the distance between them ? If they were points on a map, I could do the usual straight line distance. But does that really capture how far apart they are ? After all, if you're in downtown Salt Lake with its grids, a "crow flies" distance doesn't help estimate how far things are. If you're on the surface of the earth, you can't really tunnel through the center to get from one point to another.

So we have to create the shape of the space by defining how far apart two points are. And this is one of the trickiest parts of data mining. Either we have to use some domain knowledge to estimate which features are significant and control more of the distance between objects, or we have to try and learn what seems like the right distance between objects based on user estimates or other information.

The good thing is that we have a huge collection of shapes to play with, and different shapes tend to work well for different classes of objects. Some are easy to interpret, others are easy to compute, and so on. So a good part of the "art" in data mining is understanding the data and estimating the right geometry for it, so that our tasks (expressed geometrically) give us meaningful answers.

Or as Dorothy famously said to her dog, "Toto, I don't think we're in the plane any more!"

Tuesday, November 06, 2012

On the elections, Nate Silver, and lessons for data mining

One interesting side story from this election has been the intense focus on Nate Silver's election predictions, and the matter of aggregate polling statistics. While there's certainly a partisan element to much of the discussion, there's also a larger sense of unease about what the predictions are actually saying.

There are lessons here for the greater goal of using data mining for prediction and modelling, and this will get more and more important the more predictive analytics intrude on our lives.

People don't quite understand probability, even though they understand odds. 
There's a lot of confusion about what it means for a one-time event to have a probability associated with it, even though people are quite comfortable with the idea of odds in (for example) sports. This to me reflects a deeper confusion between the frequentist and Bayesian view of probability: namely, is probability the long-run average of the frequency of an event, or an a priori expression of uncertainty in the likelihood of an event.

I'm not going to say anything here that hasn't been said for more than a century, but from the point of view of interpreting the results of mining, this distinction will be important.

Aggregation is a GOOD THING. 
It is entirely likely that all the poll aggregators will have egg on their face tomorrow when the results come in. But it does seem that aggregation helps smooth out the natural irregularities and biases in individual polls. This is a theme that comes up time and again in data mining, and especially in clustering: rather than trusting a single algorithm, you should try to run algorithms that return diverse answers and aggregate them in some fashion.

It's not enough to predict: you must also explain. 
Among the good reasons to feel uneasy about the aggregate predictions is that they don't give much insight into why things are going the way they are. To be fair, Nate Silver laid out some economic markers back in the spring, and tied them to possible outcomes via regression. While this provides some  insight, it's still pattern matching as opposed to a mechanism.

More generally, it is very difficult to convince people that an answer is pertinent or "correct" unless there's some way to explain the answer without listing a sequence of coefficients or writing down a collection of centers. Much of the popularity of decisions trees comes from the ease of explanation it seems to provide.

Conclusion. 

Most of the controversy around data mining in the public sphere has centered around privacy issues. Indeed, privacy issues become a concern precisely because people worry that the mining algorithms are too accurate. But in fact we don't really understand the behavior of many algorithms that we use, and that is dangerous regardless of privacy concerns. The angst over the methods used to predict this election are an illustration of what will happen when the predictions we make start to matter, and matter to many people.

Monday, October 08, 2012

On why I'm excited about "big data"

I was in Aarhus recently for a MADALGO workshop on large-scale parallel and distributed models, where I did a sequence of lectures on GPU algorithms. I was briefly interviewed by a university reporter for an article, and did a little video on why I think big data/big iron problems are interesting.

At the risk of embarrassing myself even more than I usually do, here's the video. Note that this was recorded at a time of great crisis across the globe, when all hair styling products had mysteriously disappeared for a few days.


Wednesday, October 03, 2012

We're hiring FIVE (count 'em, FIVE) faculty this year.

We had an incredible hiring season two years ago, making seven offers and hiring seven new faculty. And now we're doing it again !

Our department is looking to hire five new faculty (at least four are at the assistant professor level). I'm particularly excited that we're hiring two people in the general area of big data (following up on our data mining and database hires from two years ago).

One slot is in what I'll call "big data meets big performance": I'll have more to say about this shortly, but the challenges of large data analysis are not just about managing the data, but about managing large numbers of machines to crunch this data (MapReduce is perhaps the most well known example of this). We're looking for people who can "speak extreme data and extreme processors" fluently - these could be on the data/systems management side, or on the analysis side, or the modelling side.

Utah has a strong presence in high performance computing (the Supercomputing confererence is happening in Salt Lake, and Mary Hall is the general chair), and we're one of the few places that has a good understanding of both sides of the large data story (i.e machines and bits).

The second slot is in machine learning, with ties to language. Text (and language) provide one of the best large data sources for scalable machine learning, and we're looking for people interested in the challenges of doing ML at scale, especially when dealing with NLP. If you're that person, you'll be coming into a department that has the entire range of data analysis folks from algorithms to analysis to systems to NLP (with many other faculty that are heavy users of ML technology).

Our plan, once we fill these slots, is to make Utah a one-stop shop for large-scale data analysis and visualization - in addition to the above slots, we're also looking to hire in information visualization to complement our strong viz presence.

In addition to the above slots, we are also hiring in computer security and HCI/user interfaces. While I'm equally excited about these positions, I know much less about the areas :). I will point out that we have a big systems group that covers many aspects of security (language security, verification, and network security) already. We've also had strong demand from students and industry for research in HCI, which will complement our info-viz efforts (and even our data mining work)

For more details on how to apply, see our ad. We'll start reviewing applications after Dec 1. Feel free to email me if you have questions about the slots (but don't send me your application material - send them in directly)

Disclaimer: the above views are my own personal views, and don't represent the views of the department or the hiring subcommittees.


Monday, October 01, 2012

Things a _____ should do at least once...

Without quite realizing it, I managed to create a (tiny) meme in the rarefied circles of TCS/math with my post "Things a TCSer should have done at least once".

Firstly, you should check out the G+ post for even more scathing commentary on my original post.

Next, you should see the followups (in case you haven't already):
Let me know if there are more - I'm still waiting for a quantum computing version.(thanks, Pontiff!)

Wednesday, September 19, 2012

FOCS 2012 Student Support

I've been asked by Rafail Ostrovsky and David Shmoys to highlight that student support for travelling to FOCS 2012 is still available, and the deadline is this Friday.

They note that they can fund 24-25 people, so people in the US should apply EVEN IF they don't have a paper.

The NSF has been very generous of late in supporting student travel to conferences, and the best way to encourage them to continue is to show that we use the money they give us :). My students have taken advantage of these opportunities in the past, and it's been a great experience for them.

So if you're a student reading this (and I know there are many of you!) or if you're an advisor who may not have the funding to send your student(s) to FOCS, do take advantage of this chance.


Sunday, September 09, 2012

Things a TCSer should have done at least once

There are many discussions about what things a TCS grad student should know. I thought it might be useful instead to list out some things a theoretician should have done at some point early in their career.

Rules of the game:
  • You lose points if you do it as part of a class. If you decide to round an LP in a class on approximations, that's something you're being taught to do. But if you do it as part of solving a problem, then you've also done the difficult job of recognizing that an LP needs to be rounded. That latter skill demonstrates some mastery.
  • The goal is not complete coverage of all of TCS; rather, it's coverage of techniques that will come up fairly often, no matter what kinds of problems you look at.
  • This list is necessarily algorithms-biased. I doubt you'll need many of these if you're doing (say) structural complexity. 
  • A similar caveat applies to classical vs quantum computing. While it seems more and more important even for classical computations that one knows a little quantumness, I don't know enough about quantum algorithm design to add the right elements. Comments ? 
With those caveats out of the way, here's my list:
  • Show that a problem is NP-hard (for bonus points, from some flavor of SAT via gadgets)
  • Show a hardness of approximation result (bonus points for going straight from a PCP)
  • Prove a lower bound for a randomized algorithm
  • Prove a lower bound via communication complexity or even information theory
  • Round an LP (bonus points for not just doing the obvious rounding)
  • Show an integrality gap for an LP
  • Design a primal-dual algorithm
  • Use projective duality to solve a problem (bonus points for using convex duality)
  • Apply a Chernoff bound (bonus for using negative dependence, Janson's inequality, and an extra bonus for Talagrand's inequality)
  • Design an FPT algorithm (maybe using treewidth, bonus for using bidimensionality or kernelization)
  • Design a nontrivial exponential time algorithm (i.e an algorithm that doesn't just enumerate, but does something clever)
  • Do an amortized analysis (for extra bonus get a log*n bound)
  • use an advanced data structure - something beyond van emde Boas trees (extra bonus for exploiting word-size)
  • invoke VC dimension to solve a problem 
What else ? 

Tuesday, August 28, 2012

FOCS 2012 Call for Posters

Four times makes it an institution ? 

After the poster events at STOC 2011, STOC 2012 and SoCG 2012, FOCS 2012 makes it four times. Tim Roughgarden notes that the deadline for submitting posters to FOCS 2012 is in two weeks (Sep 11). So if you have interesting work you'd like to air out in the theory comunity, and a deep longing to visit New Brunswick, New Jersey, then submit your entry.

By this time, there's a good chance that you've already experienced the posters event either as presenter or audience at one of these venues. If not, I'll reiterate what I've said before: presenting a poster is a great way to disseminate your work with more personalized interaction than you often get with a paper presentation.

Wednesday, July 25, 2012

Data Streaming in Dortmund: Day II

Andrew McGregor and I are tag-blogging the workshop. Read his post on day 1 here.

Day II could roughly be summarized as:

Sitting by the stream recovering from a sparse attack of acronyms. 

There were a number of talks on sparse recovery, which includes compressed sensing, and asks the question: How best can we reconstruct a signal from linear probes if we know the signal is sparse.

Eric Price led things off with a discussion of his recent breakthroughs on sparse FFTs. While the regular DFT takes $n \log n$ time, it's reasonable to ask if we can do better if we know the signal has only $k$ nonzero Fourier coefficients. He talked about a sequence of papers that do this "almost optimally", in that they improve the $n\log n$ running time as long as the sparsity parameter $k = o(n)$.

Anna Gilbert provided an interesting counterpoint to this line of work. She argued that for real analog signals the process of sensing and recording them, even if the original signal was extremely sparse, can lead to discrete signals that have $\Theta(n)$ nonzero Fourier coefficients, and this is in some way intrinsic to the sensing process. This was part of an attempt to explain why a long line of sublinear methods (including Eric's work) don't do very well on real signals. This line of attack is called 'off the grid" reconstruction because you're off the (discrete) grid of a digital signal.

In sparse recovery, there are two parameters of interest: the number of measurements you make of the signal, and the time for reconstruction. Obviously, you'd like to get both to be as small as possible, and information-theoretic arguments show that you have to spend at least $k \log(n/k)$ measurements. Martin Strauss focused on speeding up the reconstruction time while maintaining measurement-optimality, in a setting known as the "for-all" formulation, where the adversary is allowed to pick a signal after seeing the probe matrix (there's a related "foreach" model that is subtlely different, and I'm still a little confused about the two).

On a slightly different note (but not that different as it turns out), Atri Rudra talked about a fun problem: Given the "shadows" of a set of 3D points along three orthogonal planes, can you find the minimum number of points that could yield the specific shadows ? If all projections have complexity $n$, it's well known that the right bound is $n^{3/2}$. While this bound was known, it wasn't constructive, and part of Atri's work was providing an actual construction. There are all kinds of interesting connections between this problem, join queries, triangle counting in graphs, and even the scary-hairy 3SUM, but that's a topic for another time.

Other talks in the afternoon: Alex Andoni talked about finding the eigenvalues of a matrix efficiently on streams (specifically finding the "heavy" eigenvalues). I learnt about the Cauchy interlacing theorem from his talk - it's a neat result about how the eigenvalues of submatrices behave.  Ely Porat talked about the problem of preventing evil entities Hollywood from poisoning a BitTorrent stream of packets, and presented ideas involving homomorphic signatures for packets via error correcting codes.

Joshua Brody returned to the basic streaming setup. Most stream algorithms that estimate some quantity introduce two-sided error (the true estimate could be above or below the reported value). He asked whether this was necessary to stay with sublinear bounds: it turns out for some problems, limiting yourself to 1-sided error can worsen the space complexity needed to solve the problem (note that for problems like the Count-min sketch, the estimate is one-sided by design, and so there's no deterioriation)

Coming up next: Andrew's summary of day three, and a report on our heroic tree-swinging adventures.

Tuesday, July 17, 2012

Permutation tests, graph non-isomorphism and kernels

I was reading Larry Wasserman's post on the modern two-sample test and a few thoughts came to mind.

1. I've known about permutation tests for a long time, but it only just occurred to me that the permutation test is exactly what you do in the AM protocol for Graph non-isomorphism. The principle is that if the two distributions are identical, your test statistic should not be able to tell them apart, and the way to test this is by randomly changing labels. Replace distributions by graphs and test by Merlin, and it looks the same. Is this a trivially trite observation, or are there other examples of protocols that use standard statistical tests in disguise ? (hmm I see a cstheory question here)

2. The kernel test he mentions, if you look closely, is merely computing the kernel distance between the two distributions. And the energy test he mentions later is doing something akin to earthmover. The kernel distance strikes again !!!

Wednesday, July 11, 2012

Guest Post: Women in Theory Workshop II

This is the second part in Samira Daruki's report from the Women in Theory workshop at Princeton. The first part focused on the technical talks at the workshop, and this part focuses on the non-technical events. 


As an aside, many of the topics discussed would be quite useful to have at a general conference as well: it would be nice to have a similar panel discussion at a STOC/FOCS/SODA.

The opening talk of the workshop was by Joan Girgus (Princeton). She presented statistics about the percentage of women at the undergrad and graduate level in different fields of science and engineering in the last five decades. She mentioned that nearly fifty years ago, those who wanted to combine a career with raising a family were seen as anomalies. Today, combining family and the career is the norm with its own complexities and difficulties.  However, even now women continue to be underrepresented in science and engineering beginning from undergraduate level till the faculty and research positions. Joan presented several possible reasons for this and also suggested approaches that could be taken by universities to improve the  participation of women in academic and research careers.

The other interesting talk on the first day was by Maria Klawe (Harvey Mudd) who argued (and actually convinced us!) that it is the best time ever to be a woman in theory, and discussed opportunities for the future.

On the second day, there was a "work-life balance" panel led by Tal Rabin. All the speakers and organizers of the workshop were gathered to answer the questions by students. This panel was one of the most useful sessions in the workshop, because  we could hear the real experiences and challenges that pioneering female researchers faced in their career and life. 

The panel began by Tal asking speakers to just give one piece of  advice to the audience. Some highlighted points were:
  • " Be aware of young male talkers”
  • “make conscious choices”
  • “Make a right decision and go for it”, 
  • “Go for what really interests you”
  • “The path is difficult, and so you must acquire  the ability to talk about yourself and your work”,
  • “do the work you like and be excited about that”,
  • “Try to be more self promoting”

The floor was then opened for questions. There were different types of questions. Some of the questions were about  family life for female researchers and the right time to have children.

Some speakers believed that having children is an option, rather than a default, and there should be no pressure to have children. While it might seem that everybody by default expects you to raise a family and have children, you don’t need to listen to people and do what they want.  It was also mentioned that there is no "right time" to have children, and that it was a very personal decision. You should just decide for yourself when it's the right time for you.

Valerie King said that some of the conditions at one's  job  can affect decisions regarding one's family. She pointed out that in Canada there are child-friendly policies for women in academia. But she also mentioned that sometimes you have to sacrifice something because of your family life, and  in those cases you should find some other alternative ways to minimize  the impact, like choosing a less taxing problem to work on or...

There were different views on how family life and having children could affect the career life of young female researchers. Some believed that it wasn't a major issue - you could  take off for a few years and wait till you reach a stable point for going back to your job -  but some argued against this, pointing out that research in fields like CS develops very fast,  and coming back after being away from research for a while can be very difficult and destructive for your research career without a  careful and conscious plan. There were also discussions among speakers about how choosing a supportive partner can help young female researchers to deal with these difficulties more easily, (and that actually finding a proper partner is time-consuming and challenging by itself!).

Another question was about the challenges and pressures of graduate study. One of the highlighted issues was about challenges in working with colleagues in academia. It was mentioned that you should be careful with the people around you, and make sure to have some people around you that talk about your contribution and acknowledge you.

Panelists talked about different experiences they had with male colleagues.  some of whom would make sure to acknowledge your contributions explicitly in their presentations, and some who would use your ideas without any acknowledgement. Clearly if you want to be more successful you should avoid being surrounded by this latter group of people. It was mentioned that one of the techniques in dealing with problems that you might face with male colleagues (if you find yourself unable to solve it by yourself) is to go to your manager or boss and push him to help you in that situation.

Another challenge that was highlighted was finding a research problem to work on during graduate study and also for one's research career after that. Many of the speakers agreed  that was one of the biggest challenges in their research work). Other discussed challenges were about choosing the right advisor and changing research problems or advisors during one's PhD.

It was mentioned that usually the most common mistake new students make in doing research is that they decide on some topic, do a wide search on the current and previous work, and then come to the conclusion that  all the problems had already been solved and that there was  nothing new to do. But in fact in most research topics there are always ways to make the area broader and find new directions: this is the creative aspect of research. This is the main distinction between doing research and "re-search"

There were also some discussions about the different aspects of working as a researcher at research labs or at a university as faculty. Lisa Zhang from Lucent mentioned that research labs have good quality of life and encourage a lot of flexibility.  However, there are issues relating to job security versus tenure and there is a trade-off between these two kinds of research positions.

There was  discussion about collaboration between researchers. Valerie King mentioned that one should not be afraid to  contact  people and ask to work with them. In fact, people like it that others come and work with them on interesting research problems. She related experiences where she got stuck in solving some problem and  found someone more expert in that area to collaborate with. One such collaboration was with two other female researchers resulting in what she called the only “Three Women SODA paper”.

At the end of the panel, Shubhangi Saraf (IAS, Rutgers) talked about her experiences during graduate study. She said that it was  very important for one's research career to do multiple internships,  travel and visit different schools and research labs,  find good people to work with and build  a good network and connections. Shubhangi was one of the participants  that first attended the Women In Theory workshop as a student four years ago and is now, at the third workshop, one of the organizers. She mentioned that this workshop was one of the ways that she was able to meet new people and make connection to do internships.

At the end of the second day there was a banquet in  Palmer House at which we were able to meet other professors from Princeton University and talk with them.

To conclude this post, I think this workshop was successful in its  main goal of bringing together theory women from different  departments and fostering a sense of kinship and camaraderie among  them.  It was really a good feeling to talk about challenges you  faced or times when you got stuck during your research and realize  that other students and researchers have had the same experience! You  feel a lot more powerful, because now when you're stuck with a  problem and don’t know what to do, you know there are some other  people with a similar situations that you can just shoot an email to  and say: “Hey! I'm stuck and need to talk with you! ”.




Tuesday, July 03, 2012

Guest Post: Women in Theory Workshop I

My student Samira Daruki recently attended the Women In Theory workshop at Princeton. This is the first part of her (lightly edited) report from the workshop, focusing on the technical talks. For more on the workshop (from a presenter's perspective) see Glencora Borradaile's post.

This past week, I was at the third workshop on “Women in Theory (WIT)” workshop held in Princeton University with about 46 female students and 12 speakers. Most of the students came from different schools in USA:  University of Maryland, College Park, Brown, Oregon State, UMass, MIT, Stony Brook, Berkeley, Princeton, Columbia, Boston U, Washington, Stevens Institute of Technology, , UCSD, Northwestern, Harvard, UW-Madison, UCLA, CUNY, GMU, Harvey Mudd and Uta . There were also some international students from India (IIT), ETH Zurich, Germany, Canada (Toronto), City University Hong Kong, Moscow Engineering Institute and Israil (Technion, Weizmann, Hebrew).

Participants were from a wide range of research fields like Cryptography and Privacy, Graph theory and algorithms, Computational Geometry, Computational Biology, Complexity, Game theory and mechanism design and Social Networks. But the interesting thing was that you could find more students working on cryptography than other fields.

Tal Rabin, Shubhangi Saraf and Lisa Zhang were organizers of the workshop and there were many junior and senior women in the field as speakers: Gagan Aggarwal (Google), Glencora Borradaile (Oregon State), Joan Girgus(Princeton), Nicole Immorlica (Northwestern University), Valerie King (University of Victorial),Maria Klawe (Harvey Mudd), Vijaya Ramachandran (UT Austin), Jennifer Rexford (Princeton),Dana Ron (Tel Aviv University), Ronitt Rubinfeld (MIT and Tel Aviv University), Shubhangi Saraf(IAS), Rebecca Wright (Rutgers and DIMACS).

Beside the non-technical parts of the workshop, there were many interesting technical talks by different researchers on different topics.

Among them, one of the very wonderful and interesting talks was presented by Dana Ron (Tel-Aviv University, Columbia) on sublinear-time algorithms. When we refer to “efficient algorithms” we usually mean “polynomial-time algorithm” and at the best we can try to lower the degree of the polynomial as much as possible, leading to “linear time algorithms“. However when we are dealing with massive data sets, even linear time might be infeasible. In these cases we look  to design even more efficient algorithms, namely “sub-linear time algorithms”. These algorithms do not even go through the whole input data, so we just expect to output approximately-good answers. They are also probabilistic, and so are allowed to have  a failure probability (i.e are Monte Carlo).

One such type of algorithms are property testing algorithms, in which one has to decide with high success probability whether an input (like a graph), has a certain property (for example is bipartite), or is relatively far from having that property (for example in this case a relatively large fraction of its edges should be removed so that the graph become bipartite). In this talk several properties and algorithms were discussed. Other types of sublinear algorithms discussed in her talk were algorithms for estimating various graph parameters, like the number of connected components, the size of a minimum vertex cover, and so on. I think Dana wase able to give a flavor of analysis techniques for sublinear-time algorithms with great clarity,  and it was definitely one of the best talks in this workshop.

Another great talk was given by Ronitt Rubinfeld (Tel Aviv University and MIT), on estimating properties of distributions. In her talk, she surveyed a body of work regarding the complexity of testing and estimating various parameters of distributions over large domains, when given access to only few samples from the distributions. Such properties include testing if two distributions have small statistical distance, testing if the marginal distributions induced by a joint distribution are independent, testing if a distribution is monotone, and approximating the entropy of the distribution. In this kind of problems, the classical techniques such as the Chi-squared test or the use of Chernoff bounds have sample complexities that are at least linear in the size of the support of the underlying discrete probability distributions. However, algorithms whose sample complexity is “sublinear” in the size of the support were shown for all of the above problems.

Nicole Immorlica (Northwestern University) also gave an interesting talk about cooperation in social networks, presented as a game among students. In this talk, she explored the impact of networks on behavior through a study of the emergence of cooperation in the dynamic, anonymous social networks that occur in online communities. In these communities, cooperation (for example in business deal) results in mutual benefits, whereas cheating results in a high short-term gain.

Some of the other interesting talks was presented by Gagan Aggarwal (Google) on mechanism design for online advertising, Glencora Borradaile (Oregon State) on designing algorithms for planar graphs (and it was the only talk on the blackboard without any slides!), Vijaya Ramachandran (U of Texas, Austin) on Parallel and Multicore Computing Theory and Valerie King (University of Victoria) on Dynamic Graph Algorithms for maintaining connectivity. In her talk, Valerie discussed this problem that had been studied for over 30 years, and she reviewed the area and described a new direction for achieving polylogarithmic worst case performance. At the end of her talk she also mentioned Mihai Patrascu as a pathbreaking researcher who was taken too soon from us.

Unfortunately there were no talks on topics like Computational Geometry, but we had a few students working on CG related topics and they presented their research problems in the rump session. My presentation at the rump session was on building core sets for uncertain data

Monday, July 02, 2012

Guest Post: Report from the Snowbird Math Research Communities program

My student Amirali Abdullah attended a recent MRC event on discrete and computational geometry at Snowbird, organized by Satyen Devadoss, Vida Dujmovic, Joe O'Rourke, and Yusu Wang. This is his (lightly edited) report from the event. Note that the organizers requested that the problems discussed not be widely disseminated, so no specific technical questions will be discussed here.

Recently, there was an MRC workshop for Discrete and Combinatorial Geometry held right in my hometown, at Snowbird in Utah. Suresh invited me to share my experience of the event.

Working with trained mathematicians illustrated to me just how many tools are out there
that I wasn't familiar with beyond a "oh I heard that word somewhere" head-nod. Ergodic theory, configuration spaces of cohomologies, measure theoretic ideas, algebraic geometry techniques and variational calculus tools and more. Now, there's always a strong need for self-study and picking up techniques independently in a Ph.D but I can't help but feel that most theoretical CS students would benefit from required courses and curricula more tailored to supplementing our math backgrounds.

But more than being exposed to the mathematical depth of knowledge out there, I loved the intoxicating energy of a room filled with curious mathematicians. One group would be eagerly folding origami cubes, another playing with colored bricks and crayons on a coloring problem,
a trifecta of mathematicians lying passed out by a whiteboard decoated with scribbled figures and gazing dreamily into the distance .I finally understand the cliche of mathematicians having the enthusiasm and boundless energy of kindergarteners,playing with ideas and constructs-- no disinterested 'suits' here!

More so, it was good for me to associate real faces and people with many of the papers and fields I had read of.  One grad student speaks of how he had been burned out for several months after his last paper,  another stares blankly at his beer at 10:30 pm after a twelve hour session discussing a tricky problem, another discuss the merits of wine vs scotch, one raves about an advisor who recommend his students go out hiking on a beautiful day instead of staying in their labs (Suresh, hint, hint!!), another of handling the two-body problem in an already restricted job market. And of course, the common theme of how wonderful and at times frustrating math is.

There were many light-hearted moments too, to brighten a few faces. For example after mathematicians A and B had spent all day on a problem only to realize their approach was all wrong-
Mathematician A: "I guess doing math is all about cursing sometimes."
Mathematician B: "$%#@! F-bomb. #@%#@ F-bomb. Censored here for family audiences". 
Or another light-hearted conversation between Organizer A and a student who had invited him to speak at a conference -
"So, am I the entertainment for the high school students
then?" 
Student-"Yes, we have your talk on geometry the evening after the sword-swallower performs."

Let me give a shout out to the wonderful facilities provided us by the organizers, especially the amazing food.We were fed three square meals a day, plus tea twice a day and another informal snacks and beer session after nine pm. Most of the meals were supplemented by confectionaries including red velvet cake or pastries, the meals were generally 3-4 courses (including mushroom and cheese garlic pizza, salmon fillet, chicken teriyaki, beef broccoli and more) and there were several rounds of free wine and scotch during the week. I may or may not have been slightly tipsy on a few occasions, and most of us put on at least a couple of pounds in a gluttonous week of math and fine cuisine. Several of us also went on a hike up the nearby trails, or enjoyed the swimming pools. I'm from Utah, of course, so I've been spoiled to always have the outdoors easily available.

There was a lovely talk given by the organizers on the job hunt process and pointers on finding the best fit institution. We've all heard the horror tales of how tight the academic job market is, but it's
disconcerting nonetheless to hear firsthand of several hundred applicants for a single faculty position, or of how many of the workshop participants had sent in over a 100 applications to various universities for their first job. Despite this, the independence of a research university position is still THE holy grail for those of a more mathematical bent - most of those attending  seemed uninterested in the compromises involved in a teaching intensive or industry position, and I can certainly understand that sentiment.

Finally a shout out for my favorite screening of the session- Diana Davis showed us her entry for "Dance your Ph.D thesis", which drew much approval from an audience worn out by the excessive number of dry beamer and powerpoint presentations we've seen. .

Disqus for The Geomblog