## Sunday, May 15, 2011

### Weekend cstheory review.

I haven't done a cstheory review in a while. If you're not there surfing around (and possibly getting addicted!), then you're missing out on being part of a group of more than 4000 people who've asked and answered more than 1800 questions thus far.

Some high-profile interesting questions in the recent past:

### NSF Workshop: 8F (Algorithms in the field)

I'm off to DIMACS for the NSF workshop 8F ("Algorithms in the field", AITF, get it?):
Computer Science (CS) is rapidly evolving. We need to understand the evolving CS systems and their algorithmic foundations. This needs close collaboration between the researchers in algorithmic foundations and expert researchers in various areas. The premise is that researchers from different communities should work jointly “in the field”, constantly informing each other, inventing in their respective areas, and forging systems that are simultaneously validated by both algorithmic and systems communities (and without needing to make independent research threads meet ex post).

The purpose of this workshop is to provide a working vision for examples of Algorithms in the Field, near term goals and directions for research in the future. The outcome of this workshop will be a report contributed by the participants that will inform the academic community and future funding programs. Scientifically, we hope bringing people with widely varied research interest together with algorithms researchers will lead to happy collisions, and unexpected directions of research.
There's a great mix of researchers from within theory and from more applied areas, and of course the topic is near and dear. Along with a long list of talks, the idea is to have a number of breakout sessions on different topics in this area. One that I'm involved with is 'Core Algorithms', whose mandate (roughly speaking) is to figure out how basic algorithms research might evolve over the next many years, in relation to events in the larger arena of computing research and technology.

Of course it's a mug's game to make predictions about the future of technology, or worse still, about the future of research. Unlike many other subfields of CS, theoryCS doesn't really lend itself to long-term prognostications or discussions of the 'this is what we should work on' variety.

But from a funding perspective, if we can identify interesting directions to explore even if we look completely stupid in retrospect, this could be a fun exercise. One framing device that we might try to use is:
View core algorithmic developments in two directions. In one direction, there are many concepts from the world of theoryCS that slowly make their way into the more applied realms. In this context, one might ask about which paradigms are either ripe for "tech tranfer", are likely to do so within a few years, or need more 'baking' before they're ready, but have potential.

The reverse direction is the idea that core algorithmic questions and models are motivated by applications, whether they be models of computation like external memory/streaming/multicore/mapreduce, or conceptual developments like smoothed analysis, complexity classes for iterative algorithms, numerical algorithms and precise computations, and so on.

In this context, interesting points for discussion would be: what kinds of paradigms do we see approaching from "applied land" and how might they influence the kinds of core algorithmic questions we ponder.
If you're at the workshop, you can make your opinions known. But even if you're not, you can voice them right here ! I'll incorporate comments posted here into the discussions (and sign your name if you'd like credit).

## Sunday, May 01, 2011

### SDM 2011: Other thoughts

My student Parasaran Raman did a two part review of SDM 2011 (here, and here), and I thought I'd add my own reflections.

There was an excellent talk on the second day by Haesun Park from Georgia Tech on non-negative matrix factorization methods (NMF). This problem has become all the rage in ML circles, and is defined as follows.
Given a matrix $A$ and parameter $k$, find rank $k$ matrices $U$ and $V$ such that $\|A - UV\|$ is minimized, and $U$ and $V$ contain only nonnegative entries.
The main difference between this problem and the SVD is the non-negativity requirement, and not surprisingly, it changes the complexity quite a bit - you can no longer get an optimal solution, for example.

There appears to be relatively little theory-algorithms work on this problem (there's some mathematical work, including the famous Perron-Frobenius theorem), and her talk presented a good overview of the various heuristic strategies people have used to try and solve the problem.

One of the reasons this formulation is interesting is because for many problems, you'd like an "explanation" of the data that doesn't have negative coefficients (which is the bane of PCA-like methods). She also says that for reasons unknown, the matrices produced by this formulation tend to be quite useful "in practice" at factoring out the different interactions present in data. In fact one of her big open questions is whether there's a more rigorous way of explaining this.

The talk was very enjoyable, and I sat there thinking that it would be perfect as an ALENEX (or even SODA) invited talk.

There was also a panel discussion on the future of data mining. The panelists comprised two industry folks, two academics, and one researcher from a national lab, with questions being thrown at them by Chandrika Kamath from LLNL. My twitter stream gave a blow-by-blow, so I won't rehash it here. I was intrigued (and a little disappointed) when I realized that almost all the answers centered around the needs of business.

In one respect this is not surprising: the primary reason why data mining is so hot right now is because of the vast opportunities for data mining to sell ads, products, or even just model consumer behavior. But it makes the field a little more shallow as a consequence: inevitably, industry needs solutions, not deep problems, and a relentless focus on solutions doesn't help facilitate deeper analysis of the problems. Maybe that's the nature of data mining, and I'm expecting too much to ask it to be more profound. But I wouldn't mind a more ML-like sense of rigor in the formulation of computational questions.

Overall, I quite enjoyed the conference. I've  been to KDD, and have been on PCs for both KDD and ICDM (the third major data mining conference). To me, SDM has the feel of a SODA/SoCG like conference - a smaller, more academic crowd, more mathematics/statistics in the talks, and less of a big industrial presence. I can definitely see myself publishing there and going back again.