## Tuesday, February 23, 2010

### Guest Post: Update from the CRA Career Mentoring Workshop, Day II

(ed note: Jeff Phillips is at the CRA Career Mentoring Workshop. His Day 1 dispatch is here)

It is day two at the CRA Career Mentoring Workshop.

Today was all about funding, with speakers from NIH (Terry Yoo), DARPA (Peter Lee), Laboratory for Telecommunications Science (Mark Segal), and NSF (Jan Cuny). Jeanette Wing, the assistant director at NSF CISE also made an appearance at reception yesterday.

(ed. note: I just heard that Jeannette Wing is leaving CISE in July. This is sad news - she was a strong and dynamic presence at CISE)

NIH advertised having a lot of money (about $30 billion, compared to$7 billion in NSF). The NIH has many sub-institutes with many different topics, but all applications are funneled through grants.gov. Terry Yoo was very enthusiastic about us applying for a piece of his large pie. It seems a bit tricky, however, to fit a pure computer science project into one of these institutes, specific health-related applications are enough.

We all (CI Fellows) thanked Peter Lee who helped spearhead the CI Fellows program. He recently joined DARPA to head the Transformational Convergence Technology Office (TCTO or "tic-toe"), a new program that will oversee many funded computer science programs. See the DARPA_News twitter feed for information on DARPA solicitations. Among other goals of this office, is to eliminate harsh "go or no go" conditions associated with DARPA grants.
For young researchers, look for CSG or YSA programs, similar in some ways to NSF CAREER awards.

The Laboratory of Telecommunications Science is part of NSA. They hire many many Ph.D.s for advanced computer science research. He could not tell us specifics about what they do, but compared it to an industrial research labs (e.g. AT&T, Yahoo Research, etc.). Movement between research parts and non-research parts is more fluid and is pretty hands on. Even the theoretical computer scientists and mathematicians they hire often build systems to implement their work.

To get funding through them, they generally have close and specific collaborations with faculty. The best way to start a relationship is sending a student on a summer internship (maybe even an undergrad) or via a sabbatical.

Jan Cuny from NSF decided she did not need to convince us that we should apply to NSF; rather, she just assumed we would and gave a howto on applying for NSF grants. Most important tip: **talk to program officers!** (before you submit). Otherwise, it is hard to give specific summaries from her talk (the slides will eventually be online--definitely look for them). The presentation nicely demystified some of the reviewing process; such as how grants are reviewed and why she may choose a certain proposal (for diversity) that scored slightly lower than another unfunded proposal. The other key advice: follow the guidelines precisely and carefully, make it easy for reviewers, and focus the content section on proposed work, not existing work.

A parting thought. It has been great to see many friends who are recent faculty or postdocs, in areas who I might not meet in my normal conferences. But it was a bit odd to have such a large fraction of my competition for jobs in the next year or two in the same room. The funding agencies were definitely here advertising how to get their funding, but if we did not realize that this was important, we would probably not have much luck getting jobs. If you were a department looking to hire to someone, perhaps it would have made sense to come here to recruit postdocs to apply ? Although I guess that is a bit optimistic, as it is a hirer's market.

Is there some way that having many people looking for jobs all in one place can facilitate the hiring process, or has this been out-dated with the electronic age? I would argue that personal interaction is underrated, and would help universities figure out not just who has a great resume on paper, but is also great to personally interact with.

## Monday, February 22, 2010

### Guest Post: Update from the CRA Career Mentoring Workshop

(ed note: Jeff Phillips is at the CRA Career Mentoring Workshop today and tomorrow, and filed this dispatch from Day 1.)

I am reporting from the CRA Career Mentoring Workshop. So far it has been excellent.

Frankly, I would not have even thought of coming (I did not even know about it), but it was "highly recommended" for all CI Fellows. But, having been here the first day, I would now recommend it to postdocs, early faculty, and senior graduate students set on a career in academics. Someone could obtain all of the information presented here by asking the right people around your department or field, but this workshop has really stressed what are the right questions to ask, and whom to ask.

The topics today were "Planning Your Research Career," "Career Networking," "Teaching," "Managing and Mentoring Students," "Preparing a Tenure Dossier," "Time Management and Family Life," and "Advice from Early Career Faculty." I thought an important aspect of how it was organized was that each topic had at least two speakers. This kept presentations short, and always provided at least two (often differing) perspectives. This ensured that there was just not someone lecturing us on their opinions on a topic, but instead demonstrating to us that there was no one right way to approach being a young faculty. The slides from all of the talks should eventually be online, found from one of the above links.

(ed. note: I've also heard the counter-advice "focus on doing the work to get tenure at a high ranking place, rather than just your department" - the rationale being that having a generically strong tenure case makes you more mobile if you need to be)

Also, your department is making a 5-6 year investment in you. So they should be there to help you succeed; if you don't get tenure, then everyone loses. This has all helped realize the great demands of the tenure process, but also make it seem quite possible. Intimidating and comforting at the same time.

## Wednesday, February 17, 2010

### SoCG author feedback

David Eppstein has an analysis of the SoCG author feedback with respect to his papers. Worth perusing: his overall conclusion is that having the rebuttal was a good idea, but he'd like to hear from the committee (perhaps at the business meeting?).

I had two papers rejected. For one there was no feedback requested, and the paper was rejected. The final reviews made it clear that the reviewers understood the main contributions of the paper - what was under contention (among other things) was how the material was presented, and that's obviously not something author feedback can help with.

The other paper had one request for feedback which was basically a long negative review, again focusing on the presentation. We tried to respond as best we could, but it didn't make too much of a difference.

He did concur that the quality of reviewing was very high.

### SoCG Feedback...

It's my 1000th post !! Took longer than I expected, but then my average has been slipping. Here's hoping I get to 2000 !

Just got back my reviews from the Valentine's day massacre. I have to say that I'm stunned, and not for the reason you think.

I've been writing papers and getting reviews for a LONG time now, and I have to say that these are the best reviews I've ever seen, and are way beyond the standard for the typical theory conference. Both papers were rejected, and so the reviews necessarily were negative, but in an extremely constructive way. They critiqued without being critical, were detailed and thorough, and clearly got to the heart of what the paper was about. The comments pointed out what was good in the paper, and more importantly, pointed out what the reviewers felt was missing, and how best to address the problems. I actually felt better about the rejection after reading the reviews, because they came across as genuinely liking the work.

Now it wasn't all good. There was a basic premise at the heart of the rejection that I disagree with, but it's a reasonable point to disagree on, and I can at least see a way to resolving that problem.

At least one other author agrees with me on this assessment - you're welcome to share your experiences in the comments. Congratulations to the PC - theory conference reviews are often slammed, and rightly so, but these reviews stand out for their high quality.

## Tuesday, February 16, 2010

### SoCG accepted papers

After the Valentine's day massacre, comes the list. I'll link to PDFs if I'm pointed to them (add in the comments)
• David Millman and Jack Snoeyink. Computing Planar Voronoi Diagrams in Double Precision: An Example of Degree-driven Algorithm Analysis
• György Elekes and Micha Sharir. Incidences in Three Dimensions and Distinct Distances in the Plane
• Micha Sharir, Adam Sheffer and Emo Welzl. On Degrees in Random Triangulations
• Pankaj K. Agarwal, Rinat Ben Avraham and Micha Sharir. The 2-Center Problem in Three Dimensions
• Florian Berger and Rolf Klein. A Traveller's Problem
• Sergio Cabello and Bojan Mohar. Adding one edge to planar graphs makes crossing number hard
• Tobias Christ, Dömötör Pálvölgyi and Miloš Stojaković. Consistent digital line segments
• Dominique Attali and Andre Lieutier. Optimal reconstruction might be hard
• Dominique Attali and Andre Lieutier. Reconstructing shapes with guarantees by unions of convex sets
• Mark de Berg. Better Bounds on the Union Complexity of Locally Fat Objects
• Marc Glisse and Sylvain Lazard. On the complexity of the sets of free lines and free line segments among balls in three dimensions
• Tamal Dey, Jian Sun and Yusu Wang. Approximating loops in a shortest homology basis from point data
• Mohammad Ali Abam and Sariel Har-Peled. New Constructions of SSPDs and their Applications
• Sergio Cabello, Éric Colin de Verdière and Francis Lazarus. Output-sensitive algorithm for the edge-width of an embedded graph
• Sergio Cabello, Éric Colin de Verdière and Francis Lazarus. Finding Shortest Non-Trivial Cycles in Directed Graphs on Surfaces
• David Eppstein and Elena Mumford. Steinitz Theorems for Orthogonal Polyhedra
• Akitoshi Kawamura, Jiri Matousek and Takeshi Tokuyama. Zone diagrams in Euclidean spaces and other normed spaces
• Eryk Kopczynski, Igor Pak and Piotr Przytycki. Acute triangulations of polyhedra and the space
• Keiko Imai, Akitoshi Kawamura, Jiri Matousek, Daniel Reem and Takeshi Tokuyama. Distance k-sectors exist
• Joseph Mitchell. A Constant-Factor Approximation Algorithm for TSP with Neighborhoods in the Plane
• Benjamin A. Burton. The complexity of the normal surface solution space
• Karl Bringmann. Klee's Measure Problem on Fat Boxes in Time $O(n^{(d+2)/3})$
• Natan Rubin. Lines Avoiding Balls in Three Dimensions Revisited
• Pankaj K. Agarwal, Jie Gao, Leonidas Guibas, Haim Kaplan, Vladlen Koltun, Natan Rubin and Micha Sharir. Kinetic Stable Delaunay Graphs
• Haim Kaplan, Micha Sharir and Natan Rubin. A Kinetic Triangulation Scheme For Moving Points in The Plane
• Anne Driemel, Sariel Har-Peled and Carola Wenk. Approximating the \Frechet Distance for Realistic Curves in Near Linear Time
• Joachim Gudmundsson and Pat Morin. Planar Visibility: Testing and Counting
• Ken-ichi Kawarabayashi, Stephan Kreutzer and Bojan Mohar. Linkless and flat embeddings in 3-space and the Unknot problem
• Roel Apfelbaum, Itay Ben-Dan, Stefan Felsner, Tillmann Miltzow, Rom Pinchasi, Torsten Ueckerdt and Ran Ziv. Points with Large Quadrant-Depth
• Sunil Arya, David Mount and Jian Xia. Tight Lower Bounds for Halfspace Range Searching
• Don Sheehy, Benoit Hudson, Gary Miller and Steve Oudot. Topological Inference via Meshing
• Gur Harary and Ayellet Tal. 3D Euler Spirals for 3D Curve Completion
• Peyman Afshani, Lars Arge and Kasper Dalgaard Larsen. Orthogonal Range Reporting: Query lower bounds, optimal structures in 3-d, and higher-dimensional improvements
• Abdul Basit, Nabil Mustafa, Saurabh Ray and Sarfraz Raza. Improving the First Selection Lemma in $\Re3$
• Bernard Chazelle. A Geometric Approach to Collective Motion
• Bernard Chazelle. The Geometry of Flocking
• Pankaj Agarwal, Boris Aronov, Marc van Kreveld, Maarten Löffler and Rodrigo Silveira. Computing Similarity between Piecewise-Linear Functions
• Jean-Daniel Boissonnat and Arijit Ghosh. Manifold Reconstruction using Tangential Delaunay Complexes
• Pankaj K. Agarwal. An Improved Algorithm for Computing the Volume of the Union of Cubes
• William Harvey, Yusu Wang and Rephael Wenger. A Randomized $O(m\log m)$ Time Algorithm for Computing Reeb Graphs of Arbitrary Simplicial Complexes
• Omid Amini, Jean-Daniel Boissonnat and Pooran Memari. Geometric Tomography With Topological Guarantees
• Lars Arge, Morten Revsbaek and Norbert Zeh. I/O-efficient computation of water flow across a terrain
• Timothy M. Chan. Optimal Partition Trees
• Afra Zomorodian. The Tidy Set: Minimal Simplicial Set for Computing Homology of Clique Complexes
• Umut Acar, Andrew Cotter, Benoit Hudson and Duru Türkoğlu. Dynamic Well-Spaced Point Sets
• David Mount and Eunhui Park. A Dynamic Data Structure for Approximate Range Searching
• Janos Pach, Andrew Suk and Miroslav Treml. Tangencies between families of disjoint regions in the plane

## Friday, February 12, 2010

### Papers and SVN

Way back when, I had promised to do a brief post on the use of SVN (or other versioning systems) for paper writing. Writing this post reminds of all the times I've sniggered at mathematicians unaware of (or just discovering) BibTeX: I suspect all my more 'systemsy' friends are going to snigger at me for this.

For those of you not familiar with the (cvs, svn, git, ...) family of software, these are versioning systems that (generally speaking) maintain a repository of your files, allow you to check files out, make local changes, and check them back in, simultaneously with others who might be editing other files in the same directory, or even the same file itself.

This is perfect come paper writing time. Rather than passing around tokens, or copies of tex files, (or worse, zip files containing images etc), you just check the relevant files into a repository and your collaborator(s) can check them out at leisure. SVN is particularly good at merging files and identifying conflicts, making it easy to fix things.

My setup for SVN works like this: Each research project has a directory containing four subdirectories. Two are easy to explain: one is a "trunk" directory where all the draft documents go, and another is an "unversioned" directory for storing all relevant papers (I keep these separate so that when you're checking out the trunk, you don't need to keep downloading the papers that get added in)

The other two come in handy for maintaining multiple versions of the current paper. The 'branches' directory is what I use when it comes close to submission deadline time, and the only changes that need to be made are format-specific, or relate to shrinking text etc. The 'tags' directory is a place to store frozen versions of a paper (i.e post-submission, post-final version, arxiv-version, journal version, etc etc)

It seems complicated, but it works quite well. The basic workflow near deadline time is simply "check out trunk; make changes, check in trunk; repeat...". A couple of things make the process even smoother:
• Providing detailed log messages when checking in a version: helps to record what exactly has changed from version to version - helpful when a collaborator needs to know what edits were made.
• Configuring SVN to send email to the participants in a project whenever changes are committed. Apart from the subtle social engineering ("Oh no ! they're editing, I need to work on the paper as well now!"), it helps keep everyone in sync, so you know when updates have been made, and who made them.
• Having a separate directory containing all the relevant tex style files. Makes it easy to add styles, conference specific class files etc.
I can't imagine going back to the old ways now that I have SVN. It's made writing papers with others tremendously streamlined.

Caveat:
• SVN isn't ideal for collaborations across institutions. Much of my current work is with local folks, so this isn't a big problem, but it can be. Versioning software like git works better for distributed sharing, from what I understand.

## Thursday, February 11, 2010

### Graphs excluding a fixed minor

Families of graphs that exclude a fixed minor H have all kinds of nice properties: for example
• If G excludes a fixed minor H, then G has O(n) edges
• If G excludes a fixed minor H, then G has O(\sqrt{n}) treewidth
• If G excludes a fixed minor H, then G has a nice decomposition into a few graphs of small treewidth
• If G excludes a fixed minor H, then G can be decomposed into a clique sum of graphs that are almost embeddable on surfaces of bounded genus.
(all of these and more in Jeff Erickson's excellent comp. topology notes).

In all these cases, the O() notation hides terms that depend on the excluded graph H. for example, if H is a clique on k vertices, then G has at most O(nk\sqrt{log k}) edges.

So the question is: given a graph G, what's the smallest graph H that G excludes ? This problem is almost certainly NP-hard, and probably at least somewhat hard to approximate, but some approximation of the smallest graph (measured by edges or vertices) might be useful.

I was asked this question during our algorithms seminar a few days ago, and didn't have an answer.

## Monday, February 08, 2010

### Good prototyping software

All the code for my recent paper was written in MATLAB. it was convenient, especially since a lot of the prior work was in MATLAB too. I actually know almost no MATLAB, preferring to do my rapid protoptyping in C++ (yes, I'm crazy, I know).

Which brings me to the question that I know my lurking software hackers might have an answer to. If I have to invest in learning a new language for doing the empirical side of my work, what should it be ? Here are some desiderata:
• Extensive package base: I don't want to reinvent wheels if I can avoid it. In this respect, the C++ STL is great, as is Boost, and Python has PADS, as well as many nifty packages. MATLAB is of course excellent.
• Portability: I'm sure there's an exotic language out there that does EXACTLY what I need in 0.33 lines of code. But if I write code, I want to be able to put it out there for people to use, and I'd like to use it across multiple platforms. So Lua, not so great (at least in the research community) (sorry Otfried)
• Good I/O modules: if I write code, I'll often want to send output to a graph, or plot some pictures etc. Some systems (MATLAB) are excellent for graphing data.
• Performance: I don't want to sacrifice performance too much for ease of coding. I've always been afraid of things like Java for this reason. Of course, I'm told I'm dead wrong about this.
I deliberately haven't listed 'learning curve' as an option. If the language merits it, I'm willing to invest more time in switching, but obviously the benefits have to pay for the time spent. In terms of background, I'm most familiar with C++, and am a nodding acquaintance of python, and perl. I occasionally nod at MATLAB in the street when I see it, but usually cross the road to the other side if I see Java approaching. I used to be BFF with OpenGL, but then we broke up over finances (specifically the cost of GPUs).

Thoughts ?

## Sunday, February 07, 2010

### The challenge of doing good experimental work

I recently submitted (with Arvind Agarwal and Jeff Phillips) a paper on a unified view of multidimensional scaling. It's primarily empirical, in that the main technique is a heuristic that has many nice properties (including providing a single technique for optimizing a whole host of cost measures for MDS). For anyone interested though, there's also a nice JL-style theorem for dimensionality reduction from (hi-D) sphere to (lo-D) sphere, which gets the "right" bound for the number of dimensions.

But this post isn't really about the paper. It's more about the challenges of doing good empirical work when you're trained to think formally about problems. This post is influenced by Michael Mitzenmacher's exhortations (one, two) on the importance of implementations, and Mikkel Thorup's guest lament on the lack of appreciation for simple algorithmic results that have major consequences in practice.

So you're looking at practical ramifications of some nice theory result, or you're trying to apply formal algorithmic tools to some application problem. If you're lucky, the problem doesn't have an existing base of heuristics to compare against, and so even a straight-up implementation of your ideas is a reasonable contribution.

Of course, you're rarely this lucky, and there's usually some mish-mash of heuristics to compare against. Some of them might be head-scratchingly weird, in the "why on earth should this work" category, and some are possibly more principled. At any rate, you go and implement your idea, and you suddenly realize to your horror that worst-case bounds don't mean s*** when your principled method is ten times slower than the crazy heuristics. So now what ?

The central point that I want to make here is that while actually paying attention to implementation issues is of immense value if we want people to actually care about theoretical work, I don't think we get the right kind of training to do it well.

First off, we lack training in various heuristic design strategies. Now I don't actually mean the kinds of heuristics that one might come across in the Kleinberg-Tardos book (local search, annealing, and the like). I mean the much larger body of principle-driven heuristics that the optimization community is quite familiar with. Without even thinking too hard, it's easy to list heuristics like Newton's method, conjugate gradients, alternating optimizations, matching pursuit, majorization, the frank-wolfe method, iteratively reweighted least-squares, (and I could keep going on...)

Of course you might point out that I seem to know all about these heuristics. Well, not quite. The second problem is that even if one knows about these methods, that's not the same thing as having a good intuitive feel for when and why they work well. Case in point: one of the thorns in our side in this paper was a heuristic for MDS called SMACOF. It's a nice technique based on majorization, and although it's a heuristic, it works pretty darn well most of the time and takes a lot of effort to beat, even though there's no clear reason (at least to me) why it should work so well. The only real way to get a sense for how different heuristics work is to implement them all really, or at least have the right set of MATLAB/C/C++ tools lying around. I notice that ML folks tend to do this a lot.

The third problem that often comes up is by far the most irritating one: the actual cost function you're optimizing doesn't even matter that much at all. Returning again to our paper, there are many ways to define the "error" when embedding a metric into another metric. The traditional theoryCS way looks at dilation/contraction: the worst-case ratio between distances in the original and target space. Most variations on MDS actually look at an average difference (take the difference between the distances, and average some function of this). As anyone who mucks around with metric spaces will tell you, the actual error function used can make a huge difference to the complexity of the problem, the ability to approximate, and so on and so forth.

But here's the thing we discovered: it's actually possible to run heuristics designed explicitly for one kind of error function that do just great for another kind of error function, and it takes a lot of work to construct examples that that demonstrate the difference.

These points tie together in a more general theme. I was reading a post by Jake Abernathy at Inherent Uncertainty, and he makes a valuable point about the difference between algorithms/theory culture and ML culture (although ML could be replaced by other applied areas like db/data mining as well). His point is in theoryCS, we are problem-centric: the goal is to prove results about problems, and taxonomize them well. Improve asymptotic run-time - great ! get a better approximation ratio - excellent ! reimplement the same algorithm with the same running time to get a better behaviour in practice - hmmm. This is in contrast (as he puts it) to a lot of ML research, where the algorithmic technique comes first, and it's later on that some results are generated to go along with it.

This of course drives us nuts: NOT EVERY CLUSTERING PROBLEM SHOULD BE SOLVED WITH k-MEANS ! (phew - I feel better now). But if you reexamine this situation from the setting of applied situations, you realize its utility. I want to solve a problem - I pick up some off-the-shelf algorithm. Maybe it's doesn't solve my problem exactly; maybe it solves a related problem. But it's either this, or some very complicated theoretical method that has no extant implementation, and is optimizing for a worst-case that I might never encounter. What do I do then ?

This is not a rant about worst-case analysis. Far from it. It's not a rant about O() notation either. What I'm merely trying to say is that a focus on worst-case analysis, asymptotic improvements, and provable guarantees, while essential to the enterprise of doing theory, leaves us little room for the kind of experience needed to do effective practical implementation of our ideas.

## Monday, February 01, 2010

### The limerick I used to introduce Emmanuel Candes

I was assigned the task of introducing Emmanuel Candes for his invited talk at SODA. After getting tired of my incessant pacing up and down the hotel room at 3am mumbling about the text of the intro, my roommate took pity on me and composed a cute little limerick for the occasion.

Now whether it was madness, desperation or both, I don't know. But I decided to use that limerick to open the introduction, much to the consternation of my (unnamed) roommate, who hadn't intended his little throwaway to take on such prominence.

But use it I did, and it didn't fall entirely flat, for which I am very grateful. I won't out the composer unless he does it himself :), but here's the limerick:

There once was a professor from Caltech
who represented his signals with curvelets
But upon reflection
realized, he'd prefer projection
As they could better capture sparse sets.

Thank you all: I'll be here all night....

### Could the IPad make computer science obsolete ?

OK fine. it's a provocative title. But hear me out.

Most non-cave-dwelling luddites have heard about the new Apple tablet (aka IPad). The simplest (and most misleading) way to describe it is as a gigantic ipod touch, with all the multitouch goodness of the ipod/iphone, as well as the numerous app store apps.

There's a vigorous debate going on over the merits and impact of the IPad, and while it's clear that it's not a work laptop/netbook replacement, it's probably the first internet appliance with some mojo.

The word 'appliance' is chosen deliberately. The IPad essentially behaves like a completely sealed off appliance - you can't hack it or customize it directly, and are only allowed the interface that's provided to you by Apple and the app store (also controlled by Apple). This is viewed (correctly, on many levels) as a feature, and not a bug. After all, most people don't care to know how their large and complicated computers really work, and all they want is a way to check email, surf the web, watch movies, etc etc.

But here's the thing. As long as the computer has been this complicated, hard to manage and yet critically important device, it's been easy to make the case for computer science as an important, lucrative discipline, and one worth getting into. Even in the past few years, with enrollments plummeting (they seem to be recovering now), there's been no argument about the importance of studying computer science (even if it comes across as boring to many).

And yet, how many people enroll in 'toaster science' ? More importantly, how many people are jumping on the chance to become automotive engineers ? As the computer becomes more and more of an appliance that we "just use", the direct connection between the person and the underlying computing engine go away.

Obviously there'll always be a need for computer scientists. Those social networks aren't going to data mine themselves, and someone needs to design an auction for pricing IPad ads. But it's quite conceivable that computer science will shrink dramatically from its current size down to a much smaller discipline that generates the experts working in the backend of the big internet companies (Microsoft, I'm not optimistic about your chances of survival).

This cuts both ways: a smaller discipline with more specialized skills means that we can teach more complex material early on, and are likely to attract only the dedicated few. However, it'll be a "few": which means that for a while, till we reach a new stable equilibrium, there'll be way fewer jobs at lower salaries.

I make this observation with no great pleasure. I'm an academic and my job might disappear within 10 years for other reasons. But along with rejoicing in the mainstreaming of what appears to be fairly slick use of technology, I'm also worried about what it means for our field as a whole.