Friday, November 30, 2007

The true meaning of peer review

from Cosma Shalizi:
passing peer review is better understood as saying a paper is not obviously wrong, not obviously redundant and not obviously boring, rather than as saying it's correct, innovative and important.
Of course, the contrapositive of this is that an obviously wrong, redundant and boring paper has some nonzero chance of being rejected ;)

Interesting new blogspam method ....

When you've been writing a blog for a while, you'll discover what appear to be automated blogs that shamelessly steal your posts and repost them as new posts verbatim. I can only imagine that they do this to get high hits (and therefore advertising revenue) from Google, although why someone would take *my* posts for this purpose escapes me.

A new form of blog plagiarism combines the best of spam techniques with post copying, so as to evade the obvious duplicate detection methods. Consider the following examples:

The original post:
There's been a lot of grief over the reduction in price for iPhones only a few months after they were released. Wired magazine interviewed people who don't regret paying the higher price for the virtue of being an early adopter/arbiter-of-cool, and this comment caught my eye
The modified post (and no, I'm not going to link to it):
There's been a number of visibility at the decline master p scholomance for online in very couple weeks and we were released. Wired magazine interviewed people who don't regret paying a low frequency of the world as both an early riser and perhaps comment and the eye
The original post:
Suppose you're given a metric space (X, d) and a parameter k, and your goal is to find k "clusters" such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster "center" (a designated point).
The modified post:
Lil romeo you're given a multidimensional space (X, d) and a parameter k, and a buff being use find k "clusters" such as lump- end of the effect is minimized. Here, lil romeo use 70million a target wikipedia, the sum of these items out the amount of the need codify the cluster "center" (a designated point).
And so on. What's more mysterious is that this isn't straight plagiarism: the post credits my post as the "source", although given the garbage that's generated, I'm not sure I want that "source" credit :).

Very mystifying...

Saturday, November 24, 2007

Black Friday -- online

I'm on a cable modem, and thus I share my network connection with my neighbours. Ever since Friday morning, my connection has been awfully slow during the day (and much faster in the early AM). I wonder if the whole 'Black Friday, crowded mall' phenomenon is moving online.

Tuesday, November 20, 2007

Author contributions

In this conference season, it seems like a good time to think about the issue of author contributions. Here are some of the methods for author listing used in computer science:
  • Alphabetical (the norm in theoryCS and communities heavily influenced by it)
  • Advisor-student: student name comes first, advisor comes last.
  • Work-ordering: first author did all the work, last author provided the funding, intermediate authors ordered by contribution (common in systems work)
Of course, there are combinations: I've seen alphabetical modified by advisor-student ordering, or alphabetical with sub-level work-ordering pieces etc, although the latter is not common in theoryCS. The arguments for and against the various methods are well known I imagine:
  • alphabetical ordering conceals who did the real work ("but the cream rises to the top")
  • Work ordering allows advisors to slap their names on anything coming out of their lab ("But they did all the work needed to fund this research: grant writing, selling, etc")
  • alphabetical gives undue eminence to authors whose names begin with 'A', (but see here for a counter-theory)
  • alphabetical ordering makes authors with names in the end of the alphabet lazy (Who're you looking at, punk ?)
  • theory papers can't be work-ordered, because ideas "float in the air without ownership" (yes, I have heard this argument many times, so don't snicker)
But what is perhaps more interesting is the problem of assigning author contributions, especially in systems like the alphabetical one where author contributions are not signalled by placement. Luca Aceto points out an unusually detailed description of author contributions, and speculates as to what a similar note might look like on theory papers.

He also links to an interesting set of axioms that Hardy and Littlewood developed before their famous collaboration started. I actually admire this approach. As a general rule, theory folks tend to be very allergic towards detailed discussions about contributions and what-not, but in collaborations, it's critically important to lay out the ground rules in advance, and the Hardy-Littlewood axioms are good because they lay out rules in advance that eliminate any problems later on: for example,
And, finally, the fourth, and perhaps most important axiom, stated that it was quite indifferent if one of them had not contributed the least bit to the contents of a paper under their common name . . .
Agreeing to such an axiom requires great trust between the collaborators, because of the immense potential for abuse, and maybe that's exactly the point: a good collaboration requires trust, and you can't agree to such an axiom unless trust already exists.

Monday, November 19, 2007

My first ipelet: 1-medians

I just wrote my first ipelet, and it was an amazingly pleasant experience. Not to mention the thrill I got from marking a collection of points and whoosh! getting the answer right there on the screen.

The ipelet is for computing the discrete and geometric 1-median of a set of points. The geometric 1-median of a set of points (also called the Fermat-Weber-Steiner-Torricelli-< insert your name here > point) is the point that minimizes the sum of distances to a set of points. The discrete 1-median is the same minimization, restricted to points in the input.

Computing the discrete 1-median of a set of points in the plane is trivial in n^2 time, and I didn't do anything clever there. Computing the geometric 1-median is more nontrivial though: there is no closed form solution beyond 4 points, and it can be shown that the point cannot be computed using arithmetic and k^th roots.

There are various approximation schemes: the most pertinent one here (in the plane) is a scheme by Bose, Maheshwari and Morin that uses a cone decomposition of the plane to yield an approximation scheme. In practice, researchers often use an iterative scheme developed by Weiszfeld; it's a simple iterative scheme built around the fact that the optimal solution must be a point such that the sum of unit vectors from it to all other points is zero (formally, this is the condition on the gradient).

The cost function is convex and differentiable except at the input points themselves. Thus, the Weiszfeld iteration is guaranteed to give the optimal solution as long as it doesn't get stuck at one of the data points. This algorithm is often referred to as the "founding algorithm of location science", and there are numerous modifications, extensions, and refinements. The ipelet implements the basic Weiszfeld scheme with a fairly brain-dead termination condition.

What possessed me to do such a thing ? Well, I'm trying to find counter examples for a conjecture involving 1-medians, and needed a way of visualizing examples :).

The code is here: feel free to use it, comment on it, modify it, etc.

IKEA = NP ?

Brian Hayes, in a post with a very catchy title, makes this painful, and yet not-off-the-mark comment about complexity class nomenclature:

The sad truth is, the naming conventions for furniture at Ikea make for a more consistent language than those of complexity theory.

Ouch ! For more on this, read Lance's post from way back when.

Tuesday, November 13, 2007

A day in the life...

Here's a list of things I did today:
  • I taught one lecture of my class
  • I attended a departmental committee meeting
  • I had a meeting with collaborators to work out a paper outline for something we're submitting
  • I met with my student and did some nontechnical advising
  • I had a (brief) discussion with a collaborator about a pending grant proposal.
In other words, I covered the entire gamut of activities one might expect of an academic - with one significant exception. Did you notice it ?

Nowhere in there was any actual research done ! Gaaaah !

Tuesday, November 06, 2007

"As good as your best result"

Mihai makes the argument that in theoryCS, you're as famous as your best result. I think the number of good results does matter a bit more than that, but getting well known for your best result is the best way to make a first splash and get on the radar.

I've actually heard the opposite claim made by systems researchers, and this has been extended to other empirical researchers as well. Namely, "you're as good as your worst result" (presumably thresholded to ignore grad school). The rationale here appears to be that in the conservative empirical sciences, where a badly designed experiment can cast a shadow on all your later work, it's your worst result that matters.

I can see how this works (sort of): in the more mathematical disciplines, a "proof" can to a large extent be validated independent of the author, so mistakes in past proofs can be tolerated (though if this becomes an endemic problem, then ...). Louis de Branges is famous for his proof of a conjecture in complex analysis known as the Bieberbach conjecture, and is currently well known for his claimed proof of the Riemann Hypothesis (Karl Sabbagh's book on this has more on de Branges). His proof of the Bieberbach conjecture was not without some initial skepticism from other mathematicians, because of some earlier false starts. However, it was soon seen as correct, and has opened up new areas of mathematics as well. As a consequence, his proofs of the Riemann hypothesis have received more cautious consideration as well (although I am not aware of the current status of his claims).

On the other hand, validating empirical work is largely on trust: you trust that the scientists have made faithful measurements, have kept proper lab notes etc, and I can see how a weakening of that trust can lead to much higher skepticism.

In this light, it's interesting to read Malcolm Gladwell's latest article for the New Yorker. He profiles the "profilers": the psych experts who help the FBI track down serial killers, and comes to this conclusion:
He seems to have understood only that, if you make a great number of predictions, the ones that were wrong will soon be forgotten, and the ones that turn out to be true will make you famous.

Sunday, November 04, 2007

Disqus for The Geomblog