The Geomblog: 09/01/2004

Wednesday, September 29, 2004

Blog Carnivals ...

With so many bloggers posting every day, it's hard to keep track of the latest greatest trends. Either you can subscribe to a memepool, or you can read a weekly summary of happenings in what is typically called a 'carnival'. The most famous one is the Carnival of the Vanities; it has been going for two years now, and attempts to collect the best of the blogosphere on a wide variety of topics. Some weekly carnivals are themed: for example, here's an insect-themed one.

But there are many specialized carnivals as well. Perhaps you want to understand the inner workings of the capitalist world. Or maybe you just want to buy stuff, and eat it. Or you like to write and want to see what others are upto. Maybe you live in Canada, and watch the Bush administration activities with great interest.

Even if you are a doctor (medical, or Ph.D), there is a carnival for you. Or maybe you just like cats.

But if you like clickety-clacking on a keyboard for a living, there isn't a carnival for you...yet....

Tuesday, September 28, 2004

Not photoshopped...

but too implausible to be true. Alas, if it only were:

Mathematics can even be a matter of life or death. During the Russian revolution, the mathematical physicist Igor Tamm was seized by anti-communist vigilantes at a village near Odessa where he had gone to barter for food. They suspected he was an anti-Ukranian communist agitator and dragged him off to their leader.

Asked what he did for a living he said that he was a mathematician. The sceptical gang-leader began to finger the bullets and grenades slung around his neck. "All right", he said, "calculate the error when the Taylor series approximation of a function is truncated after n terms. Do this and you will go free; fail and you will be shot". Tamm slowly calculated the answer in the dust with his quivering finger.
When he had finished the bandit cast his eye over the answer and waved him on his way.

Tamm won the 1958 Nobel prize for Physics but he never did discover the identity of the unusual bandit leader. But he found a sure way to concentrate his students' minds on the practical importance of Mathematics!

(via a commenter on Political Animal...)

On a side note, I would never have dreamed of seeing Mean Girls, but computing a limit in one's head in 20 seconds ....

Bloghoo ?

Via Pharyngula, a creative idea from Voyage to Arcturus and Rhetorica:

Recent controversies regarding "legacy media"/"MSM" coverage of the U.S. Presidential campaign, especially the troubling memos regarding the President's experiences during the Vietnam War, have demonstrated a need for conventional media to draw on the vast, dispersed expertise of the blogosphere.

Can this Schumpeterian gale be harnessed? We believe it can. Amidst the jeering, we have formulated a constructive response -- a mechanism whereby a symbiotic relationship between blogging and traditional forms of journalism can be deliberately cultivated.

That mechanism is 411blog.net.

Reporters can use it to quickly authenticate highly technical or specialized story elements with subject-matter experts (SMEs) drawn from the best the blogosphere has to offer, including academics, business people, scientists, and lay experts of all kinds. SMEs on 411blog.net also offer reporters another important advantage: As bloggers in addition to subject experts, they are plugged in to the latest internet conversation regarding their subject areas.

Bloggers can use 411blog.net to nominate subject-matter experts, build trust with traditional media, and increase their standing in the blogosphere.

I had commented earlier about the proliferation of the so-called 'subject blogs', blogs not devoted to popular topics like politics and culture, but devoted to specific subjects. 411blog.net is a good idea to organize these blogs in a single place, much like Yahoo did in the early days. In fact, I may have even predicted this !

Monday, September 27, 2004

Knowledge via democracy

Pharyngula has been frothing at the mouth (quite justifiably) about the latest issue of Wired that covers the intelligent design controversies raging in the public school system. As he points out, the core strategy of the (un)ID folks is to create a debate and then use its existence to establish legitimacy: "see, we are disagreeing, as rational people will about a controversial topic".

The debate's two-on-two format, with its appearance of equal sides, played right into the ID strategy—create the impression that this very complicated issue could be seen from two entirely rational yet opposing views. "This is a controversial subject," Meyer told the audience. "When two groups of experts disagree about a controversial subject that intersects with the public-school science curriculum, the students should be permitted to learn about both perspectives. We call this the 'teach the controversy'
approach."[1]

What is saddest is comments like this coming from school board members:

Another board member, Deborah Owens-Fink, declares the issue already closed. "We've listened to experts on both sides of this for three years," she says. Ultimately, the question of what students should learn "is decided in a democracy, not by any one group of experts."

We know what happens to the value of pi when democracy decides knowledge...

Notes:

Larry Kraus was interviewed in Scientific American about his experiences dealing with the school board mentioned above.
I find it even more tragic that at a time when a grounding in biology appears to be key to having a part in the biotech revolution, these people want to gut the foundations of scientific education in this country.

Saturday, September 25, 2004

Academics locked out by tight visa controls

It may be obviously true, but it needs to be said, and I am glad more people are saying it. Here is Bruce Schneier in the San Jose Mercury News:

Security is always a trade-off, and specific security countermeasures affect everyone, both the bad guys and the good guys. The new U.S. immigration rules may affect the few terrorists trying to enter the United States on visas, but they also affect honest people trying to do the same.

All scientific disciplines are international, and free and open information exchange -- both in conferences and in academic programs at universities -- will result in the maximum advance in the technologies vital to homeland security. The Soviet Union tried to restrict academic freedom along national lines, and it didn't do the country any good. We should try not to follow in those footsteps.

Friday, September 24, 2004

As the weekend approaches....

Your moment of Zen, courtesy OxBlog:

From a distinguished guest lecturer at YLS, on the different benefits of private practice vs. government work vs. academia:

Money can't buy you job satisfaction -- that is true -- but it can buy you lots of other cool stuff!

Thursday, September 23, 2004

Geometric Probability

Probability is a tricky beast, and even more tricky when it comes to analyzing geometric data. The problems arise from how to define accurate measures over geometric spaces; the Buffon needle problem is a classic example of needing to do this kind of calculation carefully. A short (but dense) monograph by Klain and Rota starts with the Buffon needle problem and delves deeper into notions of valuation (a variant of measure) and how to define them correctly, the key problem being the issue of invariance: how to define measures that are invariant under geometric transformations (read this review (PDF)).

To illustrate some examples of the "weirdness" of geometric probability, consider the following three results:

If we pick n points at random from the unit square, the expected size of the convex hull of these points is log n.
If we pick n points at random from a k-gon, the expected size of the convex hull is k log n
If we pick n points at random from a unit disk, the expected size of the convex hull is n^1/3 !

All of these results are collected in a nice manuscript by Sariel Har-Peled.

But a result that I found even more surprising is this one, by Golin, Langerman and Steiger.

An arrangement of n lines in R² induces a vertex set (all intersection points) whose convex hull has expected complexity O(1) !

Their result uses a fact that is well known (Feller vol. 2), but is still remarkably elegant.

Choose points x₁, x₂, ..., x_n at random on the unit interval. Compute their sorted order x₍₁₎, x₍₂₎, ... x_(n), the order statistic. Then the variable x(k) is distributed (roughly) as the sum of k i.i.d exponentially distributed random variables.

Specifically this means that the minimum is distributed as an exponential distribution, and the rest are distributed as the appropriate gamma distributions.

Bill Steiger recently gave a talk at the NYU Geometry seminar. I could not attend, but the abstract of his talk indicates that the above result holds even in R³.

Update: I should clarify that the O(1) result mentioned above was first proved by Devroye and Toussaint; their distribution on lines is different, and their proof is more complex. The above paper presents a simpler proof for an alternate distribution.

My internet fame score is

12 millibrooksies...

Lance, on the other hand, has a score of 47 mb

And Jeff has the whopping score of 760mb. But then again, who is Jeff ?

Wednesday, September 22, 2004

Ernie shows remarkable restraint...

A few months ago, Ernie (of 3D Pancakes) commented on the (now infamous) Annie Jacobsen episode by mentioning his own heart-stopping experience with the dark side of Islamic terrorism:

Like Annie Jacobsen, I recently took a long Northwest Airlines flight, #53 from Amsterdam to Detroit, on which I observed several Middle Eastern passengers. Including one who sat down immediately next to me! Blocking my access to the aisle! Who then began reading a book full of Arabic! And looking nervously around the plane! Oh my god, oh my god, help me, we're all going to DIE! EEAHEEAHEEAHEEAHEEAH!

As I girded my metaphorical loins in anticipation of poisoning my seatmate to death with the remains of my so-called "chicken" "dinner", she turns to me and asks where I'm from. Turns out she was flying to Nashville, my hometown, where she is a nursing student and her husband a mechanical engineer, after a two-month visit to her family in Tehran. We had a lovely chat about living in the South, and the current troubles in Iraq. She showed me several pictures of her family at various gatherings (a birthday party, her cousin's engagement), including several fascinating pictures of the ruins at Persepolis. We landed in Detroit without indicent [sic] and wished each other well.

As he mentions in the comments:

...she wasn't actually reading Arabic, but Farsi. (It was a book about Persepolis.) But the differences between the Farsi, Urdu, and Arabic alphabets are pretty subtle to uneducated westerners like myself, not to mention all those other languages that use Arabic scripts. For all I knew, she could have been reading Turkish or Kashmiri.

Indeed. Consider the case of the friendly folk of Midwest Airlines, who sadly didn't show the same level of restraint:

Midwest Airlines canceled a flight ready to take off for San Francisco after a passenger found Arabic-style handwriting in the company's in-flight magazine and alerted the crew.

The plane, carrying 118 passengers and five crew members, had already pulled away from the gate at Mitchell International Airport Sunday evening. It returned to the gate, the passengers got off, security authorities were notified, all luggage was checked and the aircraft was inspected. Nothing was found. The passengers were put up in nearby hotels and booked on a Monday morning flight.

The writing was in Farsi, the language used in Iran, said airline spokeswoman Carol Skornicka. She said she didn't know exactly what the writing said but was similar to a prayer, ``something of a contemplative nature.''

Monday, September 20, 2004

'Legs of Death'

I think we don't give Lance enough credit for blogging from the war-zone that is Chicago:

Chicago resident Guarav[sic] Bhatia, 25, is a man of science, a chemical engineer working on his second master's degree at the Illinois Institute of Technology.

Scientists are logical people. So he's puzzled about what happened to him on the CTA the other day.

"I've got a problem," he told me on Wednesday. "I'm accused of breaking the law."

Did you dare munch a jelly doughnut on a CTA train? I mentioned this because the CTA rents space to doughnut shops, but prohibits the munching of doughnuts by commuters.

"No, I'm accused of sleeping," Bhatia said. "And now I'm going to have to go to court!"

But wait ! There's more:(emphasis mine)

Officially, [the city is] sticking with the same story they had the other day, that he violated a CTA ordinance by obstructing the operation of a train.

The new spin is that he was not simply sleeping, something common on the CTA, but that he was--get this--sleeping dangerously.

Gaurav is now being called 'Legs of Death' Bhatia. We shall leave him with the last word:

As for "Legs of Death" Bhatia, he explained that he had his face against the train window, so he couldn't have stretched into the aisle, sleeping dangerously as it were.

"It would have been physically impossible," he said. "Even Keanu Reeves from `Matrix' could not do it."

(Via Sepia Mutiny)

Update: Sepia Mutiny announces that Gaurav has now beaten the rap !

The Research Is Not Funny !

From the Annals of Improbable Research:

"The research is not funny at all! ... Dr Stone has just completed initial fieldwork at South Georgia. His preliminary results show that no King Penguins toppled over when overflown by the Lynx helicopter."

This announcement brought to you by the British Antarctic Society.

Sunday, September 19, 2004

The self-absorption of science.

Many months ago, Scott Aaronson initiated a discussion on Lance's blog about the appropriate amount of time to spend on one's research (quick answer: all). Lance was quick to adjust this in a later post, saying "Your success in academics, like any professional endeavor, depends in part on how much effort you put into it with the relationship far more than linear. But by no means is social life and a productive research career incompatible."

I was reminded of this discussion whilst reading a book that excerpted small sections from Charles Darwin's autobiography. Here is Darwin reflecting on his absorption in science:

My mind seems to have become a kind of machine for grinding laws out of large discussions of facts...Up to the age of thirty, or beyond it, poetry of many kinds ... gave me great pleasure... Pictures gave me considerable, and music very great delight. But now for many years I cannot endure to read a line of poetry; I have tried lately to read Shakespeare and have found it so intolerably dull that it nauseated me... Music generally sets me thinking too energetically on what I have been at work on....

Darwin goes on to theorize that a lack of "poetic stimulation" as it were atrophied the sections of his brain that would have appreciated the arts, and felt that if he could have lived his life over again, he would have spent more time cultivating artistic pursuits, because

The loss of these tastes is a loss of happiness, and may be injurious to the intellect, and more probably to the moral character by enfeebling the emotional part of our nature.

Especially in more theoretical areas of computer science, it is the easiest thing in the world to become absorbed in your work: all you really need is a pen and paper. There is also a thrilling sense of abandonment, of being swallowed whole by a collection of lines, or a manifold, or a linear program, as one delves deeper and deeper into the mysteries of one's work. Darwin's sense of regret as he looks back is an interesting perspective on such a life of absorption. I don't think that everyone would agree, but it is a sentiment worth keeping in mind.

Saturday, September 18, 2004

Experimental Work in SODA

A reader writes:

Let me suggest another [topic]--the attitude towards experimental algorithmic work, which I think is a recurring issue. The CFP [for SODA] often advocates this theme but then the committee is likely to not appreciate so much the experimentations.

As theoreticians, it is probably not so impressive to us that one can beat the known worst-case analysis in certain data sets (particularly simulated data). Perhaps we are not used to evaluate experimental work and we rarely think about the kind of experiments that would significantly benefit our theoretical work.

David Johnson has written a Theoretician's Guide to the Experimental Evaluation of Algorithms (PDF)(link fixed). The first time I saw this, it was a short 7 page guide, and now it has expanded to a 36 page review. I highly recommend reading it, and keeping a copy around when you start planning your experiments. The article addresses all the points (and more) that I would have hoped to make, so I'll merely excerpt a few points:

1. Understand what your paper is about:
–

Implementation for a specific application
A 'horse-race' paper, comparing algorithm/implementation to others
Exploring the strengths/weaknesses etc of various algorithmic ideas in practice
Generating conjectures about average case behaviour when formal probabilistic analysis is too hard

The first two cases are the most common kinds of implementations, but it is the third one that David suggests is the most interesting from an experimental algorithmics perspective. Indeed, usually papers that combine the first two characteristics are best suited for the area the application came from, rather than a SODA/ALENEX....

2. What makes a paper interesting:

...the results should yield broader insights, either in terms of asymptotics or presumption of robustness.... (this is one way in which an experimental paper aspires to higher goals than a horse race paper...)

3. On reproducibility:

... what does "reproducibility" mean in the context of the experimental analysis of algorithms? In the strictest sense it means that if you ran the same code on the same instances on the same machine/compiler/operating system/system load combination you would get the same running time, operation counts, solution quality (or the same averages, in the case of a randomized algorithm).

This, however, would not be very informative, nor does it correspond to what is meant by "reproducibility" in classical scientific studies. In reproducing such a study, a scientist must use the same basic methods, but will typically use different apparatus, similar but distinct materials, and possibly different measuring techniques. He will be said to have reproduced the original results if the data he obtains is consistent with that of the original experiments and supports the same conclusions. In this way he helps confirm that the results of the original experiment (and the conclusions drawn from them) are independent of the precise details of the experiment itself.

It is this broader notion of reproducibility that I consider to be the most important.

4. On comparability:

You should write your papers (and to the extent possible make relevant data, code, and instances publicly available in a long-term manner) so that future researchers (yourself included) can accurately compare their results for new algorithms/instances to your results. Part of this is simply following the above recommendations for ensuring reproducibility, but there are additional steps necessary.

He goes on to list some issues: the uncalibrated machine, the lost testbed, lost code/data.

Finally, the article lists things you cannot trust:

1. Never trust a random number generator.
2. Never trust your code to be correct.
3. Never trust a previous author to have known all the literature.
4. Never trust your memory as to where you put that data (and how it was generated).
5. Never trust your computer to remain unchanged.
6. Never trust backup media or Web sites to remain readable indefinitely.

And most memorably,

7. Never trust a so-called expert on experimental analysis.

The advice in this paper, if taken, will lead to an excellent piece of experimental work. It does require a lot more work than merely coding up an algorithm, and I think this is something to keep in mind. I have come to realize that unlike when writing a theoretical paper, when the timeline from initial problem conception to paper writing can be as little as a few days depending on the problem, experimental work always takes a lot longer and requires a lot more care to do and to write about. It's worth keeping that in mind; don't expect a published work to come out of an implementation merely by the act of implementing.

Frame the question this way: what can someone take away from reading your paper ? In what way will it guide their work ? When is your empirical work applicable to their situation, and when not ? I actually don't think theory committees have great difficulty evaluating experimental work; it's that I think experimental algorithmics has some ways to go to catch up in quality to the (justified) expectations of it. Over the years, the quality of papers in ALENEX has steadily grown, and I see no reason why we can't get better.

A bit of historical context: SODA has always been interested in examining experimental work. This excerpt comes from the CFP for SODA 1990 (the first one); added emphasis is mine. You can't find it on the web; I had to parse a troff document from David !

Papers are solicited that, by mathematical or experimental analysis, address the ways in which resource usage grows with increasing problem size, with ``resource'' interpreted broadly to include, for instance, ``nearness to optimality'' as well as such traditional measures as running time, storage, number of processors, and amount of communication. Mathematical analysis may be worst-case or probabilistic. Lower bound results are appropriate if they apply to real algorithmic problems.

Experimental papers can address a variety of issues, but special consideration will be given to those that put theoretical results into practical perspective or suggest new avenues for theoretical investigation.

Friday, September 17, 2004

5-yr anniversaries

I just got my 5-year anniversary congratulations letter in the mail. For those who are curious, it starts off 'Dear Fellow Employee', and goes downhill from there. In what is possibly the most bizarre way of felicitating employees, I was asked to go to a website to "purchase" this letter (for free, thankfully), with all the associated "add to shopping cart" and "check out" activities.

As always, Dilbert has the best take on the situation:

Intellectuals and anti-intellectualism

As I spent more time in the US, an interesting paradox began to present itself to me. On the one hand, people (like me) came to this country because of its role as the driver of technological advancements, as well as its ability to nurture and sustain a dynamic intellectual culture in the academic system. On the other hand, it became more and more apparent that a streak of anti-intellectualism runs fairly close below the surface of American life, manifesting itself in such things as the attitude towards "smarts" in public schools, the focus on practical experience as opposed to "book-learning", and even in undercurrents of the 2000 and 2004 elections (It is not that such phenomena are lacking in the schools I grew up in; it just seemed contrary to what I had viewed the American education system as, before I came here)

One of the definitive texts that talked about this aspect of American life was Richard Hofstadter's Anti-Intellectualism in American Life. It traces back American history to tbe beginning of the colonies, and links this phenomenon to religious and cultural upheavals over the centuries. The book was written in 1966 and thus is limited in its analysis of the current state of affairs in this country, but had a number of insightful observations outlining what appeared to be a strong and long-running tradition (much more so that I had realized).

A more recent book by Franklin Furedi, 'Where Have All the Intellectuals Gone?', discusses the demise of the "intellectual" in modern society (where by 'intellectual' I infer that he means more than an 'academic'. An interesting point that it brings out is the connection between postmodern relativism and the democratization of knowledge in the destruction of the intellectual by paradoxically, placing him front and center. Here is an excerpt from Terry Eagleton's review in the New Statesman (via Oxblog):

A society obsessed with the knowledge economy, Furedi argues, is oddly wary of knowledge. This is because truth is no longer precious for its own sake. Indeed, the idea of doing something just for the hell of it has always put the wind up philistine utilitarians, from Charles Dickens's Mr Gradgrind to our own Mr Blair. At an earlier stage of capitalism, knowledge was not so vital for economic production; once it becomes so, it turns into a commodity, while critical intellectuals turn into submissive social engineers. Now, knowledge is valuable only when it can be used as an instrument for something else: social cohesion, political control, economic production. In a brilliant insight, Furedi claims that this instrumental downgrading of knowledge is just the flip side of postmodern irrationalism. The mystical and the managerial are secretly in cahoots.

In other words, when everyone is in the business of producing knowledge, the pursuit of knowledge becomes a business, in a way that is antithetical to what the pursuit of knowledge should be: something done "just for the hell of it".

In our everyday lives as researchers, we see examples of this all the time, especiallly when having to either justify working on a problem, or when finding problems to investigate. I would go as far as to claim that the loss of public stature of NASA after the moon landings is caused by a serious lack of vision, of the kind that can inspire and elevate a discourse, and a mindless focus on minutiae of shuttle missions and space stations.

Wednesday, September 15, 2004

Academic classifications...

Erdös's vocabulary for people is famous:

Epsilon: a child
Joe/Sam: Soviet Union/USA
Bosses/Slaves: Wives/Husbands
Dead: Stopped doing math

and so on...

A funny description of economists (via this comment on danieldrezner.com):

Rudi [Dornbusch] was a very funny, colorful speaker of English, with a gift for offbeat but perfect turns of phrase. He had a classification system for economists, depending on their research style. "Goldsmiths" were careful, meticulous workers - which Rudi admired. "Pigs" just sort of jumped into an issue and wallowed around. But that was OK too, if it was done with sufficient vigor and originality. Rudi described Larry Summers as a "fearful pig" - and it was a compliment.

On the other hand, "plumbers" were economists who devised intricate contraptions with no clear purpose. I won't tell you who he described as "dreadful plumbers", but he was right.

Some other interesting modes of academic research:

craftsman: worked alone, or with one or two colleagues, to carefully write papers and books.
bureaucrat: have grants, research assistants and a large network of co-authors. This kind of scholar is more like an architect - he designs the overall project, but an army of helpers puts together the final project.
recycler: academics who come up with one big theoretical idea, and then try to use that idea to explain every possible phenomenon under the sun.
importer: academic who engages in intellectual arbitrage. They develop an expertise outside their disciplinary boundaries, and then import the ideas, paradigms, and analytical tools culled from these outside areas to explain phenomenon within their discipline.
theocrat: This academic develops (possibly through a "Bureaucrat" research process) a belief system which then not only infuses everything he, but also causes him to dismiss the work of others. Those other academics, he argues, are wrong and (1) need to study his work more; (2) haven't studied enough to realize he's right; and/or (3) have attempted to replicate his work but aren't skilled enough to do so.
compiler: the academic whose most important contribution is to assemble a textbook, which puts his field all in one place at the fingertips of his colleagues. Often this involves taking dense academic research done by others, and making it tractable. 15 editions later, he is universally cited as one of the Greats of his time.

And my own contribution:

hoarder: has access to some key piece of data/equipment/resource that is hard-to-impossible to recreate, and exchanges access to it for authorship on papers.

I should add that not all of these are pejorative labels. Recycling and importing can be very fruitful research modes, just to name a few.

Tuesday, September 14, 2004

Theory Meets Hollywood

Benjamin Rosenbaum (via BoingBoing) develops a theory of Bacon-Erdös numbers:

...the preliminary work has already been done. Brian Greene, for instance, has an Erdös number of 3, and a Bacon number of 2. Thus, my proposed conversion function (allowing edges in the unified Bacon-Erdös graph to represent two people either appearing together in a movie or coauthoring a paper) is as follows:

Finding: an actor with a Bacon number of N has, at most, a Baconized Erdös number of N+5. Similarly, an academic with an Erdös number of M has, at most, an Erdösinated Bacon number of M + 5.

Boo-yah !

10:34 AM, Tuesday 9/13/04:

3.5 years.....

Friday, September 10, 2004

PCs, new folks, and new authors: some data

My post about SODA generated all kinds of comments: further discussion is in the comment threads. One point that merited another post:

A commenter complained:

I have noticed that most of the accepted papers in SODA/STOC/FOCS seem to be from one of the IVY league schools or from one of the established Labs...

Also, why in the world do we see the same set of names on program committees with little/no permutations.

I'll address both these points in some detail, accompanying my comments with statistics provided by Adam Buchsbaum (SODA 2005 PC Chair). First, let's look at the complaint about author diversity:

1. Diversity of authors:
What follows is a frequency chart of affiliations for a random sample of 27 (20%) of the accepted papers at SODA 2005:

6 max plank inst.
5 mit
4 u. illinois (u-c), tel aviv u.
3 u. bergen (norway), technion, hebrew u.
2 uc berkeley, u. waterloo, u. paris-sud, rutgers, ohio state u.
1 utrecht u., unsw (sydney), uc irvine, u. tokyo, u. leeds, u. glasgow, u. chicago, u. aarhus, stanford, simon fraser u., microsoft, kings college, it u. copenhagen, eth zurich, comenius u., cmu, christian-albrect u., charles u., acad. sci. czech republic

seems fairly diverse to me. It should not be hard to compile such stats for other years/conferences as well...

Next, let's look at the issue of recycling commitee members:

2. PC "freshness":

There is a perception that newer folks have a hard time getting onto S/S/F committees. To remedy this, David Johnson, at the SODA 2004 business meeting, introduced a list of 'neverbeens'. This list is defined somewhat roughly as

People who have never served on a S/S/F committee and plausibly could, where "plausibly could" includes things like

out of school for at least some small amount of time

some reasonable number (>= 1, more is better) of conference papers

regular attendance at such conferences

He also requested that people who want to be on this list should email him. He then provides this list to commitee chairs to do with as they please.

But in reality, how skewed towards "frequent members" are program committees in reality ? Let's look at some numbers in detail.

Freshness of PCs:
For SODA 2005, roughly 1/2 (my earlier comment had said 1/3) of the PC members are new i.e have not served on a SODA/STOC/FOCS committee before (disclaimer: this set includes me). For SODA 2004, the corresponding number appears to be 1/3. Both numbers are reasonable, and might demarcate extreme ends of the "right" ratio.

Historical PC composition (all years):

1. SODA
Served exactly once: 130 people
Served exactly twice: 34
Served three times: 13
Served four times: 1

2. FOCS
1 time: 140
2: 52
3: 24
4: 14
5: 9
6: 4

3. STOC
1: 139
2: 49
3: 19
4: 8
5: 5
6: 2
8: 2

Overall: (SODA U FOCS U STOC)
1: 200
2: 84
3: 55
4: 34
5: 29
6: 5
7: 7
8: 9
9: 11
11: 1
12: 2
16: 1

(My take: SODA is better than STOC/FOCS at integrating new people, but not hugely so)

Adam further points out:

Translating these statistics to reflect recidivism in filling slots reveals that for each of the three conferences, a majority of the PC slots were filled by then-first timers. For all conferences as a group, the majority was filled by 1st- or 2nd-timers.

It thus seems to me that the perception of incestuousness among
FOCS, STOC, and SODA PCs is in fact a mis-perception. Still,
we have to address the mis-perception by ongoing action, not
simply historical statistics. David Johnson's maintenance of
a list of people who have never served on STOC, FOCS, and SODA
committees and a continuing effort to include new people are
positive forces.

Data is a good thing... If you don't like the stats here, do your own digging and I will post the results here.

The power of baseball...

So Tuesday I went to my first live baseball game, a Yankees game no less, courtesy of Adam. It was quite the cultural experience: I had 2 huge hot dogs, stood for the American National Anthem, stretched at the 7th inning stretch, even took cover from a ~~whizzing fly ball~~ screaming line drive (ed. baseball fans are picky...). Not to mention a full complement of home runs, double plays, intentional walks, grounds staff dancing to YMCA, and even a collision that left the catcher comatose for a few minutes.

The Yankees won 11-2 over Tampa, so there was good feeling all around. Clearly more good feeling than I expected....because....

Yesterday I got a letter from the Department of Homeland Security (nothing is guaranteed to strike more fear into a poor non-immigrant's heart) that said:

Your application for permanent resident status has been approved. Please take this notice, your Arrival/Departure rec....zzzzzz

As one might imagine, I tuned out at that point. I now have a green card !!! And I dare anyone to try and convince me that visiting a bonafide icon of the American cultural landscape had nothing to do with this: isn't this what the Patriot Act is for ?

Thursday, September 09, 2004

Data collection...

Chandra asks:

It would be nice to know by some data analysis if we
have more people submitting papers or more papers
being submitted.

and says:

Good survey papers are really needed. CRC handbooks
also seem to do some of this indirectly although I wish the authors put up their articles on the web - who is going to pay hefty sums for these gargantuan sized books?

So I have two responses to this:

firstly, on the data collection issue, it is not as hard as one thinks. the data is available: all we need are PC chairs/members willing to extract and anonymize data appropriately, so no confidentiality is breached (do I really want people to know that I submitted 15 papers to SODA last year and they were all rejected?).

Secondly, although it is not really academically rigorous, I think that even blogs have a small role to play in helping spread the word about interesting developments in the field. Lance has been posting surveys about interesting papers/areas for a long time now, and my recent post on Arrow's theorem forced me to learn at least a bit more about social choice theory than I previously knew (and hopefully had something useful for readers).

I don't think blogs can replace rigorous reviewed surveys though, and somehow ACM Computing Surveys doesn't have the cachet of the various math review journals. There was some site (not arxiv.org) that allowed people to discuss the papers that were posted - that's an interesting idea as well.

What I feel (and people who've been around longer can correct me on this) is that we are at a volume level where we need to explore new strategies for disseminating (as opposed to publishing) research. Chandra makes a good point when he says that people are too busy publishing to write surveys (which are essentially useless from the point of view of tenure/grant commitees etc). Maybe all the tenured faculty out there need to step up :)

Update: Jeff Erickson deservedly swats me upside the head with a copy of Geometric Range Searching and its Relatives (incidentally one of the most useful surveys that I have ever read), and points me to (at least for geometry) a good collection of surveys maintained by Sariel Har-Peled. But of course we knew that geometry folks were ahead of the curve...

More SODA...

The review process was interesting for me: the discussions and reviews exposes one to a diversity of opinion and perspectives that one doesn't always get in one-on-one discussions, and there is a lot to learn.

Probably even more importantly, it gives me a better sense of what kinds of papers get into SODA. With the competition being what it is (this year I believe is a record low for acceptance), the goal changes from finding papers to accept to finding papers to reject, which means that papers have be a lot more polished, a lot more relevant, and a lot more interesting (at least to some subset of reviewers) than before. A paper that is overall decent has a tough battle, because of the sheer size of the submitted set.

The main problem of scale we face (and have been doing so for a few years now), is the reviewer load, (and at 65+ papers/person, we do have an issue). The question that really comes up is: is it that the (roughly) same set of people are writing more papers, or is it that there are just more people in the SODA community ? The answer is important, because it governs how we address the issue of reviewer load.

If there are overall more people entering the community, then it makes sense to continue to increase commitee sizes, expand conference durations and/or number of papers accepted (by shortening presentation times) and other such strategies. If it is that people are writing more papers, it is less clear what one can do about this, other than starting new conferences/journals...

Returning to the issue of reviewer load, I recently heard two different radical strategies that attempt to deal with different aspects of the problem.

* To handle the case of recycled papers that go from conference to conference in the hope of finding reviewers that haven't seen them:
- Create a repository where all submitted papers are registered. When a paper is submitted to a conference, a link is set up, and then when reviews return, they are filed with the paper. If the paper is submitted again, the reviews are there to see.

Main cons: public flagellation of papers is never a good idea, and authors could always create new entries to defeat this scheme

* To handle the issue of review quality and the perception of unfairness
- once papers have been accepted, make reviews for accepted papers public (anonymously).

The Rieman Hypothesis is the key to the Apocalypse

Via Not Even Wrong, an excellent synopsis of progress to date on the Poincare Conjecture.

Also, a link to a truly pointless article in the Guardian. I can sympathize with the plight of science writers; explaining some of the more abstract concepts in mathematics can be extremely difficult. But can you at least not be completely wrong ?

[If]... somebody really has cracked the so-called Riemann hypothesis, financial disaster might follow. Suddenly all cryptic codes could be breakable. No internet transaction would be safe.

Sigh.. later on comes this attempt to explain the first of the seven Millenium problems:

Birch and Swinnerton-Dyer conjecture: Euclid geometry for the 21st century, involving things called abelian points and zeta functions and both finite and infinite answers to algebraic equations

Why even bother...

As an aside, does anyone know why all of a sudden there is renewed interest in the Poincare conjecture ? There was a survey at least 3-4 months ago in the Scientific American about Perelman's work, and his first preprint was posted in 2002.

Wednesday, September 08, 2004

Voting and Geometry

It would be remiss of me (this is the Geomblog, after all) not to remark on the connection between voting and geometry pointed out by a commenter on my earlier post.

Donald Saari has developed an interesting theory of rankings based on mapping rankings to points in simplices. An immediate application of this formulation is the observation that the Kemeny rule determines a ranking minimizes the l₁ distance to the original ranking schemes.

Getting something named after you...

In an earlier post, I talked about the Condorcet criterion. As it turns out, this was discovered by Llull nearly 500 years before Condorcet formulated it.

Voronoi diagrams were first invented by Descartes, and then by Snow, and then by Dirichlet, before being invented by Voronoi.

The Koebe-Andreev-Thurston theorem relating circle packings to planar graphs was first proved by Koebe in 1936. Not knowing this result, Andreev re-proved it in 1970, and subsequently Thurston, who had proved it again, realized that his results followed from Andreev's work. All of this and more on circle packings is described here (PDF).

There are probably many other examples that establish the following meta-theorem.

To get a result named after you, be the last person to prove it.

A corollary might very well be:

To get a result named after you, publicize it everywhere.

SODA 2005...

results are out. The authors have been notified and ~~a list should be available soon~~ is here. A total of 135 accepted papers (the maximum possible given time constraints) out of roughly 490 submissions.

Arrow's Theorem, Voting, and Ranking Schemes.

In the light of Lance's post about voting schemes, it seems a good time to mention Arrow's theorem, the "impossibility theorem" for voting schemes. Arrow's theorem (and the related area of social choice theory) have become useful tools in computer science as well over the past few years.

The setting for Arrow's theorem is a set of rankings. Individuals (voters) order a set of options (candidates) via preferences, and the goal of a ranking scheme (voting system) is to come up with a preference list that respects certain properties:

universality: the output should be a total order
non-imposition: every possible order should be achievable
non-dictatorship: the global preference should not merely follow one of the input rankings
monotonicity: raising the ranking of a choice cannot hurt it in the overall rankings
independence of irrelevant alternatives: the "spoiler" condition - rankings of options outside a given subset should not affect the ordering within the subset.

Arrow showed that all of these cannot be satisfied simultaneously (with at least two voters and three options). Incidentally, Arrow proved this in his Ph.D thesis, and it is one of the contributions mentioned in his Nobel Prize citation from 1972. His major work is most notably in the realm of pricing, and the Arrow-Debreu theorem on market pricing is a fundamental result that is familiar to people working in the area of mechanism design and auctions.

But social choice theory itself has been exploited in a CS context. To understand this, we need another notion, the Condorcet criterion.

One of the criticisms of Arrow's theorem is that the "spoiler" condition is too stringent, and other (weaker) conditions have been proposed that allow for a general voting procedure to exist. The most notable scheme is one that was developed nearly 800 years ago, and was rediscovered by Condorcet in the 18th century (which leads to a topic for another post). The idea is to create a ranking by looking at a series of face-offs. The relative ranking of options A and B is determined by seeing how many voters rank A above B, and taking the majority opinion. This is done for all pairs.

The Condorcet winner is then the option that in a head-to-head matchup beats all other options. Note that a Condorcet winner may not always exist, and this is the problem with using the Condorcet method in its basic form for elections.

The Condorcet criterion is a condition on a ranking scheme.

If any element beats all other elements in head-to-head matchups, it should be ranked first.

A generalization of this, the extended Condorcet criterion, states that:

If there is a partition (C, D) of the set S of options such that for each x in C and each y in D, x beats y head-to-head, then x should be ranked above y.

So what does all of this have to with computer science ? The application comes from the problem of merging ranked lists. The most common example of this arises when doing meta searches in a number of search engines. If Google ranks pages a certain way and Teoma does it a different way, how should the metasearch engine present results ?

The Condorcet criteria provide a condition that any reasonable ranking scheme must satisfy. The algorithms are generated by defining a metric on rankings, after which the consensus ranking is a good "center point" in the induced metric space. Specifically, if we define the "distance" between two rankings as the "bubble-sort" distance between them, or the number of pairs on which they disagree, then the Kemeny optimal ranking is the ranking that minimizes the average distance to all the input rankings. A nice property of the Kemeny optimal ranking is that it is the unique ranking satisfying the extended Condorcet criterion while having other desirable properties as well.

All of this is explained in some detail in the pioneering paper by Dwork, Kumar, Naor and Sivakumar. When I was at VLDB I saw more examples of these methods being used to merge the results of multiple rankings. I should add that the problem of merging different rankings comes up in many different settings, so it is worth knowing the background in social choice theory.

Monday, September 06, 2004

NIH Proposes Free Access For Public to Research Data

This opens up a new area in the evolving landscape of post-internet publishing:

The National Institutes of Health has proposed a major policy change that would require all scientists who receive funding from the agency to make the results of their research available to the public for free.

The proposal, posted on the agency's Web site late Friday and subject to a 60-day public comment period, would mark a significant departure from current practice, in which the scientific journals that publish those results retain control over that information. Subscriptions to those journals can run into the thousands of dollars. Nonsubscribers wishing to get individual articles must typically pay about $30 each -- fees that can quickly add up for someone trying to learn about a newly diagnosed disease in the family.

An interesting idea: Since most CS funding also comes from the NSF and the DoD (both tax-funded agencies), such an idea could easily float our way.

Sunday, September 05, 2004

Sneakernet is validated !

A new article in the NYTimes confirms what we all learnt in networking 101:

...the problem will be familiar to anyone who ever spent time shrinking a digital photograph before trying to send it over the Internet through a dial-up connection. It would be much easier to drive a truck of photo albums across town or put them in an overnight-mail box than to go through the process of scanning and shrinking each photo.

The discussion came up in the discussion of interstellar communication. People at SETI were not amused:

The paper, Nature's cover article, is being received with bemusement by veterans of the Search for Extraterrestrial Intelligence, or SETI.

Dr. Paul Horowitz, a Harvard physicist and SETI expert, called it "a fun and an enjoyable read, but I wouldn't turn off my radio telescope and go out with my butterfly net."

Pages