Monday, September 27, 2004

Knowledge via democracy

Pharyngula has been frothing at the mouth (quite justifiably) about the latest issue of Wired that covers the intelligent design controversies raging in the public school system. As he points out, the core strategy of the (un)ID folks is to create a debate and then use its existence to establish legitimacy: "see, we are disagreeing, as rational people will about a controversial topic".
The debate's two-on-two format, with its appearance of equal sides, played right into the ID strategy—create the impression that this very complicated issue could be seen from two entirely rational yet opposing views. "This is a controversial subject," Meyer told the audience. "When two groups of experts disagree about a controversial subject that intersects with the public-school science curriculum, the students should be permitted to learn about both perspectives. We call this the 'teach the controversy'
approach."[1]
What is saddest is comments like this coming from school board members:
Another board member, Deborah Owens-Fink, declares the issue already closed. "We've listened to experts on both sides of this for three years," she says. Ultimately, the question of what students should learn "is decided in a democracy, not by any one group of experts."
We know what happens to the value of pi when democracy decides knowledge...

Notes:
  1. Larry Kraus was interviewed in Scientific American about his experiences dealing with the school board mentioned above.
  2. I find it even more tragic that at a time when a grounding in biology appears to be key to having a part in the biotech revolution, these people want to gut the foundations of scientific education in this country.

Saturday, September 25, 2004

Academics locked out by tight visa controls

It may be obviously true, but it needs to be said, and I am glad more people are saying it. Here is Bruce Schneier in the San Jose Mercury News:
Security is always a trade-off, and specific security countermeasures affect everyone, both the bad guys and the good guys. The new U.S. immigration rules may affect the few terrorists trying to enter the United States on visas, but they also affect honest people trying to do the same.

All scientific disciplines are international, and free and open information exchange -- both in conferences and in academic programs at universities -- will result in the maximum advance in the technologies vital to homeland security. The Soviet Union tried to restrict academic freedom along national lines, and it didn't do the country any good. We should try not to follow in those footsteps.

Friday, September 24, 2004

As the weekend approaches....

Your moment of Zen, courtesy OxBlog:
From a distinguished guest lecturer at YLS, on the different benefits of private practice vs. government work vs. academia:

Money can't buy you job satisfaction -- that is true -- but it can buy you lots of other cool stuff!

Thursday, September 23, 2004

Geometric Probability

Probability is a tricky beast, and even more tricky when it comes to analyzing geometric data. The problems arise from how to define accurate measures over geometric spaces; the Buffon needle problem is a classic example of needing to do this kind of calculation carefully. A short (but dense) monograph by Klain and Rota starts with the Buffon needle problem and delves deeper into notions of valuation (a variant of measure) and how to define them correctly, the key problem being the issue of invariance: how to define measures that are invariant under geometric transformations (read this review (PDF)).

To illustrate some examples of the "weirdness" of geometric probability, consider the following three results:
  1. If we pick n points at random from the unit square, the expected size of the convex hull of these points is log n.
  2. If we pick n points at random from a k-gon, the expected size of the convex hull is k log n
  3. If we pick n points at random from a unit disk, the expected size of the convex hull is n1/3 !
All of these results are collected in a nice manuscript by Sariel Har-Peled.

But a result that I found even more surprising is this one, by Golin, Langerman and Steiger.
An arrangement of n lines in R2 induces a vertex set (all intersection points) whose convex hull has expected complexity O(1) !
Their result uses a fact that is well known (Feller vol. 2), but is still remarkably elegant.
Choose points x1, x2, ..., xn at random on the unit interval. Compute their sorted order x(1), x(2), ... x(n), the order statistic. Then the variable x(k) is distributed (roughly) as the sum of k i.i.d exponentially distributed random variables.
Specifically this means that the minimum is distributed as an exponential distribution, and the rest are distributed as the appropriate gamma distributions.

Bill Steiger recently gave a talk at the NYU Geometry seminar. I could not attend, but the abstract of his talk indicates that the above result holds even in R3.



Update: I should clarify that the O(1) result mentioned above was first proved by Devroye and Toussaint; their distribution on lines is different, and their proof is more complex. The above paper presents a simpler proof for an alternate distribution.

My internet fame score is

12 millibrooksies...

Lance, on the other hand, has a score of 47 mb

And Jeff has the whopping score of 760mb. But then again, who is Jeff ?

Wednesday, September 22, 2004

Ernie shows remarkable restraint...

A few months ago, Ernie (of 3D Pancakes) commented on the (now infamous) Annie Jacobsen episode by mentioning his own heart-stopping experience with the dark side of Islamic terrorism:

Like Annie Jacobsen, I recently took a long Northwest Airlines flight, #53 from Amsterdam to Detroit, on which I observed several Middle Eastern passengers. Including one who sat down immediately next to me! Blocking my access to the aisle! Who then began reading a book full of Arabic! And looking nervously around the plane! Oh my god, oh my god, help me, we're all going to DIE! EEAHEEAHEEAHEEAHEEAH!

As I girded my metaphorical loins in anticipation of poisoning my seatmate to death with the remains of my so-called "chicken" "dinner", she turns to me and asks where I'm from. Turns out she was flying to Nashville, my hometown, where she is a nursing student and her husband a mechanical engineer, after a two-month visit to her family in Tehran. We had a lovely chat about living in the South, and the current troubles in Iraq. She showed me several pictures of her family at various gatherings (a birthday party, her cousin's engagement), including several fascinating pictures of the ruins at Persepolis. We landed in Detroit without indicent [sic] and wished each other well.

As he mentions in the comments:
...she wasn't actually reading Arabic, but Farsi. (It was a book about Persepolis.) But the differences between the Farsi, Urdu, and Arabic alphabets are pretty subtle to uneducated westerners like myself, not to mention all those other languages that use Arabic scripts. For all I knew, she could have been reading Turkish or Kashmiri.
Indeed. Consider the case of the friendly folk of Midwest Airlines, who sadly didn't show the same level of restraint:
Midwest Airlines canceled a flight ready to take off for San Francisco after a passenger found Arabic-style handwriting in the company's in-flight magazine and alerted the crew.

The plane, carrying 118 passengers and five crew members, had already pulled away from the gate at Mitchell International Airport Sunday evening. It returned to the gate, the passengers got off, security authorities were notified, all luggage was checked and the aircraft was inspected. Nothing was found. The passengers were put up in nearby hotels and booked on a Monday morning flight.

The writing was in Farsi, the language used in Iran, said airline spokeswoman Carol Skornicka. She said she didn't know exactly what the writing said but was similar to a prayer, ``something of a contemplative nature.''

Monday, September 20, 2004

'Legs of Death'

I think we don't give Lance enough credit for blogging from the war-zone that is Chicago:
Chicago resident Guarav[sic] Bhatia, 25, is a man of science, a chemical engineer working on his second master's degree at the Illinois Institute of Technology.

Scientists are logical people. So he's puzzled about what happened to him on the CTA the other day.

"I've got a problem," he told me on Wednesday. "I'm accused of breaking the law."

Did you dare munch a jelly doughnut on a CTA train? I mentioned this because the CTA rents space to doughnut shops, but prohibits the munching of doughnuts by commuters.

"No, I'm accused of sleeping," Bhatia said. "And now I'm going to have to go to court!"
But wait ! There's more:(emphasis mine)
Officially, [the city is] sticking with the same story they had the other day, that he violated a CTA ordinance by obstructing the operation of a train.

The new spin is that he was not simply sleeping, something common on the CTA, but that he was--get this--sleeping dangerously.
Gaurav is now being called 'Legs of Death' Bhatia. We shall leave him with the last word:
As for "Legs of Death" Bhatia, he explained that he had his face against the train window, so he couldn't have stretched into the aisle, sleeping dangerously as it were.

"It would have been physically impossible," he said. "Even Keanu Reeves from `Matrix' could not do it."




(Via Sepia Mutiny)

Update: Sepia Mutiny announces that Gaurav has now beaten the rap !

The Research Is Not Funny !

From the Annals of Improbable Research:

"The research is not funny at all! ... Dr Stone has just completed initial fieldwork at South Georgia. His preliminary results show that no King Penguins toppled over when overflown by the Lynx helicopter."

This announcement brought to you by the British Antarctic Society.

Sunday, September 19, 2004

The self-absorption of science.

Many months ago, Scott Aaronson initiated a discussion on Lance's blog about the appropriate amount of time to spend on one's research (quick answer: all). Lance was quick to adjust this in a later post, saying "Your success in academics, like any professional endeavor, depends in part on how much effort you put into it with the relationship far more than linear. But by no means is social life and a productive research career incompatible."

I was reminded of this discussion whilst reading a book that excerpted small sections from Charles Darwin's autobiography. Here is Darwin reflecting on his absorption in science:
My mind seems to have become a kind of machine for grinding laws out of large discussions of facts...Up to the age of thirty, or beyond it, poetry of many kinds ... gave me great pleasure... Pictures gave me considerable, and music very great delight. But now for many years I cannot endure to read a line of poetry; I have tried lately to read Shakespeare and have found it so intolerably dull that it nauseated me... Music generally sets me thinking too energetically on what I have been at work on....
Darwin goes on to theorize that a lack of "poetic stimulation" as it were atrophied the sections of his brain that would have appreciated the arts, and felt that if he could have lived his life over again, he would have spent more time cultivating artistic pursuits, because
The loss of these tastes is a loss of happiness, and may be injurious to the intellect, and more probably to the moral character by enfeebling the emotional part of our nature.
Especially in more theoretical areas of computer science, it is the easiest thing in the world to become absorbed in your work: all you really need is a pen and paper. There is also a thrilling sense of abandonment, of being swallowed whole by a collection of lines, or a manifold, or a linear program, as one delves deeper and deeper into the mysteries of one's work. Darwin's sense of regret as he looks back is an interesting perspective on such a life of absorption. I don't think that everyone would agree, but it is a sentiment worth keeping in mind.

Saturday, September 18, 2004

Experimental Work in SODA

A reader writes:
Let me suggest another [topic]--the attitude towards experimental algorithmic work, which I think is a recurring issue. The CFP [for SODA] often advocates this theme but then the committee is likely to not appreciate so much the experimentations.

As theoreticians, it is probably not so impressive to us that one can beat the known worst-case analysis in certain data sets (particularly simulated data). Perhaps we are not used to evaluate experimental work and we rarely think about the kind of experiments that would significantly benefit our theoretical work.
David Johnson has written a Theoretician's Guide to the Experimental Evaluation of Algorithms (PDF)(link fixed). The first time I saw this, it was a short 7 page guide, and now it has expanded to a 36 page review. I highly recommend reading it, and keeping a copy around when you start planning your experiments. The article addresses all the points (and more) that I would have hoped to make, so I'll merely excerpt a few points:

1. Understand what your paper is about:
–
  • Implementation for a specific application
  • A 'horse-race' paper, comparing algorithm/implementation to others
  • Exploring the strengths/weaknesses etc of various algorithmic ideas in practice
  • Generating conjectures about average case behaviour when formal probabilistic analysis is too hard
The first two cases are the most common kinds of implementations, but it is the third one that David suggests is the most interesting from an experimental algorithmics perspective. Indeed, usually papers that combine the first two characteristics are best suited for the area the application came from, rather than a SODA/ALENEX....

2. What makes a paper interesting:

...the results should yield broader insights, either in terms of asymptotics or presumption of robustness.... (this is one way in which an experimental paper aspires to higher goals than a horse race paper...)
3. On reproducibility:
... what does "reproducibility" mean in the context of the experimental analysis of algorithms? In the strictest sense it means that if you ran the same code on the same instances on the same machine/compiler/operating system/system load combination you would get the same running time, operation counts, solution quality (or the same averages, in the case of a randomized algorithm).

This, however, would not be very informative, nor does it correspond to what is meant by "reproducibility" in classical scientific studies. In reproducing such a study, a scientist must use the same basic methods, but will typically use different apparatus, similar but distinct materials, and possibly different measuring techniques. He will be said to have reproduced the original results if the data he obtains is consistent with that of the original experiments and supports the same conclusions. In this way he helps confirm that the results of the original experiment (and the conclusions drawn from them) are independent of the precise details of the experiment itself.

It is this broader notion of reproducibility that I consider to be the most important.
4. On comparability:
You should write your papers (and to the extent possible make relevant data, code, and instances publicly available in a long-term manner) so that future researchers (yourself included) can accurately compare their results for new algorithms/instances to your results. Part of this is simply following the above recommendations for ensuring reproducibility, but there are additional steps necessary.
He goes on to list some issues: the uncalibrated machine, the lost testbed, lost code/data.

Finally, the article lists things you cannot trust:
1. Never trust a random number generator.
2. Never trust your code to be correct.
3. Never trust a previous author to have known all the literature.
4. Never trust your memory as to where you put that data (and how it was generated).
5. Never trust your computer to remain unchanged.
6. Never trust backup media or Web sites to remain readable indefinitely.
And most memorably,

7. Never trust a so-called expert on experimental analysis.

The advice in this paper, if taken, will lead to an excellent piece of experimental work. It does require a lot more work than merely coding up an algorithm, and I think this is something to keep in mind. I have come to realize that unlike when writing a theoretical paper, when the timeline from initial problem conception to paper writing can be as little as a few days depending on the problem, experimental work always takes a lot longer and requires a lot more care to do and to write about. It's worth keeping that in mind; don't expect a published work to come out of an implementation merely by the act of implementing.

Frame the question this way: what can someone take away from reading your paper ? In what way will it guide their work ? When is your empirical work applicable to their situation, and when not ? I actually don't think theory committees have great difficulty evaluating experimental work; it's that I think experimental algorithmics has some ways to go to catch up in quality to the (justified) expectations of it. Over the years, the quality of papers in ALENEX has steadily grown, and I see no reason why we can't get better.

A bit of historical context: SODA has always been interested in examining experimental work. This excerpt comes from the CFP for SODA 1990 (the first one); added emphasis is mine. You can't find it on the web; I had to parse a troff document from David !
Papers are solicited that, by mathematical or experimental analysis, address the ways in which resource usage grows with increasing problem size, with ``resource'' interpreted broadly to include, for instance, ``nearness to optimality'' as well as such traditional measures as running time, storage, number of processors, and amount of communication. Mathematical analysis may be worst-case or probabilistic. Lower bound results are appropriate if they apply to real algorithmic problems.

Experimental papers can address a variety of issues, but special consideration will be given to those that put theoretical results into practical perspective or suggest new avenues for theoretical investigation.

Friday, September 17, 2004

5-yr anniversaries

I just got my 5-year anniversary congratulations letter in the mail. For those who are curious, it starts off 'Dear Fellow Employee', and goes downhill from there. In what is possibly the most bizarre way of felicitating employees, I was asked to go to a website to "purchase" this letter (for free, thankfully), with all the associated "add to shopping cart" and "check out" activities.

As always, Dilbert has the best take on the situation:


Intellectuals and anti-intellectualism

As I spent more time in the US, an interesting paradox began to present itself to me. On the one hand, people (like me) came to this country because of its role as the driver of technological advancements, as well as its ability to nurture and sustain a dynamic intellectual culture in the academic system. On the other hand, it became more and more apparent that a streak of anti-intellectualism runs fairly close below the surface of American life, manifesting itself in such things as the attitude towards "smarts" in public schools, the focus on practical experience as opposed to "book-learning", and even in undercurrents of the 2000 and 2004 elections (It is not that such phenomena are lacking in the schools I grew up in; it just seemed contrary to what I had viewed the American education system as, before I came here)

One of the definitive texts that talked about this aspect of American life was Richard Hofstadter's Anti-Intellectualism in American Life. It traces back American history to tbe beginning of the colonies, and links this phenomenon to religious and cultural upheavals over the centuries. The book was written in 1966 and thus is limited in its analysis of the current state of affairs in this country, but had a number of insightful observations outlining what appeared to be a strong and long-running tradition (much more so that I had realized).

A more recent book by Franklin Furedi, 'Where Have All the Intellectuals Gone?', discusses the demise of the "intellectual" in modern society (where by 'intellectual' I infer that he means more than an 'academic'. An interesting point that it brings out is the connection between postmodern relativism and the democratization of knowledge in the destruction of the intellectual by paradoxically, placing him front and center. Here is an excerpt from Terry Eagleton's review in the New Statesman (via Oxblog):
A society obsessed with the knowledge economy, Furedi argues, is oddly wary of knowledge. This is because truth is no longer precious for its own sake. Indeed, the idea of doing something just for the hell of it has always put the wind up philistine utilitarians, from Charles Dickens's Mr Gradgrind to our own Mr Blair. At an earlier stage of capitalism, knowledge was not so vital for economic production; once it becomes so, it turns into a commodity, while critical intellectuals turn into submissive social engineers. Now, knowledge is valuable only when it can be used as an instrument for something else: social cohesion, political control, economic production. In a brilliant insight, Furedi claims that this instrumental downgrading of knowledge is just the flip side of postmodern irrationalism. The mystical and the managerial are secretly in cahoots.
In other words, when everyone is in the business of producing knowledge, the pursuit of knowledge becomes a business, in a way that is antithetical to what the pursuit of knowledge should be: something done "just for the hell of it".

In our everyday lives as researchers, we see examples of this all the time, especiallly when having to either justify working on a problem, or when finding problems to investigate. I would go as far as to claim that the loss of public stature of NASA after the moon landings is caused by a serious lack of vision, of the kind that can inspire and elevate a discourse, and a mindless focus on minutiae of shuttle missions and space stations.

Wednesday, September 15, 2004

Academic classifications...

Erdös's vocabulary for people is famous:
  • Epsilon: a child
  • Joe/Sam: Soviet Union/USA
  • Bosses/Slaves: Wives/Husbands
  • Dead: Stopped doing math
and so on...

A funny description of economists (via this comment on danieldrezner.com):

Rudi [Dornbusch] was a very funny, colorful speaker of English, with a gift for offbeat but perfect turns of phrase. He had a classification system for economists, depending on their research style. "Goldsmiths" were careful, meticulous workers - which Rudi admired. "Pigs" just sort of jumped into an issue and wallowed around. But that was OK too, if it was done with sufficient vigor and originality. Rudi described Larry Summers as a "fearful pig" - and it was a compliment.

On the other hand, "plumbers" were economists who devised intricate contraptions with no clear purpose. I won't tell you who he described as "dreadful plumbers", but he was right.

Some other interesting modes of academic research:
  • craftsman: worked alone, or with one or two colleagues, to carefully write papers and books.
  • bureaucrat: have grants, research assistants and a large network of co-authors. This kind of scholar is more like an architect - he designs the overall project, but an army of helpers puts together the final project.
  • recycler: academics who come up with one big theoretical idea, and then try to use that idea to explain every possible phenomenon under the sun.
  • importer: academic who engages in intellectual arbitrage. They develop an expertise outside their disciplinary boundaries, and then import the ideas, paradigms, and analytical tools culled from these outside areas to explain phenomenon within their discipline.
  • theocrat: This academic develops (possibly through a "Bureaucrat" research process) a belief system which then not only infuses everything he, but also causes him to dismiss the work of others. Those other academics, he argues, are wrong and (1) need to study his work more; (2) haven't studied enough to realize he's right; and/or (3) have attempted to replicate his work but aren't skilled enough to do so.
  • compiler: the academic whose most important contribution is to assemble a textbook, which puts his field all in one place at the fingertips of his colleagues. Often this involves taking dense academic research done by others, and making it tractable. 15 editions later, he is universally cited as one of the Greats of his time.
And my own contribution:
  • hoarder: has access to some key piece of data/equipment/resource that is hard-to-impossible to recreate, and exchanges access to it for authorship on papers.
I should add that not all of these are pejorative labels. Recycling and importing can be very fruitful research modes, just to name a few.

Tuesday, September 14, 2004

Theory Meets Hollywood

Benjamin Rosenbaum (via BoingBoing) develops a theory of Bacon-Erdös numbers:
...the preliminary work has already been done. Brian Greene, for instance, has an Erdös number of 3, and a Bacon number of 2. Thus, my proposed conversion function (allowing edges in the unified Bacon-Erdös graph to represent two people either appearing together in a movie or coauthoring a paper) is as follows:

Finding: an actor with a Bacon number of N has, at most, a Baconized Erdös number of N+5. Similarly, an academic with an Erdös number of M has, at most, an Erdösinated Bacon number of M + 5.


Boo-yah !

10:34 AM, Tuesday 9/13/04:

GC stamp

3.5 years.....

Friday, September 10, 2004

PCs, new folks, and new authors: some data

My post about SODA generated all kinds of comments: further discussion is in the comment threads. One point that merited another post:

A commenter complained:
I have noticed that most of the accepted papers in SODA/STOC/FOCS seem to be from one of the IVY league schools or from one of the established Labs...

Also, why in the world do we see the same set of names on program committees with little/no permutations.
I'll address both these points in some detail, accompanying my comments with statistics provided by Adam Buchsbaum (SODA 2005 PC Chair). First, let's look at the complaint about author diversity:

1. Diversity of authors:
What follows is a frequency chart of affiliations for a random sample of 27 (20%) of the accepted papers at SODA 2005:

6 max plank inst.
5 mit
4 u. illinois (u-c), tel aviv u.
3 u. bergen (norway), technion, hebrew u.
2 uc berkeley, u. waterloo, u. paris-sud, rutgers, ohio state u.
1 utrecht u., unsw (sydney), uc irvine, u. tokyo, u. leeds, u. glasgow, u. chicago, u. aarhus, stanford, simon fraser u., microsoft, kings college, it u. copenhagen, eth zurich, comenius u., cmu, christian-albrect u., charles u., acad. sci. czech republic

seems fairly diverse to me. It should not be hard to compile such stats for other years/conferences as well...

Next, let's look at the issue of recycling commitee members:

2. PC "freshness":

There is a perception that newer folks have a hard time getting onto S/S/F committees. To remedy this, David Johnson, at the SODA 2004 business meeting, introduced a list of 'neverbeens'. This list is defined somewhat roughly as
People who have never served on a S/S/F committee and plausibly could, where "plausibly could" includes things like
  • out of school for at least some small amount of time
  • some reasonable number (>= 1, more is better) of conference papers
  • regular attendance at such conferences
He also requested that people who want to be on this list should email him. He then provides this list to commitee chairs to do with as they please.

But in reality, how skewed towards "frequent members" are program committees in reality ? Let's look at some numbers in detail.

Freshness of PCs:
For SODA 2005, roughly 1/2 (my earlier comment had said 1/3) of the PC members are new i.e have not served on a SODA/STOC/FOCS committee before (disclaimer: this set includes me). For SODA 2004, the corresponding number appears to be 1/3. Both numbers are reasonable, and might demarcate extreme ends of the "right" ratio.

Historical PC composition (all years):

1. SODA
Served exactly once: 130 people
Served exactly twice: 34
Served three times: 13
Served four times: 1

2. FOCS
1 time: 140
2: 52
3: 24
4: 14
5: 9
6: 4

3. STOC
1: 139
2: 49
3: 19
4: 8
5: 5
6: 2
8: 2

Overall: (SODA U FOCS U STOC)
1: 200
2: 84
3: 55
4: 34
5: 29
6: 5
7: 7
8: 9
9: 11
11: 1
12: 2
16: 1

(My take: SODA is better than STOC/FOCS at integrating new people, but not hugely so)

Adam further points out:
Translating these statistics to reflect recidivism in filling slots reveals that for each of the three conferences, a majority of the PC slots were filled by then-first timers. For all conferences as a group, the majority was filled by 1st- or 2nd-timers.
It thus seems to me that the perception of incestuousness among
FOCS, STOC, and SODA PCs is in fact a mis-perception. Still,
we have to address the mis-perception by ongoing action, not
simply historical statistics. David Johnson's maintenance of
a list of people who have never served on STOC, FOCS, and SODA
committees and a continuing effort to include new people are
positive forces.
Data is a good thing... If you don't like the stats here, do your own digging and I will post the results here.

The power of baseball...

So Tuesday I went to my first live baseball game, a Yankees game no less, courtesy of Adam. It was quite the cultural experience: I had 2 huge hot dogs, stood for the American National Anthem, stretched at the 7th inning stretch, even took cover from a whizzing fly ball screaming line drive (ed. baseball fans are picky...). Not to mention a full complement of home runs, double plays, intentional walks, grounds staff dancing to YMCA, and even a collision that left the catcher comatose for a few minutes.

The Yankees won 11-2 over Tampa, so there was good feeling all around. Clearly more good feeling than I expected....because....

Yesterday I got a letter from the Department of Homeland Security (nothing is guaranteed to strike more fear into a poor non-immigrant's heart) that said:
Your application for permanent resident status has been approved. Please take this notice, your Arrival/Departure rec....zzzzzz
As one might imagine, I tuned out at that point. I now have a green card !!! And I dare anyone to try and convince me that visiting a bonafide icon of the American cultural landscape had nothing to do with this: isn't this what the Patriot Act is for ?

Thursday, September 09, 2004

Data collection...

Chandra asks:
It would be nice to know by some data analysis if we
have more people submitting papers or more papers
being submitted.
and says:
Good survey papers are really needed. CRC handbooks
also seem to do some of this indirectly although I wish the authors put up their articles on the web - who is going to pay hefty sums for these gargantuan sized books?
So I have two responses to this:

firstly, on the data collection issue, it is not as hard as one thinks. the data is available: all we need are PC chairs/members willing to extract and anonymize data appropriately, so no confidentiality is breached (do I really want people to know that I submitted 15 papers to SODA last year and they were all rejected?).

Secondly, although it is not really academically rigorous, I think that even blogs have a small role to play in helping spread the word about interesting developments in the field. Lance has been posting surveys about interesting papers/areas for a long time now, and my recent post on Arrow's theorem forced me to learn at least a bit more about social choice theory than I previously knew (and hopefully had something useful for readers).

I don't think blogs can replace rigorous reviewed surveys though, and somehow ACM Computing Surveys doesn't have the cachet of the various math review journals. There was some site (not arxiv.org) that allowed people to discuss the papers that were posted - that's an interesting idea as well.

What I feel (and people who've been around longer can correct me on this) is that we are at a volume level where we need to explore new strategies for disseminating (as opposed to publishing) research. Chandra makes a good point when he says that people are too busy publishing to write surveys (which are essentially useless from the point of view of tenure/grant commitees etc). Maybe all the tenured faculty out there need to step up :)



Update: Jeff Erickson deservedly swats me upside the head with a copy of Geometric Range Searching and its Relatives (incidentally one of the most useful surveys that I have ever read), and points me to (at least for geometry) a good collection of surveys maintained by Sariel Har-Peled. But of course we knew that geometry folks were ahead of the curve...

More SODA...

The review process was interesting for me: the discussions and reviews exposes one to a diversity of opinion and perspectives that one doesn't always get in one-on-one discussions, and there is a lot to learn.

Probably even more importantly, it gives me a better sense of what kinds of papers get into SODA. With the competition being what it is (this year I believe is a record low for acceptance), the goal changes from finding papers to accept to finding papers to reject, which means that papers have be a lot more polished, a lot more relevant, and a lot more interesting (at least to some subset of reviewers) than before. A paper that is overall decent has a tough battle, because of the sheer size of the submitted set.

The main problem of scale we face (and have been doing so for a few years now), is the reviewer load, (and at 65+ papers/person, we do have an issue). The question that really comes up is: is it that the (roughly) same set of people are writing more papers, or is it that there are just more people in the SODA community ? The answer is important, because it governs how we address the issue of reviewer load.

If there are overall more people entering the community, then it makes sense to continue to increase commitee sizes, expand conference durations and/or number of papers accepted (by shortening presentation times) and other such strategies. If it is that people are writing more papers, it is less clear what one can do about this, other than starting new conferences/journals...

Returning to the issue of reviewer load, I recently heard two different radical strategies that attempt to deal with different aspects of the problem.

* To handle the case of recycled papers that go from conference to conference in the hope of finding reviewers that haven't seen them:
- Create a repository where all submitted papers are registered. When a paper is submitted to a conference, a link is set up, and then when reviews return, they are filed with the paper. If the paper is submitted again, the reviews are there to see.

Main cons: public flagellation of papers is never a good idea, and authors could always create new entries to defeat this scheme

* To handle the issue of review quality and the perception of unfairness
- once papers have been accepted, make reviews for accepted papers public (anonymously).


The Rieman Hypothesis is the key to the Apocalypse

Via Not Even Wrong, an excellent synopsis of progress to date on the Poincare Conjecture.

Also, a link to a truly pointless article in the Guardian. I can sympathize with the plight of science writers; explaining some of the more abstract concepts in mathematics can be extremely difficult. But can you at least not be completely wrong ?
[If]... somebody really has cracked the so-called Riemann hypothesis, financial disaster might follow. Suddenly all cryptic codes could be breakable. No internet transaction would be safe.
Sigh.. later on comes this attempt to explain the first of the seven Millenium problems:
Birch and Swinnerton-Dyer conjecture: Euclid geometry for the 21st century, involving things called abelian points and zeta functions and both finite and infinite answers to algebraic equations
Why even bother...

As an aside, does anyone know why all of a sudden there is renewed interest in the Poincare conjecture ? There was a survey at least 3-4 months ago in the Scientific American about Perelman's work, and his first preprint was posted in 2002.

Wednesday, September 08, 2004

Voting and Geometry

It would be remiss of me (this is the Geomblog, after all) not to remark on the connection between voting and geometry pointed out by a commenter on my earlier post.

Donald Saari has developed an interesting theory of rankings based on mapping rankings to points in simplices. An immediate application of this formulation is the observation that the Kemeny rule determines a ranking minimizes the l1 distance to the original ranking schemes.

Getting something named after you...

In an earlier post, I talked about the Condorcet criterion. As it turns out, this was discovered by Llull nearly 500 years before Condorcet formulated it.

Voronoi diagrams were first invented by Descartes, and then by Snow, and then by Dirichlet, before being invented by Voronoi.

The Koebe-Andreev-Thurston theorem relating circle packings to planar graphs was first proved by Koebe in 1936. Not knowing this result, Andreev re-proved it in 1970, and subsequently Thurston, who had proved it again, realized that his results followed from Andreev's work. All of this and more on circle packings is described here (PDF).

There are probably many other examples that establish the following meta-theorem.

To get a result named after you, be the last person to prove it.

A corollary might very well be:

To get a result named after you, publicize it everywhere.

SODA 2005...

results are out. The authors have been notified and a list should be available soon is here. A total of 135 accepted papers (the maximum possible given time constraints) out of roughly 490 submissions.

Arrow's Theorem, Voting, and Ranking Schemes.

In the light of Lance's post about voting schemes, it seems a good time to mention Arrow's theorem, the "impossibility theorem" for voting schemes. Arrow's theorem (and the related area of social choice theory) have become useful tools in computer science as well over the past few years.

The setting for Arrow's theorem is a set of rankings. Individuals (voters) order a set of options (candidates) via preferences, and the goal of a ranking scheme (voting system) is to come up with a preference list that respects certain properties:
  • universality: the output should be a total order
  • non-imposition: every possible order should be achievable
  • non-dictatorship: the global preference should not merely follow one of the input rankings
  • monotonicity: raising the ranking of a choice cannot hurt it in the overall rankings
  • independence of irrelevant alternatives: the "spoiler" condition - rankings of options outside a given subset should not affect the ordering within the subset.
Arrow showed that all of these cannot be satisfied simultaneously (with at least two voters and three options). Incidentally, Arrow proved this in his Ph.D thesis, and it is one of the contributions mentioned in his Nobel Prize citation from 1972. His major work is most notably in the realm of pricing, and the Arrow-Debreu theorem on market pricing is a fundamental result that is familiar to people working in the area of mechanism design and auctions.

But social choice theory itself has been exploited in a CS context. To understand this, we need another notion, the Condorcet criterion.

One of the criticisms of Arrow's theorem is that the "spoiler" condition is too stringent, and other (weaker) conditions have been proposed that allow for a general voting procedure to exist. The most notable scheme is one that was developed nearly 800 years ago, and was rediscovered by Condorcet in the 18th century (which leads to a topic for another post). The idea is to create a ranking by looking at a series of face-offs. The relative ranking of options A and B is determined by seeing how many voters rank A above B, and taking the majority opinion. This is done for all pairs.

The Condorcet winner is then the option that in a head-to-head matchup beats all other options. Note that a Condorcet winner may not always exist, and this is the problem with using the Condorcet method in its basic form for elections.

The Condorcet criterion is a condition on a ranking scheme.
If any element beats all other elements in head-to-head matchups, it should be ranked first.
A generalization of this, the extended Condorcet criterion, states that:
If there is a partition (C, D) of the set S of options such that for each x in C and each y in D, x beats y head-to-head, then x should be ranked above y.
So what does all of this have to with computer science ? The application comes from the problem of merging ranked lists. The most common example of this arises when doing meta searches in a number of search engines. If Google ranks pages a certain way and Teoma does it a different way, how should the metasearch engine present results ?

The Condorcet criteria provide a condition that any reasonable ranking scheme must satisfy. The algorithms are generated by defining a metric on rankings, after which the consensus ranking is a good "center point" in the induced metric space. Specifically, if we define the "distance" between two rankings as the "bubble-sort" distance between them, or the number of pairs on which they disagree, then the Kemeny optimal ranking is the ranking that minimizes the average distance to all the input rankings. A nice property of the Kemeny optimal ranking is that it is the unique ranking satisfying the extended Condorcet criterion while having other desirable properties as well.

All of this is explained in some detail in the pioneering paper by Dwork, Kumar, Naor and Sivakumar. When I was at VLDB I saw more examples of these methods being used to merge the results of multiple rankings. I should add that the problem of merging different rankings comes up in many different settings, so it is worth knowing the background in social choice theory.

Monday, September 06, 2004

NIH Proposes Free Access For Public to Research Data

This opens up a new area in the evolving landscape of post-internet publishing:

The National Institutes of Health has proposed a major policy change that would require all scientists who receive funding from the agency to make the results of their research available to the public for free.

The proposal, posted on the agency's Web site late Friday and subject to a 60-day public comment period, would mark a significant departure from current practice, in which the scientific journals that publish those results retain control over that information. Subscriptions to those journals can run into the thousands of dollars. Nonsubscribers wishing to get individual articles must typically pay about $30 each -- fees that can quickly add up for someone trying to learn about a newly diagnosed disease in the family.

An interesting idea: Since most CS funding also comes from the NSF and the DoD (both tax-funded agencies), such an idea could easily float our way.

Sunday, September 05, 2004

Sneakernet is validated !

A new article in the NYTimes confirms what we all learnt in networking 101:
...the problem will be familiar to anyone who ever spent time shrinking a digital photograph before trying to send it over the Internet through a dial-up connection. It would be much easier to drive a truck of photo albums across town or put them in an overnight-mail box than to go through the process of scanning and shrinking each photo.
The discussion came up in the discussion of interstellar communication. People at SETI were not amused:

The paper, Nature's cover article, is being received with bemusement by veterans of the Search for Extraterrestrial Intelligence, or SETI.

Dr. Paul Horowitz, a Harvard physicist and SETI expert, called it "a fun and an enjoyable read, but I wouldn't turn off my radio telescope and go out with my butterfly net."


Tuesday, August 31, 2004

Maybe it's just a train wreck.

Success catastrophe: a term I first heard in AT&T, used to describe a project that goes so well that higher ups keep demanding more and more (when you work at a company, being on the radar screen of the powers-that-be is a mixed blessing).

Catastrophe of success: Frank Capra's biography.

Catastrophic success: Apparently, the war on Iraq.

Any more suggestions ?

Talk styles

So I just finished my talk, and it appears to have been well received. Not being a DB regular, I took a somewhat more general tone with my slides, and listening to the other talks in my session, I realized that some notions I could have assumed audience knowledge for.

It brings up a duality in talk styles: does one:

* go into lots of details, explaining how specific things were done.
* present a general overview, glossing over details ?

As with all dualities, the answer is "it depends". On the one hand, if you give a talk where you say at some point, "And now we shall solve subproblem B with this trick that I won't get into in this paper", the people who came to your talk precisely because they had been banging their heads over subproblem B to no avail will be rather disappointed.

On the other hand, a lively and spirited discussion among 3 people about the solution to subproblem B might be lost on the 100 or so other people who have no clue as to why it was such a big deal.

My personal preference is what I'll call 'the big tent'. Keep things at a fairly high level, assuming that someone interested in the problem can read the paper with the roadmap one provides.

But (and here is where good judgement and the art of presentation comes in), make good choices about what details to present and what not to present. Some details are illuminating; some are not. Knowing which is which differentiates a good speaker from a not-so-good one.

Monday, August 30, 2004

What do DB people do ?

For people who know how I carp about databases in general, this will be deliciously ironic:

I was attempting to get my laptop to talk to the wireless system in the business center of the VLDB hotel, and the person managing the center comes up to chat. It turns out that he is taking computer classes in night school and asked me (of all people) what it is that database researchers do. It was interesting that his first thought was that they do something with Oracle. I think I successfully defended DB folk against this scurrilous baseless rumor...

As an aside, a friend of mine who used to teach databases to business school folks frequently complained that they all came into the class expecting to learn how to use Micro$oft Access.

Nosy gmail

Blogging will be light: am at VLDB in Toronto.

I was exchanging mail with a colleague about a problem in dynamic graph algos. I was doing this on my gmail acct for various reasons. I just noticed the ads that google placed next to the conversation:

1. Need an algorithm ?
2. Java graph layout library
3. Find shortest paths

Moreover, in the "related links" section, it pointed me to
(a) a paper on maintaining MSTs in graphs by Henzinger and King, and
(b) Information on Dijkstra's algorithm from NIST.

Spooky...

Thursday, August 26, 2004

Graphics

In response to a previous post, Chandra asks:
Turing awards are typically given for work
that induce a paradigm shift. Not being very
knowledgeable about graphics, I would like to
know whether there have been any recent paradigm
shifts in graphics.
Hmm. I am not an expert, but here are some things:
  • The switch from vector to raster graphics
  • The use of specialized hardware (the graphics card) for rendering operations
  • (and this is far too soon to tell) the advent of programmability in graphics cards.
all of these are fairly recent. It would be fair to say that much of the modern state of graphics reflects innovation that happened starting in the late 80s and going forward, and considering that the most recent Turing Award was given for something invented in the late 70s....



As an aside, wouldn't it be nice to have RSS feeds for comments ? I don't have any illusions that people are falling over themselves to hear what I have to say, but the only easy way for me to respond to comments in a way that they can find out is to reply directly (not always possible) or post a new entry.

Haloscan used to do this, but blogger doesn't... yet....

Tuesday, August 24, 2004

Referencing

I was chatting with people at SIGGRAPH and remarking on the fact that there is only one Turing Award winner for graphics (so far). The people there made the interesting argument that one of the reasons theoretical contributions appear to get Turing Awards more frequently (more on this below) is that theoretical papers tends to frame a problem in the context of related work far more meticulously.

Now this is something I have heard in other contexts as well (when seeing reviews of papers, reading papers etc). It is not too hard to see why this might be the case; theoretical problems are usually crisply defined, and it is almost always clear what results/techniques influence a given work (though not completely clear).

As an argument in favor of sound referencing, I like this one. We are taught (implicitly or explicitly) that referencing is important to place work in context, to acknowledge other's work in the "society of research", and ultimately to recognize that one's one work is never done in isolation but builds upon that of others. The argument that proper referencing creates a trail back to a source of inspiration is more than 'citation counts count !': it emphasizes what's important about careful referencing: that it builds a structured edifice of knowledge whose value can be perceived objectively.

Back to Turing Awards:

According to my highly unscientific classification method, theory-like disciplines (Algorithms/Complexity/Logic/Programming Languages) took a large, but not overwhelmingly large share of Turing Awards:

(Contains theory-like substance): 18 (11 Alg/Complexity + 6 PL + 1 Logic)
(We build stuff): 10 (3 DB + 7 Compilers/Arch/OS/)
(We are stuff): 5 (4 AI/1 Robotics)
(We count stuff): 2 Numerical Analysis

What category does Dijkstra fall into ? I left him out of this...

Saturday, August 21, 2004

War and Graphics

A few days ago, on the SIGGRAPH blog, I remarked on the number of simulations directed towards military applications. Now this kind of thing has been going on for a while within the graphics community, and I am sure none of them are surprised by this.

However, in what can only be viewed as a sign of the times (no pun intended), both the New York Times Magazine and the September issue of Wired carry articles describing the level of sophistication military simulations have reached in training soldiers for combat situations. Both articles focus on a particular software package called Full Spectrum Warrior as one of the primary candidate tools for training soldiers (FSW is available as a commercial tool; the military version has more features and more accurate military plans than the commercial version).

The focus (contrary to the popular impression conveyed by games like Doom and Quake) is not on first-person orgies of murder and mayhem; many of these simulations, and indeed many games nowadays, focus more on story telling and scenarios/strategies. Some military training game don't even allow the player to fire a weapon, requiring that they handle situations without the use of force.

The NYT article gets bonus points for managing not to mention Ender's Game. The Wired article stays strong for a while, and then flubs the ending, not only mentioning the book, but mangling the title.


p.s The game industry is paying their PR staff well...here's how games are taking over Hollywood

p.p.s More coverage of this on /. (follow the links in the blurb)

Friday, August 20, 2004

Open Problems (redux)

One of my first posts was about finding interesting problems to work on. Adam Klivans, guest-posting over at the complexity blog, mentions that COLT has an extremely civilized approach for dealing with this:
For the last few years, COLT has glorified the open problems section and allocates about an hour of time for a presentation of open problems. The open problems themselves must be submitted months beforehand and are refereed (how rigorously is anyone's guess); accepted problems appear in the proceedings. A list of this year's open problems can be found on the COLT 2004 program schedule -- the session was held on Friday evening.
Thus I am happy to note that this year's fall workshop on computational geometry will have a similar focus on open problems. From the call for abstracts:
To promote a free exchange of questions and research challenges, there will be a special focus on Open Problems, with a presentation on The Open Problems Project, as well as an Open Problem Session to present new open problems. Submissions are strongly encouraged to include stand-alone open problems, which will be collected into a separate webpage and considered for inclusion in The Open Problems Project.
This workshop is the 14th in a series of CG workshops that are always a pleasure to attend: the focus is on interactions and discussions, rather than presentations, and the atmosphere is always very relaxed. So mark your calendars for Nov 19-20 in Boston.

Thursday, August 19, 2004

Timing a talk

A nifty link I found via the Dynamist Blog:

It's an online stopwatch to time yourself. Handy if you're like me and never carry a watch, and need to time yourself when preparing a talk.

A survey on inapproximability

Not to infringe on the Complexity blog :), but I found this survey by Luca Trevisan at the ECCC to be a very well written birds-eye view of the developments in inapproximability from slightly before the main PCP results till now (in fact some of the results mentioned are to appear at FOCS 2004).

It covers the early attempts at showing approximability by gap-introducing reductions, spends a little time (although not a lot) on the growing body of work relating proof systems to inapproximability, and then spends a lot of time on developments post PCP.

What I like about it, as a non-expert in complexity, is that it explains some of the basic PCP-based methods for proving hardness very lucidly, using examples like clique and set cover. It also helps one understand the feverish developments in the field leading to the slew of optimal inapproximability results in the late 90s.

There has been a recent flurry of breakthrough complexity results, some of which are to appear at FOCS. The survey addresses some of these new results as well, and what they could imply for the knottier problems like vertex cover etc.

Probably one of the most interesting developments over the past few years is a set of hardness results that destroy the 5-class inapproximability hierarchy that people had thought to map out the space of approximations. New results showing tight log* n hardness, log log n hardness, and even log2 n hardness results indicate that approximation classes might be a lot finer than people had imagined.

Tuesday, August 17, 2004

What is it ?

I was wandering the corridors of AT&T one day and picked up a gizmo that looked like this:



Rather puzzled by it, I took it to lunch, where I was peremptorily ridiculed for basically being too young to know what it was. As it turns out, this is the "type ball" of the IBM Selectric, an ingenious gadget that allowed one to change types on the fly (the ball had all the characters of a particular type on it, and would rotate as keys were pressed on the typewriter.

This story would be highly unremarkable if not for the fact that William Gibson, author of Neuromancer and cyberpunk pioneer, wrote a little rhapsody to the Selectric type ball.

[Dead Tech backgrounder, for extra points: In the decade or so prior to the advent of personal computers, IBM produced an electric typewriter called the Selectric; this had a “type-ball”, a metal sphere about the size of a golf ball, which held an entire font; you could switch fonts, which at the time was little short of miraculous; you could also, even more amazingly, power-correct mistakes with a built-in paper-colored ribbon. The IBM Selectric, when I started writing for publication, was the most shit-hot professional writing machine on the planet; by the time I could have afforded one, they were propping up broken barbecue grills in Value Village. The Finn’s shop probably has at least one box of Selectric type-balls, somewhere; they are beautiful sculptural objects, these balls, and won’t be easily thrown away.]

Monday, August 16, 2004

PCPs for geometric problems

Many geometric problems (especially those involving rectangles in the plane) are MAX SNP-hard; some are even log n hard via set cover reductions, and there is at least one problem (matching of point sets) that is polynomially hard.

However, as far as I am aware of, most of the reductions are via L-reductions from other hard-to-approximate problems. In fact the only "from first-principles" hardness results for geometric problems (i.e directly from a PCP) that I am aware of are

1. The result by Brieden showing that width is hard to approximate to any constant
2. The extension by Varadarajan, Venkatesh and Zhang that shows it is hard to approximate to within logd n, for some d > 0.

In both cases, the "source" problem is Quadratic Programming. One might quibble that even in this case, the reduction is really an L-reduction from QP, but in both results some tinkering with the PCP is necessary to get the desired bound.

Are there other problems that have "from first principles" hardness results ?

Sunday, August 15, 2004

A trail of thought...

The knowledge contained in a research paper represents only a small fraction of the trips the author(s) took through the landscape of the subject. There are countless counterexamples, lemmas that didn't work, statements that were trivial, and ideas that fell by the wayside, none of which make it into a written document, and only occasionally see the light of day (in a casual talk, or a one-on-one conversation).

But these lost trails are useful as well. they are landmarks of places not to go, of things unexplored, of directions taken and backed away from. It is a pity that one cannot record these as part of the research log, in some form.

Vannevar Bush talked about these things nearly 60 years ago, in a prescient essay titled 'As We May Think'.
This is the essential feature of the memex. The process of tying two items together is the important thing.

When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item.

Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together to form a new book. It is more than this, for any item can be joined into numerous trails.

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outranged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

There are many more gems in the essay: worth a read.



My link trail: Boing Boing -> Battelle On Search.

Thursday, August 12, 2004

Perl Swearing

In this article, Paul Graham comes up with a really funny description of Perl:

"When I talk about ugly source code, people will of course think of Perl. But the superficial ugliness of Perl is not the sort I mean. Real ugliness is not harsh-looking syntax, but having to build programs out of the wrong concepts. Perl may look like a cartoon character swearing..."


The article is about Python programmers: makes for interesting reading.



p.s SIGGRAPH is finally over. It was overwhelming, fascinating, boring, and LONG; all at the same time. My conference-ending-timer kicked in around Monday, and it has been a long haul since.

Tuesday, August 10, 2004

Real vs "Real"

From the SIGGRAPH blog:
An undercurrent that occasionally bubbles up in talks/comments is the feeling that graphics is "not real", that it is only "smoke and mirrors". Bruce Sterling (when he wasn't busy being pleased with himself) hammered this point home as well during the beginning of his keynote address. Given that from where I stand in theory-land, graphics is about as "real" as it gets, this was interesting and not a little surprising.
Read more...

I've been spotted !

It appears that my SIGGRAPH entries over the past few days have not gone unnoticed. I was invited to post entries to the official SIGGRAPH blog. I will be crossposting my most recent entries there, and will be posting new notes there.

I'll also provide a brief summary (and links) here in case people don't want to switch over.

Monday, August 09, 2004

Demo or Die...

Another interesting concept at SIGGRAPH is called 'Demo or Die'. The premise is as follows:

* you have to conduct a live demo of whatever gizmo you've cooked up.
* you get three minutes to present it
* After your demo, the audience gets to vote.

Now the voting itself is a neat idea. The organizers distributed laser pointers (yes, over a 1000 of them) to everyone in the audience. There was a large screen up near the speaker podium, and after each demo, two large boxes saying Demo and Die were broadcast on the screen. You pointed your laser pointer at the screen (these are very powerful pointers) and with some nifty software/hardware, the number of dots on each side were tallied to decide whether the presentation survived or not.

Surprisingly enough, the Law of Demos only kicked in once; most demos ran seamlessly. Which is not to say that they were all good. Some were quite nifty, involving haptic devices.

An amusing moment: A demo participant from Microsoft showed a haptic device for playing Jenga online. I had dinner with him the night before, and he was describing the demo to me. I couldn't help but remember Uri Zwick's talk on Jenga from SODA 2002. It seemed fitting that we prove theorems about Jenga, while graphics people build demos about Jenga.

Sunday, August 08, 2004

Stranger in a strange land...

So after the somewhat more civilized GP2 workshop, I am now at SIGGRAPH itself. It's being held in the LA convention center, and if any of you have ever been to a home improvement or other such convention, you will get an idea of the scale of this beast.

The main order of business is the Fast Forward Paper Review: the idea is that all authors of the 83 accepted SIGGRAPH papers get 50 seconds to present a capsule preview of their work. It's an interesting concept, and with over 2,000 people (out of 20,000) attending the conference itself, it is probably essential.

The review itself is held in a dark and cavernous curtained-off section of the convention center. Joe Marks, the PC chair, just announced the start of the review, and from the sound of the applause, you'd think you were at a rock concert.

It's quite the setting: there are three huge screens for the presentation and another huge screen displaying the presenter. Not all the presenters are good at this format though: some people run through reduced versions of their actual presentations. Powerpoint never looked so out of place...

Some highlights:
1. Rendered gems that look like real !!
2. Graphical origami: how to make a paper model from a mesh. You CAN Touch this !
3. Triangles are toast; Our very own Nina Amenta renders with point clouds.
4. A Mission-Impossible-themed movie....on tensors.


Random thoughts:
* You can tell that this is Hollywood-driven: many presentations go "suppose you have this picture, and your director wants it modified to that picture"
* It does put paper-writing in perspective: how many paper ideas would survive if you knew you also needed a funky 50 second advertisement for them ? Reminds of the maxim (by Feynman?): if you can't explain what you are doing in one sentence to a child, you are probably not doing anything interesting.

Saturday, August 07, 2004

Heard in passing...

Things heard at the GP2 Workshop:

Fred Brooks:
In the days of IBM 7950 (1961), you would program one 250 byte instruction, go for lunch, program one more, and then leave for the day.

Keith Cooper:
In 1980, a compiler could generate code that ran at 85% of peak CPU power (on a VAX). The figure now is 5-15%.


Friday, August 06, 2004

Hi ho, hi ho,...

it's off to SIGGRAPH I go...

A rite of passage that many fellow geometers have gone through: I will be hanging out with the only group of people who can hope to get both Turing awards and Academy awards (take that, C. P. Snow).

<shameless-shill-for-my-research>
I go to do God'sTuring's work: presenting some results on theoretical analyses of GPU algorithms at the GP2 workshop. If I can make it through the workshop without getting into any arguments about the value of theory, I will consider it a success.
</shameless-shill-for-my-research>

SIGGRAPH gets over 2,000 attendees; considering how we do cartwheels at SODA when we get 300 folks, I expect significant culture shock.

Update: Did I say 2,000 ? I mean 20,000. Sigh....

Wednesday, August 04, 2004

Scott Aaronson is a very patient man...

The latest P=NP saga on comp.theory involved soap bubbles: the first post in the thread:
The paper is the best argument I have heard for P=NP, even though I believe the opposite. It can be found here: http://arxiv.org/abs/cs.CC/0406056. It brings out a great question.

Basically, the argument is that since soap bubbles can be made to solve NP-complete problems, particularly the Steiner tree graph problem, in what appears to be polynomial time and physics on a macroscopic level can be modeled as a Turing machine, it must be true that P=NP.

What I would like to know from any physicists out there is why do soap bubbles work in such a way that they are able to solve the Steiner tree graph problem?How is nature able to quickly solve problems that we cannot solve quickly?
Scott Aaronson, in a post downstream:
Motivated by this newsgroup discussion, this week I did the experiment. At a hardware store I bought two 8"x9" glass plates; paint to mark grid points on the plates; thin copper rods which I cut into 1" pieces; suction cups to attach the rods to the plates; Murphy liquid oil soap; and a plastic tub to hold the soapy water. I also bought work gloves, since at one point I cut my hand handling the glass.
The post continues: read it all. He even has a picture.

Monday, August 02, 2004

Proofs and Reputations

Karl Sabbagh, the author of a recent book on the Riemann Hypothesis, recently wrote an article in the London Review of Books about Louis de Branges and his recent announcement of a proof for the RH.

There is no evidence that, so far, any mathematician has read [de Branges' proof ]: de Branges and his proof appear to have been ostracised by the profession. I have talked to a number of mathematicians about him and his work over the last few years and it seems that the profession has come to the view that nothing he does in this area will ever bear fruit and therefore his work can be safely ignored. It may be that a possible solution of one of the most important problems in mathematics is never investigated because no one likes the solution's author.

My post has nothing to say about the proof itself; I would not dare to presume even a passing familiarity with it. What caught my attention was the sense of surprise in Sabbagh's article; the unstated 'what on earth does a man's reputation have to do with his proof' ?

What indeed ?

Mathematics (and by extension theoretical CS) inhabits a Platonic world of truths and false statements (with a dark brooding Gödel lurking in the background, but that's a different story). As such, either statements are true, and become theorems/lemma/what-have-you, or they are not and fall by the wayside. The pursuit of (mathematical) truth is thus the search for these true statements. The identity (or very existence) of the searcher has no effect on the truth of the statements; there is no observational uncertainty principle.

However, mathematicians live in the real world. In this world, true and false gets a bit murkier. A theorem is no longer true or false, but almost certainly true, or definitely false. They are far closer to the falsifiable theories of natural science, although there is at least a "there" there; a scientific theory can only have overwhelming evidence in support of it, but a mathematical statement (if not too complex) can be categorically true.

The natural sciences have reproducible experiments; if I cannot reproduce the results you claim, all else being equal, the burden of proof is on you to demonstrate that your results are indeed correct. Similarly in mathematics, if a claimed theorem has a complex proof, the burden of proof does reside on the author to demonstrate that it is indeed correct. They can do this by simplifying steps, supplementing with more intuition, or whatever...

In this respect, theorem proving in the real world has a somewhat social flavor to it. And thus, there is also (it seems to me) a social compact: You demonstrate competence and capability above a certain basic threshold, and I deem your proofs worthy of study. The threshold has to be low, to prevent arbitrary exclusion of reasonable provers, but it cannot be nonzero zero, because in the real world it is hard to check a proof with absolute certainty.

This is why the many proofs that P=NP (or P != NP) that float on comp.theory don't get a fair shake: it is not because the "experts" are "elitists" who don't appreciate "intruders" poaching their beloved problems; it is because the social compact has not been met; the writers don't cross the threshold for basic reasonableness, either by choosing to disregard the many approaches to P vs NP that have been ruled out, or by refusing to accept comments from peer review as plausible criticism, and demanding that the burden of proof be shifted from them to the critical reviewer.

Such a compact could be abused mightily to create a clique; this is why the threshold must be set low and is low in mathematics. The notorious cliche that a mathematician's best work is done when young at least reinforces the idea that this club can be entered by anyone. More mundanely, there are awards for best student papers at STOC/FOCS that often go to first-year grad students (like this year's award).

Going back to de Branges' proof, I have no idea what the technical issues are with his proof, and if there are known reasons why they don't work, but going solely on the basis of Karl Sabbagh's article (and I acknowledge that he could be biased) it seems wrong to ignore his manuscripts. He for one has clearly crossed the threshold of the social compact. Reminds me of an attempt I made to read a popular exposition of Mulmuley and Sohoni's recent work on P vs NP; if this work does lead to a claimed proof, I imagine that there would be few people who could comprehend the proof, but it would deserve to be read.

Sunday, August 01, 2004

On empirical research and the Energizer Bunny

From Lee Smolin's article on the anthropic principle:

...to be part of science, X-theorists have to do more than convince other X-theorists that X-theory is true. They have to convince all the other well trained scientists who up till now have been skeptical. If they don’t aspire to do this, by rational arguments from the evidence, then by Popper’s definition, they are not doing science.


From Numerical Recipes in C (Ch. 14, pg 609):

At best, you can substantiate a hypothesis by ruling out, statistically, a whole long list of competing hypotheses, every one that has ever been proposed. After a while your adversaries and competitors will give up trying to think of alternative hypotheses, ro else they will grow old and die, and then your hypothesis will become accepted. Sounds crazy, we know, but that's how science works.


So that's where the creationists get their methods from !

From an interview with Lawrence Kraus in Scientific American:

But then you realize that this is exactly what Phillip Johnson, this lawyer who first proposed the intelligent-design strategy, proposed when he said something like, "We'll just keep going and going and going till we outlast the evolutionists."

Sorting vs Searching:

In a previous post, I had mentioned an upcoming paper in FOCS 2004 by Franceschini and Grossi titled 'No Sorting? Better Searching!'.

The paper is not yet online, but Roberto Grossi posted a long comment in that post detailing the results in the paper. I reproduce the comment below in full; a short summary is:

They show that the natural way to do searching in a set of ordered elements (i.e via sorting and then doing binary search), makes sense when the cost of comparisons is O(1), but does not make sense when elements are larger (formally, when each element of the list actually consists of k characters, where k is super-constant). They do this by demonstrating a new ordering technique that beats known lower bounds on searching a sorted list; what's nice is that their result is tight as well.

His abstract follows (bold-face is mine):

Sorting is commonly meant as the task of arranging keys in increasing or decreasing order (or small variations of this order). Given n keys underlying a total order, the best organization in an array is maintaining them in sorted order. Searching requires Θ(log n) comparisons in the worst case, which is optimal. We demonstrate that this basic fact in data structures does not hold for the general case of multi-dimensional keys, whose comparison cost is proportional to their length. In previous work [STOC94, STOC95, SICOMP00], Andersson, Hagerup, Håstad and Petersson study the complexity of searching a sorted array of n keys, each of length k, arranged in lexicographic (or alphabetic) order for an arbitrary, possibly unbounded, ordered alphabet. They give sophisticated arguments for proving a tight bound in the worst case for this basic data organization, up to a constant factor, obtaining

Θ[ (k log log n)/(log log (4 + (k log log n)/(log n)) + k + log n ]

character comparisons (or probes). Note that the bound is Θ(log n) when k=1, which is the case that is well known in algorithmics.

We describe a novel permutation of the n keys that is different from the sorted order, and sorting is just the starting point for describing our preprocessing. When keys are stored according to this ``unsorted'' order in the array, the complexity of searching drops to Θ( k + log n) character comparisons (or probes) in the worst case, which is optimal among all possible permutations of the n keys in the array, up to a constant factor. Again, the bound is Θ(log n) when k=1. Jointly with the aforementioned result of Andersson et al., our finding provably shows that keeping k-dimensional keys sorted in an array is not the best data organization for searching. This fact was not observable before by just considering k=O(1) as sorting is an optimal organization in this case.


When the paper is available I will link to it here; one interesting question is: how hard is this "other" order to compute ?

Disqus for The Geomblog