Thursday, December 29, 2005

A new theme song for Numb3rs ?

Kate Bush (remember her?) has a new album out. Called 'Aerial', it includes one song, 'Pi', that
is about a "sweet and gentle and sensitive man / With an obsessive nature and deep fascination / For numbers / And a complete infatuation with the calculation / Of Pi"
The song itself (and this is what caught my attention while driving into work listening to WXPN) is a recitation of the digits of pi, done so sonorously that if you weren't paying attention, you wouldn't even notice. Once you do though, you're amazed at how creatively one can say the numbers zero through nine without sounding monotonous. Here's a thought: instead of these annoyingly stilted voices that repeat numbers to you for confirmation when you call customer service, someone should break down the different ways Kate Bush sings each number and randomize them into a string that would sound so much more delightful.

You can send my royalties to my Paypal account at numb3rs@rules.com

Happy New Year !

Update: Michael links to an mp3 file.

Categories:

Sunday, December 25, 2005

Diaconis lectures on the origins of probability

Persi Diaconis runs a course on the origins and philosophy of probability called 'Probability, Induction and Statistics'. It draws inspiration and content from Edwin Jaynes' book on probability, and lectures cover some of the profound philosophical debates that have raged through the ages on the meaning and interpretation of probability.

In algorithms, most of us use probability unthinkingly in what statisticians would call "a frequentist vein", where the probability of an event can be envisioned as the fraction of times it occurs on average over a large number of trials. The lectures in Diaconis' course (and indeed much of Jaynes' book) explain why even if you believe this view, it is not as simple as it seems, and how the Bayesian perspective (where probabilities can also represent strength of belief) is a consistent alternate view of the world.

Jaynes also pushed the idea that statistical mechanics can be reformulated via Bayesian reasoning methods; this was a powerful (and controversial) idea, and is too complex to discuss further here. If you're interested in reading more, you can start with Jaynes' papers here and here, and his summary of MaxEnt methods here.



Aside: I've started a QuickLinks section for links that might not necessarily merit a full post, but are interesting nonetheless. It's on the left side of the main page, and has its own RSS feed.

Sunday, December 18, 2005

Computer science vs programming, II

I should add that the view presented in the Globe article I just mentioned is in direct opposition to an evolving perspective about the role of university computer science programs.

Namely, it is a view that university CS programs should be doing more towards helping students prepare for life in industry, like offering classes in various programming languages like Java, Python, and others, training students to write software, and generally addressing the needs of the computer industry.

Although I am strongly against the notion that such training merits a degree in "computer science", I see no reason why different schools might not take different tacks, offering different kinds of degree programs and specializing in different areas. What I do welcome is the idea that there is a pushback from the "computer science as a way of thinking" direction, as well as creative thinking about courses that might more effectively reflect this point of view (for more on this, see Siva's post about alternative courses, and Michael Mitzenmacher's comment at the same post)

Computer science vs programming

Via the CRA, an article that purports to be about the dwindling number of women computer scientists in the field, but is really about much more:
Some computer scientists fear that they may be going in the same direction. They view the dearth of women as symptomatic of a larger failure in their field, which has recently become less attractive to promising young men, as well. Women are ''the canaries in the mine," said Harvard computer science professor Barbara J. Grosz.
The article highlights geometer Diane Souvaine of Tufts, and her work in developing a curriculum that focuses more on the science than the computer in computer science. It makes a point that we in the field are all familiar with, but that I have never seen explained in the media:
Introductory classes zeroed in on programming and other technical aspects of the field, rather than explaining big ideas or talking about how computing can impact society, many professors say. That approach led to a misconception among students that computer science is the same thing as computer programming. Computer scientists say that view shortchanges the field, which is far broader and more intellectually rich. It is applied math and design, they say; it is about modeling human behavior and thinking about the simplest way to accomplish a complex task.
The other point the article makes that I really don't agree with is that a focus on programming and technical aspects of computers is what attracted male programmers (read "nerds") to the field, to the exclusion of females. The implication of course is that if computer science education were focused more on problem solving and "impact on society", that more women would have been inclined to enter the field.

This is debatable. Any higher level "non-programming-centric" approach to teaching computer science would involve heavy dollops of math; linear algebra, graph theory, calculus, probability, geometry, you name it, even if you never ended up doing theoryCS. Math has always had a problem attracting women students, and I don't see why shifting focus away from programming and towards problem solving (which I highly encourage, btw) would make the barrier to entry for women students any lower.

Thursday, December 15, 2005

Physics, complexity and NP

Sometimes I feel like classical physicists must have felt when quantum mechanics first came marching through the door...

Lance Fortnow chats with Scott Aaronson about what physicists need to know about complexity, and Dave Bacon (almost)-snarks that maybe we should be asking what computer scientists need to know about complexity.

He is referring to quantum complexity of course, the premise being quantum computing represents the definitive description of all "efficient" computing devices, rather than a poly-time (or randomized poly-time) classical TM.

I'd like to think that all my years spent learning classical algorithms and complexity haven't gone to waste ;), but something Scott said on the podcast was very interesting. He argued that one should not expect that a QP algorithm will solve an NP-hard problem. Now I don't know if this is a view shared by the quantum computing community in general, but it's definitely something I didn't expect: I had always imagined that the odds were in favor rather than against BQP containing NP.


Categories:

Saturday, December 10, 2005

Great conference papers that never made it to a journal, part 327...

It's no secret that theory folk are somewhat allergic to journal papers, only publishing them to kowtow to the tenure gods, and even then grudgingly. On that note, I recently "discovered" that two (more) very influential papers that I was familiar with had never made it to a journal.

Both are by Andy Yao, their influence is staggering, and at least one of them was directly mentioned in his Turing Award citation:
  1. Some complexity questions related to distributive computing: The paper that introduced the idea of communication complexity appeared in STOC 1979.
  2. Lower bounds by probabilistic arguments: This paper introduced the idea of minimax reasoning for lower bounding randomized algorithms, and appeared only in FOCS 83.


Footnote: If you haven't seen it, it's "new to you" :).

More cartograms

As seen by many all over the blogosphere, here's a cartogram (use the right term, people !) of the countries of the world based on population.

Tags: ,

Tuesday, December 06, 2005

ahhhh: or why I became a computer scientist.

I will join in the collective sighs of relief at the completion of yet another deadline. I'd like to officially grumble about anonymous submission procedures, because one of the few pleasures of completing a paper is being able to brag about it afterwards :), and not everyone is as enlightened as the IACR.

Anyhoo, here's what I dispatched to the annual sausage convention. No pdf yet alas, because of paper release procedures and the like.

Lance had an interesting post about why he became a computer scientist: it appears that his story (and that of many in the comments) is of someone with a strong math background ending up in a math program and moving to a computer science program (one commenter even cites a complaint among math profs about losing their students to physics and CS). My story was slightly different: after reading Stephen Hawking's 'Brief History of Time' I was convinced that I wanted to be a physicist. My main interaction with computers was as a hacker, writing video games (in assembly code to boot) on my dinky ZX Spectrum.

But then I found a book in the library (actually it was the British Council Library in New Delhi, for any Delhiites among you all) that talked about Alan Turing and the Turing test. It was a short hop, skip and jump from there to Searle's Chinese Room argument, strong AI, and Godel's Theorem. AI was my hook into computer science, and remained my major interest up through graduation, although a complexity theory class by Somenath Biswas (and a thesis under his supervision) started drawing me away from the "fuzziness" of AI and into TCS itself. Through all of this, I remained untouched by the "math bug" (though I always enjoyed it), and for the last many years have been playing a desperate game of catchup !

From the point of view of training, I'd argue that a math degree is indispensable, and I fervently wish I had gone that route. My meanderings through AI did give me a more "romantic" notion of the field though, which is not necessarily a bad thing :)

Friday, December 02, 2005

More random mumblings

The "standard" ACM conference style file is one of the most god-awfully ugly designs I have ever seen. Those who know me know that I like to obsess over nice fonts, and the clunky square font stylings that I am often forced to use just depress me.

On the other hand, SIGGRAPH (the graphics SIG of ACM) has an excellent style file, which often shows up in computational geometry proceeedings. I guess it's no surprise that the graphics folks are more sensitive to the layout and presentation of a paper.

Obviously, these are subjective judgements. So judge for yourself:

4 days and counting

If you're reading this, don't ! get those 10pt font submissions ready !

Tuesday, November 22, 2005

Random thought while working on a deadline...

In the suited-and-booted corporate world, dress codes are quite strict. The only place you get to show your "flair" is in your tie.

The title of a submission is our "tie".

Case in point: Knuth's "Toilet Paper Problem".

Update: Of course I forgot to mention the classic example of 'mandated flair'.

Monday, November 21, 2005

SoCG Abstracts deadline.

Now that we're all done with the Fall Workshop, it's time to submit those abstracts to SoCG. This year, the commitee has instituted an abstracts deadline of Nov 23 in addition to the paper deadline of Dec 5, so you have till Wednesday !

The submission server is up, and can be found via the conference web site. Here is the note from the conference chairs:

Dear Computational Geometers,

The Web site for submissions to the ACM Symposium on Computational Geometry is now up, and can be accessed through the conference Web site at http://www.cs.arizona.edu/~socg06/.

This year we want people to submit an abstract by Nov 23, before the paper deadline of Dec 5. We will use these abstracts to assign papers to committee members, so that the committee members will have more time to read the papers. Please be aware that if you do NOT submit an abstract by Nov 23, there will be some delay in getting you paper to the committee.

Looking foward to seeing your recent work!

Nina Amenta
Otfried Cheong
If I may be permitted a brief snark, I hope people will submit not-so-recent work as well :).

Friday, November 18, 2005

FWCG 2005: Day I

Organizing a workshop is far more work than being on the commitee or giving a talk ! I spent much of the day worrying about things that had to be planned, whether the talks would run over, whether the abstract booklets would show up, and countless other things. No major fiascos, so far....

We had a number of nice talks, and it was interesting to see application themes like statistics, visualization and biology get repeated over and over again. I think it is often hard to see the impact of geometry because there are few clear cut areas where theoretical results in their pristine form are needed (like in cryptography). But concepts like Delaunay triangulations, orthogonal range trees, and binary space partitions are now ubiquitous in many application areas, and this is definitely a success story for the field.

Tamal Dey's talk on surface reconstruction was particularly interesting for a different reason. It illustrated to me firstly how much topology I still need to know to understand some of the more cutting edge methods that are being used in this area, and further illustrated that methods from topology (combinatorial, algebraic) are going to be more and more important in extending the range of geometric problems that we can now deal with.

Thursday, November 17, 2005

Quantum CVS

In a previous post, I had asked for emacs-based approaches to tracking changes in a document. There were many useful suggestions, some of which I might actually try out. The most promising is also the most intriguing. It is called darcs:
Darcs is [..] based on a "theory of patches" with roots in quantum mechanics
There's more.
I have looked at patches as being analogous to the operators of quantum mechanics. I include in this appendix footnotes explaining the theory of patches in terms of the theory of quantum mechanics.
Takes quantum computing to a new level....

I should add that the patch software itself appears quite reasonable, and in fact came highly recommended in the comments to my prior post. The main connection to QM appears to be the fact that patch operations are reversible (there is also some stuff about commutativity of patch operators).

Sunday, November 13, 2005

It must be something in the air

There are a number of academic bloggers at the University of Chicago. Lance was blogging about complexity theory a good two years before blogging became the verb of the year. He also just started a complexity podcast (here's the Odeo channel), and now I discover that the University of Chicago itself has a podcast, produced by the Office of the Vice President for Research. The podcasts showcase researchers from the university, spanning the gamut of topics (astronomy, ecology, law, and divinity being the topics of the last four shows).

Tuesday, November 08, 2005

Fall Workshop on Computational Geometry, 2005: Schedule

The schedule for this year's Fall Workshop on Geometry and Visualization is up! Take a look, and if you're in Philly, drop by ! Registration is free, but we do ask that you let us know.

Monday, November 07, 2005

Tracking changes in Emacs/LaTeX

Probably the only thing I like about Word is the Track Changes feature. Once you turn it on, you can track all changes you make to a document, and when you have multiple co-authors editing, this can be handy when attempting to resolve disputes. Also, often what happens is that I make some changes that my co-author doesn't notice because they are buried deep in the middle of a section, and it would help to have some way of marking these up.

Most people I know have some kind of \note mechanism for making margin notes, but that doesn't help with changes that you know you want to make, but also want the co-author(s) to look at. Does Emacs/LaTeX have macros for tracking changes in documents ?

p.s for those of you who will immediately write in suggesting that I use CVS, remember that I am behind a firewall, and even if I weren't, CVS is a heavy hammer for something as simple as this :).

Tuesday, November 01, 2005

Edge of Computation Prize

Edge.org is a "third culture" site that
takes a good deal of its inspiration and its view of the intellectual landscape from C. P. Snow's division of intellectual culture into two camps, the literary and the scientific. Snow's 1959 book The Two Cultures presents a reading of intellectual history which argues, in part, that twentieth-century literary intellectuals attempted to commandeer the title of intellectual from the scientists, delegitimatizing scientists as men and women of letters and attempting to exclude them from the intellectual mainstream. Snow laid the principal blame on the literati, but also chided scientists for failing to defend their rightful cultural prerogatives. Snow eventually came round to the view (presented in his 1963 essay "The Two Cultures: A Second Look") that a "third culture" would emerge, fusing the old dual cultures and placing the literary and the scientific on co-equal terms, communicating and cross-fertilizing each other.
They have a prize called The Edge of Computation, whose mandate they describe as:
Metaphors of information processing and computation are at the center of today's intellectual action. A new and unified language of science is beginning to emerge. Concepts of information and computation have infiltrated a wide range of sciences, from mathematics, physics and cosmology, to cognitive psychology, to evolutionary biology, to genetic engineering. [...]

The Prize recognizes individual achievement in scientific work that embodies extensions of the computational idea — the design space created by Turing. It is a 21st Century prize in recognition of cutting edge work — theoretical, experimental, or both — performed, published, or newly applied within the past ten years.
The prize is worth $100,000, and there are a number of nominations (some multiple nominations as well). The list is interesting, and worth perusing (although the double nomination for Stephen Wolfram might not please Cosma Shalizi :)).

What I noticed though that among the nominations most relevant to theoretical computer science are three quantum computing experts, Charles Bennett, David Deutsch and Peter Shor. David Haussler, Gregory Chaitin and Martin Davis round out the list. Although we all know that prizes are superficial lotteries, it did pique my curiosity to see that a large majority of theoretical computer scientists deemed worthy of recognition this way are people involved with quantum computing.

Should I be reading anything into this ? Is any computing unrelated to quantum computing now a pointless exercise in combinatorics ? Is BQP the true model of effective computation, making all other (classical) models irrelevant and outdated ? Am I just a paranoid android ?

Update (11/14/05): David Deutsch wins !

Monday, October 31, 2005

On this most horrifying of days

Kurt Van Etten explores the most horrifying thing of all: the flawed P vs NP proof.

Friday, October 28, 2005

Darwin's and Einstein's (e)mail correspondence rates, or a rumination on power laws.

Over the last few days, there has been much discussion of a paper in Nature by Oliveira and Barabási on the correspondence patterns of Darwin and Einstein. One of the main conclusions of the paper was that at some point, their response patterns started following a power-law distribution, with coefficients such that a finite fraction of mail went unanswered.

Soon after this, there was a response suggesting that the analysis was flawed, and in fact the data followed a log-normal pattern, rather than a power-law. Cosma Shalizi and Aaron Clauset weighed in on this as well, on the side of the "log-normal folks".

As Geomblog denizens might know, I had a while ago mentioned a fascinating article by Michael Mitzenmacher on the difference (and similarity) between log-normal and power-law distributions, and the natural processes that generate them. I asked Michael if he'd like to weigh in on this controversy, and what follows below is his note. He comments not only on the technical issues involved, but on the whole issue of PR in science, and the contributions that Barabási has made to the field, and his perspective is very interesting indeed.

For those of you who might wonder why computer scientists (and theoretical computer scientists at that) should concern themselves about such matters, I'll refer you to the reading list for Jon Kleinberg's course on Information Networks, where you'll see how many aspects of link structure analysis on the web (which in turn is used for search engines like Google) relate to understanding power law distributions.

And now for the article...
Suresh asked me to comment on a current controversy: a paper by Barabasi et al claims that many human interactions, including the example of Darwin's and Einstein's letter response frequency, are governed by a bursty process that leads to power law tails. A rebuttal by Stouffer et al claims that the distribution is really lognormal.

I believe Suresh asked me to comment because (here comes the blatant self-promoting plug) I had written a survey on lognormal and power law distributions (and a slightly more fun, less edited earlier version). I'm embarassed to say I had not heard of the controversy, but it looks like a lot of fun for those of us who enjoy this sort of thing. Rather than focus on the specific technical details, let me make two high level comments.

First, as you'll know in more detail if you read the survey, this controversy does not surprise me. Lognormal and power law distributions are so closely related that if somebody wrote a paper claiming some distribution is a power law, you can count on somebody else writing a follow-up paper claiming it is actually lognormal (and vice versa). The only thing that is surprising is how fast the turnaround time is these days. The Web is truly amazing that way.

A more important second point, however, relates to how good Barabasi is at generating PR, and the recent panel discussion at FOCS about popularizing computer science. I think the reason this controversy erupted so quickly was because Barabasi sent a summary note on the work to Nature. It has been argued that we in TCS need a prophet, and Barabasi is indeed a modern prophet in the area of networks and power laws. Say what you like about him (and people say some bad things -- the Stouffer rebuttal does not use the words "technically incompetent", but I think the implication is in there; I and many others have had issues with Barabasi's grandiose claims and lack of sufficient citation of past work in the past), but he gets his message out there, and people actually know about it. His book Linked was very popular and had a big impact. While I think all his talk about networks is good for computer science in general, I'd rather see a computer scientist out in front of the world enlightening the masses as to why all this stuff is important.

For those who care about the technical issue, I will express some mild opinions. While the rebuttal suggests the data is a better fit for the lognormal distribution, I am not a big believer in the fit-the-data approach to distinguish these distributions. The Barabasi paper actually suggested a model, which is nice, although the problem of how to verify such a model is challenge, as I talk about in this MSRI video. This seems to be the real problem. Trust me, anyone can come up with a power law model. The challenge is figuring out how to show your model is actually right.

Michael Mitzenmacher

Computer Science in the NYT

but maybe not in the way we wanted.

Nearly a year ago, I had mentioned that Andy Yao had returned to China to teach there. Today, he leads off a NYT article on Chinese universities:
When Andrew Chi-chih Yao, a Princeton professor who is recognized as one of the United States' top computer scientists, was approached by Qinghua University in Beijing last year to lead an advanced computer studies program, he did not hesitate.

It did not matter that he would be leaving one of America's top universities for one little known outside China. Or that after his birth in Shanghai, he was raised in Taiwan and spent his entire academic career in the United States. He felt he could contribute to his fast-rising homeland.

"Patriotism does have something to do with it, because I just cannot imagine going anywhere else, even if the conditions were equal," said Dr. Yao, who is 58.

The rest of the article talks about the challenges of building up world-class universities in China. It did have this interesting line from Prof. Yao:
Dr. Yao said he had expected to concentrate on creating a world-class Ph.D. program but had found surprising weaknesses in undergraduate training and had decided to teach at that level. "You can't just say I'll only do the cutting-edge stuff," he said. "You've got to teach the basics really well first."
Sounds like something you could say about the U.S. system as well.

Wednesday, October 26, 2005

a puzzle

that someone asked me, and I don't know the answer to:

Is there a closed-form expression of some kind corresponding to the formal power series

∑ x2k
k = 0


The ranks grow..

D. Sivakumar, formerly of IBM Almaden, and now at Google, has started a 'Glob'.

Tuesday, October 25, 2005

Maybe there's hope for theory after all...

Conversation had by your humble blogger in a taxicab:

Driver: So where are you from ?
Humble blogger: From India
D: Ah, so are you a student (this was in Philadelphia) ?
HB: No, I work here, at AT&T
D: Oh, so what do you do ?
HB: I'm a computer scientist...
D: So do you do programming ?
HB: Not exactly, I do research. It's slightly more mathematically oriented.
D: Oh, so you mean algorithms ?
HB (slightly surprised): Yes, actually. That's right.
D: All this P vs NP stuff ?
HB (even more amazed): Yes, that kind of stuff.
D: Yeah, you know some years ago, three Indians showed how to find prime numbers efficiently.
HB (at this point on the edge of his seat): Yup ! So do you have a computer science background ?
D: Yes, I took a class at Brooklyn College

.... some time later....

D: I liked the algorithms class. The instructor was tough, but I passed. I really liked converting NFAs to DFAs; I remember standing in front of a blackboard at 2 in the morning.
HB: (at this point I have nothing more to say, except to congratulate him on his knowledge).

Monday, October 24, 2005

FOCS Panel Discussion: The Soundbite edition

Shorter Aaronson: Theory needs prophets
Shorter Quantum Pontiff: Theory needs physics
Shorter Vanity of Vanities: Theory needs a merge dance.
Shorter Geomblog: Theory needs metaphors.

I liked Rocco's characterization of Bernard's comments at the panel discussion:
If the broader scientific community comes to use algorithms as a conceptual framework for the problems they are dealing with, we will have done well.
This is an important point: PR is a good thing (whether you publish in Nature or not), and "spreading the word" is also great. But more fundamentally, it is important to spread the idea that when you think about any kind of computational process, you are thinking about an algorithm, and therefore trying to define what it is you are computing will help you tap into the best way of solving the problem. The problem with a lot of the computational conceptualization (phew, there's a tongue twister) that goes on outside of theory is that it is akin to looking at a physical process and trying to come up with your own physics to understand it, rather than invoking (at the very least), Newton's laws of physics.

A legitimate retort to the above comment is that often, theoretical methods fail to address the phenomenon under study. To a degree this is true: theory of computation involves "laws of the limit"; worst-case and asymptotic analyses often fail to detect phenomena that occur when n is not heading off towards infinity. But why it's only true to a degree is because there is a lot of theory (maybe not the sexiest kind) that does push back towards non-asymptotic behaviour, for example in the various explanations of why quicksort works so well inspite of its worst-case O(n2) complexity.

The Quantum Pontiff makes another good point:
they need to attempt to convey their results in a way which, while not totally faithful to the science, gives the reader a reason to believe that they understand what is going on. This, of course, is much hard to do as a computer scientist than as a theoretical physicist because in the former rigor is held in higher esteme than in physics where hand-wavy arguments hold a much greater standing.
It's true; viewing theoretical computer science as a form of mathematics (Avi Wigderson's clarification notwithstanding) gives us the same public persona (at best) as mathematicians. Most mathematicians are viewed by the general public as glorified accountants (and mathematics as "counting"). The effect of this on our ability to proselytize and disseminate is even more noxious. After all, mathematics remains under the thrall of G. H. Hardy's apology, in which
There is no scorn more profound, or on the whole more justifiable, than that of the men who make for the men who explain. Exposition, criticism, appreciation, is work for second-rate minds.
How on earth are you supposed to come up with the next Feynman/Hawking/Sagan under such a withering blast of derision ?

Returning to the Quantum Pontiff, he makes another point close to my heart:
CS theory hasn’t, in my opinion, exploited the fact that it is studying a fundamental question: the fundamental limits of computation. This fundamental research direction, to me, is as deep as understanding what the laws of nature are (and if you catch my on some days I might even say that one is deeper than the other.
As Dave Winer might say, 'Bing!'. This is exactly right, and is a point that unfortunately many theoretical computer scientists don't agree with. The evidence in favor of this view of theory would take another post, and ultimately does have a faith-based element to it, but it is hard not to look at quantum computing, information theory, and black hole entropy, and feel that there is something more to computing than computers.

Sunday, October 23, 2005

Algorithms and Technologies, and STOC 2006

Of course, effective demonstrations of the power of algorithms require actual algorithms ! A much closer deadline is STOC 2006, to be held in Seattle from May 21-23, 2006. The deadline is November 3.

Algorithms and Technologies, and ESA 2006

I recently attended a talk on convex programming by Stephen Boyd. He made the interesting point that least-squares minimization, linear programming, and even semidefinite programming are now "technologies", in that they have a well defined theory and polynomial time solutions, efficient black box code that works for very large sized inputs, and overall, a very well worked out understanding of what variants to use for what kinds of problem instances.

What this means is that tomorrow if I have to suggest to someone that their problem can be formulated as a convex program, this not only gives them a solution "in theory", but gives them a solution "in practice", that they can actually implement. Such happenings are surprisingly rare when applying theoretical methods to real-world problems; it is not often that one can invoke a sophisticated algorithm that has known worst-case bounds, and find an implementation for it.

Today is the FOCS panel discussion on 'Exciting the public about (theoretical) computer science'. One way (though certainly not the only way) of exciting people about theory is to demonstrate the power of mathematical rigor and worst-case analysis by showing how algorithms that have good theoretical guarantees actually work well in practice: in other words, showing that algorithms can become "technologies".

This is a really long and roundabout way of mentioning a call for papers: ESA 2006 will be held in Zurich from Sep 11-15, 2006. ESA has a both a theoretical and applied track (disclaimer: I am on the PC for the applied track), and this year the applied track is trying a new approach to submissions:
Authors of papers that include an experimental aspect can make supporting source code or data sets available on a webpage (referenced in the submitted paper) or send them by e-mail to esa06code@mcs.le.ac.uk; such supporting files will be considered by the programcommittee members at their discretion.
So get your source code ready ! And remember this is not about theory making itself "useful", but about demonstrating that the formal approach to algorithm design is far more effective than any other approach.

Saturday, October 22, 2005

Alternate logics

A Neighbourhood of Infinity writes about alternate logics (apart from "standard" two-valued modus ponens based inference). Among the many alternate logics discussed is intuitionistic logic, which objects to the law of the excluded middle (either X or not X is true) and its connection to constructivist mathematics (where "there exists" is equivalent to "we can construct"). Sigfpe makes the following comment regarding the two:
...intuitionist logic is closely tied up with Constructive Mathematics (not to be confused with a similarly named educational fad) which insists that you always construct things you claim exist. This means that it goes hand in hand with computer science where people's programs are usually expected to produce a result - not just say a result exists.
Which is funny, because "there exists" statements abound in (theoretical) computer science, especially when dealing with combinatorial constructions. Admittedly, the gap between "there exists" and "here it is" is often an exponential rather than infeasible gap (the probabilistic method, for example).

He goes on to discuss higher-order logics, and mentions an interesting fact:
Quine believed that Second Order Logic isn't even Logic.
and goes on to say:
And even though Second Order Logic appears to have greater expressive power mathematicians actually get along fine with First Order Logic. Every time they make a statement about properties there's always a workaround (sometimes quite inelegant) for expressing something that's just as useful in First Order Logic.
Aha, but that's not true for NP, for example. After all, NP is equivalent to a restricted form of second order logic, and P is First Order logic augmented with an ordering and fixed point operator.

Yet another nugget that I learned from this article is the existence of "Quantum Logic". Namely,
Quantum Logic really is the quantization of logic - we have replaced truth values with non-commuting Hermitian operators whose eigenvalues are the results we want!
Overall, a very illuminating article.

FOCS 2005

FOCS starts today in Pittsburgh. Alas, I won't be attending. What sounds most intriguing to me (apart from the papers themselves) is the session on 'Exciting the public about (theoretical) computer science' at the business meeting. I'm looking forward to hearing what folks have to say about this. Since I can't be there, I'll merely reiterate what I said earlier about metaphors for computation.

I also note that Bernard Chazelle is giving a tutorial on Algorithmic Techniques and Tools from Computational Geometry, and Subhash Khot is giving a tutorial on the Unique Games Conjecture. I especially regret not being able to attend the second tutorial, given the frenzy of activity that this conjecture has generated in recent years (see Lance's post for a summary of the problem).

Friday, October 21, 2005

Beauty and ugliness

Via BoingBoing comes this gallery of mathematical art from Justin Mullins. Each piece of art is a mathematical concept or set of equations, with an evocative title. I enjoyed browsing the gallery, and even reading the liner notes below each picture.

The first two in the gallery are Beauty and Ugliness. Beauty is represented by the famous equation

e+1 = 0

and Ugliness is represented by the 633 graphs representing the difference cases to check in the proof of the 4-color theorem (this is the "simplified" proof by Robertson, Sanders, Seymour and Thomas).

Wednesday, October 19, 2005

Books that read like a thriller...a new entry.

A while back, I had talked about books that read like a thriller:
Occasionally one finds technical books that, either because of the subject matter, or the style of writing, or some other inexplicable properties, make for such gripping reading that one almost wants to read from cover to cover in a single sitting.
The latest entry in this category is Convex Optimization, by Stephen Boyd and Lieven Vandenberghe. It's a Cambridge University Press book, and covers a wide range of topics in convex programming, including convex duality, semidefinite programming, interior-point methods, and applications. One special case of convex programming is second order cone programming, which is further a special case of semidefinite programming, and includes the minimum enclosing ball problem and max-norm distance computations as special cases.

Why do I like this book so much ? An underrated feature of a good book is its choice of fonts, and although this seems like a rather shallow reason, I have to say that it makes a big difference. Nothing like a clean layout and nice fonts to make a text easy to read.

The material is organized very well. It starts off with the basic theory of convex functions, and slowly introduces the various convex programming variants, from the simpler kinds to the fully general versions. Applications to problems are given everywhere, and the exercises contain all kinds of interesting variants. The book works both as a text, to be read from cover to cover, and as a reference for specific definitions. It does probably the most difficult thing of all; hit the right tone in terms of mathematical exposition and intuition. I particularly liked the sections on interior point methods, as well its explanation of Legendre duality (the generalization of traditional geometric duality)

Tuesday, October 18, 2005

FWCG: One day and counting

One day and counting: tomorrow is the deadline for submissions to the Fall Workshop on Computational Geometry.

Sunday, October 16, 2005

Sampling from the simplex

This post started off as a call for help, and after some digging, became an answer to a question :).

How do you sample uniformly from the unit simplex ? Specifically, I would like a uniform sample from the set X = { (x1, x2, ..., xD) | 0 <= xi <= 1, x1 + x2 + ... + xD = 1}. D is the dimension of the simplex. The first class of approaches that one might try are "rejection" sampling methods. For example, we can sample element from the unit hypercube, and reject elements that don't lie in the simplex. This of course gets worse and worse as the dimension increases: the number of rejections for each valid point gets more and more. Another type of approach is what is called a "projection" method. for example, sample a point from the unit hypercube, and mark off the point on the simplex that intersects a ray connecting the origin and the sampled point. The problem with these approaches is that they generate non-uniform distributions, and you then have to figure out a way of normalizing correctly. An interesting hybrid is the idea of sampling from the unit sphere, and then using the map x -> sqrt{x} to map the point to the simplex. However, this mapping is also non-uniform, and generating a uniform sample efficiently from the sphere is also non-trivial.

Generating a point from the D-dimensional simplex is equivalent to sampling D-1 points from the unit line, sorting them, and then taking the intervals between adjacent points. The distribution of the sorted set (the order statistics) can be derived from the distribution the points are sampled from. In general, if the points are taken from a distribution with pdf given by f and cdf given by F, then the pdf of the kth smallest element is given by

f(k)(x) = n f(x) F(x)k-1 (1-F(x))n-k [ n-1 choose k-1 ]

For a uniform distribution, these values are given by the Beta distribution, with parameters k-1 and n-k. Of course, what we want are the differences between adjacent elements.

It turns out that there is an elegant way of doing this. We generate IID random samples from an exponential distribution (which you do by sampling X from [0,1] uniformly, and returning -log(X)). Take D samples, and then normalize. Voila, the resulting list of numbers is a uniform sample from the simplex.

A similar idea works to sample from the unit sphere. Sample elements IID from a normal distribution, and normalize so the resulting vector has norm 1.

For more on this, check out Luc Devroye's book, 'Non-Uniform Random Variate Generation'. It was originally published by Springer, and is now out of print. He has the entire book available on his website.

Update: As one of the commenters pointed out, uniform sampling from the simplex is a special case of sampling from a Dirichlet distribution (whose main claim to fame is as the conjugate prior for a multinomial distribution). To sample from the Dirichlet distribution, you can take normalized samples from an appropriately constructed gamma distribution, which for the case of sampling from the simplex reduces to sampling from the exponential distribution as above.

Saturday, October 15, 2005

Numb3rs

I haven't been posting regular Numb3rs summaries for the new season, because I've been disappointed with the new episodes; they appear to be using math mostly for throwaway comments, rather than actually trying to get into any specifics (Farey sequences, the Art Gallery Theorem, and cellular automata being some examples).

They did mention curvelets though, when doing image reconstruction. This particular reference is interesting because, as Shankar Krishnan informs me, one of the top experts on curvelets, Emmanuel Candes, is actually at CalTech, and Caltech's math department has been providing technical expertise for the show.

Wednesday, October 12, 2005

SoCG 2006: The Vortex edition

The compgeom mailing list has the second CFP for SoCG 2006, to be held in Sedona, Arizona. Paper deadline is Dec 5, 2005, but the abstracts are due Nov 23, 2005. Jeff predicts doom.

However, what he doesn't realize, and what the CFP doesn't mention, is the natural healing powers of the Sedona vortex,
a spot where a concentrated source of energy is flowing from. Vortexes are an accelerator for healing and clearing. Or a place to receive an abundance of energy.
And consider this effect of visiting Sedona:
  • A sense of communion with unseen forces surrounding you, including the intense desire to spend more time with Nature.
Personally, that happens to me whenever I have to review papers.

And for authors, there's even more that Sedona has to offer. Especially if your paper gets rejected:
THE GATEKEEPER VORTEX EXPERIENCE

This experience is designed to accelerate clearing of emotional blocks, letting go of the past, releasing feelings of frustration, anger and injustices.
If that doesn't work, you could always escort reviewers up to
THE EAGLE'S VIEW VORTEX
...follow the course of water up the red rocks to the Eagle’s View Vortex! The view is incredible! [..] stand [reviewers] on the vortex and guide them into shape-shifting into the eagle! They fly ahead into the Dream Weave and explore healing a moment in their life or just fly and fly!!
Once you've done that, and feel better, try out
THE HAWK VORTEX EXPERIENCE

Boulder dancing at it’s purest! Feel yourself let go and fly as you hop from one boulder to the next!
with the extra bonus that
This place has a mystical feel to it and if your spirit is ready, the hawk that lives there might pay you a visit!

7 days and counting...

The deadline for submitting abstracts to the Fall Workshop on Computational Geometry is only 7 days away now.

Saturday, October 08, 2005

The "right" kind of research

Not Even Wrong points to an essay by Sir Michael Atiyah in the Bulletin of the AMS on mathematical beauty. Reading it (and he writes, and speaks exceedingly well; the interview that he and Isadore Singer gave after wining the Abel prize is a gem to read) reminded me of the constant arguments we have about the "right" kind of theoretical computer science (which the FOCS/STOC/SODA/SoCG brouhaha was really a proxy for).

It reminded me of these arguments because it illustrates how pointless they really are (and I have been as guilty as any in attempting to "define" the right kind of research):
We can identify a long list of desirable qualities: beauty, elegance, importance,
originality, usefulness, depth, breadth, brevity, simplicity, clarity. However,
a single work can hardly embody them all; in fact some are mutually incompatible.
Just as different qualities are appropriate in sonatas, quartets or symphonies, so
mathematical compositions of varying types require different treatment.
He goes on to describe the myriad ways in which mathematical beauty (to him) appears in a work. It's interesting to read it, but what's even more important is the disclaimer:
...what follows are my personal preferences and make no claim to universality. In fact I glory in the diversity of mathematics and the lack of a uniform straightjacket [sic]. This is what makes it live.
At the end of the essay, he briefly dwells on the role of applied work in mathematics (another topic of relevance in theoryCS):
Much of mathematics was either initiated in response to external problems or has subsequently found unexpected applications in the real world. This whole linkage between mathematics and science has an appeal of its own, where the criteria must include both the attractiveness of the mathematical theory and the importance of the applications. [...]. Not only can we mathematicians be useful, but we can create works of art at the same time, partly inspired by the outside world.

Friday, October 07, 2005

Priority Sampling...

Today's AT&T Labs math seminar was by Mikkel Thorup, who talked about a new paper with Nick Duffield and Carsten Lund on priority sampling.

Suppose you are given items 1, 2, 3, ... n with weights wi. The goal is to be able to compute subset sum queries: For a given set of indices I (the query), compute the sum of all weights wi, i \in I. In situations of interest, n is huge, so you can't really store all the weights. What you'd like to be able to do is sample from the universe, obtaining a representative set of indices S with appropriate weights, so that when a query comes in, you add up the weights of elements in the intersection of I and S, and return that as your answer.

Now you can sample the items uniformly at random, but you'd miss the (typically few) elements with high weight. You could do weighted sampling, but then you'd constantly be hitting the elements with high weight, and that's no good. What you really need is some kind of weighted-sampling without replacement (which you can simulate by throwing out duplicates, but then you waste time trying to find a reasonable sample).

Their approach is an elegant idea called priority sampling. For each item, generate a random number ai between 0 and 1. Assign a priority to i of value wi/ai. Whp, all priorities are distinct. Take the top k elements (with respect to priority), and let the priority of the (k+1)th element be t. Now give each sampled element a weight w'i = max(wi, t). All unsampled elements get a new weight of 0.

The main bit of magic is the following fact:
E[w'i] = wi
In other words, the estimation is unbiased, from which it immediately follows that any subset sum estimate is also unbiased. An even more surprising fact (surprising because the threshold t appears to depend on k elements), is:

w'i and w'j are independent.

which then means that the variance of any subset sum is merely the sum of the individual variances. A conjecture of the authors, recently resolved in the affirmative by Mario Szegedy, is that this sampling scheme (using k+1 samples) provides the minimum total variance unbiased estimator over all such schemes that sample k elements from the input.

A formal expression for the variance of this sampling scheme is a bit complicated to write down in general, but for simple cases it can be written explicitly. The authors have experimental results showing how the estimates provides by priority sampling converge to the true estimates.

The sampling trick is very neat, and does appear magical even after you see the proof. There is a fascinating connection to a kind of sampling called threshold sampling, where each element is picked with a specific probability (and thus the sample size is not fixed), and a fixed threshold is used to decide whether an element is retained in the sample or not. Informally, threshold sampling provides a lower bound on the variance of such a scheme, and priority sampling is close to matching it.

What I understand less is the apparent connection to range sum histograms. In the data streaming world, a range sum histogram H is an approximation of an array A[1..n] such that range sum queries in A can be answered in H approximately. Muthukrishnan and Strauss have a SODA paper from 2003 on this topic, where they show how to construct histograms on a*B buckets that are within some approximation error of the best B-bucket histogram. One key difference is that in their formulation, the average squared error is minimized rather than the worst-case error of a single query. Note also that the subset sum formulation is more general, since ranges are a special kind of subset.

Friday, September 30, 2005

Fall Workshop on Computational Geometry, 2005

Not a moment too soon, here's the CFP for this year's CG Fall workshop. Your humble geomblogger has the honor of chairing the organizing committee this year.

For the first time, the workshop will be held at the University of Pennsylvania, in Philadelphia. Submission deadline for 2 page abstracts is Oct 19, with results announced a week later. As always, we encourage submissions for work in progress, being submitted, already published; the workshop has no formal proceedings, and the participatory atmosphere is one of the best things I have liked about workshops past.

Wednesday, September 28, 2005

Somewhere, Edsger Dijkstra is smiling:

From a Saturday WSJ article on Windows Vista (subscription required, alas):
Old-school computer science called for methodical coding practices to ensure that the large computers used by banks, governments and scientists wouldn't break. But as personal computers took off in the 1980s, companies like Microsoft didn't have time for that. PC users wanted cool and useful features quickly. They tolerated -- or didn't notice -- the bugs riddling the software. Problems could always be patched over. With each patch and enhancement, it became harder to strap new features onto the software since new code could affect everything else in unpredictable ways.
The UNIX philosophy makes sense:
A newcomer to the Windows group, Mr. Srivastava had his team draw up a map of how Windows' pieces fit together. It was 8 feet tall and 11 feet wide and looked like a haphazard train map with hundreds of tracks crisscrossing each other.

That was just the opposite of how Microsoft's new rivals worked. Google and others developed test versions of software and shipped them over the Internet. The best of the programs from rivals were like Lego blocks -- they had a single function and were designed to be connected onto a larger whole. Google and even Microsoft's own MSN online unit could quickly respond to changes in the way people used their PCs and the Web by adding incremental improvements.

Responses to journal referee reports:

Then:

Dear Sir,

We (Mr. Rosen and I) had sent you our manuscript for publication and had not authorized you to show it to specialists before it is printed. I see no reason to address the—in any case erroneous—comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere.

Respectfully,

P.S. Mr. Rosen, who has left for the Soviet Union, has authorized me to represent him in this matter.

And now:

Dear Sir, Madame, or Other:

Enclosed is our latest version of Ms # 85-02-22-RRRRR, that is, the re-re-re-revised revision of our paper. Choke on it. We have again rewritten the entire manuscript from start to finish. We even changed the goddamn running head! Hopefully we have suffered enough by now to satisfy even you and your bloodthirsty reviewers.

I shall skip the usual point-by-point description of every single change we made in response to the critiques. After all, it is fairly clear that your reviewers are less interested in details of scientific procedure than in working out their personality problems and sexual frustrations by seeking some kind of demented glee in the sadistic and arbitrary exercise of tyrannical power over helpless authors like ourselves who happen to fall into their clutches. We do understand that, in view of the misanthropic psychopaths you have on your editorial board, you need to keep sending them papers, for if they weren't reviewing manuscripts they'd probably be out mugging old ladies or clubbing baby seals to death. Still, from this batch of reviewers, C was clearly the most hostile, and we request that you not ask him or her to review this revision. Indeed, we have mailed letter bombs to four or five people we suspected of being reviewer C, so if you send the manuscript back to them the review process could be unduly delayed.

[.....]

Monday, September 26, 2005

Dickson Prize for David Haussler

David Haussler is perhaps best known to geometers for the work he did with Emo Welzl on the theory of epsilon-nets, one of the most influential ideas in the area of randomization and computational geometry. He is also famous for his work in bioinformatics, specifically for the use of hidden Markov models for sequence analysis.

He was just awarded the Dickson Prize by CMU.

Tuesday, September 20, 2005

Sunday, September 18, 2005

Holes in Sets: The Conclusion

Over a year ago, I had mentioned a fascinating "Ramsey" problem on the plane that dates back to Erdos and Szekeres, namely:
For a fixed k, does there exist an n = n(k) such that any point set in the plane of size at least n(k) contains an empty convex polygon (a "hole") of size k ?
Horton showed that no such n exists for k = 7. Specifically, for any n, there exists a set of n points in the plane that had no empty convex heptagon. Conversely, Bárány and Füredi showed that for k = 5, for sufficiently large n, any point set of size n had an empty convex pentagon.

For k = 6, the matter was unresolved, until just recently. This week's combinatorics seminar at NYU is by Andres Varon, who announces that Tobias Gerken has just closed this gap, showing that such an n exists for k = 6. If and when I get more details, I'll post them here.

Update: David Eppstein and Libertarianoid mention this as well. Libertarianoid has further details:
The conjecture was recently resolved by Tobias Gerken in affirmative. As Matousek says:

"P. 34, the 6-hole problem was resolved in 2005 by Tobias Gerken, who showed by a clever case analysis that any set containing 9 points in convex positions has a 6-hole. The proof was simplified by Pavel Valtr to about 4 pages (but one needs more than 9 points in convex position to start with, and hence the resulting upper bound for n(6) is worse)."

The original proof is somewhat long (around 39 pages), but elementary and not too hard to read. Valtr gave a talk on his simplified proof a while ago.

Friday, September 16, 2005

"Threshold"

One of the new fall series premiering on CBS is "Threshold". I wouldn't comment on it except for this (referring to one member of an alien pursuing team):
The most fractious of all is Arthur Ramsey (Peter Dinklage), a dwarf with a penchant for strip clubs who is a brilliant mathematician and linguistics expert. "If our E.T. tries to phone home," Molly tells Cavennaugh, "he'll translate the call for us."
Hmm... In any case, other disciplines have been equally abused:
  • Dr. Nigel Fenway (Brent Spiner), a former 1960's radical and embittered NASA microbiologist.
  • Lucas Pegg (Rob Benedict), a meek, somewhat nerdy physicist who reads First Corinthians on his BlackBerry.

Wednesday, September 14, 2005

Herman Goldstine Fellowship

The Mathematical Sciences Department of the IBM Thomas J. Watson Research Center is pleased to announce its 2006-2007 Herman Goldstine Postdoctoral Fellowship. The fellowship provides an excellent opportunity for research in Mathematics and Computer Science, with exposure to technical problems arising in industry. Details, including application procedure, may be found at
http://domino.research.ibm.com/comm/research.nsf/pages/d.math.goldstine.html

This is a great fellowship for graduating Ph.Ds in theoryCS. I wouldn't be surprised if most of the readers of this blog already knew about it, but just in case..

Deadline is Dec 31, 2005, and applications will be received from Oct 1.

And who is Herman Goldstine ?
Goldstine's career has been long and distinguished. Although his early research was in the area of calculus of variations, during WWII he joined John von Neumann in the groundbreaking ENIAC (Electronic Numerical Integrator And Computer) project. ENIAC was one of the early innovations that paved the way for the modern computing industry. Goldstine joined IBM in 1958 where he established the Mathematical Sciences Department and served as its first director. As recognition for his contributions to IBM and to science he was appointed an IBM Fellow in 1967, and served in this position until he retired in 1973. Most recently he served as the executive director of the American Philosophical Society.

Goldstine has received numerous awards for his impact on science and technology. The National Medal of Science, the Harry Goode Award, the IEEE Pioneer Award, and memberships in the National Academy of Science, the American Academy of Arts and Science, and the American Philosophical Society are among these awards.

Friday, September 09, 2005

SODA 2006

As you all probably know by now, SODA results are out. Lots of interesting stuff; a rough survey indicates around 16 geometry papers (depending on how you count them). Lots of embedding results as well.

I hope to track down and review some of the papers once they start appearing online. For now, I will entertain you with one of my own (assuming this doesn't break some academic blogosphere code of etiquette ;)).

Deepak Agarwal, Jeff Phillips, Suresh Venkatasubramanian.

The title (or at least the first part) is a tip of the hat towards an earlier paper by Friedman and Fisher titled 'Bump Hunting in High Dimensional Data'. The basic setup is this: you're given points labelled red or blue, and a class of shapes (usually hypercubes, but you could have spheres, pyramids, etc). Given a shape, you count the number of red and blue points in it, and compute a discrepancy function that could be as simple as |#red - #blue|. The goal is to find the shape that maximizes this function (i.e, finds the biggest bump).

Where do the red and blue come from ? One of the more popular applications of this problem is in biosurveillance, where the red points represent incidence of disease, and the blue represents a baseline population (the points may be tagged with weights). A bump is an unusually high or low disease rate with respect to the population.

What discrepancy function do we use ? The "simple" combinatorial discrepancy function described above has actually been looked at before, in the context of learning. Here, the red points are "good" elements of a classifier and the blue points are "bad", and the shape is a hypothesis that has classifies the most points correctly while misclassifying the fewest. Dobkin, Gunopulos and Maass presented exact and approximate algorithms for this problem in arbitrary dimensions.

But in the biosurveillance domain, statistical measures are more commonly used. The idea is that we want the discrepancy function to model a likelihood ratio with respect to some underlying population distribution: how likely was it that I saw as many red and blue points that I did ? The resulting function often resembles a Kullback-Leibler distance, or a chi-squared distance, and one of the most popular measures is the Kulldorff scan statistic, developed by Martin Kulldorff in 1997.

Unfortunately, such measures are painful to work with from an algorithmic point of view. They are not as "nice" as the (linear) combinatorial discrepancy, and thus the best algorithms have been either heuristics or (theoretically) brute force methods that have good empirical behaviour.

What we show is that as long as the discrepancy function is convex (which likelihood-ratio based distances tend to be), you can approximately rewrite it as the envelope of a set of linear functions, which you then whack with the DGM algorithm. The question of course is, how many functions do you need ? This is the tricky part, and involves an asymptotic bound on the eigenvalues of the Hessian of the discrepancy function.

This is a fairly general crank that allows us to solve efficiently (and approximately) a number of (till now) different statistical discrepancy problems (including maximizing with the Kulldorff scan statistic) with the same basic method. There are other results in the paper, which I hope to have online soon.

Wednesday, September 07, 2005

Michael finds the elephant

..in the room:
I think STOC/FOCS has become progressively less algorithms-oriented over the years. My impression is that STOC/FOCS emphasizes "mathematical depth", "cleverness", and "theory purity" over, say, "usability" and "insight into a practical problem". I know there are a large number of people who think this, and I would therefore encourage people who don't see this or think it is true to simply consider the issue again. I do not wish to make a judgement call as to whether this is good or bad -- this is just what these conferences, I believe, have turned into.

SODA, not coincidentally, has become a refuge for more algorithmically oriented papers. (As have more specialized conferences.) In some ways, this is good -- these papers have a home! In some ways, it is bad, because I think it has exacerbated the division, which in part explains the extreme opinions the post has raised. (I too have had reviews essentially saying, "This is algorithmic, belongs in SODA." Perhaps the reviewer was just being kind and didn't want to negatively comment on the quality of my work. But my impression is that for many FOCS/STOC reviewers "algorithmic" has itself become a quality comment, in the sense that "algorithmic" papers that do not pass the "deep/clever/pure" test are pretty much dismissed out of hand.)

I have been on committees for FOCS/STOC conferences, and I feel that I have seen these biases in action. Strangely, I did not come away with any ideas on how to change things; once a culture develops, it seems very hard to change.
I'd like to point out that there is nothing inherently wrong with FOCS and STOC being exactly the way Michael describes them, as long as these biases are made clear. A conference is defined by its community, and by the papers it accepts. If I know that only "deep/clever/pure" papers should be sent to STOC/FOCS, it makes my life much easier, as long we are willing to relax the idea that only STOC/FOCS must represent the best that theory has to offer.

Tuesday, September 06, 2005

On SODA

Lance:
Is it truly a good thing to move from a STOC/FOCS/specialized conferences system towards a STOC/FOCS/SODA/other specialized conferences system?

JeffE:
Yes.

I'm surprised there is even a debate over this. Specializing conferences creates more interest, and you know what to expect. If we think of a conference as a group of people with shared interests, what's not to like.

The problem is the traditional role STOC/FOCS have played in "defining theory". I have actually heard arguments that since CG papers are not sent to STOC/FOCS, CG is not theory. It's about time (and in fact this has happened for many years now) that people realize that STOC/FOCS are slowly becoming specialized conferences. Bill Gasarch asks what their specialization is: a lot of it depends on the current hot topics of the day. It seems to me that as an area of theory gets hot, it starts getting more papers into STOC/FOCS, and as the hubbub dies down, the topics move on. SODA on the other hand, has a fairly stable set of topics that are covered, with variations from year to year in terms of which areas are represented more.

Update: Our strait-laced theory community is letting it all hang out. Join in the fun :)

New ! FOCS ! Almost as good as SODA !

Ok, I'm kidding. The 23rd Symposium on the Foundations of Computer Science, otherwise known as FOCS, is in Pittsburgh this year, from Oct 22 to 25. Early registration ends Sep 23,as does the hotel blocking, so get cracking.

Saturday, September 03, 2005

Tulane students.

Because of the hurricane, Tulane U has had to close down for the semester. For Tulane students (or new students), this is a serious disruption. In a wonderful gesture, numerous universities are coming to the rescue, with offers of admission, classes via correspondence, and other help. The livejournal academics_anon has a list of all the universities that have offered assistance to Tulane students, and this list is constantly being updated. Since classes have already started in many universities, there isn't too much time to act though, so spread the word (and this link).

Tags: ,

Friday, September 02, 2005

Kepler's, or the end of an era

(if you've never been to the Bay Area, the post below will make little to no sense. Be warned...)

For me, Kepler's Bookstore defined the quintessential Bay Area experience, before Silicon Valley, before choked highways, and before you could shake every tree in Menlo Park and have a VC fall out. On a bright and cool (and weren't they all) Saturday morning, you'd get on your bike and pedal over (exclusively on bike lanes) to Menlo Park to browse through what was the real "clean, well lighted place" for books. A happy hour, and a much lighter wallet later, you'd take your stash and walk out the door, and sink into a chair outside, holding your double espresso from Cafe Borrone right next door, and let the hours go by in a pleasant haze.

I am an online maven, and an amazon.com addict. I scorn old technology and the old ways, and am constantly restless for change. But I will mourn the loss of Kepler's, because the Bay Area will never be the same for me again.

Thursday, September 01, 2005

No Free Lunch Theorems

Neural networks and genetic algorithms are general search strategies that are often used as heuristics to solve otherwise intractable problems. They provide the luxury of not having to think about the internals of a specific problem, and in certain cases can perform quite well as practical heuristics.

I dislike them greatly though (even though in my misguided youth I used to play with GAs) because they are extremely general hammers that typically don't have provably performance or running time guarantees, and don't reveal any interesting insights into the structure of a problem.

Such search strategies are also victims of their own generality, and the dagger that pierces them is called a "No Free Lunch" theorem. No Free Lunch theorems are generally of the form
...all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. In particular, if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A.
The crucial point here is the "averaged over all cost functions". The NFL theorems are not saying (and this was something I used to be confused about) that search procedures are doomed; rather, they say that blind search procedures (such as those performed by genetic algorithms) are doomed. The original NFL theorem is attributed to Wolpert and Macready; many variants have been proved as well.

Joseph Culberson has a nice perspective on such theorems from an algorithmic point of view, and attempts to frame them in the context of complexity theory.

Tags: ,

Wednesday, August 31, 2005

More numerology

The h-index and its cohorts are proposed measures of individual research impact, measured by citation counts. More traditional measures include things like citation counts, and number of high quality journal/conference publications. Derek Lowe at In the Pipeline has been blogging about the Impact Factor, the number du jour for measuring the "quality" of a journal. It is defined as the number of citations to articles in a journal divided by the number of papers in the journal (computed over a window of two years).

Apparently, it is now a popular game for journals to try and ramp up their IF (for example, a journal that many review articles will generate very high citation counts). This has caused much angst among researchers, because like any "objective" system, the IF can be gamed to the point of meaninglessness.

It is probably no surprise that we search for "measurable" indicators of quality, whether they be paper ratings, conference acceptance rates, individual impact, journal quality, or what-have-you. On the other hand, I doubt there is anyone who actually takes such ratings seriously (clearly we pay attention to the numbers, but always with the disclaimer that "true quality can't be measured in numbers"). It must be peculiarly frustrating to people in the "hard" sciences (and I will take the liberty of including theoryCS in this group) that we attempt to find objective truths, and yet our attempts to evaluate the quality of our work are so fuzzy and ...(this is hard for me to say..) subjective.

I wonder if our brethren (sistren?siblingen?) in the "not-so-hard" sciences are better able to come to terms with this.

Tuesday, August 30, 2005

Fun with numbers

In all the recent discussion of the h-index (my GSH is 11), it's worth noting that we need some kind of normalization by community. As any one who's ever written a paper in databases will tell you, it's the best way of getting a HUGE citation count. Why ? Because the field is a lot bigger, and the odds are that many people will end up citing your work. For example, my most cited paper is an old piece of work for the WWW conference.

Maybe you could normalize the index by dividing by the maximum.

Monday, August 29, 2005

funny...

Via Chris Leonard, the FETCSWCHG:
If you know the difference between P and NP, and the difference between offsides and icing; if you can properly pronounce Dijkstra and Jagr, Euler and Roy -- this is the page for you.

With sufficient encouragement, we will acquire ice time for the First Ever Theoretical Computer Science World Cup Hockey Game. Please fill out the form below to help us organize this unprecedented and quite possibly unique event.


League Commissioner (but she really wants to coach): Catherine McGeoch, Amherst College ccm@cs.amherst.edu
President of the Players Union: Cliff Stein, Columbia University cliff@ieor.columbia.edu
President of the Players Intersection: Micah Adler, University of Massachusetts micah@cs.umass.edu
We might even get to see the STOC 2006 PC Chair pad up :).

This has never happened to me

but I wish it had :):

Wednesday, August 24, 2005

arxiv and trackbacks.

By now, everyone is probably aware of the fact that the arxiv supports trackbacks. In theoryCS (excluding the quantum folks), we don't use the arxiv that much (with some notable exceptions), so this might not affect us greatly, but the physics bloggers are rather happy. It was interesting to read this at Cosmic Variance:
Most people these days post to the arxiv before they even send their paper to a journal, and some have stopped submitting to journals altogether. (I wish they all would, it would cut down on that annoying refereeing we all have to do.) And nobody actually reads the journals — they serve exclusively as ways to verify that your work has passed peer review.
Now that would be a neat model to adopt.

p.s I personally am not that enamored of trackbacks. Blogger doesn't support them, trackback spam is rampant, and they can be cumbersome to use. Technorati "who links to me" pages are often more effective. But they do provide a (semi-)automatic comment mechanism that allows for discussions of papers to be carried out in the blogosphere, so it will be interesting to see how effective this is.

Saturday, August 20, 2005

Data Mining and the GPU

I was planning to be at KDD tomorrow to co-present a tutorial on data mining and graphics hardware, but as I am rapidly learning, man proposes and baby disposes ! In any case, if you are in the area, my very able co-conspirators Shankar Krishnan and Sudipto Guha will be pulling extra weight for me.

Mattress flipping

I am told that sleep is something that people do from time to time. I am further told that it is possible to sleep flat, on a structure called a mattress. If sleeping on mattresses is something you do on a regular basis, you might be interested in an amusing reflection on the group theory of mattress flipping.

(Via Ars Mathematica)

Friday, August 19, 2005

I'm back..

A somewhat forced vacation, caused by a 50% increase in family size (sometimes, constants DO matter !). While I recover, look over this set of recommendations on giving scientific talks, by Robert Geroch.

Whenever I see suggestions on how to use powerpoint effectively, I see these inane comments like "never use more than 6 words per slide" and what-not. For high signal-to-noise ratio presentations like at conferences, this hardly makes sense. The suggestions given above are remarkably intelligent, and are directed towards scientific talks. The fact that this was written in 1973 only goes to show that the principles of good public speaking are timeless.

Friday, August 05, 2005

DOI and BibTeX

For the last few hours, for reasons that I will not get into, I have been trying to track down bibtex entries for papers. Usually if the paper has an ACM DL entry, there is a bibtex entry that one can web-scrape, but for many papers (especially IEEE publications), this doesn't work because IEEE doesn't have bibtex entries on their website (and it's harder to web-scrape them).

Most of the complication comes from the fact that often I have a title, and need to match it to an actual citation of some kind. Google Scholar is quite helpful in this regard, allowing me to search for a title and more often than not returning the ACM DL link to the paper (and BibTeX entry).

But the ACM doesn't have everything, and this is where DOI numbers come in. The Document Object Identifier is a unique identifier that maps to a document entity, analogous to the URL for a web page. Similarly to a web page, the actual location of the document can be hidden from the user, and changed easily by the publisher, allowing for both portability and the ability to integrate a variety of sources. There is even a proxy server that you can supply a DOI number to; it returns the web page of the publisher that currently maintains that document.

What would be very cool would be a DOI to BibTeX converter. Note that a BibTeX entry maps to a single document, like a DOI. DOIs of course address a smaller space, since they govern only published work. If publishers exported some standard format (XML?), then it would be a trivial matter to write such a thing. Right now, all you get is the web page, from which you either have to scrape a bibtex, or construct one by hand. Neither options scales or is particularly appealing.

Thursday, August 04, 2005

geeks who eat...

Anyone who's ever sat through any theoryCS conference lunch/dinner knows that as the meal wears on, the probability that someone will make some silly joke about dining philosophers tends to one.

I am therefore gratified to see the possible emergence of another dining problem in a different community, in this case statistics:
2 groups of statisticians want to lunch together, but have managed to travel to 2 different restaurants. There is a third similar restaurant nearby. In fact the 3 restaurants are equidistant; it takes exactly 5 minutes to move from one to another. The statisticians have dashed to lunch without their cellphones, and don't know the phone numbers for any of the restaurants, so they can't communicate. Having a shocking faith in randomness, they have devised a technique for joining up. Each group will wait a random time--exponentially distributed with a mean of 5 minutes--and then move to another restaurant, selected randomly (of course) with a coin toss. They either meet, or repeat the process. What is the probability that one group will meet the other after moving only once?

Wednesday, August 03, 2005

What you need is a degree in acting, not a Ph.D...

The Pentagon's new goal:
Fewer and fewer students are pursuing science and engineering. While immigrants are taking up the slack in many areas, defense laboratories and industries generally require American citizenship or permanent residency. So a crisis is looming, unless careers in science and engineering suddenly become hugely popular, said Robert J. Barker, an Air Force program manager who approved the grant. And what better way to get a lot of young people interested in science than by producing movies and television shows that depict scientists in flattering ways?
What better way ? Hmm, maybe you could (cough) (cough) give scientists some (cough) grant money ?
Tucked away in the Hollywood hills, an elite group of scientists from across the country and from a grab bag of disciplines - rocket science, nanotechnology, genetics, even veterinary medicine - has gathered this week to plot a solution to what officials call one of the nation's most vexing long-term national security problems.

Their work is being financed by the Air Force and the Army, but the Manhattan Project it ain't: the 15 scientists are being taught how to write and sell screenplays.
Oh I see. So instead of encouraging scientists to write grants, we'll get them to write $%$%$% screenplays. Admittedly, grant writing is an exercise in creative fiction, but to go this far ? And don't you read blogs ?
There aren't many stirring stories of heroic derring-do in which the protagonist saves the world and/or gets laid thanks to the well-timed development of a more efficient word processor file format translator. Sure, I've heard a few, but not many, and I doubt you'd want to hear them.
Oh and did I mention that Numb3rs will be back on the air Sep 9 ?

Disqus for The Geomblog