Thursday, June 30, 2005

While you're slogging away over SODA submissions

The following excerpt might prove useful in your introductions:
Any resemblance between the following views and those of the University, my employer, my terminal, or the view out my window are purely coincidental. Any resemblance between the below and my own views is non-deterministic. The question of the existence of views in the absence of anyone to hold them is left as an exercise for the reader. The question of the existence of the reader is left as an exercise for the second god coefficient. (A discussion of non-orthogonal, non-integral polytheism is beyond the scope of this article.)
(Via Sudipto Guha)

Wednesday, June 29, 2005

New Theory Task Force

A growing concern in computer science, and in theoryCS, has been the dwindling of research funding and support from the government (NSF/DARPA/etc) at a time when interest in computer science among undergraduates is dwindling, and the US appears to be losing its edge in basic science.

Especially in theoryCS, there is a growing sense that we need to articulate a vision of the field that is more about CS as a way of thinking, than solely as a field that develops IT infrastructure. Lance had mentioned TheoryMatters earlier: it's a new clearing house for information on why theoretical computer science matters, and how the current funding crisis affects the work that we do.

In the next phase of this project, SIGACT (the theory arm of the ACM), has set up a committee that will deal with funding, outreach and advocacy issues. Here is the original missive from Sanjeev Arora:
SIGACT has set up a committee that will deal with funding, outreach, and advocacy issues. It will also explore and promote funding opportunities for TCS in NSF and other funding agencies. The members are: Richard Karp (chair), Christos Papadimitriou, Avi Wigderson, Richard Lipton, Bob Sloan, Madhu Sudan, Sanjeev Arora, Michael Mitzenmacher, and 3 ex officio members from the Sigact EC: Hal Gabow, Éva Tardos, Joan Feigenbaum.

One of the things we plan to work on is preparing a list of "research directions" for TCS in the next decade or two. Unlike a similar list from 2000 this one will be kept brief. It will have a few compelling items that will be comprehensible to nonspecialists and congressional aides. These will show that funding TCS is in the national interest.

We welcome your suggestions. Let's all put some thought into this. We should not be afraid to take on big challenges. Obviously, a brief list will not cover all of TCS but it is hoped that a long-term focus by funding agencies on some of the list items will benefit many or even most TCS researchers.

This is your time to speak up. If you have thoughts on this matter, (and if you are a theoretician, you should!), you may comment here, or even better, at the comments section of the TheoryMatters page (the password is the three first letters of "theoretical computer science", all in small case). Think big ! And don't be afraid to be speculative.

Here is my (first) contribution, on (what else!) computational geometry:
For the past many years now, biologists have been accumulating X-ray crystallographic structures of proteins, and developing a rich understanding of the crucial role that protein shape plays in determining its function. Today, the Protein Data Bank holds over 30,000 protein structures, and this number is constantly growing.

Understanding protein shapes, performing structural comparisons of molecules, and manipulating large molecular structures all require a sophisticated understanding of topology, geometry, and shape theory. In fact, one of the most active areas of research in geometry has been the area of "computational topology", which includes a large body of work on the use of topology in modelling shapes, especially proteins.

Being able to manipulate shapes efficiently is a core tool in doing the kinds of large scale structural biology that protein biologists have long dreamed about. Fundamentally, if I can classify the shape of a protein by comparison with others, I have learnt a lot about its function.

One goal of computational geometry should be to continue explorations in shape modelling and manipulation, with particular emphasis on the kinds of tools needed to compare and process large protein structures.

Update: There were some complaints about the make up of the committee. As Sanjeev Arora points out in the comments, Hal Gabow sent out a general invite to all SIGACT members, and not many people responded. It's important to note there are many ways to contribute to this effort.

FOCS 2005

You know quantum computing is now at the center of theoryCS when the Quantum Pontiff is the first one to point to the FOCS list.

Notes:
  • Absolutely no CG papers at FOCS, out of a grand total of two (I am told) submitted.
  • Scroll to the bottom of the list, where you will see:
One extra paper: title and authors TBA (shortly)

I smell a merge.

Update: The extra paper is "Error Correction via Linear Programming. Emmanuel J. Candes and Mark Rudelson and Terence Tao and Roman Vershynin."

Don't try this at home...

Scene at the German Consulate:

A big guy with a clear Russian/Slavic accent was arguing with the visa officer. She wanted two photographs: he only had one, and insisted that the German consulate web site mentioned only one (it says "two"). A few minutes into this argument, he reaches into his pocket and whips out a CD, and points to the picture on the front.

"That's me", he says. "I am an opera singer, and this is my CD. I was just in Carnegie Hall for a recital, and I am returning to Kiev. You can even read my biography on the liner notes. I'm singing Schubert in this CD. Schubert is German".

A few minutes later, he walks out with a satisfied look on his face.

The Tangled Bank #31

Gathering of the scientific blogginess, is here, hosted by Science and Sensibility. This time the format is that of a day at a conference. And I have to say, the cultural difference between other scientists (especially biologists) and computer scientists is immediately apparent.

First of all, the "talks" are only 15 minutes ! 15 minutes ! Bah ! It is below my dignity to give a talk for less than 25 minutes (even if this could accomodate more papers). Secondly, there is only one coffee break. This is terrible scheduling, and would send our entire cohort of theoreticians staring glassily off into space by the time afternoon hits. Finally, talks till 9:30 pm ? Sheesh: where's the time to party ?

p.s this is not too far from the format of many biological meetings, btw.

p.p.s Lots of good articles. Go ! Read ! Learn ! Write !

Tuesday, June 28, 2005

SODA 2006

The SODA 2006 deadline is fast approaching, and soon we will all be frantically checking the clock to milk those last few seconds out of the deadline clock. The SODA deadline is July 6, 2005 at 5 pm EDT. This is all very well for people like me who live on the East coast of the USA. But what about the inhabitants of the other countries of the world ? Miscalculate, or fail to understand the intricacies of U.S time zones, and your precious new results end up in the recycle bin (for this year at least)!

As was commonly said through much of last year, "Help is on the way". Click here for a conversion of the SODA 2006 deadline time into times all over the world. Pick your own, and you're off to the races !

p.s Would be nice to be living in Kiritimati.

Yes Calvin, math is scary


...from the CH archives.

Monday, June 27, 2005

Quantum graphics ?

Now you see it, now you don't:
A concise and self-contained introduction to quantum computing and its application to computer graphics. In addition to providing a general overview of quantum computing, the course reviews the theoretical limitations of classical computing for graphics and simulation, and how quantum computers can overcome these restrictions
I'm not going to SIGGRAPH this year alas, but I'd love to know exactly what topics are covered in sections titled 'Quantum Computational Geometry' and 'Quantum Rendering Algorithms'. It looks like it should be quite accessible, especially since
The course is self-contained and does not assume any prior knowledge of quantum physics or quantum computing. Familiarity with classical rendering algorithms such as Z-Buffering and ray casting is helpful. A basic understanding of linear algebra and vector spaces is absolutely required
Phew. One might have thought you'd need to know something about two qubit interactions on a two dimensional honeycomb lattice.

Thursday, June 23, 2005

PPT doesn't bore people, people bore people.

From [Lowerbounds, Upperbounds], a pointer to a "defense" of Powerpoint by Don Norman, a usability expert. Although the article was written in response to the famous Tufte diatribe against Powerpoint, it is not so much a defense of Powerpoint as an argument that you can't blame the tool for people's inadequacies.

The article makes points that I feel are very important, and yet are ignored by a vast majority of academic speakers (especially theory speakers).
Let's face it: most people give poor talks. If we are lucky, the points are laid out logically, starting with the history, the current situation, the analysis, and the recommendations or conclusions. In other words, the dull stuff is presented first with the interesting part at the end, oftentimes missed because the speaker runs out of time.
There is an almost abysmal lack of creativity in talk structuring. A large part of this is because the average speaker doesn't really spend a lot of time on the talk, and if they do, they are focused on a doing a content-dump rather than tweaking the presentation itself.
Academic speakers love to review the entire history of a problem, boring their listeners to tears and robbing themselves of valuable time in which they could be presenting their own views. Why? Because it is thought important to demonstrate one's erudition. Bah. Let that come out in the question period.
This is a slightly more unusual proposition. Personally, I don't mind an exposition of the history, because you have to understand the history to understand why this problem is interesting. I imagine that as with all things, moderation is the key: it's fine to present a relevant path through the history that conveys the message, and defer other related work to questions or a slide at the end.
Listeners cannot absorb too much information at once. Talks should be limited to getting across just a few critical points. The goal is to get the listener interested enough to explore the subject in more depth on their own, perhaps by reading, perhaps by conversation
This is related to the previous point. A very common tendency for speakers (especially less experienced ones) is to overload a talk with every single gem that appeared in their mind (paper). Some of it is insecurity: "I need to show you that I have lots of thoughts", and some of it is ego: "Stand in awe and admire my cool results". Especially in talks, less is quite often more.

[...] slides should illustrate the major points and help motivate the listener. Tufte is apt to complain that this is simply "entertainment," but I respond that if the audience is not entertained, they are not apt to listen, and what good is a cleverly drafted talk if the audience is not listening.
This is another sentiment that I share, but I can't say is widely shared. Entertainment (not the song-and-dance variety) is a key component in making a talk "work". It involves nuances like cadence, flow, visual effects, aural cues, and many other subtle elements that if done best are invisible, but leave an impression.

Bottom line: it is actually possible to make a talk distinctive and interesting. Does every talk you give warrant the effort ? Probably not. But getting into the habit of thinking hard about a presentation (rather than slapping slides together from a paper) will not only earn you the gratitude of your audience, it might even help your research, by forcing you to clarify your thoughts and make judgements about what's important and what's not.

Tuesday, June 21, 2005

Foreign researchers and immigration policies

Chris Mooney points to a new ACLU report titled 'Science Under Siege: The Bush Administration’s Assault on Academic Freedom and Scientific Inquiry'. One of the sections talks about post 9/11 visa policies and how they've affected the academic community. Along with a long list of the new hoops that visitors have to jump through in order to visit the US (and this doesn't even include the new "export controls" that restrict the topics foreign researchers can work on), there are personal stories that are probably depressingly familiar to us:
In 2003, Elena Casacuberta, a postdoctoral student at the Massachusetts Institute of Technology (MIT) returned to Spain for winter break. Elena had renewed her work visa three times since she had been coming and going to the United States starting in 2000, so she did not anticipate any problems. Five months later, Elena was still in Spain, awaiting Visa Mantis security clearance. In the meanwhile, her NIH-funded research on the genetics of fruit flies was put on hold indefinitely.
In May 2004, Reza Chamanara, an Iranian postdocteral fellow in the department of mathematics at Indiana University , left for England to give a lecture and found himself blocked from returning to the United States Seven months later, university administrators were still unable to get an answer from the FBI as to why his visa renewal was being held up and whether he would be able to return.
Add to this the example of Chinese students on F-1 visas, who are given single-entry six month student visas to enter the US. This basically means that if they ever have to travel outside the US to present a paper at a conference, they have to return to China and re-apply for a visa (unless all travel is compressed in a six-month window). [Update: As of June 20, 2005, they are now being issued 12 month multiple entry visas. Thanks, Dan Li]

Virginia Postrel points to a Steve Forbes editorial in Forbes magazine critiquing the government on this issue (Forbes is a conservative Republican; this is not a "red-blue" issue). She also quotes an Indian-born researcher colleague who says "For the first time, I feel like a foreigner in this country".

Sadly, even though I spend my time amongst the most diverse group of people one can imagine, I have to agree. For an entertaining and yet disturbing story of life in the US post 9/11, listen to Act I from this episode of This American Life.

Intriguing

Via Quantum Algorithms, a new paper by Andreas de Vries that if correct would imply that NP is contained in BQP. I don't have the quantum computing chops to understand this, but am curious if anyone knows anything more about what's going on ?

Update: The Quantum Pontiff swats at it lazily...
Ernie points to an excellent op-ed by Sanjeev Arora and Bernard Chazelle on the need to communicate the "thrill" of doing computer science. Read the whole thing: but here's another excerpt that's worth meditating on.
Unfortunately, efforts to replicate these efforts elsewhere run into trouble because there are no introductory texts [...] that provide an exciting overview of the “computer science story”—let alone something analogous to the classic “Feynman Lectures in Physics.” [...] Such texts play a vital role in disseminating ideas to frontline educators. There is also a need to write popular accounts. Few computer scientists bother to do this, whereas worldclass physicists (Feynman, Weinberg, Hawking, Greene) have mastered the art of story telling. The NSF Physics Division even lists “working toward early inspiration of the young” as one of its three main goals [9]. Within computer science there is evidence that D.R. Hofstadter’s Pulitzer prize-winning 1979 book “G¨odel, Escher, Bach – An eternal golden braid” attracted an entire generation of students to the field, especially to artificial intelligence.
Not to mention the numerous physics blogs that dot the landscape of the blogosphere. As I had mentioned earlier, we need metaphors to communicate the power and beauty of computer science, and these can only emerge by trying. Blogs are one way; courses are another, and a more visible public presence for those who speak for the science (rather than those who speak for the computer industry) is imperative, if only to lead us away from (IMO) pointless worries about the "practicality" of a computer science education.

Monday, June 20, 2005

The Sausage is strong in Philadelphia

Two thirds of the global population of CG bloggers are now in the same place. Stay tuned...

Tuesday, June 14, 2005

Tangled Bank

Welcome to the 30th edition of the Tangled Bank, a regular round up of the best blog writing on science and medicine. For those of you who are reading the Tangled Bank for the first time, here's how it works: every couple of weeks (and every week starting from this one), science bloggers send nominations for their favorite posts to the next host of the TB, who collates the posts, writes a little bit about each article, and organizes them into categories. The idea is to get a sampling of the varied and excellent science writing on the web, a 'blog-digest', if you will.

It's often thought that it is hard to write accurately and clearly about science, that if you emphasize one, you give up on the other. The posts in this installment should disabuse you of that notion. Since this is after all the GEOMblog, I have created a new twist; posts on mathematically oriented topics tend to be in short supply on the web, and physicists tend to be under-represented in the TB, so I have gone hunting for posts that I think represent a sample of some of the best in mathematical and physics writing. If you like what the authors have to say, add their blogs to your feeds, and visit their sites !

And away we go:

Evolution, ID and the great non-debate.

It is an unfortunate consequence of the times we live in that most science discussions tend to get bogged down in ID-related controversies. Thankfully PZ Myers has help swatting at the swarms of FUD that emanate from a certain building in Seattle. And by all accounts, he might have just acquired a new army. Phil Plait @ Bad Astronomy points out that the ambitions of the Discovery Institute extend far beyond biology, and vows to fight the good fight. Ernest Miller makes a plea for all scientists (not just biologists) to speak out against creationist nonsense, and Orac gently tries to deprogram a 14 year old proud creationist. Josh Rosenau has a four part debate with William Dembski, to which my first response (BLINK!) was 'Mud wrestling with a pig....'

One might think that mathematicians are safe from this nonsense. But doesn't Godel's theorem imply that there is a limit to what computational and (if you believe quantum computing) physical processes can accomplish ? Does this not imply the existence of an intelligent designer ? Ok, just kidding....

But how does one really test the components of evolution ? After all, we can't wait a million years ! RPM provides an explanation of how biologists study evolution today. The Blinne Blog reminds us that religion and science are not mutually exclusive, and explains how an evangelical scientist views the ID controversy.

Mechanisms of Evolution

But enough controvery. How about some beautiful explanations of actual species evolution ? DarkSyd presents the story of whale evolution, complete with pictures of detailed bone structures (via an assist from PZ) and evolutionary trees. Although today is not Friday, (the traditional cat blogging day in the blogosphere), I cannot help pointing to DarkSyd's post on the evolution of cats: did you know that cats and mongooses (mongeese ?) are closely related ?

Survival of the fittest

Bora Zivkovic, in his Science and Politics avatar, takes the recent controversy about a genetic basis for the female orgasm as a starting point for a discussion of how sexual reproduction, sexual pleasure and the orgasm might have evolved. Pharyngula (aka PZ Myers) discusses the biological mechanisms underlying pre-eclampsia and how this represents a kind of evolutionary resource contention between fetus and mother.

Arachibutyrophobic discusses the new and surprising finding that modifying a single gene in Drosophila can cause female flies to court other female flies. I remember this finding quite vividly because I first heard about it when on holiday recently, and my wife the biologist, who knew all about fru, was able to explain the finding in much greater detail than the original news report did.

David Pescovitz discusses new research on the evolution of the highly complex eye of the mantis shrimp. The mantis shrimp is the Bruce Lee of the ocean, known for having "the fastest kick in the animal kingdom".

A rather different 'survival of the fittest' is the (soon-to-be)annual goby massacre known as the Goby Assault Party. Read the article, at the Invasive Species Weblog, if only to be regaled with phrases like 'Goby Dick' and 'Goby Gallows'.

Grad school can feel rather Darwinian at times. Here's advice from The World of BotanicalGirl on how to survive it.

Environmentally speaking...

Tom Kimmerer reports on the dwindling lowland forests of Borneo, one of the most naturally diverse areas in the world, with over 360 new species having been discovered in the past 11 years ! Birdwatcher are challenged in this article from 10,000 birds on the genus Empidonax, apparently one of the most difficult bird groups to spot correctly. Take the quiz !

Things not biological...

I have it on good authority that certain astronomers would like nothing better than to curl up with a book on dust. By examining different wavelenghts of light filtered through cosmic dust, astronomers can infer all kinds of things about far-away star systems. EGAD tells us about the balloon BLAST that was just launched by the National Scientific Balloon Facility (didn't think such an organization existed, did ya ?). DarkSyd, in a determined attempt to become the most cited Tangled Banker, presents an audiovisual treat to explain what looking at wavelengths outside the visible range tells us about the universe.

Like building bridges ? Dr. Bob picks the brain of a construction engineer to bring us the story of building the bridge over the Tacoma Narrows, in three parts. Part II is aptly titled 'Concrete Thinking'.

Getting heartburn from too much web surfing ? Culture of Chemistry explains the structural difference between Prilosec and Nexium.

Pictures, Pictures.

Botany Photo of the Day has an almost-geometric picture of the plant Raoulia Australis, and descriptions of its background. Snail's Tales discusses some of the snails commonly found in M. C. Escher paintings.

Department of icky things...

McDonald's fries are highly sanitary ! Well, sort of. Carrie McLaren of Stay Free Daily has the germs (er...dirt... er.. details). And here's an icky thought: want to give up coffee ? Steve Pavlina has options (the mere thought makes me crave an espresso).


And finally, 2+2...

As promised, some mathematically inclined posts. These posts are not necesarily recent, but are good examples of some of the better material out there. WARNING: for some reason, it is much harder to do mathematical writing in a way that conveys intuition without jargon, so some of the links might be a lot more technical than any of the links above.

We'll start off with probably the most well known and most misunderstood theorem, Godel's incompleteness theorem. Dilip D'Souza brings the skill of a writer to a rather technical topic, explaining the basic idea quite lucidly. Once you've digested that, pop over to Cosma Shalizi's Notebooks and read the more detailed explanation, as well as some of the common misunderstandings. Cosma's notebook on information theory explains the mathematics underlying electronic communications.

John Baez writes one of the few (and probably the best) math blogs around. His posts are rather difficult to follow if you don't have at least some mathematical background, but I found this post on the basics of topology quite entertaining. At Preposterous Universe, Sean Carroll explains how our sense of the arrow of time is in a sense a consequence of the fact that entropy always increases (which must mean that time was designed intelligently ... just teasing, PZ :)). The Quantum Pontiff explains how the Pigeonhole principle has connections to problems that are hard for naive quantum algorithms. David Pescovitz interviews Edward Frenkel on the power of symmetry in mathematics, and its connections to string theory.

Finally, for those of you who like your thousand words in colors, check out the Astronomy Picture of the Day, where yesterday's entry discusses the recent discovery of an Earth-like planet orbiting a distant normal star.

This seems like an opportune time to give a shout out to two of the best science journalists in the hated MSM. If you haven't read Carl Zimmer and Chris Mooney, you are missing out on high quality science writing.

Hope you enjoyed this week's Tangled Bank ! TB #31 will be hosted on June 22 by Science and Sensibility. Please send your submissions to winda002@studet.otago.ac.nz, host@tangledbank.net, or pzmyers@pharyngula.org.

ESA 2005

results are out. 55 papers in the theory track, and 20 more in the applied track. And this year, there'll be an eclipse on special order for the conference, thus yielding the first 'important dates' in the call for papers listed thus:

IMPORTANT DATES
Submission deadline: April 12, 2005
Notification to authors: June 7, 2005
Symposium: October 3-6, 2005
Solar Eclipse: October 3, 2005

Sunday, June 12, 2005

Geometry and Theory

John Baez's latest offering is up, and once again has a mindboggling mouthful of mathematical magic.

I liked this section:
...this reminds me of something Feynman wrote: whenever he worked on a problem, he needed the feeling he had some "inside track" - some insight or trick up his sleeve that nobody else had. Most of us will never be as good as Feynman at choosing an "inside track". But I think we all need one to convert what would otherwise be a dutiful and doomed struggle to catch up with the experts into something more hopeful and exciting: a personal quest!

...

For anyone with a background in geometry, a good "inside track" on almost any math problem is to convert it into a geometry problem.
If you read nothing else in his post, read the last section, containing a beautiful quotation from Alain Connes. This is an excerpt:

The really fundamental point in that respect is that while so many mathematicians have been spending their entire life exploring that world they all agree on its contours and on its connexity: whatever the origin of one's itinerary, one day or another if one walks long enough, one is bound to reach a well known town i.e. for instance to meet elliptic functions, modular forms, zeta functions. "All roads lead to Rome" and the mathematical world is "connected".

In other words there is just "one" mathematical world, whose exploration is the task of all mathematicians, and they are all in the same boat somehow
Which brings me to a topic that Jeff brings up: Is computational geometry still part of the theoryCS landscape ? Jeff's answer is spot-on, if a little tongue-in-cheek: Of course it is !! But he is also right that reality seems to suggest otherwise. Lance and Sariel (and Jeff, again) push the discussion further, and interesting points are made in the comments on all blogs. These points are tactical, in that they discuss the logistics of large meetings, the "too specialized for STOC/FOCS" phenomenon, and other ways of increasing socialization.

But really, if you want to have geometry (or any other sub-area of theory) be viewed as central within the larger landscape, tactical methods are not what does it. Ultimately, a practising researcher has to ask themselves, "What does knowing more about the tools and techniques in this area give me" ? And most researchers instinctively focus on areas and topics that affect their work in a day-to-day sense. This is not to say that one remains specialized (and balkanized), it's that the way people move from area to area is by natural drifts from their current choice of problem, rather than by teleportation.

Which brings me back to the picturesque 'landscape' of Alaine Connes. What valuable exports does computational geometry currently have to offer to the travelling theoretician ? There is a rich structure of tools and techniques in computational geometry, and an unparalleled array of methods for working in polynomial time (something that work in approximations, for example, tends to give short shrift to). Any story of the power of randomization has to contain large doses of geometry, for this is where its power has been exploited most successfully.

But not many of the techniques familiar to most geometers are extended (or extensible in the current world) to problems that lie squarely in theory land, but outside the realm of geometry. The best (and only example) I can think of is how Mulmuley uses the Collins decomposition and other tools from algebraic geometry in his P vs NC result. Machine learning of course makes heavy use of VC-dimension methods, but geometers can't really claim ownership of this idea either, though we have used it extensively.

I will admit that the point I make is extreme, and is deliberately so. But I am merely trying to argue that if I work in (say) graph approximations, I am unlikely to need to know methods or results from computational geometry beyond what is directly needed for a problem I am working on. Thus, any interest I will have in geometry will be purely from a general sense of curiosity about the area. In these specialized times, this is really not enough to keep me abreast of the latest results and methods in the field.

I use the word 'current', and the present tense, because I don't believe that this situation will continue indefinitely. Fields evolve and change, and we are still in the early days of exploring the theoryCS landscape. Furthermore, CG is not the only 'isolated city'; other areas face the same problem, and until brave souls take a machete to the thickets and forge roads between these cities, we will continue to bemoan the lack of attention our pet subfield gets in the overall scheme of things.

So here's a challenge: let CG be the 'inside track' that Baez speaks of. Let's see if we can find problems that need a "geometric approach" to break a logjam or prove a new result. Do this even once or twice, and the interest in the field will grow, naturally.

But what in the meantime ? One of my more radical beliefs is a conviction that the 'conference-as-main-pub' model is not working for us any more, primarily for reasons of scale. Further, the growth of areas in theory is making the whole STOC/FOCS idea of a central 'theory conference' less and less a model of reality. I liked the suggestion on Lance's blog that we should have an FTRC for theory areas; after all, this would be very similar to the omnibus conferences that are common in the math world.

This post has become far too long, so I'll stop here :).

Friday, June 10, 2005

Expanders

Michael Nielsen has been writing a series of posts on expanders.
This post is one in a series introducing one of the deepest ideas in modern computer science, expander graphs. Expanders are one of those powerful ideas that crop up in many apparently unrelated contexts, and that have a phenomenally wide range of uses. The goal of the posts is to explain what an expander is, and to learn just enough about them that we can start to understand some of their amazing uses.
If you haven't been there to read them, do check them out. He starts off with basic definitions, gives some examples, relates expansion to adjacency matrices and the eigenvalue gap, and gives as an application the use of random walks on expanders to decrease randomness.

Tangled Bank #30

I will be hosting the 30th edition of the Tangled Bank, a bimonthly roundup of all things scientific, natural, and medical. I'd especially like to see contributions from the computer science side of the blogosphere as well. You can send nominations to me, or to host@tangledbank.net or pzmyers@pharyngula.org, where PZ Myers, in between swatting buzzing creationists and IDiots, will process your submissions.

The current edition is hosted at Organic Matters, where you can take a tour of the Blogosphere Natural History Museum.

Stop the presses: rockets use physics

The name itself is idiotic: 'algorithmic trading' ?

Eleven centuries after the arithmetical discoveries of the Persian mathematician al-Khwarizmi, after whom they are named, algorithms are coming of age in the world's most sophisticated stock markets.

The article has no details on what algorithms are being used, and a series of pointless quotes about their impact. The whole tone is annoying: as if algorithms are this exotic new beast....

(HT: Mathforge)

Thursday, June 09, 2005

Mmm, coffee..

The review pans the book, but the subtitle was too cool to resist:

Brain Brew

How coffee fueled Voltaire's Candide, Newton's theory of gravity and Juan Valdez's modern woes.

Back from puffin land

Well, it's been a great vacation, but it's now over alas, and I am back from puffin-land (where I actually didn't see any puffins, but that's another story). Many thanks to Graham for taking over the reigns in my absence; his ability to write long, substantive posts far exceeds mine, and he clearly needs to start a blog of his own !

One of the things that foreign travel (especially to Europe) does is that when you return, you find yourself asking questions like "Why can't wine and beer be served at any cafeI visit?", "Why can't I just ask for a cup of coffee and know that I'm getting a nice thick espresso", "What's so bad about universal health-care coverage" (ok well, maybe not that last one)

My epiphany was a realization of how good some of the BBC shows (especially the nature programs) really are (and in case you were wondering, these shows were watched in a 30 minute window before going to bed :)). Returning the the US of A, and scouring my PBS guide for similar fare, I realized that the pickings were quite sparse in comparison. I did find this particular series interesting though: a four part series on the basics of probability and statistics, which addresses nicely some of the issues raised by Graham's post on probability. Alas, at least in Philadelphia, it's playing at 1 am, which is not very useful unless you're an insomniac or have TiVo. I fall into both categories, so a review might be forthcoming if I can shake off this post-vacation ennui.

p.s I was on a ferry boat up in Scotland, and saw at least two people working their way through a Sudoku book.

p.p.s On a side note, what is it with the British and quiz shows ? I didn't realize where the Indian fascination with quiz shows came from, and now it all makes sense. There are radio quiz shows where I could barely answer 1% of the questions, and the contestants answer even before the quizmaster has finished his question. I am quite impressed.

p.p.p.s Further signs of the imminent demise of the American empire: A brief blurb for the American spelling bee presents the contestants as brilliant and studious whizkids: in the US, the spelling bee is covered on ESPN and there is a thinly disguised disgust at the (mostly Indian-American, 5 of the last 7!!) "nerds" who win.

A Computer Scientist Reads the Newspaper

A dinner conversation with a colleague recently took a convoluted route from a challenge to "Name 10 famous (living) economists" through "A random walk down wall street" and thence to "A mathematician reads the newspaper".

I read Innumeracy a long time ago and enjoyed it, and realized that I don't remember reading AMRTN. I shall definitely seek out a copy. I was reminded my of my frustration with most newspaper reporting of mathematical and scientific issues, especially those to do with risk, probability and chance, since these most often appear in the news sections. Recently, there has been discussion in the UK about introducing an identity card with biometric information. One issue is what form this information would take -- fingerprints, facial scans, iris recognition. A much repeated figure is that the best of these -- iris recognition -- achieves only 96% accuracy.

But, is this 96% accuracy at identification (ie looking you up in a database based on an iris scan) or verification (checking that you are the same person as the owner of your card). Is this 96% over all people on a single trial or over multiple (say, three attempts) trials? Under what conditions are these trials conducted -- ideal or realistic? Are there 4% of people for whom this procedure will never work, or can one merely say that each trial can be treated independently with 4% probability of failure, so with enough repetitions the probability of failure becomes virtually zero?

Without this information, it's impossible to gauge. One can only guess at the implications, and assume that a 96% accuracy is quantifiably better than a 50% accuracy [for that matter, is a 50% accuracy equivalent to tossing a coin?]. Even the concepts of distinguishing between false positives and false negatives seems too subtle for most newspaper discussions, yet these are exactly the level of information from which policy makers decide what to implement.

I found one article which discussed this study in much greater detail but even after reading this I was still no wiser on exactly how one measures "success rate".

This affects me as a computer scientist, since I often work in the realm of randomized algorithms for problems where exact algorithms provably cannot exist without using dramatically more resources. In some areas, even within the discipline, one encounters a certain resistance to a and the attitude that a deterministic algorithm is always preferable to a randomized one. If people had a more intuitive understanding of chance and probability ("there is a greater chance of a gamma ray striking the processor and causing the wrong answer to be produced than the algorithm making a mistake") then our lives would be better. Probably. Wanna bet on that...?

Social Networking and Graphic Depictions of Human Relations

Some of the largest data sets we work with can be represented by graphs: the web graph, the internet graph and telephone call graphs have all driven work on handling, visualizing and analysing massive graphs. In a way these can all be thought of as social graphs -- where nodes are "individuals" of some sort, and edges link individuals that share some connection. When the individuals are humans, these graphs have a particular resonance. As an undergrad I did a final year project on graph drawing algorithms, and used as one of the example graphs a network of the 'social interactions' of my colleagues. Here's a few other interesting graphs I've seen recently:


  • Exploring Enron. The collected emails from Enron visualized and searchable / filterable to explore who knew what when.
  • Jefferson High School A graph of relationships at a midwest high school. (Despite the URL, this should be safe to view while at work, unless your cow orkers are offending by graphs).
  • Social networking sites. Frienster, orkut and the rest seem to have fallen from the brief period of popularity. That doesn't stop new sites from being set up, the latest asking people to input edges attesting to their past personal relations. I'm linking to a news story about it, since the web filters here disallow me from visiting the site itself...


Another graph we love to explore is the collaboration graph. Thanks to the DBLP there's a lot of data conveniently arranged in electronic form. One can download this and explore t with the DBL browser. But I haven't seen many other interesting uses of this data. It would be great to see a good visualization of this graph, highlighting various cliques and quasi-cliques in the graph. We often check our Erdos Number, but who is the center of this graph? That is, who has the minimum distance to every other researcher, or smallest average distance to others? This is of interest since although Kevin Bacon is usually taken as the center of the film star graph, Sean Connery has a lower average distance. Is there one giant component, or are there many small components? There's plenty of fun to be had with large graphs like this, if you have the time or the tools.

Wednesday, June 08, 2005

Flash Elements

Thanks to the wonderful Annals of Improbable Research (a journal I heartily recommend publishing in) and their blog, comes some Tom Lehrer related items: a video and a flash animation of the Elements.

Tuesday, June 07, 2005

Doing the Numb3rs

Due to popular demand, my view of the "Numb3rs" (pronounced "Numbthreers" since you ask) television show. Buoyed on by the enthusiasm of Suresh and Lance, I started watching Numb3rs, and ended up enjoying it sufficiently much that I watched pretty much all of the episodes. I have a fairly low standard for entertainment, it should be said.

Given low initial expectations, it was probably one of the better attempts to show mathematical topics within the context of popular entertainment. It's best not to get too hung up on all the details, and go along with it to some extent. Still, certain plot points did rankle when there seemed to be significant flaws: if I remember correctly, in one episode Charlie (the hero) determined that a suspect had been running a particular kind of pyramid scheme because hidden in his apartment was exactly $2^19 in US currency.

The scheme was explained as follows: the criminal had broken into a bank's computers, and obtained access to a large number of records. He then proceeded to steal $1 from a single account. Then he stole $1 from two more accounts replaced the $1 in the first account, so it would not be noticed, and kept the other. Then he proceeded to continue in this doubling manner for twenty iterations. At this point, the effort became too much, and so he extracted his ill-gotten gains and fled.

The think that irks me about this plot is that it makes no sense whatsoever (as opposed to merely stretching credibility). The purpose of replacing some of the stolen funds is to mask the crime for a while. It's a bit like a Ponzi scheme, where initial investors are paid off with money from subsequent investors to make it appear that the investment is a success and drum up more investors by word of mouth. But there's no reason for this in this case. If he really did need the money so much, then why was it all there, hidden in his apartment? If the eventual goal was just to steal the half million and disappear, then why mess about with the doubling scheme to begin with? Why draw out the money exactly and convert it to dollars, and not keep any in his wallet or spend any? This really made no sense to me, no matter how I tried to suspend my disbelief.

The other feature of the programme that I found hard to believe was the relationships between the characters: the dynamic between Charlie and his physics prof buddy seemed like nothing I've ever seen in academia. It seemed that it was there mostly because it was necessary for expositional purposes and all of those "wait a minute! I've just had an idea" moments. Likewise, the working relationship between Charlie and his grad student / future enamorata never stuck me as genuine: she didn't have the mix of deference, fear and self-doubt that most grad students should have for their advisors [I speak from observation of others, of course, and not in reference to anyone you know]. Maybe she'll be more convincing next season as a postdoc.

Sunday, June 05, 2005

A short SODA paper post

The SODA deadline continues to creep insidiously and subtly closer. As has been commented on before, this year there is no specific invitation for short (two page) papers as there has been in previous years, just a general suggestion to consider submitting papers that are considerably shorter than the ten page limit.

I have mixed feelings about this. From my perspective as an occasional SODA author, this is no great loss: I've tried to write a few short SODA papers in my time and had no great success with it. It's very hard to get things into this short a space. Mostly these efforts collapsed before being submitted, but in one case it got as far as being submitted but did not make it in. Eventually all these ideas were broadened out to full length papers and appeared elsewhere, given room to breathe instead of being crushed in to the short format. Although the material was significantly expanded, the fact that it eventually became a full paper meant that it was never really appropriate for the short form. Given the incredibly low acceptance rates for short papers, I resolved that I should never try to write a short SODA paper, so the fact that the suggestion to do so in the call for papers has been curtailed is no great loss.

But some of my favourite SODA papers, certainly some of the most memorable (less to forget?) have been short ones. I'm thinking of examples like Thorup's paper on universal hashing, Gupta and Zane's paper on Counting Inversions in Lists, and Kalai's paper on efficient pattern matching with don't cares all spring to mind, since they are crisply written and strong results on topics close to my heart. Zwick's paper on Jenga also deserves a mention, although it is short-ish, rather than short according to the standard SODA definition.

Lastly, I will miss the annual argument about short papers at the SODA business meeting. It was like meeting a favourite old friend in a bar, and hearing him repeat his old anecdotes over again: the same back and forth every year, the same attempt to force a vote but with a complete lack of conclusion, and all accompanied by a beer or soda provided to keep the audience docile. Maybe next year we will have the argument over again just for old times' sake.

So, goodbye short SODA papers. Although no category exists anymore, I hope that the same good material will continue to get submitted, and accepted to the conference. Most importantly, I hope that the authors of such papers don't try to artificially inflate their papers to 10 pages, but can express themselves concisely in three or four pages.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. It's a nice sentinment Antoine, but couldn't you make it a bit snappier?

Saturday, June 04, 2005

Sums of squares, part 2

Following on from the previous post, a second application of the "AMS sketch". There, I described how creating a variable z from the stream by summing hash values of items seen in the stream. By squaring this variable, we got an unbiased estimate for the sum of squares of the number of item occurrences.

Now, here's another application of the same data structure. Create z in the same way as before, but now we pick some item i and compute z*h(i) (remembering that h(i) maps uniformly onto {-1,+1}). What is the expectation of this value? We get the number of occurrences of i multiplied by h(i)^2 (which is always 1), and all other frequencies of items j multiplied by h(i)h(j). These two hash values are uncorrelated, so this product is equally likely to be +1 or -1. Hence, in expectation, this estimate is exactly the number of occurrences of i.

Of course, this estimate is not perfect, else we could go through all values of i and recover the number of occurrences of items exactly. But, by doing a similar analysis as for the sum of squares case, we find we can estimate individual item counts with some additional error that depends on the square root sum of squares of items. By taking more repetitions of this procedure, we can reduce the error to being small fractions of this quantity. So, if the item we are estimating occurs sufficiently many times, then we get an estimate which is (relatively) reasonably good. This makes no assumptions about the distributions of counts of items, and so works whatever the frequency distribution happens to be.

This approach was taken and improved by Charikar, Chen and Farach-Colton. There, the authors use some more hashing techniques to make a better estimator while using the same space, by spreading out the 'heavy items' which can lead to bad estimates. They also show that if the frequency distribution does match a particular distribution (a skewed, or Zipfian distribution, which is a good model for a wide variety of data sources) then one can prove improved estimation bounds.

This is not the final word on the matter. Some of my recent work has been on trying to give the tightest possible bounds for these problems of frequency estimation, with and without assumptions about the input distribution. In particular, one can improve the bounds for the space dependency on the quality of the estimate by a different sketch technique [warning! link leads to papers by current author. Be warned that this represents extreme self-indulgence]. The guarantee switches from the square root of the sum of squares of counts to being the sum of counts; for many problems this is the form of guarantee that is required. There is also ongoing work from practitioners on getting the best possible constant factors, fastest implementations and so on.

Friday, June 03, 2005

Sums of squares in a stream

It's time to talk a little about theorems. Fortunately these are fun theorems. Over the last few years I've worked a lot in the area of data streams. In this model, the input is provided to you in some arbitrary order, and you must compute some function on this input using space and time per item that are sublinear in (preferably polynomial in the logarithm of) the size of the input. Suresh already talked about some of the problems after the "AMS" paper very deservedly won the Godel award this year. Let me give some insight into some solutions.

Consider the (very abstract, but with a surprising number of applications) problem of computing the sum of squares of the input. If the input is given just as the sequence of values, then this problem is easy: keep a counter, and for every new update x, add on x*x to the counter.

But now suppose the stream defines the input in a different way: now, we see a sequence of items, and we want to track information about the sum of the squares of the number of occurrences of each item. So now every update is an item, i, and we want to compute the sum of the squares of the number of times each item i appears.

If the number of different items isn't too large, then again, we don't have a problem: just keep a counter for each item and increment it when we see a new copy of the item in the stream. But in many settings (IP networks being the motivating example) both the size of the data and the domain of the set of items can be very large, much bigger than we would like.

Think about this problem without knowing that there is a neat solution, and you might conclude that it can't be done. When you see a new item, it seems that you have to know how many times before it has been seen in order to make a good estimate of the sum of squares. This turns out not to be the case.

We can make a good estimator for the sum of squares by using a beautiful randomized technique due to Alon, Matias and Szegedy, and which has been modified and used for a wide variety of other problems. And we still only need a single counter! (well, sort of...)

Keep a counter z. For every item i that arrives in the stream, compute h(i), a hash function that maps i onto either +1 or -1. Add this value onto z. When we want to estimate the sum of the squares, compute z*z.

How does this work? Well, the final value of z after seeing the whole stream is equal to the sum over all i's of frequency of i times h(i). When we square this, we get the sum of the frequencies squared times h(i) squared, plus cross terms with factors of h(i)h(j) for i not equal to j.

Because h(i) is always +1 or -1, then the h(i) squared terms are all 1. While, if h is "sufficiently random" then the cross terms have expectation zero. So in expectation, the estimate has exactly the right value!

To turn this into a good estimator, and ensure that it can be implemented in small space, one needs to do some more work. Firstly, one can show with a little more working that the variance of this estimator can be bounded in terms of the square of its expectation. Then, uttering the magic words "Markov Chernoff Chebyshev!" we can argue that with enough repetitions of this procedure a very accurate estimate can be built. We can also argue that the hash functions h() do not have to be "fully independent" (since this would require linear space to represent truly independent hash function, hence missing the point), but need only very weak independence, meaning they can be drawn from a family of hash functions with a small representation.

This algorithm is not only beautiful for its simplicity and elegance; it is also the archetype for many other streaming algorithms which came after. Although a variety of techniques have been developed for this small space, one pass model of computations, the ideas of linear transformations (since this is essentially computing a linear transformation of the input), hash functions with limited independence and careful proabilistic analysis have been vital to many algorithms.

The approach is also surprisingly practical. The factors needed to make this into a very accurate summary are not too large, and with some more implementation tricks and fast implementations of hash functions, the data structures can be updated at very high update rates -- even in the order of millions of updates per second.

Thursday, June 02, 2005

Academic Folklore

I'm a big fan of urban myths and especially the folklore that builds up particular areas such as a academia. Every university campus has its own associated set of myths and legends about its buildings (usually, tales about building errors or famous occupants). Research has its own small collection of tall tales, some of which turn out to be true. I was very surprised to discover that the old legend of the student who arrived late for class, and copied down two unsolved problems as homework was completely true and happened to George Dantzig, who passed away last month.

Some myths belong to a bygone era. There is the story of the academic who received some proofs of his document back from the publishers. All his equations had been typeset perfectly, except for one place where there should be an epsilon, and there was instead a blank space with a little smudge. Looking closely at this smudge, he realized that this was actually an epsilon, but printed in impossibly small type. Returning to his original drafts, he saw that he had originally written "epsilon, where epsilon is as small as possible" and the typesetters had taken this too literally...

These days, with advent of submissions in LaTeX, and perhaps more importantly, the increasing disinterest in journal publishers in doing any subediting, this looks very antiquated. Some legends can never die though, and new ones are being born on the time. This one actually happened to a friend of mine. No, really -- you can verify this by asking around, and you will probably find that it happened to a friend of yours as well.

My friend was reviewing a paper for a journal. Sadly, it was very badly written, and full of typos and errors. One proof in particular was a complete mess, and in his referee report he wrote "The derivation of this equation does not seem to make sense. Presumably, some additional assumptions are being made which should be stated explicitly."

Several months later, he was sent a revised version of the paper. It did not seem to be much improved. The problematic proof looked to be unchanged: it was still the same mess of algebra. However, there were now some extra lines added below the end of the proof which read as follows: "The derivation of this equation does not seem to make sense. Presumably, some additional assumptions are being made which should be stated explicitly."

I should ask my friend whether the revision acknowledged the helpful suggestions of the referees. Feel free to post your favourite academic folklore in the comments...

Disqus for The Geomblog