Tuesday, May 31, 2005

Continuity Error

Thanks to boingboing, I read this tongue-in-cheek analysis of continuity within sci-fi. This set me thinking about the phenomenon of continuity errors within papers. How often have people written "full details will appear in the final/journal version of this paper" only for no such follow up to appear? A tech report promising the missing proofs turns out to be identical to the proceedings version. Subtle errors are silently corrected between the conference and journal version.

But more than this, what about all the times when the clearer way to explain an idea, or a simplified proof, or a neat trick to shave off a small constant factor occurs too late to be incorporated into the published version of a proof? You come across a helpful reference too late to mention it (or it is only published after your work is published). The result is that one's "deep understanding" of the problem is never committed to the literature, since none of these little fragments is worth writing up on its own. Even if they get committed to paper, the wealth of knowledge on a subject gets fragmented over a number of papers by a variety of authors.

As a grad student, I struggled against this phenomenon, and tried to produce a thesis that was free of such 'continuity errors'. I carefully worked on the writing to make a write up that collected the results together in a coherent way, and filled in all the gaps and observations that seemed relevant. About a year after finishing it, I realized that I would do it completely differently if I were writing it then. New ideas and new insights arrive constantly, and new ways to communicate ideas and proofs occur with time as one's understanding deepens. Probably one's best hope in this struggle would be to give it enough time and write a textbook or, better still, get someone else to write a textbook. But this is really a futile struggle against entropy.

Returning to the original article, the most practical solution to continuity issues is to enlist a vast army of obsessives to pour pore over every word on the subject and find bizarre and artificial ways of reconciling the inconsistencies (I briefly considered, and rejected the possibility of destroying the entire universe and starting over again). Where to find such a supply of foot soldiers in the war against continuity continuity errors?

At this point I received an email from a grad student who had read one of my papers and was asking a very long and detailed but entirely misguided question about a typo in the proof of a fact that was mostly irrelevant to the main part of the argument, and the answer became clear to me.

Errata: thanks also for those who take the trouble to point out my typos and grammatical errors. Distributed checking, maybe that is the answer?

Monday, May 30, 2005

It would be, it would be so nice

Today is a public holiday in the US, and also in the UK. The Canadians had one last week. And yet, here I am in front of the keyboard with a small stack of papers to work on. With conference deadlines and other demands on our time, those of us in CS often seem to ignore public holidays like this and work to our own schedule. Placement of these conference deadlines sometimes seems to encourage this: SODA in my recollection has always been just after July 4th. Not a problem for most non-US domiciled researchers, but meaning the curtailment of celebrations for those participating in the informal and uncelebrated tradition of trying to rile the SODA Program Committee by submitting as many papers as possible (rumour has it that the record is in double figures).

This is not to complain, merely to observe. In fact, I think the freedoms that we have to work how, where and when we choose outweigh the fact that we are often out of phase with the rest of society. Some university professors can arrange their schedule so that they are in the department only two or three days a week. The department understands that so long as they are able to fulfil all their commitments in that time, then there is no reason to object. Research labs are also usually relaxed about people working from home frequently, or keeping non-standard hours.

This is important: research is not something that can be confined to certain times. We cannot just sit down and decide, "I am going to have inspirational ideas for the next hour"; ideas can occur at any time. But this is not to say that we can just sit back and wait for inspiration. Getting distracted into other tasks such as administration can preoccupy us. We have to mull over problems, make sure that there is time to think about research, talk with others and see new results. I like Feynmann's idea of carrying around a stock of open problems, and every time one sees a new idea or result thinking, "Can I apply this to my problems?".

We also need a break occasionally. So even if we don't join the herd and rest on the public holidays, and find ourselves at our desks on weekends or late into the evening, we should take time out at other times, and get away to indulge our passions for poetry appreciation, sudoku playing, beach volleyball, puffin spotting or whatever. Never let the number of hours worked be your metric for self-evaluation, and try to avoid achieving the trivial work-life balance.

Having pontificated on this subject for too long, I shall get back to work now.

Sunday, May 29, 2005

A Paltry Plethora of Permutation Puzzles

I remember a few years ago whenever I would hear a new puzzle it would often be prefaced by the explanation that this was a puzzle asked at interviews for Microsoft. Examples were the classic "how to test if a linked list has a loop in it in linear time and constant space", and the by now hoary "U2 Puzzle". I always maintain the answer to "what should U2 do in this situation?" is "Sack their tour manager", but sadly no one seems to think that this is reasonable. International supergroups have never previously shown such strong regard for punctuality and I believe that their fans could tolerate the extra three minutes delay. Certainly, some of Microsoft's software releases have been in excess of three minutes later than their original shipping date. But I digress.

Anyway, these days many new puzzles I hear are prefaced by by an explanation that they were asked at interviews for Google. I wonder when this shift happened and Google took the crown for being the tech company with a reputation for hiring the best at answering funny little puzzles. I also wonder if they really were asked in interviews, or whether this is just a shorthand for "This is a moderately tricky problem, but you should be able to work out an answer within a reasonable period of time. At the very least you should be able to find some solution quite quickly even if it is not the optimal one". With this in mind, here are three puzzles relating to permutations which I think are of "popular tech company interview level". They might puzzle you for a while, but the solutions are fairly simple.

1. You are given a permutation of 1...n and you want to cyclically shift it by k. That is, the item at position 1 should go to k+1 mod n, at position 2 should go to k+2 mod n and so on. Usual restrictions apply: do this in linear time and with constant space [that is, you have a constant number of variables which can each hold an item or a pointer].

2. You are given a string of length n, consisting of words separated by spaces. Permute it so that the order of the words is reversed, but the words themselves still read left to write. For example, the transform applied to the first sentence of this puzzle would be "spaces. by separated words of consisting n, length of string a given are You"

3. Take a permutation of 0..2n-1 and subtract i from the i'th entry, modulo 2n.
For example, 2 3 1 0 becomes 2 2 3 1. Show that for any n this operation can never result in a new permutation. [Note that the permutation is of even length; for odd length one can easily find a permuation of length 3 for which this is not the case].

In terms of origins, one of these was apparently asked in a Google interview. One was a puzzle created by a CS researcher (not sure whom or how), and two of then came up in the context of a research problem to prove a lemma. Yes, that's four: one puzzle came up in two contexts.

Many thanks to Amit and Muthu for telling me some of these problems.

Saturday, May 28, 2005

Sudoku OK, kudos UK

Thanks to Suresh for the introduction and for the opportunity to add a few posts to the Geomblog. Since it is a "holiday" weekend in the US and elsewhere, I thought I would begin with some posts on puzzles and games from a computer science perspective.

As Lance observes, there seems to be a sudden and unexplained interest in Sudoku in my home country. It felt very odd to read about this while living in the US, since it makes the extraordinary interest this seems to have generated (newspapers publishing tens of puzzles every day and competing with each other for who has the best puzzles) seem very abstract and hard to imagine. Of course, since I was reading about this in an online newspaper, it's quite possible that they were merely hyping the phenomenon in their own interest.

The game is pretty simple: one is given a 9 by 9 grid, formed of 9 subgrids of size 3x3 each. Some digits from 1 to 9 are filled in some of the squares, and the goal is to complete the grid under the constraints that each digit must appear exactly once in each row, column, and subgrid.

I tried some of these puzzles online at http://www.sudokufun.com/sudoku.php and the guardian, and finished them in an hour or so each. As a game, I didn't find it very satisfying: it didn't seem to require too much wit or inventiveness to finish. Once I picked up a few heuristics, it seemed to follow pretty easily from repeated application of the Sherlock Holmes method: eliminating possibilities. Perhaps it takes a few more tries to get addictive. There is certainly something rewarding about the "endgame" where the last few squares can get filled in very quickly indeed.

As a good computer scientist, my first thought was also how to generalize this puzzle. If we call this version Sudoku-3, then one can easily imagine how Sudoku-n will look like, and very quickly (by websearching, of course...) discover that the decision problem is NP-Complete (that is, to decide whether a given instance has any solution). Is this the end of it? Not really. There are many other questions that spring to mind that I do not know the answer to:

  • Counting version: How many different filled in Sudoku-n grids are there? Let me get you started: For n=1, the answer is 1. For n=2... actually, I don't know, but there are at most (4!)^4 ways to arrange 4 lines of permutations of {1,2,3,4}, so checking all 300,000-ish possibilities won't take too long on a modern PC. Someone who is a whiz with mathematica could probably do this in a few lines of code. But for 3, one will need some smarter techniques to do the counting, perhaps building up satisfying sublocks, and using symmetries more cleverly. I couldn't spot any entries in Neil Sloane's encylopedia of integer sequences for this problem yet.

  • Algorithms. One important feature of the puzzles that are printed is that there is exactly one solution. For these the decision problem is simple: the answer is always "solvable!" However, given this fact, is there now a simple algorithm to find the solution which does not resort to exhaustive checking? here someone has laboriously gone through an example of how to solve the puzzle by repeatedly applying simple elimination rules. Can you prove that this is always sufficient?

    There's a danger of a tautology here: apparently many of these puzzles are generated by computer and then checked automatically that there is a unique solution. So the difficulty of these puzzles is only as hard as the complexity of the checker! In other words, if the checker runs the elimination algorithm to verify that there is a unique solution, then it will reject some grids that can be solved uniquely if they do not yield to the elimination method. Indeed, some grids are crafted by human designers and these are supposed to be superior (although, I do not know how easy it would be to distinguish machine generated from human). I tried one of each, and although the human generated puzzle took longer and seemed to require some additional rules to finish, it would be hard for me to say where the creativity comes from.

  • Puzzle generation. Staying on this theme, how could you generate a random instance of a puzzle to give to someone to solve? Randomly filling in a few entries then checking whether there is a (unique) solution might work. But now how do you make this a good puzzle? Perhaps the generated instance is too easy (eg, if very many entries are filled in already, or all 9 copies of one digit are filled in). How to tune the algorithm to generate good grids, or ones at differing levels of difficulty? I suspect that this problem very rapidly becomes a subjective matter, and comes down to a variety of heuristics, but perhaps one could define some metrics or desired properties and make it more rigourous.

  • (Philosophical -- Extra Credit). Why is this a craze? What is it about these puzzles that makes them enjoyable? As has been proved by many computer scientists, many popular games and puzzles, from Tetris and Minesweeper, to various block pushing and object arranging games, are NP-Hard. Does this explain why we enjoy them? Is there something common to these "constraint satisfaction problems" that makes them enjoyable? There's something a little wrong with this hypothesis though: the instances of these games used to prove hardness have a tendency to be very repetitive, and boring seeming. For instance, the reduction used to prove Tetris NP-Hard relies on a very regular initial arrangement and many repetitions of the same block in sequence. When I play Tetris, I enjoy much more the application of essentially simple heuristics to try to produce an arrangment that can adapt to whatever happens to come next. When I play Lemmings, I know that the level designers have in mind some sneaky way of using the physics of the game in order to win the game. The reason I didn't enjoy my first games of Sudoku very much was because it seemed too deterministic: I wanted some non-determinism, but not too much!

So, I haven't really been hooked by this craze. I think I will stick to my preferred newspaper puzzle, the cryptic crossword, of which more in a later post...

Friday, May 27, 2005

Off to puff at puffins

and other bird-like creatures on the Orkney Islands, and other parts of Scotland. Alas, I will miss the exciting adventures of geometers at the annual sausage convention in Pisa. I assume the Urbana blogging mafia will report on all the goings on at the tilted town, along with revealing the appropriate affine transform that makes the Leaning Tower look straight.

In the interim, I have a guestblogger ! Graham Cormode is at Lucent Bell Labs (no dammit, AT&T Labs is NOT Bell Labs, how many times do I have to repeat it, arggghhhh.....)

Oh yes, where was I ? Graham Cormode is at Lucent Bell Labs, and does all kinds of cool things in massive data sets, as well as being a personal guest of Bill [Clinton|Gates]. Hopefully he will entertain with sonorous soundbites while I am away.

And so I'm off, clutching (or rather, lugging behind me) the Roger Penrose 1100 page opus, 'The Road to Reality'. Looks like fascinating bed time reading: he makes Lie groups seem almost cuddly...

Wednesday, May 25, 2005

Durable metaphors for (theoretical) computer science

Michael Mitzenmacher writes:
I think as a field we need to do a better job clarifying that computer science is about a lot more than programming, and that the skills learned in the major can apply well in other areas.
Indeed. One of the main areas of discussion at the STOC business meeting appeared to be (I did not attend) on ways of disseminating the impact of theoryCS. The new theory advocacy website aims to be a clearinghouse of such information.

However, there is an important other angle to this whole business of demonstrating the value of theoretical computer science (and CS in general, although this post will be theory-centric). Not only do we need demonstrable examples of where theory is used in practice, but we also need a more fundamental story line for the raison d'etre of theory in the first place.

To take the example most commonly touted, the mythical 'man on the street' has probably heard of E=mc2 and quantum physics and something about relativity, without necessarily understanding anything more about it. This person, if asked, will be able to hazard some guess as to what physicists do, and how it's fundamental (in a vague way) to everything in our modern world (at least to nuclear weapons and space exploration).

But we don't have a similar story line for theoryCS.

The point of doing theoryCS is that one believes (and there are degrees of belief here) that a computation is a tangible entity. What I mean by this is that there is something almost Platonic about a 'hard' problem and an 'easy' problem; the intrinsic complexity of a problem appears to be a feature of that problem itself, independent (upto a point) of which machine we are using.
These statements are given their heft by the Church-Turing thesis and its effective variant, and even the notion of completeness, the idea that certain problems are the exemplars of what is hard to do with a certain limited resource.

I am uneasy going as far as claiming that there are 'natural laws' for computation, but the analogy is not completely outlandish. Theoreticians do believe that at the core of any computation is "a problem", and the first order of business is to identify this core problem and classify it depending on various measures of complexity (space, time, etc).

But this is not an idea that is conveyed in most undergraduate classes in computer science. In fact, this is not even an idea necessarily shared by all areas of computer science, otherwise we'd see a lot more 'problem identification and classification' than we currently do.

The idea though is crucial to providing a deeper answer to the 'whys' of computer science, in a way that can capture the imagination, far more so than a litany of applications. After all, let's face it; people don't remember general relativity and quantum physics for the list of applications enabled by them, but for an ineffable mysteriousness and beauty that surrounds these descriptions of our world.

We need to communicate that same sense of beauty that comprises the hard core of (theoretical) computer science if we want to convince people that the field is more than just programming. This is not as a replacement for the more traditional theory advocacy that we should and must do, but as a different form of advocacy that targets a very different audience, for a different purpose.

Tuesday, May 24, 2005

Do we really need more students in CS ?

Michael Stiber and Daniel Lemire ask the seemingly counterinituitive, but interesting question: "Do we really need to admit more students in computer science". The points they make are valid:
  • there isn't really that much of a demand for CS degrees, especially with outsourcing trends, and business leaders clearly have a vested interest in hyping a 'scarcity' of students (a glut in supply keeps wages low).
  • Much of the recent drop is a descent from the heights of the late 90s, when anyone who could type wanted a CS degree: this may reflect a natural 'return-to-normalcy' in computer science
  • Do universities really want to be in the business of training students in "IT", or giving them a CS degree ?
It's a good question to ask, and it at least seems that the downward trend in computer science is being counterbalanced by an upward trend in areas like biotech, reflecting the shift in perception of what is 'hot' right now.

However, from a strategic and tactical point of view, this question is a disastrous one to ask. As has been noted in many places in the last few months, we are now facing a significant funding crisis in computer science. Most traditional sources of funding for computer science are drying up, at possibly the worst time for this to happen, when American competitiveness in the global intellectual market is being called into question. If we convince ourselves that a drop in interest in computer science is part of the normal course of things, we will never be able to convince anyone else that they should allocate funding for our work.

As a computer scientist, I am of course biased, but I do believe that the technology revolution is not over yet; rather, we are at the end of the beginning. There is ample room for serious innovation and creative thinking in computer science, and if computer science enrollment (as indicative of a general lack of interest in the field) is dropping, the question to ask is not
Is this a bad thing ?
but rather
What do we do to rectify this trend ?
Daniel has argued in the past that it is our responsibility not to send Ph.Ds out onto a market that cannot sustain them, and therefore slowing down the pace of research is not necessarily a bad thing. In my view, this is a different problem, one that is better addressed by restructured funding at the grant level. Moreover, the steady decline in funding is being accompanied (paradoxically) by an across-the-board surge in paper writing, as academics struggle harder and harder to distinguish themselves from their peers in order to get the funding crumbs still available.

But CS enrollment decline at an undergraduate level is reflective of a deeper problem (as Michael points out, one of these is an increasing gender disparity) in computer science, a perception that the field is not interesting or vibrant any more. This perception directly feeds into political perceptions at funding agencies: if there is nothing interesting happening in a field, and people are not going into it, why fund it ?

I believe that computer science is going through a crisis of definition right now. The internet years caused an explosion in the field, and demand for CS majors was so high, that curricula became fairly fluid, and companies were willing to snag CS grads and train them in whatever skills they needed. But now, as companies are cutting back, they are less willing to take someone with no programming skills and training in real software development, and I have heard of at least some companies that are now leaning on university programs to tailor their bachelor's degrees to be more 'IT friendly'.

There is far more that can be said on this topic, and I had wanted to post a longer note about it. But it is something that departments everywhere must be pondering right now. On the one hand, you can make a degree program produce graduates that are more employable, but you veer dangerously close to the 'vocational training' edge of the cliff, or you make a degree program more grounded in rigorous training, (essentially what we have now), and continue to lose students to other programs because the CS degree they could get is not 'marketable'.

I was busy this weekend...

Photo courtesy Karen Ho.

Baez and Zeilberger

Two math bloggers who write very entertaining posts (but have no RSS feeds alas) are John Baez and Doron Zeilberger.

Baez writes the well known column 'This Week's Finds in Mathematical Physics', which he cleverly called 'This week's..' instead of 'the week's...' so he didn't have to post something every week :). He discusses very technical material in mathematical physics, and is up to week 216 now (as of today), and his writing provides excellent model for a longer form exposition of technical content for the non-expert, something I hope theory CS folk can pick up on.

Doron Zeilberger's articles are less about mathematics and more about the process of doing mathematics, but are hilarious. He doesn't post that often, but all his articles are great to read. His most recent column is May 5, 2005.

This is all part of the theory advocacy process. We have to communicate the value of what we do to funding agencies, but we should try to communicate to a general audience as well. At the very least, the latter sharpens our skills for the former, and at best, we increase the awarenesss of the general population about the work we do.

Monday, May 23, 2005

Gödel Prize

I was browsing the Gödel Prize website and noticed that the rules for eligibility changed this year.

The old rule (governing prizes from 1993-2004):
Any research paper or a series of papers published (not reprinted) in a recognized refereed journal by a single author or a team of authors in the seven years preceding the year of the award is deemed eligible.
The new rule (prizes awarded 2005 and later):
Any research paper or series of research papers by a single author or by a team of authors is deemed eligible if the paper was published in a recognized refereed journal before nomination but the main results were not published (in either preliminary or final form) in a journal or conference proceedings 14 or more years before the year of the award.
So, two key changes: firstly the clock starts at the time of conference publication, not journal publication, because "it often is the most effective means of bringing the results to the attention of the community". Secondly, and of course this is related to the first change, the time before eligibility expires has increased to 14 years.

Which means that there is now time for the amazingly cool paper on embeddings by Linial, London and Rabinovich (published in FOCS 1994, and in Combinatorica in 1995). Maybe it's already been nominated... ?

p.s Given the limited eligibility period, it seems that giving out more than one award each year might account for bursts of inspiration where more than one seminal paper gets written in a year.

Streaming Algorithms

This year's Gödel Prize went to Noga Alon, Yossi Matias and Maro Szegedy for their 1996 paper titled 'The space complexity of approximating the frequency moments". Why ? What was so great about this paper (which we shall call the 'AMS' paper) ?

If you study databases for a living, or work with large data sets, you will know that the amounts of data that computers are being asked to process is growing astronomically. Google crawls over 8 billion pages on a regular basis, and if you start monitoring data on networks, you are looking at gigabytes of fresh data streaming in every day. If you're Walmart for example, you might have sales data showing up every minute, and if you're an astronomer observing on a telescope, again you get humungous (ed: 'humungous' is a technical term that translates as 'really a lot of stuff, and I am too lazy to figure out exactly how much') amounts of data a constant basis.

Most of this data cannot be stored; it is just infeasible, even with storage as cheap as it is, to store everything you get. More often than not, you want to compute a small summary of the data and throw the rest away. Even if you are willing to archive large portions of your data, it is too large to do careful processing over. Most often you will scan the data ONCE, and retain whatever you can maintain usefully. As database folks know well, accessing a disk is a lot cheaper if you do it in sequential order.

Remember though, that memory in your computer is nowhere near as large as your disk. A state of the art PC might have 4GB of RAM, and over 500 GB of disk, and servers have even more disk space, with not necessarily much more RAM. So here's the problem:
If I can't even read my data into memory to process, or I am drinking at this hose that's blasting data at me, how can I compute summaries efficiently with only a small amount of main memory ?
This is the prime motivating question for the area of streaming algorithms, an area of frenetic research in algorithms and databases over the past 10 years. For a survey, read this tech report by S. Muthukrishnan.

The AMS paper looked at the following simple problems on a stream of numbers:
  • Can I count the number of unique items ?
  • Can I estimate the variance of the items ?
In general, if I associate the number fi with the frequency of the number i, can I compute functions of the form Fk = ∑ fik, where F0 thus denotes the number of unique items, F1 the sum, F2 the sum of squares (which can be used to compute variance) and so on.

You might think it's easy to count F0 and F1. You'd be partly right. I can count F1 by adding up all the items as they come in. I need only space to store the sum. But if I want to count the number of unique items, I have to maintain one memory location for each new item that comes in, and that could mean needing n memory locations for n items, which violates the rules of this game (memory used has to be MUCH LESS than the size of the data).

In general, for frequency counts other than F1, I am in a bind. I can't maintain all the frequencies fi, because that would need me to maintain as much space as there were distinct items (and in a worst case analysis this is proportional to the size of the data). So what can I do ?

This is the space of problems attacked by AMS: their main contributions were:
  • Good news: F2 can be approximately computed in logarithmic space; in other words, proportional to the space required to store a single element of the stream
  • Bad news: if you are not willing to toss coins (use a randomized algorithm) and be content with an approximate solution, you will need space linear in the input stream.
  • Tight results for different k.
Their paper was the first to use communication complexity to prove lower bounds on stream algorithms. This is too long a topic to get into now, but it is one of the most powerful methods for proving limits on streaming, and most current lower bounds use this idea.

This paper, along with later work by Henzinger, Rajagopalan and Raghavan and Feigenbaum, Kannan, Strauss and Vishwanathan, laid the foundations for the entire field of streaming algorithms. The time was right in the middle 90s for streaming to emerge as a real field with real applications. Interestingly, some of the nascent ideas date back much further (to a paper by Flajolet and Martin in 1983 for computing F0, and a paper by Morris on computing F1 in log log n space (!) in 1978, in the CACM-as-a-good-technical-publication era).

The field has expanded greatly since AMS, but the problems they studied are still of great interest. In fact at this very STOC, a paper by Indyk and Woodruff closes the final open problems left behind by the AMS paper.

Tags: , ,

STOC 2005

Lance has been liveblogging from the business meeting at STOC 2005. Usually these events are quite dreary; this year though it looks like things got hot and spicy, with many discussion on the direction of theory funding (DOWN) and what we can do about it (TAKE TO THE STREETS?).

A new wiki, TheoryMatters, has surfaced as an intended clearing house for advocacy and information about the relevance of theoretical computer science. There appear to be some initial glitches (you can't sign the guest book or edit pages without a password, and no mention on how to get one), but it will be an interesting experiment...

Update: Lance points out that the main page has a contact email address, and a mailing list that you can subscribe to.

Friday, May 20, 2005

you'd better believe it:...

From Melvin Durai's latest column:
Q: Your house, containing everything you own, catches fire. After saving your loved ones and pets, you have time to safely make a final dash to save any one item. What would it be?

A: That's easy: my Indian passport. If there's anything worse than watching your house on fire, it's spending a day at the Indian embassy. A friend of mine walked in with a full head of hair and walked out with a bald spot. I really felt bad for her.

Chickens and Primes

Have you ever had an annoying mathematical headworm, a puzzle that starts buzzing around in your head and won't go away till you solve it ? Well, if you have, and don't like the feeling, don't read further.

Via think again ! comes this intriguing puzzle. The original involves KFC and puzzled managers and is far more colorful. The pared-down version presented here is:

Given two relatively co-prime numbers A and B, show that there exists a number M such that all numbers N > M can be expressed as non-negative integer combinations of A and B (i.e for all N > M, there exist p, q non-negative integers such that N = pA + qB). Find this M = f(A, B).

Note that the coprime condition is critical else only multiples of gcd(A, B) can be expressed at all, and the statement is trivially false.

Math education and the illusion of certainty

Two articles were the inspiration for this musing. Firstly, there was this oped in the LA Times titled 'Definitional Drift: Math goes Postmodern'. The premise:
A baker knows when a loaf of bread is done and a builder knows when a house is finished. Yogi Berra told us "it ain't over till it's over," which implies that at some point it is over. But in mathematics things aren't so simple. Increasingly, mathematicians are confront ing problems wherein it is not clear whether it will ever be over.
The second article is a note published by Thurston in the Notices of the AMS in 1990, which was then reprinted in the arXiv. It's on math education, and I found the following excerpt quite salient:
People appreciate and catch on to a mathematical theory much better after they have first grappled for themselves with the questions the theory is designed to answer.
As mathematicians, we know that there will never be an end to unanswered questions. In contrast, students generally perceive mathematics as something which is already cut and dried—they have just not gotten very far in digesting it.
Insofar as these comments relate to math, they do apply (in a lesser degree) to theoryCS as well. Probably the biggest misconception about mathematics is that it is about calculations and formulae. Thurston's article refers to the common student desire to "know the formula to get the right answer", and Becky Hirta and TDM have written countless posts on this topic.

The sense of determinism that lies at the heart of these assumptions (and which is exemplified, by a rhetorical contrast, in the LA Times piece) is the idea that mathematics is a dead subject full of cranks that you turn to spit out answers to questions. Want a derivative ? Here's a chain rule for you. How do I solve this quadratic equation ? Apply this formula.

What is the consequence ? You come to math expecting to memorize expressions, and do things A CERTAIN WAY. For various pedagogical reasons, none of which I am competent to discuss, teachers often encourage this approach, and students, while gaining some appreciation of the "scaffolding" of mathematics, miss out on the deeper meanings that make the study of abstract structures a far more beautiful and satisfying endeavour than it appears on the outside.

Math textbooks also don't help the matter much because, as Thurston points out:
The best psychological order for a subject in mathematics is often quite different from the most efficient logical order.
I have found this to be the case myself, when trying to pick up some new math. The intuition behind the definition of an affine connection is far more likely to stick in my head than the actual definition of a covariant derivative, and is crucial to understanding how the definitions come to be in the first place (the whole 'teach a man to fish' argument).

Growing up learning mathematics without acquiring an appreciation for the true underlying dynamics causes the kind of misconceptions perpetuated even on a show like Numb3rs, where the whizkid mathematicians just does calculations all the time ! And this is one of the better shows in terms of depiction of mathematicians. A mindset that views mathematical work as deterministic number crunching is unable to understand or 'grok' the incredible amount of creativity, artistry, and aesthetics that go into mathematical work and in fact are important reasons for why we do mathematics at all.

Of course, it is easy to say, and harder to do. Thurston's article makes for excellent reading, but is light on actual implementable suggestions. But the basic point of the article is a good one: that at a basic level, students are not getting a fundamental appreciation of what mathematics is all about. Consider this juxtaposition:

From the LA Times piece:
Increasingly, mathematicians are confronting problems wherein it is not clear whether it will ever be over.
From Thurston's article:
As mathematicians, we know that there will never be an end to unanswered questions.
Source: Mathforge.

Update: more on this topic from Tea Total.

Thursday, May 19, 2005

Numb3rs renewed

I got review fatigue and stopped posting Numb3rs reviews (I haven't even finished watching the episodes saved up), but I do like the series, and now news comes that CBS is renewing it for this fall. What's even more interesting is that Law and Order: Trial by Jury, that was up against it in the same time slot, was cancelled.

Wednesday, May 18, 2005

Computing classics

Via a commenter at the Quantum Pontiff, I came across this course offered by Christos Papadimitriou at Berkeley. Titled 'Reading The Classics", it takes the student through a selection of the some of the best works leading upto modern theoretical computer science. Some of the key papers:
  • Euler's Konigsberg bridge paper, arguably one of the first graph theory works.
  • Turing's original paper on computability
  • Shannon's foundational work on the theory of communications. In 2005, the 100th anniversary of Einstein's annus mirabilis, I should point out that if anyone can be said to be the equivalent of Einstein, it must be Shannon. With his work on information theory and boolean logic, he created the theoretical underpinnings of both the networks and the devices that make up the Information Age we live in today.
  • Jack Edmond's work on matchings, where the notion of polynomial time as a measure of "efficiency" was first proposed (Some say it was even earlier, in a letter that Godel wrote to Von Neumann)
  • The Cook-Karp-Levin sequence of papers that laid the foundations of NP-hardness.
and many others. Most of the papers are online (and in the case of Euler's paper, in the original Latin!).

Blog-fatigue: A highly unscientific observation

After last November, I had a 'blog-fatigue' moment where for a few months I couldn't bear to read anything political. This might have had something to do with my political inclinations, but I was curious, and started digging around. Based on a completely unscientific study of a completely non-random sample, I came to the conclusion (draw your own !) that the phenomenon was widely felt:

1. The Washington Monthly
Washington Monthly

2. Daily Kos
Daily Kos

3. Instapundit

4. Little Green Footballs

5. Andrew Sullivan
Daily Dish

6. Gizmodo (a prominent gadget blog)

The first two are prominent liberal (or at least left-of-center) blogs, and the next two are right-leaning blogs. Andrew Sullivan is generally conservative, but is not as ideologically predictable, and Gizmodo is a completely non-political blog. I wanted to check Boing Boing, but they only have stats going back to January.

The pattern seems fairly compelling, with all usual caveats added in. Gizmodo illustrates why this is not just a holiday season dip. Of course, if I had stats from prior years, even this effect could have been verified.

SIAM back issues

Although Lance will not be happy with this news, SIAM has set up electronic access to their back issues (prior to 1997) . Their LOCUS service contains SIAM journal issues from 1952 forwards. This comes in addition to the steadily growing archive of ACM journal/conference proceedings, which for STOC now dates back to 1969.

As far as I can tell, the service is accessible with a normal institutional subscription.

Darth Tater...

"He's more chemical than vegetable now". See the movie.

(HT: Adam Buchsbaum)

Tangled Bank #28 is up

The Tangled Bank is a weekly compendium of science (and medical writing) in the blogworld. This week's topic is "alternative remedies" and is being hosted by Chronicles of a Medical Madhouse.

And why is it called a "Tangled Bank" ? PZ Myers coined the phrase, as a reference to a famous quote by Darwin:
It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with Reproduction; Inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the conditions of life and from use and disuse: a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less-improved forms.
For some of the best science writing and expository work on evolution on the web, visit PZ Myers' site Pharyngula. And although I like to rant about science journalists, I have to give credit where it is overdue: Chris Mooney is by a long shot one of the best science writers around, and has been covering many of the recent controversies in the world of scientific research (the false 'balance' in covering scientific disputes like global warming, the ID controversies, and others).


The 16th International Symposium on Algorithms And Computation will be held in Sanya, China from Dec 19-21, 2005. ISAAC is a good Asian theory conference, along with COCOON. Paper deadline is June 28, 2005.

Monday, May 16, 2005

It's Open Problems time

Well not here, but here:

Symposium on Computational Geometry
June 6 - 8, 2005
National Research Council of Italy (CNR)
Area della Ricerca di Pisa, Italy
Sponsored by ACM SIGACT and SIGGRAPH

Supported by IIT-CNR, University of Perugia and Raindrop Geomagic


The Symposium will include an Open Problem Session on Monday, June 6, 5:20-6:20. We encourage researchers to propose and present succinct, clearly defined open research questions. Authors should carefully prepare a brief writeup of the proposed open problem (or a related small set of open questions), utilizing at most 1 page, including necessary definitions, background, relevance, and references. Authors may refer to The Open Problem Project (TOPP) webpage, where many open problems are already compiled, and selected submissions will be posted.

Open problem submissions should be submitted by email to Joe Mitchell by ** June 1, 2005 ** in order to be included in a printed summary to be distributed at the Open Problem Session. An updated summary of all open problems posed will also be available on the conference web site after the conference.

For details on the program and the registration procedures, please visit the conference web site.

New Stamps

The US Postal Service has released their 2005 Commemorative Stamps, and four scientists are honored:
Pictures of honorees
I am of course shocked, shocked ! that the USPS so blatantly supports the agenda of the secular evolutionists by awarding a stamp to one of their propagandists. Should your God-fearing, tax payer money be going towards such blasphemy ?

(HT: The Huffington Post)

Sunday, May 15, 2005

New Math/CS blog

Via John Langford, a new math/CS blog by Andrej Bauer:
I devote a lot of my time to thinking about the relationship between mathematics and computation. There are two sides of this, which can be expressed by a the slogan “Computable mathematics and mathematics of computation”. Computable mathematics is about how to do mathematics with computers, while mathematics of computation is about mathematics that describes properties of computation in a mathematical, abstract way.
Should make for interesting reading.

Friday, May 13, 2005


Dutch universities have banded together to provide digital free access to publications by academics in these universities. Given that the Netherlands is also the land of Elsevier, poster-child for commercial journal price-gouging, it seems only appropriate that this initiative is called DARE.

The spoof-meter does start moving towards the red zone when you read lines like this (from the English version of the website):
  • DAREnet gives digital access to academic research output in the Netherlands. It is accessible from home but also from both North and South Pole.
  • The top of the bill, the cream on the cake can be found through Cream of Science.
Maybe it's just one of those things lost in translation.

(HT: The Register, via /.)

Thursday, May 12, 2005

P vs NP is for kids...

In our humble corner of the mathematical landscape we jump excitedly at each new proof that P = NP, P != NP, or even P = RP?. But for the brave souls who patrol the vast expanses of sci.math, much more enjoyment is to be had.

Dave Rusin maintains a page of some the "many delights available through the internet". Among them, the daring exploits of one Edgar Escultura, mathematics professor in the Phillipines, who slayed not one, but two mighty mathematical dragons.

In 1998 (search for Escultura), he disproved Fermat's Theorem, demonstrating that "trillions upon trillions" of numbers satisfying the equation exist. A lesser man would have balked at the existence of a contradictory proof; Prof. Escultura is not one of those men. Faced with a cruel and unrelenting system of numbers that mocked him, he invented a new one !

When Wiles announced his proposed solution of the problem in 1993, Escultura pointed out his main error as lack of knowledge of recent mathematical discoveries, particularly, the uncertainty in the behavior of large numbers and in dealing with infinite mathematical systems such as the natural numbers.

But Prof. Escultura is no one-trick pony.
Earlier, Escultura captured the last stronghold of physics with his
solution of the gravitational n-body problem posed by Marquiz de
Laplace at the turn of the 17th Century. [...] His paper, The Solution of the Gravitational n-Body Problem, is published by the journal, Nonlinear Analysis, Vol. 30, No. 8, pp. 5021 - 5032, Dec. 1997. The paper earned for him international recognition: membership in the American Association for the Advancement of Science; the 25-member Core of Experts which is the leading body of the International Federation of Nonlinear Analysts (IFNA); the Editorial Board of IFNA and the Global Organizing Committee of the Third World Congress of Nonlinear Analysts 2000 (WCNA 2000), 19 26 July 2000, Catania, Italy; and organizer and chair of the international mini- symposium on the topic, The El Niño
and its Impact: Drought and Turbulence, during the Congress.
Chair of a conference on El Niño ? member of AAAS ? Give the man a cookie !

p.s Why bring this up now ? For some reason, his "disproof" of Wiles' theorem has been doing the rounds on some blogs recently, prompted by an article in the Manila Times.

Wednesday, May 11, 2005

"rabble" ?

I don't normally post on politics and the like, but some things just make my head explode. Lawyers, Guns and Money points to this entry on Powerline (TIME Magazine's man of the year last year for their work exposing the CSB forged memos, and in general a fairly right wing group blog):
It's great to see someone standing up for colonialism, especially British colonialism. I agree wholeheartedly with this observation, for example:
Had Britain had the courage to face down Gandhi and his rabble a few years longer, the tragedy that was the partititon of India might have been avoided.
The subquote comes from an article in New Criterion, by Roger Kimball.

So let me see; the British tried to avoid partition and were stymied by Gandhi and his "rabble". I can't stop gasping long enough to provide a cogent response to this filth, which is probably why I don't post on politics in the first place.

(HT: Crooked Timber)

Monday, May 09, 2005

NP-Completeness Columns

David Johnson used to write the 'NP-Completeness Column' for the Journal of Algorithms, and put it on an extended hiatus in 1992. He's restarting the column at its new home in the ACM Transactions on Algorithms, and in the tradition of must-read-columns everywhere, has brought out a special edition ! super-enhanced ! electronic ! director's cut !! one-time only !!! compilation of all his previous columns. Read them all and find out what you're missing !!!!

p.s Writer's commentary, subtitles in Swahili and extra deleted text not included.

p.p.s Jokes aside, these columns are really worth a read. They are written so as to be accessible to the practising theorist without compromising on level of detail and rigor.

The beast is dead...

Long live the beast !!

Out, finally, is the CFP for SODA 2006. The deadline is July 6, so get cracking ! With no small amount of pleasure I note that the severe beating administered to short papers had its effect:
In contrast to previous years, there will be no distinction between short form abstracts and long form abstracts. There is a 10-page limit on submission length, and authors should feel free to submit abstracts that are significantly shorter than 10 pages.
SODA 2006 will be in Miami, at the Radisson Hotel. I'm told the hotel is a few short muggings away from the beach.

Saturday, May 07, 2005

Mmm, chicken...

Not as impressive as saving one's life with a Taylor series, but still incredibly romantic:
Name one mathematican mentioned in the textbook. What is this person known for?
Answer: Steve Fisk came up with the art gallery theorem while on a bus with chickens on top in Afghanistan.

Numb3rs Non Review:

I am woefully out of date on my Numb3rs reviews, and more importantly, I have nothing interesting to say (although by that argument this whole blog should vanish in a cloud of smoke). At any rate, I actually do plan to catch up on my Numb3rs viewing and reviewing (thank you TiVo). For now, for the vanishingly small fraction of readers who don't also read /., I give you the Hollywood Math and Science Film Consulting corporation, whose expert advisors guide the poor clueless Hollywood innocent through the mysterious, magical and mindbogglingly dull ways of DA SCIENTIST.

It's a good thing they don't live in Kansas:
The hearings in Topeka, scheduled to last several days, are focusing on two proposals. The first recommends that students continue to be taught the theory of evolution because it is key to understanding biology. The other proposes that Kansas alter the definition of science, not limiting it to theories based on natural explanations.

Friday, May 06, 2005

Useful translations

From this page:

"It follows easily that" means:
One can now check that the next statement is true with a certain amount of essentially mechanical, though perhaps laborious, checking. I, the author, could do it, but it would use up a large amount of space and perhaps not accomplish much, since it'd be best for you to go ahead and do the computation to clarify for yourself what's going on here. I promise that no new ideas are involved, though of course you might need to think a little in order to find just the right combination of good ideas to apply.
Now imagine you had to replace the first by the second in each occurrence in a 10 page conference submission ;).

Sunday, May 01, 2005

Rene Thom and Dali

It's been many days since I posted anything of consequence, and I am still trying to prove that /time spent blogging/ + /time spent on research/ is not a constant. Oh well...

I finally got done with a slew of deadlines, and took a trip over to the Philadelphia Museum of Art to see the ongoing Dali exhibit. I'll leave an art critique to the experts; it was refreshing and rather bizarre, all in the same breath. Dali had this knack of absorbing the prevailing trends of the time and then expressing them in his work in a vivid way. He did this with Freudian theory, and with quantum theory (or more precisely with the idea that matter is indivisible).

His last painting was called 'Swallow's Tail', and is yet another example of his ability to transform ideas into art. In this case, the ideas came from topology, and specifically from Rene Thom's work in catastrophe theory. The term 'swallow's tail' is in fact a term coined by Thom to describe a certain kind of topological structure.

Dali made other topologically inspired paintings towards the end of his life:
Tags: , ,

Disqus for The Geomblog