The Geomblog

Sunday, February 18, 2007

STOC 2007 Results out

Via Piotr, the STOC 2007 results are out. They are sneakily hidden on Uri Feige's home page, so if you go to the STOC page, you don't see them.

77 papers made it in: I don't know how many were submitted (Update: 312, quite a healthy number). As is often with STOC, the titles are somewhat inscrutable, so it's hard to spot papers that might be interesting.

Some notable entries:

Computing crossing number in linear time, by Ken-ichi Kawarabayashi and Bruce Reed:
For a fixed k, they determine whether a graph can be drawn with crossing number at most k, and if so, construct such a drawing, all in time linear in n. This improves an earlier quadratic time algorithm of Grohe.
Combinatorial Complexity in o-Minimal Geometry, by Saugata Basu.
A continuation of his work on estimating complexity of various topological quantities, for even more generally defined subsets of Rⁿ.
Fourier meets Möbius: fast subset convolution, by Andreas Björklund, Thore Husfeldt, Petteri Kaski, Mikko Koivisto.
An application of the Möbius transform, in a kind of generalization of the use of Fourier transforms, but for subset convolutions.
Lower Bounds for Randomized Read/Write Stream Algorithms, by Paul Beame, T. S. Jayram and Atri Rudra.
This is another in a series of papers that I really must read, given how it reminds me of things I was dabbling in involving graphics cards. The quirk of this model is that the stream algorithm can write onto streams as well as reading from them; memory is limited, but not intermediate storage (for writing such streams)

Parochial concerns section: by my count, 6 geometry papers, which is fairly decent.

Thursday, February 15, 2007

Fastlane and Holidays

As many of you are painfully aware, the NSF deadline for theory proposals is this Monday, Feb 19. As you may have also realized, Feb 19 is a federal holiday. According to NSF guidelines, in such an event, the deadline moves to the next working day (i.e the 20th) and I've confirmed that this is a correct interpretation for this deadline.

However, does Fastlane itself know this ? Suppose Fastlane refuses to accept any proposal submitted after Feb 19 (since in principle it has the information on the call and the deadline) ? Can the program manager then override it ?

I wonder if anyone has any experience with this...

Wednesday, February 14, 2007

Valentines for my current state...

XEmacs, AUCTeX and RefTex
The pre-coded BibTeX at the ACM DL and DBLP
Pandora.com

New Research Center in Massive Data Algorithmics

Lars Arge, crown prince of massive data set algorithms, informs me that MADALGO, a new research center devoted to massive data algorithmics, is now up and running.

Several Postdoctoral positions at the level of Research Assistant Professor of Computer Science are available. Initially, the positions are for one year, but they can be extended by mutual consent. Applications are welcomed from researchers with clearly demonstrated experience and skills in the design and analysis of algorithms and data structures. Applicants with experience with I/O-efficient, cache-oblivious or streaming algorithms, as well as with implementation of such algorithms (algorithm engineering experience), will be preferred. The responsibilities of the candidates include work on algorithms for massive dataset problems in collaboration with center researchers, along with modest teaching responsibilities.

It's a great opportunity if you're looking for somewhere to do a postdoc, and like mucking around with lots of data. Massive data problems have added a profound new dimension to algorithms research, and this center will really help push research in this area forward. There are also Ph.D student positions available.

Saturday, February 10, 2007

Weird BibTeX problem

This is puzzling me, and I am hoping my readers might help:

I want to add a reference to this entry:

@article{ref,
author = {First Last and First M. Last, III},
title = {Insert Title Here},
journal = {Int. J. Comput. Sci.},
volume = 1,
number = 1,
year = 2010,
pages = {137--154},
}

and it comes out looking like this (in my .bbl file):

\bibitem{ref}
{\sc Last, F., and First M.~Last, I.}
\newblock Insert Title Here
\newblock {\em Int. J. Comput. Sci 1}, 1 (2010), 137--154

As you can see, there are two problems:

The second name is not formatted in the style of the first
The second author has been replaced by their grandparent (!) (the III has been replaced by I).

This is using the acm.bst style file. I tried using plain.bst, and the problem persists: the second author is listed as III First M. Last.

I tried standard tricks like enclosing the III in braces, placing commas in certain places, removing them etc. No luck.

Friday, February 09, 2007

Carnival of Mathematics

First there was the Tangled Bank, and then Philosophia Naturalis. Now we have the prosaically named Carnival of Mathematics, a round up of all the mathy goodness in the blogosphere. Your 'umble blogger has an entry or two, but more importantly, there are lots of interesting posts from blogs I had never heard of before. Time to update those blogrolls !

Wednesday, February 07, 2007

Have all ideas already been thought of ?

Amidst a superb reflection by Jonathan Lethem on ideas, the marketplace, and copyright, consider this section, on ownership of ideas:

Undiscovered public knowledge emboldens us to question the extreme claims to originality made in press releases and publishers' notices: Is an intellectual or creative offering truly novel, or have we just forgotten a worthy precursor? Does solving certain scientific problems really require massive additional funding, or could a computerized search engine, creatively deployed, do the same job more quickly and cheaply? Lastly, does our appetite for creative vitality require the violence and exasperation of another avant-garde, with its wearisome killing-the-father imperatives, or might we be better off ratifying the ecstasy of influence—and deepening our willingness to understand the commonality and timelessness of the methods and motifs available to artists?

The "ecstasy of influence": what a beautiful way to describe the joy of illumination when we read a beautiful theorem, or a profoundly elegant idea, and use it to build something of our own.

Tuesday, February 06, 2007

SoCG results out

45/139 acceptances (Congratulations, David) for a 32.3% acceptance rate. Nope, no one's getting tenure with that number :)

The list will be up shortly. Since I was on the committee, I am loathe to make specific comments about papers that I liked. Suffice it to say that I am looking forward to going to Korea.

Update (7/702): And here it is.

Monday, February 05, 2007

The Geometry of Morality

From Indexed, via BB: What do the diagonals of the seven deadly sins represent ?

Saturday, February 03, 2007

Some good news on funding the NSF

As has been reported in many places, the House has passed a new spending bill that puts back many of the American Competitiveness Initiative funding that was proposed for FY07. This bill was passed in agreement with the Senate, so Senate passage is hopefully forthcoming next week, when it's taken up there.

There are some differences in the level of the increase. According to Peter Harsha at the CRA blog,

Under the agreement, NSF would receive a 6 percent increase, slightly below the 7.8 percent increase called for in the ACI, but $335 million more than FY 2006.

But according to the AAAS funding update,

The National Science Foundation (NSF) would receive the full requested increase of 7.7 percent or $334 million for its core Research & Related Activities (R&RA) account to $4.7 billion. This funding would allow most research directorates to reverse declining funding of recent years with increases of between 6 and 8 percent. Total NSF R&D would climb 7.0 percent to $4.5 billion within a total budget of $5.9 billion, reversing two years of cuts in 2005 and 2006

Of course, the President still has to sign the bill. But since the ACI was his idea, one can be hopeful.

Friday, February 02, 2007

Visiting Barbados

The SoCG PC meeting was held this year at the Bellairs Research Institute in Barbados, in conjunction with a joint McGill-INRIA workshop on limited visibility, organized by Hazel Everett, Sylvain Lazard, and Sue Whitesides. It was my first time in Barbados; people have been coming to such workshops every year for many years now, and there's a nice core group of people who know how the system works and where all the good eating and drinking joints are.

Bellairs is a fairly minimally equipped research facility, smack bang on the beach on the Caribbean (i.e nicer) side of Barbados - shared dorm rooms, common bath areas, common kitchen, and the like. The facilities are much more basic than say, Dagstuhl, but this is more than made up for by the beauty of the surroundings (Did I mention that it was smack bang on the beach?). In fact, if you walk a few steps from the gate of the Institute, you end up on the beachfront of a swanky nearby hotel, which is rumored to have rooms that cost $700/night, with a 14 night stay guaranteed.

The communal aspect of the facilities (everyone cooks, we all make breakfast together, people share in the costs for food, beer, rum, rum, rum, and more rum) makes the workshop a rather friendly place. I was only there a few days alas, but the mere notion of a schedule where you work in the morning, take the afternoon off, and reconvene after dinner seems so natural, that I was wondering whether we could actually do this at SoCG or other conferences.

The workshop starts off with people proposing all kinds of problems loosely organized around the theme of limited visibility, and then people go off into self-organized groups to work on whatever seems interesting. Every now and then, there's a public review session to discuss progress. This model works surprisingly well, and all kinds of nice puzzles pop out of the discussions.

It's a great environment to do relaxed research in; not having the internet for a few days made looking up results a little tricky, but it was manageable. My main regret in leaving early: I didn't get a chance to go to the (in)famous Rocky's Bar.

Wednesday, January 31, 2007

A pentagon problem

Prove or disprove:

There exists no pentagon in the plane all of whose lengths (sides and diagonals) are rational.

Passed on to me by a friend. And no, I don't know the answer.

One fact that is known: no regular pentagon in the plane can have integer coordinates.

Paper ? what's paper ?

True episode:

I was looking for a paper in the university online journal listing. It's a SIAM paper published prior to 1997, so LOCUS is where it resides. Unfortunately, our online journal listing didn't seem to have LOCUS, and so I clicked on one of the help buttons to talk to a librarian (on live chat). A few minutes later, she points out to me that this particular issue of the journal is physically available in our math library.

Till that point, it had not even occurred to me to check the physical stacks.

p.s I'm just back from the CG PC meeting and the McGill workshop on limited visibility in Barbados. I had to leave early, so discussions continue. Barbados (and later, Bellairs itself) was off the internet till I left on Monday, so I have a collection of posts that will dribble out over time.

Thursday, January 25, 2007

It could be one of my students...

A posting on comp.theory:

I just started a new class recently and am in need of a good class
project.
The course is a graduate course in computational geometry.
I think I will like this course but, overall, I don't find myself too
much interested in teh material
I figure a good project will help motivate me through the course and,
well, a project is required anyway! :)
My main interests are in number theory and cryptography.
Any suggestions for a nice little project that could be done in the
span of several weeks by a beginning graduate student? The idea could
either be theoretical or an implementation in code or perhaps a bit of
both.

I have to ask: how does this person know they will like the course if they are not interested in the material ?

Wednesday, January 24, 2007

Designing homeworks

I'm teaching CG this semester, and it's a lot of fun to go back and read some of the older texts and papers. Preparata and Shamos is actually quite a neat book, chock full of useful tricks and techniques; if only they had better ways of describing their algorithms ...

The real challenge is designing good assignment problems, and I've almost given up on this. With the number of courses available on the web, and the amount of information accessible via Google, I'm hard pressed to write down a question that can't be solved using Google and a few keywords. Even the trick of digging up interesting lemmas from related papers doesn't work if you mention the paper in class. Or maybe I'm underestimating my students' potential desire to solve the problem themselves.

Tuesday, January 23, 2007

And this is a problem how ?

On Harvard's search for a new president:

The growing financial importance of research also could pressure Harvard to tap a scientist, something it hasn't done since 1933.
But Harvard also could go the other way -- picking a nonscientist who could rise above turf battles and reassure the rest of the school that America's oldest and richest university isn't becoming a giant science lab.

I don't get it. Being a giant science lab is a BAD thing ?

Thursday, January 18, 2007

Football and dynamic programming

We don't need no stinkin' longest common subsequence ! We'll use FOOTBALL to explain dynamic programming:

The footballcommentary.com Dynamic Programming Model is intended to provide guidance for certain decisions that arise during a game, such as two-point conversions and going for it on fourth down. This article explains the main ideas of the Model in simplified form. [...]

The Model is built around the idea that in making decisions, we are trying to maximize our team's probability of winning the game, and the opponents are trying to minimize that probability. There are three types of situations, called states, in which the Model explicitly evaluates our probability of winning. The first type of state is when one team or the other has just gained possession. The second type is when a team has just scored a touchdown, but has not yet tried for the extra point (or points). The third type is when a team is about to kick off.

Jobs at McGill

'Tis the season.

McGill University is looking to hire in geometric computing and bioinformatics. I can think of at least three reasons for any new Ph.D to apply:

* It's Canada ! Everyone gets funded by the government ! Need I say more ?
* It's in Montreal: where else can you get the feeling you're in a strange dream where you're in Paris but everyone speaks English ?
* You can "do research" in Barbados whenever you like, and definitely in the winter.

and most importantly,
* they have a job opening for people who do geometry for a living. How civilized is that ?

Wednesday, January 17, 2007

The Hotel Pennsylvania

Us conference goers are an ornery lot. We remember the disastrous under-construction DC hotel for SODA 2001, the bizarrely smelling Miami hotel for FOCS 1997, and the strange San Francisco hotel with the low ceilings and no lounge space for SODA 1998. There's an urban legend that FOCS can never go back to Las Vegas because of some unspecified incidents that happened in 1993; the truth is probably far more boring.

Special opprobium is reserved for the Hotel Pennsylvania in NYC, where FOCS 1999 was held. This was an old hotel; very, very old. It had the kind of rooms you'd describe as "charming" or "quaint" in publicity material. We all know what that means.

SODA 2009 is slated to be in NYC, assuming that votes are not mysteriously erased from Hal's Powerpoint slides. I am happy to announce that the Hotel Pennsylvania will in all likelihood NOT be one of the candidates for hosting the conference: it is being demolished to make way for a multi-story office complex (story via BB)

AT&T Labs is Hiring

After many years of dormant recruiting policies, and after finally getting rid of some deadwood, AT&T Labs is now hiring, in multiple areas. There's recruiting across the board in multiple areas: almost all the mentioned areas have algorithmic angles to them. So now that your faculty job applications have all been sent, send your resume to AT&T.

And in case all the mergers and rebrandings confuse you, Stephen Colbert is here to help.

Saturday, January 13, 2007

Computing the size of a convex hull

We know that ~~in the comparison model,~~ computing a planar convex hull takes Ω(n log n) time (or Ω(n log h, if you're picky) in any reasonable algebraic decision tree model. ~~The proof (at least for n log n) is a simple reduction from sorting.~~ There's a more involved proof by Yao that shows that even generating the set of vertices of the convex hull takes at least n log n time. (i.e ordering the vertices isn't the only hard part). Yao's proof is on the quadratic test model; tests on objects are allowed to be signs of quadratic polynomials, and such tests characterize all known convex hull algorithms. [Addendum: Ben-Or generalizes these results to higher-degree algebraic tests as well]

My question is: is it known whether estimating the size of the convex hull is also lower bounded by n log n (or n log h) in this model ? It seems like this should be true: I don't see an obvious way of using a size-estimation oracle to actually compute the hull (or its vertices) in linear time, but I don't know if there's a proof of this. Yao's proof doesn't appear to yield any direct insight into this question.

It's worth pointing out in the light of recent results by Chan, (and by Pătraşcu on point location), that we can solve the 2D convex hull problem in sub- n log n time; we pick our favorite machine model and sort the x-coordinates in sub-n log n time, and then run the Graham scan in linear time. Chan's paper achieves sub n log n bounds for the 3D convex hull and related problems.

Tuesday, January 09, 2007

SODA 2007: Day 2: Da Bidness

I am sorry to say that the business meeting was so dull we couldn't even muster up the energy to argue about old controversies, let alone create new ones. Here are some of the highlights:

Attendance was 276, comparable to SODA 2004 in New Orleans. 96 students
796 distinct accepted authors, from 29+ countries.
Milan Ružić was awarded the best student paper prize for "Making Deterministic Signatures Quickly".
The domain gmail.com had a 44% acceptance rate, compared to yahoo.com's 14% acceptance rate. Yahoo's stock price fell 3% on hearing this news. Google's stock price fell 5% on worries that researchers were wasting their time writing papers.
Manufactured abstract from a random selection of words in accepted paper abstracts:

We present arbitrary coinciding factors that are hierarchical and use predecessors as well as important jobs. We show there exist revenues that are treasure sales and independent for identical parametric desires.

Hal Gabow makes some attempts to shake things up with suggestions about pandering to the discrete math community, prompting this from Ian Munro:

"The notion of a discrete math community is an interesting one and somewhat perverse"

He tries to resurrect short papers, prompting this, from an unnamed PC member (but we know who you are:))

"Please don't exhume this dead horse just to kick the rotting bones around. Again"

SODA 2008 is in San Francisco; Shang-Hua Teng is chair.
Major discussion on locations for SODA 2009. Three candidates emerge (NYC/Puerto Rico/Las Vegas). After an attempt at vote-rigging that would put Diebold to shame, the organizers do a straight vote and NYC wins !!

There will be no SODA Day 3 summary from me; I have to leave early to teach. The joys of professorhood !!

SODA 2007: Day 2

David has been posting detailed impressions of specific papers. Do check them out.

He mentions the constant-degree expander result of Alon, Schwartz and Shapira. I was at that talk today morning, and it's a very neat result. Expander graphs have a long and fascinating history (see the Linial-Wigderson course notes for details), and I won't try to summarize it here. Basically, the goal for a long time has been to construct reasonably sparse, constant degree expanders that also have "simple" proofs of expansion. Many of the constructions to date were either non-constructive (like the random expanders) or had very sophisticated proofs of expansion. Luca Trevisan had two very detailed posts that give an idea of how complicated some of these arguments can be.

The main contribution of the Alon/Schwartz/Shapira paper is the explicit construction of an expander that is both small (constant degree) and which has a "simple" proof of expansion. I should mention that the celebrated zig-zag product of Reingold-Vadhan-Wigderson already does this. However, their proof (and basically all proofs of expansion) rely on the spectral analysis of the graphs, using the relation between expansion and the gap between the first and second eigenvalues of the adjacency matrix of a graph.

This paper uses a graph product called a replacement product, and presents an elementary (i.e combinatorial) proof that the replacement product of two expanders is also an expander. With that in hand, they construct three graphs, and with two replacement products, get a constant-degree expander.

The invited talk today was by Monika Henzinger, of Google and EPFL. This talk was the "applied" talk (SODA usually has one such); Monika talked about algorithmic success stories at Google, discussing PageRank (and HITS), detecting document duplicates, and load balancing queries on servers. Each of these topics deserves a post on their own (the work on detecting duplicates has some particularly elegant ideas), so I don't want to go into detail here.

There's a point worth making here. The success of PageRank and other methods for search is really a success story for algorithmic modelling, rather than algorithms per se.

What do I mean by this ? I mean that the key success of PageRank, to take an example, was the idea that pages could be viewed as nodes, and edges as transitions in a Markov chain, and that relevance (or PageRank) could be modelled as a kind of "return probability". Of course, once you do this, all your theoretical machinery kicks in, and you can prove bounds on convergence, design fast algorithms, etc. But the main step was the modelling step, where you took the raw data (web pages) and viewed them in a certain abstract framework. For those of you who don't remember this, the existing paradigm of search at the time was text-based IR, and Altavista was the main exemplar of this. What Google was proposing was a very different way of viewing documents and the problem of relevance.

This is a common, and yet widely misunderstood, aspect of doing "applied" algorithms. You can define all kinds of problems, and write all kinds of papers proving results about these problems. But the mathematical tools developed for solving these problems will always take a backseat to the essentially scientific question of whether the problems and models fit the data correctly or not.

There are many domains where this is not true; cryptography is one domain where provable guarantees are not just nice, but are the crucial element of a secure system. But the success of algorithms in Web search come not from knowing to simulate a Markov chain efficiently, but from realizing that web documents are essentially nodes in a gigantic graph, and that the problem of ranking pages can be translated to a mathematical abstraction on graphs. As an aside, one of the things that the Kleinberg/Tardos textbook appears to do well is walk students through the process of problem abstraction, via the extended real-world exercises.

Understanding this aspect of data modelling changes the questions somewhat. The issue now is not, "Is this the most efficient algorithm for the problem", but rather, "Is this the right problem for the data" ? The first question will become relevant only when the second one is answered satistfactorily, more akin to a scientific discovery of truth than a mathematical one.

Outtakes:

(Thanks to Vijay Kumar) You could, at some point, buy a watch on Amazon.com for the heavily discounted (50% off) price of $499,999. The comments on the product page are hilarious.
What's the title of Britney Spears' only SODA paper ? "Stable Marriage is Hard"

Sunday, January 07, 2007

SODA 2007: Day 1

I didn't always know I wanted to do algorithms; in fact, I came to Stanford thinking I was more interested in AI. One of my most embarrassing moments in graduate school was when I went to talk to my advisor-to-be. He told me he worked in the design and analysis of algorithms, and I asked him if he did design or analysis.

(Note to graduate students everywhere; when someone tells you that no question is a stupid question, don't believe it)

But after attending Philippe Flajolet's talk today on "Analytic Combinatorics", and after hearing Luc Devroye's talk yesterday, I'm not so sure that my question was off the mark.

A bit of background. What we refer to as "the analysis of algorithms" is usually associated with Don Knuth and the Art of Computer Programming. It referred to the initial methods being developed to analyze structures like hash tables, search trees and the like. Most computer scientists have taken some kind of discrete math class, and have seen the Knuth-Graham-Patashnik "Concrete Mathematics" textbook, and much of the content (basic combinatorics, recurrence relations, generating functions, etc) was used for early algorithm analysis.

These methods were quite precise. It's not uncommon to look back at papers from the 70s and see detailed constants in front of running times for algorithms; Bob Sedgewick mentioned one such calculation in his introduction to Flajolet's talk. People didn't use O() notation like a sledgehammer, the way we do today.

Over time, we became more comfortable with O() notation; algorithms became more sophisticated, and it became harder to figure out actual constants. It didn't seem to matter as much. After all, when you were coming up with elaborately complicated algorithms that ran in exotic times like O(n^(11/7)), it hardly seemed to matter what the constant was. This was, and has continued to be, the Age of Design.

But all along, with people like Flajolet, Knuth, Sedgewick, and many many others, the work of really analyzing algorithms continued on. ANALCO is an offshoot of this long effort; a way to attract people working on some of the considerably hard problems in this area, while creating some cross-fertilization with the design crowd at SODA. Of course, by no means do I claim that algorithm designers don't analyze; of course we do. But it's fair to say that the sophisticated analysis methods and sophisticated design methods do appear to have diverged.

Which brings us to the talk today. The rough theme for his talk was an overview of how the combinatorial problem of counting structures (trees, steps in a linear probe, etc) can be transformed into a generating function, to which then the methods of real, and more recently, complex analysis can be applied. There's some pretty heavy mathematical machinery being thrown out here: we saw large deviation theory in yesterday's talk, for example, and there are things Flajolet talked about that I have only the barest inkling of.

Doing such analysis is hard; it's not as if we're suddenly going to abandon O() notation. But, as Piotr Indyk pointed out when we discussed this later, computers aren't getting any faster, and data is getting larger and larger, and it's more and more true that the actual constants in front of a running time matter, sometimes even more than the asymptotic bound. If more sophisticated analysis methods allow us to reveal algorithm running times more transparently, this also helps repair some of the "bad press" theoreticians can get with more applied folk.

So the analysis of algorithms takes on its original meaning again; there is a conference as well, now in its 13th year, and there's an upcoming book by Flajolet and Sedgewick that covers much of the mathematics Flajolet refers to in his talk. I looked at it briefly (it's 753 pages and counting!), and I hope that when it does come out, we learn more about how to use methods from analytic combinatorics to improve analysis techniques for even our run-of-the-mill algorithms.

Outtakes:

I've quite enjoyed the talks I've attended thus far. I haven't written much about them, but that's mostly due to laziness on my part. I've been quite torn having to navigate the multiple parallel sessions; human cloning, where art thou ?
[From a random research paper session] It's funny to see a speaker struggling with their desire to display EVERY SINGLE SLIDE that they made, when faced with a remorseless clock ticking down to zero. Word of advice: no one really cares if you go through all your slides, or even flip thru them frantically while muttering very fast. They do care if you go over time and drag things ON and ON and ON.
Contrary to the general confusion being spread around, the wireless DOES work and it IS free.
I don't like hotels with two towers; especially when I'm in one and the conference is in the other, and ESPECIALLY when the only connection between the two towers is at the lobby.

Saturday, January 06, 2007

SODA 2007: Day 0

Getting into New Orleans at 1 am because of "mechanical trouble" meant that I haven't been at my best so far. But I've already heard one amazing talk today.

Luc Devroye gave the ANALCO plenary lecture on "Weighted Heights of Random Trees", based on work with his students Erin McLeish and Nicolas Broutin. After having sat through many talks with titles like this, I generally approach them with great caution and with a clear escape route. But...

This was an amazing exposition of a topic that could have become dry and terse, and essentially incomprehensible, within a slide or two. He had jokes, (that were funny), a global plan for the material, enough technical material that I went away feeling like I'd learnt something, and intuition galore. And the work itself is very beautiful.

So what was it all about ? The problem is really quite simple to state. Suppose I give you a (random) weighted binary tree, where nodes attach to parents randomly, and edges may have weights chosen randomly. What is the maximum height of such a tree ?

The standard application for such a tool is in analyzing binary search trees. The height of a such a tree controls the running time of an algorithm that needs to use it. And there's now a vast literature analyzing both the asymptotics of the height distribution (basically it's sharply concentrated around 2 log n) and the specific constants (the maximum height of a random binary search tree is roughly 4.3 log n, and the minimum is around 0.37 log n).

The "master goal" that Devroye described in his talk was this: Suppose I have a general way of attaching nodes to parents (that leads to a general distribution on subtree sizes), and a general way of attaching weights to edges (rather than being deterministically 1 for binary search trees). Such a general model captures the analysis of tries (trees on strings that are very important in text searching), geometric search structures like k-d trees, and even restricted preferential attachment models in social network analysis (Think of the edges as hyperlinks, and the height of the tree as the diameter of a web tree).

Is there a generic theorem that can be applied to all of these different situations, so that you can plug in a set of distributions that describes your process, and out pops a bound on the height of your tree ? It turns out that you can (with some technical conditions). The method uses two-dimensional large-deviation theory: can you estimate the probability of a sum of random variables being bounded by some function, while at the same time ensuring that some other sum of random variables (that might depend slightly on the first) is also bounded ?

An example of a 1D large deviation result is of course a Chernoff bound. Devroye showed that a 2D large deviation bound for the height of such trees can be expressed in a similar form using the so-called Cramér exponent, something that will probably not be surprising to experts in large deviation theory. After that, the analysis for any tree process becomes a whole lot easier. You have to analyze the corresponding Cramér function for your distributions, and a bound (with a constant; no big-O nonsense here!) pops out.

He also talked about a neat extension of this method to analyzing the "skinnyness" of k-d tree decompositions, showing that for a kind of "relaxed" k-d tree construction, the skinniest cell can be extremely skinny (having a super-linear aspect ratio). It's the kind of result that I imagine would be very difficult to prove without such a useful general theorem.

Friday, January 05, 2007

Puzzles and Mysteries

"Whatever comes in sufficiently large quantities commands the general admiration." Trurl the Constructor, from Stanislaw Lem's Cyberiad.

I've been reading Malcolm Gladwell's masterful article on the Enron scandal, and he frames it with the device of 'puzzles' vs 'mysteries':

The national-security expert Gregory Treverton has famously made a distinction between puzzles and mysteries. Osama bin Laden’s whereabouts are a puzzle. We can’t find him because we don’t have enough information. The key to the puzzle will probably come from someone close to bin Laden, and until we can find that source bin Laden will remain at large.

The problem of what would happen in Iraq after the toppling of Saddam Hussein was, by contrast, a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much. The C.I.A. had a position on what a post-invasion Iraq would look like, and so did the Pentagon and the State Department and Colin Powell and Dick Cheney and any number of political scientists and journalists and think-tank fellows. For that matter, so did every cabdriver in Baghdad. [....]
If things go wrong with a puzzle, identifying the culprit is easy: it’s the person who withheld information. Mysteries, though, are a lot murkier: sometimes the information we’ve been given is inadequate, and sometimes we aren’t very smart about making sense of what we’ve been given, and sometimes the question itself cannot be answered. Puzzles come to satisfying conclusions. Mysteries often don’t.

There is a fundamental problem that comes up when you start messing with "data". Our training in algorithms makes us instinctively define a "problem" when working with data, or any kind of applied domain. Many of the problems in clustering, like k-center, k-median, k-means, or what-have-you, are attempts to structure and organize a domain so we can apply precise mathematical tools.

In a sense, we treat these problems like puzzles to be solved. The game is then to find the best solution, the fastest, the most accurate; but the structure of the puzzle has been set. We can change the game (and we often do), but once again, the goal is to crack the puzzle.

But when you get down and dirty with data, you start seeing the problems that Gladwell describes. If your goal is to "understand" the data, then more is not necessarily better, and causes more confusion, and what you need is interpretative skills, rather than number-crunching or even problem solving skills.

This is what makes data mining so hard and exasperating, and yet so important. The need is clearly there, and there are mysteries to mine. But we've been attacking data mining problems as puzzles, and realizing fairly quickly that solving a puzzle doesn't reveal the mystery of the data.

I've often likened data mining research to an ooze; it's thin and spreads horizontally, without too much depth. But I think that's because the puzzles that we solve are of limited range, and not terribly deep. What we seem to need more are interpretative frames rather than algorithmic frames; frames that tell us about invariances in the data, rather than about quirks of representations.

Wednesday, January 03, 2007

I'm moving to academia

(WARNING: personal information ahead. If you prefer to think of the Geomblog as written by robotic monkeys pounding on millions of keyboards, read no further)

One of the reasons the Geomblog has been silent these past few weeks is that I've been busy moving, and falling sick, and unpacking, and unpacking, and unpacking, and...

Now, where was I ?

Oh yes, moving. After many years of cloistered comfort at AT&T, I've decided to take the plunge into the exciting and dangerous waters of academia, at the University of Utah (30, count 'em, 30 minutes from the best powder skiing imaginable).

Why the move ? Many people have asked me this, and the answer is actually simple: because I finally wanted to. AT&T has been a wonderful place for me to work (and they're hiring next year, so get those applications ready), but I realized that the kinds of things I wanted to do (teach, initiate my own research programs, guide students, and participate in the academic conversation in general) were better done at this point in a university setting.

It's not a decision I made easily. It is said that the real value of a workplace is in the colleagues you have, and from that point of view, leaving AT&T has been hard. Leaving for a real job after completing a Ph.D felt like a natural rite of passage, much as leaving India for grad school felt like. But leaving the labs was a purely elective decision, and as such makes the transition a little more jarring.

And now here I am, in Salt Lake City (technically, I'm in Cincinnati airport waiting for a much delayed flight to New Orleans, but I digres...), preparing for my geometry class, working on a proposal, and doing my research. On the one hand, I have the basic day to day business of research more or less under control, and work and collaborations go on seamlessly. On the other hand, I often feel like a fresh Ph.D at his first job, managing myriad things that seem new and foreign. It's a strange feeling.

But I'm genuinely excited to be teaching, and am looking forward to interacting with students; something that I sorely missed at AT&T, except for the occasional summer. It will be an exciting adventure.

Wednesday, December 13, 2006

Three thoughts on Anatoly Vershik's article...

Via Peter Woit and Luca Trevisan comes a pointer to an article by Anatoly Vershik in the new Notices of the AMS, lamenting the role of money prizes in mathematics. Three thoughts:

"the newspapers, especially in Russia, are presently “discussing” a completely different question: Is mathematical education, and mathematics itself, really necessary in contemporary society ". At the risk of sounding patronizing, I find it terribly worrisome that the place that spawns such amazing mathematicians, and has such a legendary training program for scientists, should even indulge in such a discussion. Especially now, with all the handwringing in the US about the lack of mathematical training at school level, it seems a particularly bad time to abdicate what is a clearly a competitive advantage.
He talks about not understanding "the American way of life" as regards how money is viewed. There's a juxtapositon of images that I've always been struck by, and that tennis lovers will recognize: At Wimbledon, the winner is crowned with a fanfare, royalty, and a trophy (or plate); the prize money is never really discussed. At the US Open on the other hand, along with the fanfare comes the huge check handed out by some corporate sponsor while the PA blares out the amount. The trophy presentation, although making for good photo-ops, seems almost anticlimactic.

I am a little skeptical though whether offering prizes like the Clay prize convinces people that mathematics is a lucrative profession. After all, this hasn't happened for the Nobel prizes.

On the false-duality: I've heard a variation of this argument many times. It goes basically like this: "Either you're interested in subject X and don't need motivation, or you aren't, in which case no amount of motivation is going to help". This is possibly true for identifying students likely to make the transition to being professionals in subject X. In fact, I've heard an anecdote from the world of music, about a maestro who would tell all his students that they would fail professionally at being musicians. His argument was that only the ones who cared enough to prove him wrong had what it took to survive.

One has to realize though that the teaching of a subject is not about creating Mini-Mes: only a small fraction of the students we come in contact with will become professional computer scientists/mathematicians/whatever. But a large fraction of these students will vote, many of them will go onto position of influence either in industry or government, and they will all contribute to a general awareness of the discipline. So it's a mistake to give up on motivating students; even if they never end up proving theorems for a living, a better appreciation for those who do will help all of us.

Categories:

outreach

Sunday, December 10, 2006

Sorting algorithm animations

Pat Morin has a great collection of Java sorting applets. You can line up three at a time and have them sort a random permutation of one through N. It's fun to see insertion sort or bubble sort earnestly plodding along long after merge sort or quick sort has blazed through.

Categories:

teaching

Tuesday, December 05, 2006

Author ordering and game theory.

There are typically two major ways of ordering authors on a paper. In theoryCS, we (mostly) use the lexicographic ordering (by last name); in many other areas, author ordering by relative contribution is common. There are variants: advisors might list themselves last by default even within a lexicographic or contribution-based system, and middle authors may not always be ordered carefully, etc etc.

Since paper authorship conveys many important pieces of information, author ordering is an important problem. It's an even bigger problem if you have hundreds of authors on a paper, some of which may not even know each other (!). This is apparently becoming common in the HEP (high energy physics) literature, and an interesting article by Jeremy Birnholtz studies the problem of authorship and author ordering in this setting. The study is sociological; the author interviews many people at CERN, and derives conclusions and observations from their responses.

As one might imagine, not too many of the problems of 1000-author papers are translatable to our domain. After all, these papers are bigger than our conferences, and I doubt anyone has never needed a "publication committee" when writing their paper. And yet, the interviews reveal the same kinds of concerns that we see all the time. Is a certain ordering scheme shortchanging certain authors ? Did a certain author do enough to merit authorship ? Who gets to go around giving talks on the work ?

Towards the end of the paper, the author makes an interesting (but unexplored) connection to game theory. The players in this game are the authors, and what they are trying to optimize is perceived individual contributions by the community (the "market"). Intuitively, lexicographic ordering conveys less information about author contributions and thus "spreads" contributions out: however, it's not symmetric, in the sense that if we see a paper with alphabetically ordered authors, it could be a product of a truly relative contribution ordering that yields this ordering, or a lexicographic ordering. In that sense, authors with names earlier in the alphabet are disadvantaged, something that seems counter-intuitive.

As it turns out, there's been some work on the equilibrium behaviour of this system. To cite one example, there's a paper by Engers, Gans, Grant and King (yes, it's alphabetically ordered) that studies the equilibrium behaviour of author ordering systems with a two-author paper in a market. Their setup is this:

The two players A and B decide to put in some individual effort.
The relative contribution of each (parametrized by the fraction of contribution assigned to A) is determined as a (fixed but hidden) stochastic function of the efforts.
The players "bargain" to determine ordering (lexicographic or contribution). The result is a probability of choosing one kind of ordering, after which a coin is tossed to determine the actual ordering
The work is "published", and the market assigns a value to the paper as a whole, and a fraction of this value to A, based on public information and other factors.

Now all of these parameters feed back into each other, and that's where the game comes from. What is a stable ordering strategy for this game ? It turns out that lexicographic ordering does yield equilibrium behaviour, and contribution-based ordering does not.

What's even more interesting is that if we look at merely maximizing research output (the external "quality" of the paper), then this is not maximized by lexicographic ordering, because of the overal disincentive to put in more effort if it's not recognized. However, this does not suggest that always using contribution-based ordering is better; the authors have an example where this is not true, and one intuition could be that if there's a discrepancy between the market perception of contribution and individual contributions, then there is a disincentive to deviate too much from the "average" contribution level.

It's all quite interesting. Someone made a comment to me recently (you know who you are :)) about how assigning papers to reviewers made them value research into market-clearing algorithms. I like the idea of applying game theory to the mechanisms of our own research.

(HT: Chris Leonard)

Previous posts on author ordering here, and here.

Categories:

community

Friday, November 24, 2006

European Workshop on Computational Geometry.

Like CCCG and the Fall Workshop on Computational Geometry, the EWCG is a workshop where you can present results that will appear in expanded form in a conference or journal.

Submission deadline: Jan 8, 2007.
Workshop dates: Mar 19-21, 2007, Graz, Austria.

Categories:

deadlines

Monday, November 20, 2006

On your marks, get set....

Gentlefolk, STAARRT YOUR LATEX !!!!!!!!!

Program Title:

Theoretical Foundations (TF07) Program Solicitation
Date: February 19, 2007

Approximately 15 small awards at $60,000/year or less will be made. For example, projects by new faculty may require NSF support for only one student or for summer salary. Most small awards will go to (or preference will be given to) PI's who have not previously been a PI or coPI on an NSF award.
Up to 55 awards will be made with an average grant size of $125,000/year for durations up to three years.
Up to 5 awards of up to $500,000/year for well-integrated projects of larger scope are anticipated.

The STOC deadline was over ONE WHOLE HOUR ago ! Get cracking !

p.s Thanks, Chandra.

Categories:

Thursday, November 16, 2006

On writing versus blogging

I've had this blog for going on two and a half years now, and it's interesting to see how writing posts has influenced my "other" writing.

There is the obvious difference in interface. When I add citations to a paper, I almost wish I had a tool that could highlight text and add a clickable link to a reference (the emacs extension RefTeX is great, though: it makes adding citations, references and labels blindingly easy).

I structure sentences differently as well. Links in blogs are added en passant, without interrupting the text. Citations in papers have to be worked into the sentence structure carefully (that is, if you believe the maxim that a citation is not a noun). This causes no end of confusion when I write sentences in a paper; I often have to rephrase the sentence to conform to "normal" citation format. I will add though that the parenthetical style of mentioning citations makes sense with written documents, but with online hyperlinked PDF documents I would actually prefer blog-style linking. But then again, we still have paper proceedings, so there's a long way to go for that...

It's natural that a blog post is more chatty and personal, and a paper is more formal. But writing a blog has encouraged me to be less stuffy and more breezy in parts of a paper that merit it (introductions, discussion, etc). This can only be a good thing; as the famous war cry goes, 'Eschew obfuscation' !

Writing a blog also shakes out some of the ghastlier linguistic tics that infect my writing. It's actually shocking how many there are, and how easily they evade detection.

I wouldn't recommend writing a blog solely as a way of improving (or expanding) your writing skills. But it does have benefits beyond being a soapbox for one's bloviations.

Categories:

blogs

Tuesday, November 14, 2006

Chernoff bounds, error correcting codes, and automata.

Andy Drucker, a first-year grad student at UCSD, has an interesting Math/CS blog. Some of the highlights are a post on showing constructive Chernoff bounds via error-correcting codes, and a love-poem to automata :).

Categories:

blogs

Monday, November 13, 2006

Reversal in the decline of foreign students in the US

This can only be good news:

The number of new foreign students coming to the United States grew this school year, after several years of weakness that followed the terrorist attacks of 2001, according to a survey to be released today by the Institute of International Education. [..]
According to the survey, conducted by the institute and other education groups, the number of new international students at American colleges and universities increased 8 percent this fall over last, to 142,923.
Another sign of a turnaround was a sharp upturn in student visas, said Allan E. Goodman, president of the institute. Dr. Goodman said the State Department issued a record 591,050 student and exchange visas in the 12 months ending in September, a 14 percent increase over the previous year and 6 percent more than in the year leading up to the 2001 attacks.

It's not that admitting more foreign students is a good thing in and of itself; it's more that this is a useful indicator of how competitive the US is in the marketplace of "idea generation"; for decades, the US had a monopoly on the "idea factories", and in recent years, there's been growing competition from the rest of the world (especially from China), capitalizing on the panic and overreaction following 9/11.

Update (11/17): On the other hand, there's this:

The latest IEE Open Doors report finds that the number of international students enrolled in computer and information science programs in the U.S. declined in academic year 2005/2006, as it has each year since 2002/2003. This occurred even as the number of new foreign students in all programs increased between the Fall of 2004 and 2005 and as total enrollment of foreign students stabilized.

It may not be surprising that foreign student enrollment in CIS has dropped. After all, there's a national trend of dropping enrollment in computer science. The question is whether this drop is more than the overall national trends, and how different the undergraduate/graduate enrollment statistics are.

Categories:

immigration

Tuesday, November 07, 2006

Voting

I am a politics junkie, which given my non-citizen status is as about as useful a pasttime as learning Klingon ("Heghlu'meH QaQ jajvam !!!!"). However, my life is deeply affected by laws passed here, and ensuring that not-completely brain-dead zombies come to power is a GOOD THING ("Hab SoSlI' Quch!"). So thanks to all of you that voted, are voting, or will be voting, and no thanks to Lance's prediction page for wasting so many hours of my life :).

Categories:

miscellaneous

Warning labels, continued...

This could become a nice series. Luca has a "creationist"-style warning for NP-hardness in the comments, and here's another:

WARNING: The next several slides (pages) contain equations. Your soul may be crushed.

Categories:

funny

Monday, November 06, 2006

Warning label for an NP-hardness proof

A note about the proofs: Transformation arguments can be intricate and
highly formal. Furthermore, since the polynomially-equivalent problem might
not bear any obvious intuitive relation to the voting problem, the proof might
establish computational difficulty without seeming to explain its source. In this
sense, the conclusion is more important than the argument.

From "How hard is it to control an election?", by Bartholdi, Tovey, and Trick.

Categories:

funny

Friday, November 03, 2006

Tamper-proof voting systems

Digg is a collaborative news site, much like Slashdot, where people post articles that seem interesting and topical. Through a voting scheme, articles get "dugg", and rise to the top of the site. In fact, getting "dugg" is now an inadvertent denial-of-service attack much like being "slashdotted" used to be.

The problem is that people attempt to game the system to bump up traffic to their sites, by forming voting coalitions, making fake accounts, etc etc. Needless to say, there can often be real money (in the form of advertising) behind this, so there's a lot of incentive to cheat the system.

This is not a new problem; Google and other search engines have battled search engine optimizers for a long time now. There are companies that claim to improve your location in a Google search result (for a small fee of course). An interesting side effect of all this gamesmanship is that search engines like Google and aggregators like Digg have to keep many details of their ranking process secret. Obviously, some of this is for IP protection, but protecting against vote riggers is also a major concern. A tertiary consequence of this lack of transparency is that such sites are now vulnerable to lawsuits by disgruntled sites complaining about bias in search results; Google has had to fend off lawsuits based on some claim of bias leading to monetary damage.

The latest such problem has hit Digg, where in an attempt to block out users trying to game votes on articles they want to push, the management has managed to freeze out and frustrate some of the most prolific users. A user-driven site like Digg that has many competitors can't really afford to be annoying its most valuable contributors.

So (finally), here's the technical question. Although Arrow's theorem tells us that in general, voting schemes can always be defeated, I don't know if the result is constructive. In other words, even if there is a voting strategy that can break one of the criteria for a reasonable voting scheme, it may not be easy to find such a scheme.

So, in the spirit of RSA, is there a way of designing a voting scheme that can be published (thus addressing issues of transparency), but is computationally intractable to game ? Any cryptographers know if this has been studied ?

Categories:

cryptography

Thursday, November 02, 2006

CS beyond algorithms: Would that it were so....

From an NYT article on a new research program in "Web science":

Ben Shneiderman, a professor at the University of Maryland, said Web science was a promising idea. “Computer science is at a turning point, and it has to go beyond algorithms and understand the social dynamics of issues like trust, responsibility, empathy and privacy in this vast networked space,”

In fact I'd be happy if this were so. Alas, I spend more time encountering computer science that hasn't even discovered algorithms yet !

Categories:

Wednesday, November 01, 2006

The Snowblowing (or leaf raking) problem

Lifehacker asks:

Reader David wants to know the best way to rake leaves. My first thought: "Get someone else to do it." But seriously, assuming you have a square or rectangular yard, what's the most efficient raking pattern? Should you make multiple smallish piles or aim for one big one (the latter being best for running jumps, of course)? Surely there's a mathematical or engineering principle that can be applied here.

As it turns out, this year's WAFR (Workshop on the Algorithmic Foundations of Robotics) has a paper on essentially this problem (with leaves replaced by snow), by Arkin, Bender, Mitchell and Polishchuk:

We introduce the snowblower problem (SBP), a new optimization prob-
lem that is closely related to milling problems and to some material-handling prob-
lems. The objective in the SBP is to compute a short tour for the snowblower to
follow to remove all the snow from a domain (driveway, sidewalk, etc.). When a
snowblower passes over each region along the tour, it displaces snow into a nearby
region. The constraint is that if the snow is piled too high, then the snowblower
cannot clear the pile.

This is not to be confused with the Snowplow problem, a puzzle in elementary differential equations:

One day it started snowing at a heavy and steady rate. A snowplow started
out at noon, going 2 miles the first hour and 1 mile the second hour. What
time did it start snowing?

p.s Actually, it's true. The slides are more interesting...

Categories:

puzzles

Monday, October 30, 2006

Happy Halloween, from the Turing Pumpkin.

A splendiferous carving, here. (via BB)

Categories:

miscellaneous

Sunday, October 29, 2006

Cryptography as adultery and betrayal.

xkcd is a a comic for the sophisticated geek in all of us...

Categories:

miscellaneous

Saturday, October 21, 2006

There, but for the grace of god, ...

This week's issue of the NYT Magazine has a long story on a case of scientific fraud from the University of Vermont, where a prominent researcher was sentenced to a year and a day in jail (as well as being banned from ever getting public funding) for falsifying results in dietary studies.

It seems unnecessary, (and too easy) to blame the researcher involved; the story is damning enough. What struck me though, reading though the description of how events transpired, was how banal, how mundane the fraud was, and how utterly common the driving forces were; the usual toxic mix of a desire for fame, the pressure to get money, how universities encourage people to bring in grants.

Steven Heymsfield, an obesity researcher at Merck Pharmaceuticals in New Jersey, [...] added that Poehlman’s success owed more to his business sense and charisma than to his aptitude as a scientist.
“In effect, he was a successful entrepreneur and not a brilliant thinker with revolutionary ideas,” Heymsfield wrote me via e-mail. “But deans love people who bring in money and recognition to universities, so there is Eric.”

Categories:

culture

Friday, October 13, 2006

Computer scientists sit in a cube and program all day...

Science+Professor+Woman=Me has an interesting experience talking to a bunch of freshmen about science (emphasis mine):

...today I gave a guest lecture to a group of freshmen who all said they were not interested in science. It turns out that they had no idea what science really involves. I listed a bunch of scientific questions and asked if these were things they wanted to know. Yes! They did. So we talked about these for a while, and then they thought of more questions, and it was a very fun. We also talked about how research is done - how you come up with the questions,how you go about answering, discovering, testing. The students said they hadn't known that these were the kinds of things that scientists did. They imagined that we just worked in our labs making chemicals or looking at data on computer monitors all day. I doubt if any of them were inspired to become scientists, but I felt pretty good about changing their perceptions of science and scientists.

I wonder what would happen if we did this for CS.

Categories:

outreach

Monday, October 09, 2006

Deadlines, manic behaviour and happiness

Much as I rant about the idiosyncratic "conference as primary publication venue" nature of computer science, there's no denying the pleasures of working down to the wire to meet a deadline. The adrenaline rush, the highs (and the post-deadline lows), and the feeling that my mind is working at 200 mph... aaah. It's like a drug habit I can't shake (which is why, inspite of vowing after each deadline never to do it again, I ... do it again. Classic addict behaviour).

It turns out that all I'm really doing is maximizing happiness. Who'da thunk it ?

When people are made to think quickly, they report feeling happier as a result. They also say they are more energetic, more creative, more powerful, and more self-assured. In short, they reported a whole set of experiences associated with being "manic."

And if your paper, written with the sweat of your fevered brow, fueled by zillions of cups of coffee, delivered by divine inspiration from the Book to your mind, gets rejected ? Just think quickly:

...the effect of thought speed was just as powerful as the effect of the content of the thoughts. In other words, the speed of people's cognitive processing was just as important as what they processed in determining their mood. Even thinking sad thoughts at a fast pace made people relatively happy.

Categories:

culture

Saturday, October 07, 2006

"We're making it less random to make it feel more random."

Maybe we should use the iPod to explain all concepts in theoretical computer science:

Steven Levy really liked Steely Dan, but so too, it seemed, did his iPod. Like a lot of people, he began to wonder about its shuffle - was the random function really random or a result of dirty tricks, blunders... or even telepathy?

Read more about it at the Guardian (HT: The Mathematics Weblog)

Categories:

randomness

Tuesday, October 03, 2006

On Models For Research

In a review of Lee Smolin's new book, 'The Trouble With Physics', Sean Carroll writes:

Faculty positions and grant money are scarce commodities, and universities and funding agencies are naturally risk-averse. Under the current system, a typical researcher might spend five years in graduate school, three to six as a postdoc, and another six or seven as an assistant professor before getting tenure – with an expectation that they will write several competent papers in every one of those years. Nobody should be surprised that, apart from a few singular geniuses, the people who survive this gauntlet are more likely to be those who show technical competence within a dominant paradigm, rather than those who will take risks and pursue their idiosyncratic visions.

It's worth pointing out here that there are many different models for being a successful researcher. And when I say successful, all I mean is that you contribute interesting results to the community and your work is appreciated. Indeed, finding out what model works for you is an important part of developing your identity as a researcher.

We develop our sense of what the 'ideal' researcher looks like from people around us: the advisor, the mentor, the researcher whose papers we pore over. Invariably, some will influence us more than others, and we'll start adopting, unconsciously almost, some of their style and taste for problems and lines of attack. All of this is good, and natural. But it's important to remember that like you form your own identity as a person by drawing on influences and modifying them, you must do the same as a researcher.

It's worth pointing out because I don't know how deeply we think about models of research, and what style of work makes us happier (problem solver ? theory builder ? voluminous production ? multiple collaborations ? sitting in a room and contemplating? ). Once you find your "comfort zone", you'll be a lot more content with your work, and in all likelihood you'll produce quality work.

Categories:

culture

Flying while brown, part II

I used to call it, 'Travelling while Asian', but let's face it: it's the brown skin that counts.

Here's the latest, from Boing Boing:

A 32-year-old man speaking Tamil and some English about a sporting rivalry was questioned at Sea-Tac Airport and missed his flight Saturday because at least one person thought he was suspicious.
[....]
The man was speaking Tamil, a language largely used in India, Sri Lanka and Singapore, on his cell phone at the departure gate and on the aircraft. An off-duty airline employee heard the conversation and informed the flight crew.

All those years I spent fending off attempts to teach me Tamil by my parents, grandparents, aunts, uncles, cousins, great-aunts and great-uncles and second cousins twice removed are now finally worth it ! I am safe !!

I wonder what will happen if I speak Hindi....

Categories:

Thursday, September 28, 2006

Four legs good, two legs bad...

Not that it will do me much good when the Komitat comes to round up all foreigners, but:

Thank you for joining the American Civil Liberties Union. Your gift will help us continue the fight for our basic civil liberties. Below is your donation information.

Ready to do more?

Invite your friends to join in standing up for freedom with the ACLU

Contact Information

Name: Suresh Venkat

Address: xxxx -xxxxx

Email Address: sureshv@gmail.com

Categories:

Wednesday, September 27, 2006

"Who are you ? How did you get in my house ?"

xkcd (funny geeky webcomic) cites Don Knuth on array indices.

Categories:

funny

A new voting scheme

Ron Rivest has just put out a voting scheme that is remarkable for its simplicity. It addresses the following questions:

How do you tell if your vote was counted
How might you get proof that you voted
How can this be concealed from (a) someone trying to coerce you (b) someone trying to find out your vote or tamper with it (c) from you !

I'm not an expert on voting schemes, so I'll leave the detailed analysis to the experts. I will note that the scheme is NOT cryptographic, and is surprisingly elegant in its basic method. One major drawback: it requires each voter to vote on 3 ballots in a specific way: in the era of hanging chads and butterfly ballots, this might be viewed as a major drawback.

Here's the idea: suppose you have to vote on one of two candidates Alice and Bob. You vote three times, once on each of three ballots. If you prefer Alice, you vote for her on two of the three ballots, and vote for Bob on the remaining ballot. As a receipt, you take back a copy of one of the three ballots you used. The three ballots are separated and cast individually as separate ballots into the counting box.

The neat idea here is that each candidate gets n + C votes, where C is the number of people who voted for them. So it's still easy to determine winners, and because the single receipt ballot could have come from any combination of votes, the true intention of the voter cannot be determined from their receipt. There is some heuristic reasoning about the possibility of tampering and where the main weaknesses lie.

An important side effect of the fact that you can't determine a vote from a receipt is that vote selling is not possible: how will you prove to the vote buyer that you voted as they asked ?

(From comp.risks via Adam Buchsbaum)

Categories:

Monday, September 25, 2006

Mathematics as blood sport

I for one am glad that we have left the challenge era of mathematics far behind us. From Wikipedia's entry on the cubic equation:

In 1530, Niccolo Tartaglia (1500-1557) received two problems in cubic equations from Zuanne da Coi and announced that he could solve them. He was soon challenged by Fiore, which led to a famous contest between the two. Each contestant had to put up a certain amount of money and to propose a number of problems for his rival to solve. Whoever solved more problems within 30 days would get all the money.

Tartaglia received questions in the form x³ + mx = n, for which he had worked out a general method. Fiore received questions in the form x³ + mx² = n, which proved to be too difficult for him to solve, and Tartaglia won the contest.

Later, Tartaglia was persuaded by Gerolamo Cardano (1501-1576) to reveal his secret for solving cubic equations. Tartaglia did so only on the condition that Cardano would never reveal it. A few years later, Cardano learned about Ferro's prior work and broke the promise by publishing Tartaglia's method in his book Ars Magna (1545) with credit given to Tartaglia. This led to another competition between Tartaglia and Cardano, for which the latter did not show up but was represented by his student Lodovico Ferrari (1522-1565). Ferrari did better than Tartaglia in the competition, and Tartaglia lost both his prestige and income.

You have to wonder: when you lay down a mathematical challenge, do you throw a quill at the feet of your rival ?

Categories:

culture

Thursday, September 21, 2006

Turingistan

Bernard Chazelle has updated what has been called 'The IPod essay'. An excerpt:

Let's try a thought experiment, shall we? You're the unreconstructed Algorithm skeptic. Fresh from splitting your playlist, Alice, naturally, is the advocate. One day, she comes to you with a twinkle in her eye and a question on her mind: “What are the benefits of the central law of mechanics?” After a quick trip to Wikipedia to reactivate your high school physics neurons and dust off the cobwebs around them, you reply that F=ma does a decent job of modeling the motion of an apple as it is about to crash on Newton's head: “What's not to like about that?” “Oh, nothing,” retorts Alice, “except that algorithms can be faithful modelers, too; they're great for conducting simulations and making predictions.” Pouncing for the kill, she adds: “By the way, to be of any use, your vaunted formulas will first need to be converted into algorithms.” Touché.

Much fun. Go read.

Categories:

outreach

Wednesday, September 20, 2006

The magic of √2

Bill Gasarch has an entertaining dialogue on √2 in the latest issue of SIGACT News (which also has an interesting variant of art-gallery problems in the Geometry column). It's a remarkable coincidence, because I was just about to post an entry about a 7th grade classroom problem that revolves around properties of √2). Unlike T, T, F, S, E, this problem actually does have some really nice math behind it.

The problem is as follows:

You live on a street where the houses are numbered 1,2, 3, etc.. You notice that the sum of house numbers prior to yours equals the sum of house numbers after yours. What is your house number, and how many houses are there on the street ? Your answer should be in the 30s

Some quick algebra on n, the number of houses, and k, your house address, reveals that the numbers satisfying this condition yield a square triangular number. Namely, n and k must satisfy the equation

n(n+1)/2 = k²

It turns out that numbers satisfying this equation can be derived from solutions to Pell's equation:

x² - 2 y² = 1

where x, y are integers. Pell's equation actually gives good rational approximations to √2, in the form x/y, where x, y are solutions. What's more, if x/y is a convergent in the continued fraction of √2, then x²y² is a square triangular number (i.e can be written as n(n+1)/2 or k²).

There's also a general form of the solution, that I first found (courtesy David Applegate) in the Hardy/Wright book on number theory. The "trick" here is to realize that the appropriate way to solve this is over the field of numbers of the form a + b√2.

It's rather satisfying that a simple extra credit problem for a 7th grade math class can yield such nuggets.

Categories:

puzzles

STOC 2007 Deadline

STOC 2007 approaches. As before, you can add the deadline to a Google calendar by clicking here.

Categories:

stoc2007

Sunday, September 17, 2006

Looming submission deadlines.

Apropos of my post on implementing geometric algorithms, it seems like a good time to mention the upcoming ALENEX deadline.

The aim of the ALENEX workshop is to provide a forum for presentation of original research in the implementation and experimental evaluation of algorithms and data structures.

If you use Google Calendar, click

to add the deadline to your calendar.

Another deadline that's coming up even sooner is for the Fall Workshop on Computational Geometry, the annual math-conference-style event for geometers. This year it's in Smith College on Nov 10-11, and is being run by Ileana Streinu. One of the special events this year is a 3D Printing demo by Joe O'Rourke. The deadline is Sep 22, and it should already be in your calendar by now, but if not, click here

.

p.s if you want to create buttons like above for your event, here's the link.

Categories:

conferences

Thursday, September 14, 2006

Implementing geometric algorithms.

Over at bit-player, Brian Hayes has a tale of woe about implementing a geometric algorithm. Not even a complicated one at that, merely an algorithm to check if two line segments intersect. His story is the usual one: general position is a fiction thought up by sadistic geometers (but of course !), and even the simplest of ideas needs many special cases to take of degenerate data.

Apart from being surprised that he didn't have problems with precision (how DO you tell if both endpoints of a line segment are identical), I sympathize with his plight. As anyone who's ever implemented a geometric algorithm will tell you, it's much harder than it seems, and to an extent, the idealized algorithms geometers design can be hard to implement. CGAL and LEDA have made life a lot easier, but robustness and exact computation are still important (if highly uncool) areas of computational geometry.

Categories:

programming

Wednesday, September 13, 2006

Matrix wizardry

While we're on the topic of books,

vec(AXB) = (B^T ⊗ A) vec(X)

This identity is your best friend if you're doing any kind of matrix calculus. To know more about it, and all you might ever want to know about Kronecker products, the vec() operator, and matrix calculus, check out The Matrix Cookbook, a 50+ page reference guide for all things matrix and calculus.

p.s the reason the identity is so handy is that it allows you to get the variable matrix X out from in between A and B.

Categories:

books

New textbooks in theoryCS...

There appears to be a outpouring of new theoryCS textbooks. I had mentioned the Mitzenmacher/Upfal book on probability and randomized algorithms some time ago. Recently, Jon Kleinberg and Éva Tardos published Algorithm Design, the book with THE MOST elaborate real-world exercises I have seen. Sanjoy Dasgupta, Christos Papadimitriou and Umesh Vazirani have another algorithms text coming out as well.

The not-so-latest addition to this list is a new book on complexity theory by Sanjeev Arora and Boaz Barak. For a long time now, Papadimitriou's book has been the definitive text in this area, but it has begun showing its age, especially with all the new developments in complexity theory. The Arora/Barak book is not yet published, but has been placed online for comments.

The magic of Papadimitriou's book was in making complexity classes spring to life as the denizens of the Zoo that they really are. Complexity theory can be a dry topic if not imbued with a sense of direction and purpose, helping us understand the WHY of how these classes came to be. If my cursory scanning of the Arora/Barak book indicates anything, it is that this new book is a worthy successor.

The two books cover similar material in the foundational sections; where I think the new book takes off is in its coverage of the more "modern" aspects of complexity theory: the profound results on pseudorandomness, hardness and cryptograpy, interactive proofs and approximations, natural proofs, communication complexity and lower bound methods, and even quantum computing.

Categories:

books

Thursday, September 07, 2006

The Indian grad student experience, and more...

I occasionally read the Chronicle of Higher Education: I guess it's de rigeur for my academic colleagues. It's often too stuffy for me, but this issue has two articles that are real gems (and had me nodding my head in agreement paragraph after paragraph).

Unless you've hung around Indian grad students, you have no idea of the kind of organizational skills that we can muster up when it comes to helping fellow Indian students acclimatize. I recall being picked up at SFO by a "senior" grad student and driven to Stanford, and supplied with all kinds of useful information when I got there, (including copies of all the variations of the driving tests the California DMV uses!). Apparently, things have become much more sophisticated: check out this description of the Indian student organization at NC State (and yes, I do own a copy of The Inscrutable Americans; the letter Gopal writes to his father at the beginning of the book is priceless).

The other article could have easily been written about my parents, and their general bewilderment at the kind of work I do (I remember the first time I went to work in shorts while my father was visiting, and how scandalized he was) . It's hilarious: do check it out.

Categories:

miscellaneous

Wednesday, September 06, 2006

Data hosting on the web

I recently lost my cellphone, and all my contacts with it. I got a new phone (off ebay, no less) and cingular sent me a SIM card so I could preserve my number. But my contacts were all gone. Now I hear about a new service called ZYB.com that will maintain a web backup of your contacts. The idea is that they collect your contact info (via SMS) and store it, and you can retrieve it (again via SMS) on any phone that allows for text messaging.

It's a cool idea: although there are doohickeys you can get to sync your palm/outlook/PC addressbook with a phone, they are usually vendor specific. This is more general, and is aligned with the whole "store your data on the web" philosophy that's all the rage nowadays (mail, calendars, online spreadsheets, ...)

Privacy of course is the issue, especially with contact info, a gold mine of real telephone numbers and often email addresses. ZYB of course makes all the right noises about privacy, but ultimately you have to trust them.

Why must that be ? It can't be that hard to store data encrypted, and decrypt on the fly only when a user provides a pass-phrase ? The problem with SMS specifically is that the user might have to type the pass phrase out in the open, but maybe there's an IPsec equivalent of SMS out there ? And even if there isn't, wouldn't server-side encryption obviate the need to trust a private entity ?

Given how ubiquitous crypto has become, I think I'd need to be convinced why obvious schemes like this can't be used before handing over data to an entity that I have to "trust".

p.s another point that came up in discussion was the lifetime of such a company. What's the guarantee they won't go belly up in a year and sell your data, or worse, lose it ?

Categories:

miscellaneous

SODA results trickling in...

The full list should be available soon. Unusually, no information about accepts/submits was provided in the author notification.

Here's our (accepted) paper.

Update: 135 papers accepted, with some caveats. There were 4 merges, and with two papers, I believe that there will be two separate papers in the (already extra-large) proceedings, but only one talk. The rationale for this escapes me. Effectively there will be 137 papers in the proceedings though. This translates to a 36% acceptance rate.

Update II: The list of accepted papers is here.

Categories:

Pages

Sunday, February 18, 2007

Thursday, February 15, 2007

Wednesday, February 14, 2007

Saturday, February 10, 2007

Friday, February 09, 2007

Wednesday, February 07, 2007

Tuesday, February 06, 2007

Monday, February 05, 2007

Saturday, February 03, 2007

Friday, February 02, 2007

Wednesday, January 31, 2007

Thursday, January 25, 2007

Wednesday, January 24, 2007

Tuesday, January 23, 2007

Thursday, January 18, 2007

Wednesday, January 17, 2007

Saturday, January 13, 2007

Tuesday, January 09, 2007

Sunday, January 07, 2007

Saturday, January 06, 2007

Friday, January 05, 2007

Wednesday, January 03, 2007

Wednesday, December 13, 2006

Sunday, December 10, 2006

Tuesday, December 05, 2006

Friday, November 24, 2006

Monday, November 20, 2006

Thursday, November 16, 2006

Tuesday, November 14, 2006

Monday, November 13, 2006

Tuesday, November 07, 2006

Monday, November 06, 2006

Friday, November 03, 2006

Thursday, November 02, 2006

Wednesday, November 01, 2006

Monday, October 30, 2006

Sunday, October 29, 2006

Saturday, October 21, 2006

Friday, October 13, 2006

Monday, October 09, 2006

Saturday, October 07, 2006

Tuesday, October 03, 2006

Thursday, September 28, 2006

Wednesday, September 27, 2006

Monday, September 25, 2006

Thursday, September 21, 2006

Wednesday, September 20, 2006

Sunday, September 17, 2006

Thursday, September 14, 2006

Wednesday, September 13, 2006

Thursday, September 07, 2006

Wednesday, September 06, 2006

Disqus for The Geomblog