The Geomblog: 08/01/2007

Thursday, August 30, 2007

Parallel algorithms: A resurgence ?

My department has a fairly strong systems and hardware group, and some of the buzz I hear about involves the research opportunities in multicore architectures (I'm talking MANY MANY cores, not 2 or 4), and how parallel computation is relevant once again. In fact, I've had one request to include basic material on parallel algorithms in my graduate algorithms class.

Juxtaposed with that is the fact that the Stein-enhanced version of Introduction to Algorithms does NOT contain the chapter on parallel algorithms that the old CLR used to have. In fact, I'll probably use the chapter outline from CLR if I cover parallel algorithms at any level. I wonder what was the thinking behind removing the chapter on parallel methods, and whether this might be reversed in future editions.

Update: (that was quick! - thanks, Cliff): Tom Cormen writes in to explain the decision to remove the chapter on parallel algorithms:

Cliff Stein pointed me to this blog. Suresh asked why we removed the chapter on parallel algorithms when we wrote our second edition.

The chapter on parallel algorithms in the first edition was exclusively on PRAM algorithms. We wrote the second edition mostly during the late 1990s, into early 2001. At the time, PRAM algorithms were yesterday's news, because the PRAM model was too far from what could be built realistically. We considered including algorithms for some other parallel computing model, but there was no consensus model at the time. We felt that the best decision was to just leave out parallel algorithms and use the pages for other new material.

I'm not surprised. It *was* true that by the time the 2nd ed arrived, PRAMs were yesterday's news: in fact, streaming and external memory methods were "hotter" at the time. It'll be interesting to see if multicore actually does spur new interest in parallel algorithms, and not just parallel architectures. In recent months I've participated in discussions about algorithm design for things like the Cell and the implied architecture of nVidia's CUDA, and it's surprising how often standard parallel algorithms methods like pointer jumping and the like are being reinvented.

What makes the new platforms more interesting is that there are features (especially streaming features) that make the mapping to "standard" PRAM models not so obvious. It may not merely be a mattter of shoehorning new systems into parallel models, but of extending and modifying the models themselves.

Tuesday, August 28, 2007

Streaming Summer School Report

(Ed: This post is by Piotr Indyk, who is always willing to be your near neighbor)

Greetings from the Kingdom of Denmark! The country of Vikings, meatballs, and football teams that just refuse to win, has hosted the Summer School on Data Stream Algorithms last week (August 20-23). The school was organized under the banner of MADALGO, a new research center dedicated to MAssive DAta ALGOrithms, set up in Aarhus University. The inauguration ceremony for the center took place on August 24, with several people giving invited lectures.

Muthu (one of the invited lecturers) has covered the inauguration ceremony, so I will skip the detailed description. Suffices to say, it was a pleasure to see that the Danish Research Foundation (or as the locals like to say, Grundforskningsfond) is eager to support an algorithmic research center with a budget of roughly $10M over 5 years, while its US counterpart spends about $7M per year for the entire Theory of Computing program. Did I mention that the population of Denmark is roughly 2% of that of US ?

Anyway, back to the summer school. We had 70+ participants altogether, including 5 lecturers. The school covered the following topics:

The dynamic Sudipto Guha gave two lectures. The first lecture was on algorithms for clustering. Massive amounts of data were clustered, including metric data, graph data, and a few careless participants sitting in the first row. In the second lecture, Sudipto covered the "random stream model", where the elements are assumed to be arriving in a random order, which circumvents the usual worst-case paranoia.

The twin duo of T.S. Jayram and Ravi Kumar covered lower bounds: communication complexity, information complexity, and generally "everything you wanted to know but were afraid to ask". It was the first time I have seen the details of the linear-space lower bound for estimating the L_infty distance, and I am happy to report that I understood everything, or at least that is what I thought at the time. Jayram and Ravi have also occasionally ventured into the land of upper bounds, covering algorithms for the longest increasing sequences and probabilistic data streams.

The scholarly Martin Strauss gave an overview of the algorithms for finding frequent elements, heavy hitters (sometimes on steroids) and their more recent versions used in compressed sensing.

I have covered the basic upper bounds for the L_p norm/frequency moments estimation, as well as the algorithms for geometric data (clustering, MST, matching), notably those based on core-sets. The latter topic was originally supposed to be covered by Sariel Har-Peled; however, the ~~dark forces~~ highly enlighted and insightful geniuses of the INS [Sariel's corrections] have jeopardized his plans. I guess the force was not strong enough with this one...

We also had an open problem session. Some of the problems were copy-pasted from the "Kanpur list", but a few new problems were posed as well. The list will be available shortly on the school website, so sharpen your pencils, prepare your napkins, pour some coffee, and ... give all of this to your students!

The lecture slides are also available on-line. If you spot any typos, let the lecturers know.

Overall, I think the school has been a success, perhaps with the notable exception of the weather: it started to rain promptly after the school has began, and it stopped when the school has ended. One has to admire the timing though.

SOCG 2009 will be held in Aarhus. See you then!

(Ed: But what about the beer report ?)

Friday, August 24, 2007

Infinite trees

If you followed Andy Drucker's series of posts on infinite trees and were craving more, be prepared for more foliage:

Via Ars Mathematica, a pointer to Keith Devlin's column on a bizarre counter-version of Konig's tree lemma: namely, if you have an uncountably large tree with countably large levels, then it is not necessarily true that an uncountable path must exist.

Sunday, August 12, 2007

For the love of fonts....

Continuing my part-time obsession with fonts (Helvetica II: Arial strikes back !), allow me to present this delightful tale about how the right font can save lives:

What I saw, Pietrucha knew, was what we all may see soon enough as we rush along America’s 46,871 miles of Interstate highways. What I saw was Clearview, the typeface that is poised to replace Highway Gothic, the standard that has been used on signs across the country for more than a half-century. Looking at a sign in Clearview after reading one in Highway Gothic is like putting on a new pair of reading glasses: there’s a sudden lightness, a noticeable crispness to the letters.

It's a fascinating tale of 10 years of research and lobbying that went into replacing the fonts used on all the Federal highway signs in the U.S. I was driving along I-15 today and almost got into an accident trying to tell whether the local highway sign fonts had changed. Apart from the inside-baseball of how to design a font, always guaranteed to send a shiver through my spine, the story draws out the tale of how the team of designers got together, designed the font, and managed, after repeated lobbying, to convince the Department of Transportation to replace the original fonts with the new ones.

There was an amusing line in the article about American-style engineering:

The letter shapes of Highway Gothic weren’t ever tested, having never really been designed in the first place. “It’s very American in that way — just smash it together and get it up there,” says Tobias Frere-Jones, a typographer in New York City who came to the attention of the design world in the mid-1990s with his Interstate typeface inspired by the bemusing, awkward charm of Highway Gothic. “It’s brash and blunt, not so concerned with detail. It has a certain unvarnished honesty.”

If you remain unconvinced about the difference, look at the accompanying slideshow.

Thursday, August 09, 2007

SGERs being replaced ?

The new NSF news feeds are great. I get to hear all kinds of interesting tidbits about the NSF, including some recent chatter on encouraging "transformative research". One direct consequence of this chatter is this (emphasis mine):

[NSF Director] Bement yesterday proposed a three-pronged strategy before the task force on transformative research. It was unanimously adopted by the task force on Tuesday and then unanimously adopted by the Board on Wednesday. Bement's plan for will:

1. Infuse support of potentially transformative research throughout NSF and all of its programs;

2. Learn how to facilitate potentially transformative research; and

3. Lead the community through opportunities for potentially tranformative research proposal submissions.

[...]

To lead the community, Bement will embark on a three-year trial, during which NSF will replace small grants for exploratory research with a two-tiered "early-concept" award mechanism. Tier I will call for limited funding grants that are internally reviewed. Tier II will entail larger grants requiring additional levels of review. Further, NSF will create a working group to recommend implementation details; a mechanism to monitor and track impact and lessons-learned; and a method to advertise the new approach to the community.

Monday, August 06, 2007

Things that make you pull your hair out in despair

I was recently at AT&T visiting for a few weeks, and I was lucky enough to catch a talk by Amit Chakrabarti on lower bounds for multi-player pointer jumping. A complexity class that figured prominently in his talk was the class ACC⁰, which consists of constant depth, unbounded fanin, poly sized circuits with AND, OR, NOT and MOD m gates, for all m.

Suppose we make our life simple by fixing m to be a single prime, like 3. It turns out that the corresponding class AC⁰[m] can be contained strictly within NC¹: this arises from results in the 80s by Razborov and Smolensky. Now suppose that instead of picking a prime integer, we choose a number like 6, which is the product of distinct primes. We do not even know whether AC⁰[6] = NP or not !

Wednesday, August 01, 2007

Saving lives with exact algorithms

It's not often you get to say this in a paper:

We aim for exact algorithms [because] ... any loss of optimality could lead to unnecessary patient deaths.

Anyone who's gone through an algorithms class will at some point hear about stable marriage algorithms, and how the method is used to match hospitals and newly minted MDs looking for residencies.

Consider now the far more serious problem of matching kidney transplant candidates to potential donors. Because transplant lists are long, and cadaver donors are few, marketplaces matching healthy donors to recipients have sprung up in the US. For complicated ethical reasons (which are not without controversy), such exchanges are not made for money, and are viewed as gifts.

So what happens if a donor kidney doesn't match the potential recipient ? Ordinarily, nothing. Suppose however that there was another donor-recipient pair with a similar mismatch, and if only the donors were swapped, both transplants could go through ? What about if a 3-way cycle of matchings could be found ? This is called a 'market clearing', and is the subject of a paper by Abraham, Blum and Sandholm to appear in EC.

I'll get into the problem statement in a second. What's far more important is that the results of this paper have already been used to find potential transplant matches where none existed ! The Alliance For Pair Donation has been using this method since last December for finding potential matches, and has already found matches that prior methods would have missed. This is an incredible achievement: working on a problem abstracted from a real life-or-death scenario, and actually taking the results back to the source and making a difference.

Technically, the problem is easily stated. Given a directed graph with weights on edges, and a parameter L, find a maximum weight collection of disjoint cycles, each of length at most L. Vertices are agents with items (in this case, transplant recipients with donors). The weight on an edge represents the utility to the source of obtaining the sink's item (a donor transfer). The L-constraint reflects reality, in that all such transplants would have to be performed simultaneously (to ensure that all donors go through), and it's not feasible to perform more than a few (typically 3) of these transplants simultaneously.

The line I quoted at the beginning of the post comes as part of an explanation as to why they want exact algorithms for the problem (it's NP-hard for L >= 3). The technical contributions include finding a way to run an integer-linear program at scale for graphs with hundreds of thousands of nodes.

(Via NSF Current)

The Geomblog

Pages