Friday, February 22, 2008

Notes from Dagstuhl I: Red-black trees

I've just returned (well almost: I'm in JFK waiting out the storm) from a Dagstuhl workshop on data structures. For those not in the know, this is one of the common retreat spots for computer scientists to get together and chat about problems in a very relaxed atmosphere (close to the true spirit of a workshop).

In that spirit, people present things that are more "uncooked", half-baked, or incomplete, and so I won't talk about too many of the presentations, except for the ones which are available online.

In this post, I'll discuss a new take on an old warhorse: the red-black tree.

A bit of background (for those unfamiliar with the story thus far):
One of the most fundamental data structures in computer science is the binary search tree. It's built over a set of items with an ordering defined on them, and has the property that all nodes in the left subtree of the root are smaller than all nodes in the right subtree (and so on recursively). BSTs are handy for building structures that allow for quick membership queries, or "less-than"-type queries.

One can build a BST with depth log n for a set of n ordered items. This means that operations on this structure will take log n time in general. However, if the items can change on the fly, then it's more difficult to maintain such a structure while making sure that updates themselves are cheap (O(log n)).

Among many solutions proposed to handle dynamic maintainence of BSTs is the red-black tree, proposed in its current form by Leo Guibas and Bob Sedgewick back in 1978. The red-black tree is the definitive dynamic BST data structure, at least in practice: it has worst-case log n bounds for all operations, and shows up in the implementations of basic structures in the STL and in many language libraries. By virtue of its being in CLRS, it also has stupefied the minds of undergraduates for many years now.

The new story:
Conceptually, the re-balancing operations that are used to maintain a red-black tree are not terribly hard. However, there are numerous cases to consider, especially when doing deletions, and this tends to make the actual code used to write these trees fairly complex, and thus potentially error prone. In the first talk at Dagstuhl, Bob Sedgewick talked about a simpler variant of red-black trees, called left-leaning red-black trees, whose main virtue is that they are simpler to implement (many fewer lines of code) while maintaining the nice properties of red-black trees.

Bob's slides have more details: although this new structure hasn't been published anywhere, it will appear in the next edition of his book. The key idea is to first understand how a red-black tree simulates a 2-3-4 tree (a more complex search tree in which nodes can have two, three or four children (and therefore, one, two or three keys). It's possible to construct a direct mapping between a node in a 2-3-4 tree and a subtree in a red-black tree.

Once this is understood, the LLRB tree comes about by restricting the subtrees thus formed to be "left-leaning". Specifically, right-leaning red edges are not allowed, and to prevent the tree from being too skewed, more than three consecutive left-leaning red edges are disallowed as well. Doing this allows us to simplify various cases in the insertion and deletion steps, while not increasing the tree depth by too much (this last statement was argued by analogy with 2-3-4 trees).

It's a simple idea, and simplifies code by a substantial amount without sacrificing asymptotics.

Disqus for The Geomblog