Friday, February 22, 2008

Notes from Dagstuhl I: Red-black trees

I've just returned (well almost: I'm in JFK waiting out the storm) from a Dagstuhl workshop on data structures. For those not in the know, this is one of the common retreat spots for computer scientists to get together and chat about problems in a very relaxed atmosphere (close to the true spirit of a workshop).

In that spirit, people present things that are more "uncooked", half-baked, or incomplete, and so I won't talk about too many of the presentations, except for the ones which are available online.

In this post, I'll discuss a new take on an old warhorse: the red-black tree.

A bit of background (for those unfamiliar with the story thus far):
One of the most fundamental data structures in computer science is the binary search tree. It's built over a set of items with an ordering defined on them, and has the property that all nodes in the left subtree of the root are smaller than all nodes in the right subtree (and so on recursively). BSTs are handy for building structures that allow for quick membership queries, or "less-than"-type queries.

One can build a BST with depth log n for a set of n ordered items. This means that operations on this structure will take log n time in general. However, if the items can change on the fly, then it's more difficult to maintain such a structure while making sure that updates themselves are cheap (O(log n)).

Among many solutions proposed to handle dynamic maintainence of BSTs is the red-black tree, proposed in its current form by Leo Guibas and Bob Sedgewick back in 1978. The red-black tree is the definitive dynamic BST data structure, at least in practice: it has worst-case log n bounds for all operations, and shows up in the implementations of basic structures in the STL and in many language libraries. By virtue of its being in CLRS, it also has stupefied the minds of undergraduates for many years now.

The new story:
Conceptually, the re-balancing operations that are used to maintain a red-black tree are not terribly hard. However, there are numerous cases to consider, especially when doing deletions, and this tends to make the actual code used to write these trees fairly complex, and thus potentially error prone. In the first talk at Dagstuhl, Bob Sedgewick talked about a simpler variant of red-black trees, called left-leaning red-black trees, whose main virtue is that they are simpler to implement (many fewer lines of code) while maintaining the nice properties of red-black trees.

Bob's slides have more details: although this new structure hasn't been published anywhere, it will appear in the next edition of his book. The key idea is to first understand how a red-black tree simulates a 2-3-4 tree (a more complex search tree in which nodes can have two, three or four children (and therefore, one, two or three keys). It's possible to construct a direct mapping between a node in a 2-3-4 tree and a subtree in a red-black tree.

Once this is understood, the LLRB tree comes about by restricting the subtrees thus formed to be "left-leaning". Specifically, right-leaning red edges are not allowed, and to prevent the tree from being too skewed, more than three consecutive left-leaning red edges are disallowed as well. Doing this allows us to simplify various cases in the insertion and deletion steps, while not increasing the tree depth by too much (this last statement was argued by analogy with 2-3-4 trees).

It's a simple idea, and simplifies code by a substantial amount without sacrificing asymptotics.

Sunday, February 17, 2008

The most apologetic conference rejection I've received:

From the PC Chair:
Please know that I understand how much work it is to do research
and write a paper. Rejection is never fun, happens to all of us, and there is likely always an element of misunderstanding to it.

Because CONFERENCE accepts such a small percentage of papers, however, rejection from CONFERENCE may not at all mean your paper will be rejected at other high quality conferences. To that end, I encourage you to take into account the reviewers' recommendations. They have spent many hours with your paper and their remarks (and even their misunderstandings) may help you to clarify your paper or perhaps to do some more work.
I can just imagine the PC sitting around a table, tears streaming down their eyes, as they pen this dejected missive to me.

Wednesday, February 13, 2008

Metric spaces, VC-dimension, and metric entropy.

For a problem that I've been working on, it turns out that if a related range space has bounded VC-dimension, the problem can be solved exactly (but with running time exponential in the dimension). The range space is constructed from two parameters: a metric space (X, d), and a radius e, and consists of the domain X, and all balls of radius at most e in X.

So a natural question that I've been unable to answer is:
What properties of a metric space ensure that the induced range space has bounded VC dimension ?
Most of what we do know comes from the PAC-learning community. For instance, the doubling dimension of a metric space is the smallest number d such that any ball of radius e can be covered by at most 2^d balls of radius e/2. In recent years, it has been popular to explore the extent to which
"metric space of bounded doubling dimension == bounded dimensional Euclidean space"
is true. Unfortunately, there are spaces of bounded VC-dimension that do not have bounded doubling dimension.

Another related proxy for the "dimension" of a metric space is its metric entropy: Determine the minimum number of balls of radius e needed to cover all points in a metric space. The log of this number is the metric entropy, and among other things is useful as a measure of the number of points needed to "cover" a space approximately (in a dominating set sense). It's known that the metric entropy of concept classes is closely related to their VC-dimension, (where the underlying metric is symmetric difference between the classes). I am not aware of any general result that relates the two though.

For more on the dizzying array of numbers used to describe metric space dimension, do read Ken Clarkson's magnificent survey.

[On a side note, I don't quite understand why the term "entropy" is used. It seems to me that if one wanted to use entropy, one would compute the entropy of the resulting set of balls, rather than merely the log of their number. ]

Tuesday, February 12, 2008

SoCG results out

Via compgeom-announce: (papers in order of submission time stamp)
  1. Manor Mendel and Assaf Naor. Markov convexity and local rigidity of distorted metrics
  2. Noga Alon, Robert Berke, Maike Buchin, Kevin Buchin, Peter Csorba, Saswata Shannigrahi, Bettina Speckmann and Philipp Zumstein. Polychromatic Colorings of Plane Graphs
  3. Jinhee Chun, Matias Korman, Martin Nöllenburg and Takeshi Tokuyama. Consistent Digital Rays
  4. Eitan Yaffe and Dan Halperin. Approximating the Pathway Axis and the Persistence Diagram of a Collection of Balls in 3-Space
  5. Naoki Katoh and Shin-ichi Tanigawa. Fast Enumeration Algorithms for Non-crossing Geometric Graphs
  6. Ken Been, Martin Nöllenburg, Sheung-Hung Poon and Alexander Wolff. Optimizing Active Ranges for Consistent Dynamic Map Labeling
  7. Hans Raj Tiwary and Khaled Elbassioni. On the Complexity of Checking Self-duality of Polytopes and its Relations to Vertex Enumeration and Graph Isomorphism
  8. Victor Chepoi, Feodor Dragan, Bertrand Estellon, Michel Habib and Yann Vaxes. Diameters, centers, and approximating trees of delta-hyperbolic geodesic spaces and graphs
  9. Esther Arkin, Joseph Mitchell and Valentin Polishchuk. Maximum Thick Paths in Static and Dynamic Environments
  10. Julien Demouth, Olivier Devillers, Marc Glisse and Xavier Goaoc. Helly-type theorems for approximate covering
  11. Sarit Buzaglo, Ron Holzman and Rom Pinchasi. On $k$-intersecting curves and related problems
  12. Mohammad Ali Abam, Mark de Berg and Joachim Gudmundsson. A Simple and Efficient Kinetic Spanner
  13. Frederic Chazal and Steve Oudot. Towards Persistence-Based Reconstruction in Euclidean Spaces
  14. Ken Clarkson. Tighter Bounds for Random Projections of Manifolds
  15. Krzysztof Onak and Anastasios Sidiropoulos. Circular Partitions with Applications to Visualization and Embeddings
  16. Bernard Chazelle and Wolfgang Mulzer. Markov Incremental Constructions
  17. Kenneth L. Clarkson and C. Seshadhri. Self-Improving Algorithms for Delaunay Triangulations
  18. Frederic Cazals, Aditya Parameswaran and Sylvain Pion. Robust construction of the three-dimensional flow complex
  19. Herbert Edelsbrunner, John Harer and Amit Patel. Reeb Spaces of Piecewise Linear Mappings
  20. Evangelia Pyrga and Saurabh Ray. New Existence Proofs for $\epsilon$-Nets
  21. Lars Arge, Gerth Stølting Brodal and S. Srinivasa Rao. External memory planar point location with logarithmic updates
  22. Eric Berberich, Michael Kerber and Michael Sagraloff. Exact Geometric-Topological Analysis of Algebraic Surfaces
  23. Misha Belkin, Jian Sun and Yusu Wang. Discrete Laplace Operator on Meshed Surfaces
  24. Olivier Devillers, Marc Glisse and Sylvain Lazard. Predicates for 3D visibility
  25. Luca Castelli Aleardi, Eric Fusy and Thomas Lewiner. Schnyder woods for higher genus triangulated surfaces
  26. Erin Chambers, Jeff Erickson and Pratik Worah. Testing Contractibility in Planar Rips Complexes
  27. Rado Fulek, Andreas Holmsen and János Pach. Intersecting convex sets by rays
  28. Noga Alon, Dan Halperin, Oren Nechushtan and Micha Sharir. The Complexity of the Outer Face in Arrangements of Random Segments
  29. Maarten Löffler and Jack Snoeyink. Delaunay triangulations of imprecise points in linear time after preprocessing
  30. Erin Chambers, Éric Colin de Verdière, Jeff Erickson, Sylvain Lazard, Francis Lazarus and Shripad Thite. Walking Your Dog in the Woods in Polynomial Time
  31. Jean-Daniel Boissonnat, Camille Wormser and Mariette Yvinec. Locally Uniform Anisotropic Meshing
  32. Adrian Dumitrescu, Micha Sharir and Csaba Toth. Extremal problems on triangle areas in two and three dimensions
  33. Timothy M. Chan. A (Slightly) Faster Algorithm for Klee's Measure Problem
  34. Timothy M. Chan. Dynamic Coresets
  35. Timothy M. Chan. On Levels in Arrangements of Curves, III: Further Improvements
  36. Minkyoung Cho and David Mount. Embedding and Similarity Search for Point Sets under Translation
  37. Jacob Fox and Janos Pach. Coloring K_k-free intersection graphs of geometric objects in the plane
  38. Timothy G. Abbott, Zachary Abel, David Charlton, Erik D. Demaine, Martin L. Demaine and Scott D. Kominers. Hinged Dissections Exist
  39. Vida Dujmovic, Ken-ichi Kawarabayashi, Bojan Mohar and David R. Wood. Improved upper bounds on the crossing number
  40. Pankaj Agarwal, Lars Arge, Thomas Mølhave and Bardia Sadri. I/O Efficient Algorithms for Computing Contour Lines on a Terrain
  41. Pankaj Agarwal, Bardia Sadri and Hai Yu. Untangling triangulations through local explorations
  42. Ileana Streinu and Louis Theran. Combinatorial Genericity and Minimal Rigidity

Monday, February 04, 2008

2007 ACM Turing Award, and a perspective on model checking.

The 2007 Turing Award was awarded to Edmund Clarke, E. Allen Emerson and Joseph Sifakis for their work on model checking. In our "theory A" cocoon, we tend not to pay attention to much "theory B" work, and model checking is one of the shining stars in theory B, with a rich theory and immense practical value.

I am not competent to discuss model checking at any level, and so I asked someone who is ! Ganesh Gopalakrishnan is a colleague of mine at the U. and heads our Formal Verification group. I asked him if he could write a perspective on model checking, and he kindly agreed. What follows is his piece (with URLs added by me). As usual, all credit to him, and any mistakes in transcribing are my fault alone.

A Perspective On Model Checking
Ganesh Gopalakrishnan

One of the earliest attempts at proving the correctness of a program was by Alan Turing in 1949. The activity was raised to a much more prominent level of interest and importance through the works of Dijkstra, Hoare, and others. Yet, the enterprise of "Program Verification" was met with more noticeable failures than successes in the 1970's, due to its inability to make a dent on practice. The introduction of Temporal Logic for modeling concurrent systems (for which Amir Pnueli won the 1996 Turing Award) was the first attempt at shifting the attention away from the impossibly difficult task of 'whole program verification' to a much more tractable focus on verifying the reactive aspects of concurrency.

Looking back, this enterprise did not meet with the expected degree of success, as Temporal Logic reasoning was still based on `deduction' (proof from first principles). From an engineer's perspective, what was needed was a simple way to specify the classes of "legal and illegal waveforms" a system can produce on its signals / variables, and an efficient way of checking that all those waveforms are contained in the set of "legal waveforms" as can be specified through temporal properties. In other words, it was sufficient to show that finite state machines (or finite state control skeletons of programs) served as *models* for a catalog of desired temporal properties.

This was essentially the approach of model checking. It immediately took off as a practically viable method for debugging concurrent systems; the approach guaranteed exhaustive verification for small instances of a problem. In practice it meant that all relevant corner cases were considered in the confines of afforable resource budgets. This was the gist of Allen Emerson's contribution in his PhD dissertation which he wrote under Ed Clarke's guidance. A parallel effort underway at France in Joseph Sifakis's group also arrived at the same pragmatic approach to verification. Model checking was given crucial impetus by Ken McMillan who, using the technology of Binary Decision Diagrams formulated and popularized by Randy Bryant, developed the highly efficient Symbolic Model Checking approach. The technology has since then skyrocketed in its adoption rate.

Today, model checking and its derivatives are part of every verification tool used to debug digital hardware. While "full verification" is not the goal (and is largely a meaningless term), the ability of model checkers to find *high quality bugs* is uncanny. It is those bugs that escape simulation, and may not even manifest in hardware that operates at the rate of GHz. Some of the error traces produced by model checkers are only about two dozen steps long; yet, if one imagines the number of reachable nodes in a tree with arity (branching factor) 2000 in two dozen steps, one understands why conventional approaches to testing miss bugs: they have to make the same lucky "1 in 2000" choices two dozen times in a row. A model checker can often get away by not really modeling these choices explicitly. Reduction techniques allow model checkers to detect and avoid exploring many "equivalent" behaviors. In that sense, model checking is one of the most powerful "test acceleration technologies" that mankind has invented.

Today, microprocessor companies employ several kinds of model checkers. In software, Bell Labs (mainly before its collapse), JPL, NASA, and Microsoft (to name a few) have had tremendous success using model checking for verifying device drivers, switch software, and flight control software. Speaking more generally, anything with a finite state structure (e.g., decision processes, reliability models, planning in AI) can be approached through model checking.

Aftermatter:
  • Daniel Jackson (MIT), "With its dramatic successes in automatically detecting design errors (mainly in hardware and protocols), model checking has recently rescued the reputation of formal methods." (source)
  • "Proofs of programs are too boring for the social process of mathematics to work." DeMillo et al in CACM 1979).

Disqus for The Geomblog