Thursday, September 23, 2004

Geometric Probability

Probability is a tricky beast, and even more tricky when it comes to analyzing geometric data. The problems arise from how to define accurate measures over geometric spaces; the Buffon needle problem is a classic example of needing to do this kind of calculation carefully. A short (but dense) monograph by Klain and Rota starts with the Buffon needle problem and delves deeper into notions of valuation (a variant of measure) and how to define them correctly, the key problem being the issue of invariance: how to define measures that are invariant under geometric transformations (read this review (PDF)).

To illustrate some examples of the "weirdness" of geometric probability, consider the following three results:
  1. If we pick n points at random from the unit square, the expected size of the convex hull of these points is log n.
  2. If we pick n points at random from a k-gon, the expected size of the convex hull is k log n
  3. If we pick n points at random from a unit disk, the expected size of the convex hull is n1/3 !
All of these results are collected in a nice manuscript by Sariel Har-Peled.

But a result that I found even more surprising is this one, by Golin, Langerman and Steiger.
An arrangement of n lines in R2 induces a vertex set (all intersection points) whose convex hull has expected complexity O(1) !
Their result uses a fact that is well known (Feller vol. 2), but is still remarkably elegant.
Choose points x1, x2, ..., xn at random on the unit interval. Compute their sorted order x(1), x(2), ... x(n), the order statistic. Then the variable x(k) is distributed (roughly) as the sum of k i.i.d exponentially distributed random variables.
Specifically this means that the minimum is distributed as an exponential distribution, and the rest are distributed as the appropriate gamma distributions.

Bill Steiger recently gave a talk at the NYU Geometry seminar. I could not attend, but the abstract of his talk indicates that the above result holds even in R3.

Update: I should clarify that the O(1) result mentioned above was first proved by Devroye and Toussaint; their distribution on lines is different, and their proof is more complex. The above paper presents a simpler proof for an alternate distribution.

Disqus for The Geomblog