Thursday, September 23, 2004

Geometric Probability

Probability is a tricky beast, and even more tricky when it comes to analyzing geometric data. The problems arise from how to define accurate measures over geometric spaces; the Buffon needle problem is a classic example of needing to do this kind of calculation carefully. A short (but dense) monograph by Klain and Rota starts with the Buffon needle problem and delves deeper into notions of valuation (a variant of measure) and how to define them correctly, the key problem being the issue of invariance: how to define measures that are invariant under geometric transformations (read this review (PDF)).

To illustrate some examples of the "weirdness" of geometric probability, consider the following three results:
  1. If we pick n points at random from the unit square, the expected size of the convex hull of these points is log n.
  2. If we pick n points at random from a k-gon, the expected size of the convex hull is k log n
  3. If we pick n points at random from a unit disk, the expected size of the convex hull is n1/3 !
All of these results are collected in a nice manuscript by Sariel Har-Peled.

But a result that I found even more surprising is this one, by Golin, Langerman and Steiger.
An arrangement of n lines in R2 induces a vertex set (all intersection points) whose convex hull has expected complexity O(1) !
Their result uses a fact that is well known (Feller vol. 2), but is still remarkably elegant.
Choose points x1, x2, ..., xn at random on the unit interval. Compute their sorted order x(1), x(2), ... x(n), the order statistic. Then the variable x(k) is distributed (roughly) as the sum of k i.i.d exponentially distributed random variables.
Specifically this means that the minimum is distributed as an exponential distribution, and the rest are distributed as the appropriate gamma distributions.

Bill Steiger recently gave a talk at the NYU Geometry seminar. I could not attend, but the abstract of his talk indicates that the above result holds even in R3.

Update: I should clarify that the O(1) result mentioned above was first proved by Devroye and Toussaint; their distribution on lines is different, and their proof is more complex. The above paper presents a simpler proof for an alternate distribution.


  1. I did not read the paper carefully but offhand, the definition of "uniformly distributed lines" given Golin, Langerman and Steiger seems highly suspicious. Their definition is to take the dual of n uniformly distributed points in the unit square. But since the x coordinate of a point corresponds to the slope of the dual line, this means negatively sloped lines are not represented. This makes me wonder if their figure is accurate. First, it has lines with negative slope. Further, it has lines with slope greater than one!

    Are you sure this result is worth mentioning, let alone blogging about, or deemable as "surprising"? It seems like a piece of junk.

  2. Ok. I looked at the paper more carefully after my morning caffinated rage wore off. While I think the result is nice, the abuse of the term "uniformly chosen at random" raised my expectations and lead me to disappointment when I saw what the model really was. Further, my criticism of the figure was unfounded.

    It would be far cooler if the model had allowed more slopes and intercepts. Off the cuff again, say the lines in L are formed by taking the "inverse dual" of n points sampled from [-1,1]^2 That is map a point P=(x,y) to the line TP = {(u,v): v = 1/x *u + 1/y}. If the O(1) expectation holds there, that would be surprising.


Disqus for The Geomblog