Tuesday, March 03, 2009

On Clustering and Density Estimation

A brief respite from SBR/DBR/DOOR review policies....

I'm running a clustering seminar, and although I thought at first that putting some order on the space of clustering methods was too daunting a task, I'm actually quite happy with the way the course organization has worked out. Obviously I can't cover everything, but what I've done is focus less on the slew of algorithms for a particular clustering formulation, and spend more time on the formulations themselves, comparing and contrasting to get a better sense of "which clustering technique should I be using", rather than "what's the best algorithm for k-median".

Various interesting observations have emerged along the way, some of which I might write about as time goes on. Right now, I want to point you to a post by my colleague Hal Daumé. We were talking about EM and its dual role as density estimation and clustering algorithm, and he has a very interesting observation about the the dangers of conflating the density estimation problem and the cluster-finding problem.

No comments:

Post a Comment

Disqus for The Geomblog