tag:blogger.com,1999:blog-6555947.post442124806399322933..comments2023-06-03T06:21:51.851-06:00Comments on The Geomblog: Choosing the number of clusters I: The Elbow MethodSuresh Venkatasubramanianhttp://www.blogger.com/profile/15898357513326041822noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-6555947.post-85741777380179653922012-04-24T13:19:14.206-06:002012-04-24T13:19:14.206-06:00in elbow method how do you define your cost functi...in elbow method how do you define your cost function? could you please give more information?turksoyhttps://www.blogger.com/profile/07769616999325757599noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-29877260697576775022010-03-08T09:49:02.042-07:002010-03-08T09:49:02.042-07:00I've also read about an interesting bayesian a...I've also read about an interesting bayesian approach in Bishop's book under the variational mixture of gaussians model. A prior distribution over the mixture weights is initially specified with a predetermined number of clusters which is assumed to be large enough. The EM algorithm then works in such a way that some clusters have weights being very small and hence are "eliminated", which seems to be some sort automatic relevance determination.Unknownhttps://www.blogger.com/profile/09287645882082807308noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-74827180034332994212010-03-08T09:46:17.702-07:002010-03-08T09:46:17.702-07:00@cosma: the sieve idea is very interesting, and ye...@cosma: the sieve idea is very interesting, and yes I was a tad careless in calling BIC a nonparametric method<br /><br />@Noel I've learnt everything I know about DP from Hal's seminars :)<br /><br />@both it seems like the sieve method and the DP process are two examples of the same idea: using an infinite mixture model and letting the data decide the answer. I was trying to avoid generative-based clustering, but I think I will add another part to the series on letting the data pick the answerSuresh Venkatasubramanianhttps://www.blogger.com/profile/15898357513326041822noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-832168643083196192010-03-08T07:46:23.509-07:002010-03-08T07:46:23.509-07:00Yes, I mean things like the Indian buffet process....Yes, I mean things like the Indian buffet process. The most relevant work here is the Dirichlet process (DP) which is an extension of the Dirichlet distribution to an infinite dimension. A sample from a DP is thus the mixing distribution for a mixture model with an infinite number of components (with most of the mass concentrated on a few components). With these type of models you don't have to choose k; k is infinite but only a finite number of elements will actually be used given finite data (and generally you get sensible results rather than one component per data point). Hal has published quite a bit in this area.Noelhttps://www.blogger.com/profile/09666551093622614632noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-59784360867967361282010-03-08T07:15:37.153-07:002010-03-08T07:15:37.153-07:00Related to the elbow method, are algorithms that c...Related to the elbow method, are algorithms that compute a permutation of the data such that any prefix is a good clustering under certain measure. The greedy permutation works well for k-center, and Plaxton has a paper showing how to do this for k-means (or is it k-median) clustering...Sariel Har-Peledhttps://www.blogger.com/profile/00967819909489316277noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-75892871971383155952010-03-08T06:24:15.855-07:002010-03-08T06:24:15.855-07:00I like this post and am looking forward to the oth...I like this post and am looking forward to the others. That said, in the proud academic/Internet tradition, I will show my gratitude by quibbling.<br /><br />1. At least in statistics, we often use Gaussian (etc.) mixture models without knowing how many components go into the mixture model. Then people use *IC, or cross-validation. (I prefer cross-validation.) My friends Chris Genovese and Larry Wasserman have an interesting paper on <a href="http://projecteuclid.org/euclid.aos/1015956709" rel="nofollow">using the method of sieves for Gaussian mixtures</a>, though I don't know if anyone's extended it beyond the Gaussian case.<br /><br />2. How is BIC "nonparametric"? It's all about choosing between finitely-parametrized models. (Admittedly, real Bayesians never choose between models.)Cosma Shalizihttp://bactra.org/weblog/noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-48077181851399396362010-03-08T04:15:23.928-07:002010-03-08T04:15:23.928-07:00@Noel: am not familiar with Bayesian nonparametric...@Noel: am not familiar with Bayesian nonparametric methods other than BIC. Do you mean things like LDA, the Indian buffet process and the like ? <br /><br />I guess when I was talking about mixture model methods, I was thinking about things like mixture of Gaussian like models. The main question is if you put a price on the complexity of the model or not.Suresh Venkatasubramanianhttps://www.blogger.com/profile/15898357513326041822noreply@blogger.comtag:blogger.com,1999:blog-6555947.post-56180216325433032932010-03-08T03:58:59.545-07:002010-03-08T03:58:59.545-07:00Are Bayesian non-parametric methods outside the sc...Are Bayesian non-parametric methods outside the scope of what you want to discuss? They provide an elegant way to handle the "how many clusters" problem.<br /><br />Also, I don't really understand your point about mixture-model clustering ("it makes less sense to what to choose k") given you then talk about AIC/BIC etc. which AFAIK assume a mixture-model clustering setup.Noelhttps://www.blogger.com/profile/09666551093622614632noreply@blogger.com