I was anxious about my first talk at a conference. It did help that I was scheduled to be the second speaker in the morning; that is when you are fresh with all the caffeine running through your system. I thought my talk went well, thanks to the many practice sessions my advisor and other people in my lab coerced me into doing! I talked to a few meta-clustering researchers here and they seemed to buy/appreciate our idea of a new spatially-aware metric and consensus procedure that we proposed in our paper.
There were a couple of other interesting talks in the clustering session today. Geng Li introduced a new method called ABACUS to mine arbitrarily shaped clusters in the data. They essentially collapse points till a spine of each cluster emerges. After this since they only have to deal with a smaller set of points, their technique is scalable and can work with large datasets. And yeah, as expected the Swiss roll dataset popped out!
Claudia Plant talked about a new approach to robust correlation clustering called SONAR. Her technique works in 3 steps: collect signals by probing the data with different primitive clusters, extract response patterns, and identify true clusters with techniques like Minimum Description Length. It was a nice talk and I made a mental note to read this paper more carefully.
Next up was the best student paper talk by Pu Wang on Nonparametric Bayesian Co-clustering Ensembles. This is the most recent work in a series of clustering ensembles papers of Wang, Domeniconi and Laskey. This paper tries to address the issues of clustering bias, parameter setting and curse of dimensionality through clustering ensembles and co-clustering. This line of research looks interesting, at least with the types of issues that they are trying to solve.
Later at lunch, I met a few data mining people interested in general meta aspects of clustering like subspace clustering, generating alternate clusterings and computing consensus solutions. Thomas Seidl, Stephan Günnemann and Ines Färber from RWTH Aachen University and Emmanuel Müller, who is currently at Karlsruhe Institute of Technology are talking at a tutorial session tomorrow, on "Discovering Multiple Clustering Solutions". We had a very interesting conversation about the challenges that researchers who ask these meta questions about clustering including my favorite "How do we obtain/generate data that will admit multiple good partitions?". These folks are also organizing the second workshop on Multiclust in conjunction with ECML-PKDD at Athens. The first MultiClust workshop happened at KDD last year. So, let's get down to "Clustering the clusters in the clusterings!".