After NIPS last year, I had posted a note on the evolution of data models and how we think about the "shape" of data. In brief, we have increasingly complex ways of thinking about the (geometric) shape of data that carefully balance the need for expressivity and the ability to determine the parameters of the representation efficiently.
There's a basic fact of data analysis that you come up against once you start playing with data. Either you endow the space with a geometry, or you endow it with a graph structure (and of course, nowadays you might throw in a spherical cow or two). And there are only a few ways to mix the representations (spectral methods being one such approach).
But as far as I know, there are no standard ways to create a layered representation of graphs to model increasing complexity and expressivity in a similarly balanced manner. There are of course an infinite number of ways to parametrize graphs by quantities that capture increasing complexity (treewidth, connectivity, expansion, induced metric structure, and so on). But I don't know of any established and standard ways to model families of graphs with increasing complexity that capture data patterns of interest.
One of the things that I've been tasked with (along with Peter Richtarik and Han Liu) is to draft out a report on the activities at the big data program at Simons this semester. We're looking also at challenges in the theory of big data, and in my mind coming up with good models for graph structures that capture the intricacies of real data is one of them.
No comments:
Post a Comment