I've been terribly remiss in blogging, because I've actually been trying to do some work (!). At any rate, do read David's take on Day 1.
The first session I attended was on embeddings (which was placed parallel to the geometry session: NOT A GOOD IDEA). The first talk was by Edo Liberty, on work with Nir Ailon on "Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes".
The Johnson-Lindenstrauss lemma, which tells us that any set of n points in an d dimensional l_2 space "approximately" lives in a space of dimension k = log n, has had a huge effect on data analysis, as a tool for reducing the dimensionality of a set of points. Although the construction is very simple (project each point using a matrix populated with random entries), it doesn't scale, because the matrix required is dense, and so is hard to maintain for large high-dimensional data sets. Also the transformation itself takes time O(dk) per point, and is therefore expensive if d and k are large.
Earlier work by Ailon and Chazelle showed that the JL-transform can be implemented in time O(min(d log d,k^3), which improves the "trivial" bound for certain values of k (as a function of d). The paper presented above improves the embedding speed for much larger ranges of k, and draws on a number of tools from Banach spaces and functional analysis, as well as using some coding theory to "derandomize" the construction. It's a good example of practice (how do we implement the JL-transform efficiently) driving new and nontrivial theory results.
No comments:
Post a Comment