Tuesday, June 20, 2017

TheoryFest I: Deep Learning

(ed: I'm guessing you never thought those words would appear together in the same phrase)

Ruslan Salakhutdinov gave a tutorial on deep learning today. Now deep learning is a tricky topic for theory (more on that below), but I thought he did a nice job in his two hours of

  • explaining the basics of how a neural net works and how it's trained, without getting too far into engineering weeds, but also being able to explain important ideas like drop out, SGD, batch normalization and momentum. He skillfully avoided the alphabet soup of architectures in a way that didn't really affect one's understanding (I think). He didn't get too much into RNNs, but I think that was a conscious and fair choice. 
  • Discussing the unsupervised element of DL - autoencoders, RBMs, DBMs, and GANs. Now I have a little bit of an advantage here because we're running a summer reading group on GANs, but I liked his framing here in terms of supervised and unsupervised, as well as the different kind of generative criteria (probabilistic/not, tractable/intractable, explicit/implicit) used to classify the different approaches. 

In a first at STOC (maybe?) the audience got to see a video of an RL system learning to play Doom. It was pretty neat.

Having said that, I'm not exactly the right audience for this kind of talk, since I'm decently familiar with deep learning. What surprised me though was that when I polled people during the breaks. most of the people who attended the tutorial felt the same way. And the common refrain was "We've heard so many faculty canddiates talk about deep learning that we know the basics now"!

So I almost wonder if Russ miscalbrated the level of the audience.

There was also some minor grumbling about the lack of clear open problems. I actually don't fault him for that. I think it might have been useful to expose core ideas for which answers don't exist, and some of these came out in the Q&A.

But let me make a more general observation. Deep learning is a tricky topic for theoreticians to negotiate, for a number of reasons.

  • firstly, I don't think it's even useful to ask the most general form of "what does a neural net DO" questions. Neural nets are very very general (in particular a 2-level neural net can approximate any function). So asking general questions about them is like asking to characterize a Turing machine with no constraints. You can't say much beyond recursive and r.e. I think ther right questions are much more specific. 
  • DL right now is very much an engineering discipline, which is to say that the practice of DL is focused on trying out engineering hacks that appear to get improvements. And these improvements are significant enough that it really doesn't matter why they work. In other words, DL doesn't need theory… at least now. 
  • Even if you don't grant the previous two positions, there's another issue. Descriptions of DL systems feel a lot like experimental physics: "hey we did all of this and it worked this way. Now give us a theory". With the difference that there's no "there" there: there's no fixed Nature that we can design theoretical laws against. Only a gazillion-dimensional highly nonconvex landscape where we don't even try to find a provably high quality answer. 

So I think we're on our own if we want to (or care to) understand the computational power and expressivity of neural networks. It's very interesting, and we're seeing nice results begin to appear, but we should do it because there's interesting theory to be had here, rather than trying to hew to close to actual DL systems.

No comments:

Post a Comment

Disqus for The Geomblog