## Wednesday, August 23, 2017

### On free speech, gerrymandering and self-contradictory rule systems

In the light of the wave of racist and neo-Nazi bile being slung around in Charlottesville and beyond, Karl Popper's Paradox of Tolerance has been doing the rounds. Paraphrased, it can be phrased as

In a tolerant society, one must be tolerant of everything except intolerance.
There's an interesting self-referential element there that's reminiscient of Gödel's incompleteness theorems. To wit,

We can either have a system of rules that is consistent, but cannot account for all  phenomena that are consistent with the rules, or have a system that is inconsistent.

In other words, the principle of tolerance in society cannot incorporate its own negation and still survive. One nuance here is that Popper doesn't necessarily advocate for restricting intolerant speech as much as speech that leads to incitement, which is somewhat in line with 1st Amendment protections in the US.

This reminds me of another situation where self-contradictory rule systems cause problems: gerrymandering.

The Supreme Court is soon to hear arguments in a case of partisan gerrymandering from Wisconsin. Roughly speaking, a partisan gerrymander (as opposed to a racial gerrymander) is one in which districts are drawn to favor a certain political party (the "partisan" aspect") as opposed to favoring a certain race.

While racial gerrymanders have been repeatedly struck down in court (the latest being a case from Texas), partisan gerrymanders have a much more mixed record, which is why many are watching this new case with great interest.

One potential argument in favor of allowing partisan gerrymandering is that if a party wins, their victory should allow them the power to redraw districts -- the "elections have consequences" principle. But it seems to me that this is again an example of Popper's paradox of tolerance.

That is to say, if we allow the party that wins an election to do partisan gerrymandering, then we are allowing through the democratic process an action that will serve to directly reduce the ability of the democractic process to represent the people. And for that reason a partisan gerrymander should not be allowed.

I wonder if there are other settings where this principle might help clarify the limits of a permissive rule structure.

## Friday, June 23, 2017

### TheoryFest Day 3: Plenaries

I was the chair of the plenary session on Wednesday, so was too focused on keeping track of time and such to pay full attention to the talks. Having said that, all the speakers we've had so far have done a bang-up job of keeping within their time window without much prompting at all.

So I can only give my very brief thoughts on the talks. For more information, go here.

Atri Rudra was up first with a neat way to generalize joins, inference in probabilistic models and even matrix multiplication all within a generic semi-ring framework, which allowed the authors to provide faster algorithms for join estimation and inference. In fact, these are being used right now to get SOTA join implementations that beat what Oracle et al have to offer. Neat!

Vasilis Syrgkakis asked a very natural question: when players are playing a game and learning, what happens if we treat all players as learning agents, rather than analyzing each player's behavior with respect to an adversary? It turns out that you can show better bounds on convergence to equilibrium as well as approximations to optimal welfare (i.e the price of anarchy). There's more work to do here with more general learning frameworks (beyond bandits, for example).

Chris Umans talked about how the resolution of the cap set conjecture implies bad news for all current attempts to prove that $\omega = 2$ for matrix multiplication. He also provided the "book proof" for the cap set conjecture that came out of the recent flurry of work by Croot, Lev, Pach, Ellenberg, Gijswijt and Tao (and cited Tao's blog post as well as the papers, which I thought was neat).

I hope the slides will be up soon. If not for anything else, for Atri's explanation of graphical models in terms of "why is my child crying", which so many of us can relate to.

### Minority Report and Predictive Policing

Minority report (the movie) is 15 years old. Who knew!

Well I certainly didn't, till I was called by a reporter from CNN who wanted to talk about the legacy of the movie. Here's the link to the story.

It was a depressing conversation. We went over some of the main themes from the movie, and I realized to my horrow how many of them are now part of our reality.

Precogs are now PredPol. Algorithms that claim to know where crime will happen. The companies building predictive policing software will often take umbrage at references to Minority Report because they say they're not targeting people. But all I say is "….yet".

Predictions have errors. The very title of the movie telegraphs the idea of errors in the prediction system. And much of the movie is about a coverup of such a 'minority report'. And yet today we treat our algorithms (precogs) as infallible, and their predictions as truth.

VERY personalized advertising. The main character is targeted by personalized advertising and a good section of the plot involves him trying to get a replacement eyeball so retina scans don't detect him. And then we have this.

Feedback loops. The debate between Agatha (the minority precog) and Anderton about free will leads him to a decision to change his future, which then changes the prediction system. In other words, feedback loops! But feedback loops work both ways. Firstly, predictions are not set in stone: they can be changed by our actions. Secondly, if we don't realize that predictions can be affected by feedback from earlier decisions, our decision-making apparatus can spiral out of control, provably so (consider this a teaser: I'll have more to say in a few days).

What's a little sad for me is because I wasn't sufficiently 'woke' when I first saw the movie, I thought that the coolest part of it was the ingenious visual interfaces on display. We're actually not too far from such systems with VR and AR. But that now seems like such a minor and insignificant part of the future the movie describes.

### TheoryFest Day 3: Streaming symmetric norms.

There's a weird phenomenon in the world of streaming norm estimation: For $\ell_0, \ell_1, \ell_2$ norm estimation, there are polylog (or less)-space streaming approximation algorithms. But once you get to $\ell_p, p \ge 3$, the required space suddenly jumps to polynomial in $n$. What's worse is that if you change norms you need a new algorithm and have to prove all your results all over again.

This paper gives a universal algorithm for estimating a class of norms called "symmetric' (which basically means that the norm is invariant under coordinate permutation and sign flips - you can think of this as being invariant under a  very special class of rotations and reflections if you like). This class includes the $\ell_p$ norms as a special case, so this result generalizes (upto polylog factors) all the prior work.

The result works in a very neat way. The key idea is to define a notion of concentration relative to a Euclidean ball. Specifically,  Fix the unit $\ell_2$ ball in $n$ dimensions, and look at the maximum value of your norm $\ell$ over this call: call it $b_\ell$. Now look at the median value of your norm (with respect to a uniform distribution over the sphere): call it $m(\ell)$. Define the modulus of concentration as
$$mc(\ell) = \frac{b_\ell}{m_\ell}$$
Notice that this is 1 for $\ell_2$. For $\ell_1$, the maximum value is larger: it's $\sqrt{d}$. The median value as it turns out is also $\Theta(\sqrt{d})$, and so the modulus of concentration is constant. Interestingly, for $p > 2$, the modulus of concentration for $\ell_p$ is $d^{1/2(1 - 2/p)}$ which looks an awful lot like the bound of $d^{1-2/p}$ for sketching $\ell_p$.

As it turns out, this is precisely the point. The authors show that the streaming complexity of any norm $\ell$ can be expressed in terms of the square of $mc(\ell)$. There are some technical details - this is not exactly the theorem statement - but you can read the paper for more information.

Update: Symmetric norms show up in a later paper as well. Andoni, Nikolov, Razenshteyn and Waingarten show how to do approximate near neighbors with $\log \log n$ approximation in spaces endowed with a symmetric norm. It doesn't appear that they use the same ideas from this paper though.

## Thursday, June 22, 2017

### TheoryFest Day 3: "I forgot to talk about Kant"

The above is an actual quote from Oded Goldreich in his hilarious speech accepting the Knuth Prize for 2017. This speech was highly anticipated, because most of us have spent years reading and marvelling at Oded's opinions (he addressed the elephant in the room very promptly)

As the title suggests, there was a heavy dose of philosophy, with quotes from Kant, Weber, Frankl, and MacIntyre. He also gave a brief technical exposition of new developments in interactive proofs starting from the "IP for Muggles" paper of Goldwasser, Kalai and Rothblum. This topic is quite dear to me right now, given that my student Samira Daruki just defended a thesis that is at least partly about interactive proofs where the verifier is extremely weak (aka streaming). I'll have more to say about this later.

One can only hope the video of Oded's talk will be available online soon: it's can't miss-TV :).

(I'm keeping these posts short in the hope that I'll actually write them. The danger in longer posts is that they never get written).

### TheoryFest Day 3: Panels and Lunches

(for various reasons, I don't have wifi access at my Airbnb, so my posts are delayed. But it's not like you're hanging on my every word...... are you?)

Day 3 started off with a panel moderated by Anna Karlin, starring Cynthia Dwork, Russell Impagliazzo, Ankur Moitra, Tim Roughgarden, Dan Spielman and Andy Yao. I personally like panels to have a bit of an edge and controversy, and there wasn't that much of it here, but there was some good discussion about various aspects of TCS, as well as some good advice for the young'uns in the audience. Some of the highlights:

• Andy Yao thinks a working quantum computing device is imminent, but one that we could solve Rubik's cube puzzles on might take a while longer.
• Cynthia Dwork made a strong argument for the power of theoryCS to define problems clearly and rigorously even for messy concepts like privacy and fairness.
• Ankur Moitra made a point that I think should be made more often: that modeling and algorithm design are fundamentally different things even if they might be influenced by one another. And it's important not to confuse the two.
• Dan Spielman tried to emphasize that even if it might seem like senior people at a conference know what they're talking about, they don't really know. And that's it's ok to be in a constant state of befuddlement. As Anna Karlin put it I think, "We must learn to dance with chaos".
There was lots more that the panel discussed, but these are what stuck in my mind. One pity was that we ran out of time and couldn't have any audience questions.

As I had mentioned earlier, there were networking lunches for senior (aaaah, I'm a senior now :() and junior people. Never one to let of a chance to pontificate, I had lunch with Sima Hajiaghaei (UVic), Lalla Mouatadid (Toronto), Benjamin Priest (Dartmouth) (ed: you need a website!) and Shravas Rao (NYU) at a nice dumpling place in Chinatown. They won't like me saying this, but I will anyway :) - they all seemed so young and enthusiastic I felt a little old. But we had a great chat, and if you bump into any of them at a theory conference in a future make sure to say hi :).

## Wednesday, June 21, 2017

### TheoryFest Day 2: Directed Spectral methods

There was an interesting talk on developing spectral tools for directed graphs. The magic of spectral methods for undirected graphs comes from the interpretation of the graph as a Markov chain, and then using spectral properties of the Laplacian (which is symmetric and positive semidefinite) to reason about a host of properties like conductance, mixing times, sparsification, and others.

But for directed graphs none of this works as stated. The Laplacian is no longer symmetric (goodbye real eigenvalues) and even if we symmetrize it it may not be positive definite (goodbye nonnegative eignenvalues). One of the key tricks that this paper exploits (and relates to work by Fan Chung on Cheeger inequalities for directed graphs) is the following:
Suppose our directed graph is Eulerian, which means that each vertex has the same incoming and outgoing degree. Then if we construct the associated Laplacian $L$ and symmetrize it as $L' = (L + L^T)/2$, then the resulting symmetric matrix $L'$ is positive definite!
This turns out to be a key ingredient of designing efficient algorithms for estimating spectral quantities on directed graphs, and is a trick worth remembering.

### TheoryFest: Avi Wigderson's plenary talk.

The plenary talk on Tuesday morning was by Avi Wigderson, on the nature of TCS.

Now if you haven't attended an Avi Wigderson plenary before, you should imagine yourself lying in a nice warm tub of water with soothing bath salts massaging your aching limbs. Avi's talks are balm for the poor persecuted theoretician who comes to STOC to remember that there people in the world who don't insist on demanding practical value for every darn theorem you prove.

His talk was not technical, and not even particularly informative for anyone who's been paying attention to TCS the last 40 years. What it was, was a love song to TCS, sung by a bard with +5 charisma who makes you feel like you have purpose and meaning in your life once again.

His slides will be up soon, and you can see them for yourself. But if there's anyone who can make the argument for the universality, depth and reach of TCS at its highest conceptual level, it's Avi. I'll link them when they're available and if you care at all about the field, you should take a look.

### TheoryFest: Attack of the talks

Day 2 at TheoryFest, and people are still attending talks. Ok maybe I shouldn't be that surprised. But it's kind of nice to see anyway. The lounge area is deserted during the sessions and full during the breaks.

The TheoryFest organizing committee (as represented by Ryan Williams) has organized lunch time meetups for senior and junior people (where junior means grad students and postdocs, not assistant profs — sorry, assistant profs). The idea is that the senior people sign up and the junior people can email them to meet up. It's a nice idea, especially given the feeling that I've heard from people in the past that STOC is an unfriendly place for students, especially those not from high-powered theory places.

One interesting aspect of coming to STOC after a number of years is while I know many the older folk, I don't know most of the students, and so I float around relatively anonymous (it doesn't help that I haven't blogged much in recent months - I've been distracted, people!). It's almost like being at a brand new conference that I've never attended before.

## Tuesday, June 20, 2017

### TheoryFest: Short Takes

So far, the tutorials appear to have been well attended, The DL tutorial had a full house in a big room, but the other two tutorials did pretty well to. The plenary talks (the reason I'm here) start today and it will be interesting to see what kind of attendance we see.

The business meeting will reveal the official numbers: indications are that they will be quite good especially considering we're not in the US.

Montreal is a nice town. My AirBnB is right next to Chinatown and we're pretty much in the center of everything. The parts I've walked through have this kind of pacing with cul de sacs, pedestrian-only areas, main roads, and parks that feels like well-paced musical composition.

### TheoryFest I: Deep Learning

(ed: I'm guessing you never thought those words would appear together in the same phrase)

Ruslan Salakhutdinov gave a tutorial on deep learning today. Now deep learning is a tricky topic for theory (more on that below), but I thought he did a nice job in his two hours of

• explaining the basics of how a neural net works and how it's trained, without getting too far into engineering weeds, but also being able to explain important ideas like drop out, SGD, batch normalization and momentum. He skillfully avoided the alphabet soup of architectures in a way that didn't really affect one's understanding (I think). He didn't get too much into RNNs, but I think that was a conscious and fair choice.
• Discussing the unsupervised element of DL - autoencoders, RBMs, DBMs, and GANs. Now I have a little bit of an advantage here because we're running a summer reading group on GANs, but I liked his framing here in terms of supervised and unsupervised, as well as the different kind of generative criteria (probabilistic/not, tractable/intractable, explicit/implicit) used to classify the different approaches.

In a first at STOC (maybe?) the audience got to see a video of an RL system learning to play Doom. It was pretty neat.

Having said that, I'm not exactly the right audience for this kind of talk, since I'm decently familiar with deep learning. What surprised me though was that when I polled people during the breaks. most of the people who attended the tutorial felt the same way. And the common refrain was "We've heard so many faculty canddiates talk about deep learning that we know the basics now"!

So I almost wonder if Russ miscalbrated the level of the audience.

There was also some minor grumbling about the lack of clear open problems. I actually don't fault him for that. I think it might have been useful to expose core ideas for which answers don't exist, and some of these came out in the Q&A.

But let me make a more general observation. Deep learning is a tricky topic for theoreticians to negotiate, for a number of reasons.

• firstly, I don't think it's even useful to ask the most general form of "what does a neural net DO" questions. Neural nets are very very general (in particular a 2-level neural net can approximate any function). So asking general questions about them is like asking to characterize a Turing machine with no constraints. You can't say much beyond recursive and r.e. I think ther right questions are much more specific.
• DL right now is very much an engineering discipline, which is to say that the practice of DL is focused on trying out engineering hacks that appear to get improvements. And these improvements are significant enough that it really doesn't matter why they work. In other words, DL doesn't need theory… at least now.
• Even if you don't grant the previous two positions, there's another issue. Descriptions of DL systems feel a lot like experimental physics: "hey we did all of this and it worked this way. Now give us a theory". With the difference that there's no "there" there: there's no fixed Nature that we can design theoretical laws against. Only a gazillion-dimensional highly nonconvex landscape where we don't even try to find a provably high quality answer.

So I think we're on our own if we want to (or care to) understand the computational power and expressivity of neural networks. It's very interesting, and we're seeing nice results begin to appear, but we should do it because there's interesting theory to be had here, rather than trying to hew to close to actual DL systems.

## Sunday, June 18, 2017

### Off to TheoryFest 2017

I'm at the airport getting ready to board my flight for Montreal to attend TheoryFest 2017. And much to my amazement, I discover that STOC has its own event mobile app. Who knows, maybe this means that by next decade theory conferences will do double blind review? (ed: stop it, now that's crazy talk!)

Snark aside, I'm very excited to see how the new format for STOC works. This is an experiment, and like all experiments it will take some time to see how the idea plays out. But first impressions matter, and if this year's iteration goes well, that will be good news for the overall plan.

I'm of course excited for the plenary invited talks, but I'm also excited to see the tutorials and workshops. Even theory land is not safe from the invasion of the Huns deep learning and I'm sure there will be standing-room only space at Ruslan Salakhutdinov's tutorial on the foundations of deep learning.

My "coverage" will be a mix of tweeting and blogging. Stay tuned!

## Monday, May 01, 2017

### Easy memory aides

Certain memory aides are so ... well.. memorable that they stick in your mind exactly the way they should. Here are three that I've heard of:

• Keeping track of the Baltic states: I think I heard this from +Fernando Pereira - They are alphabetical in order from north to south. So it's Estonia, Latvia and then Lithuania.
• Converting between miles and kilometers: This is a particularly geek-friendly mnemonic - If the value in miles is (close to) a Fibonacci number, the value in kilometers is (close to) the next Fibonacci number. This works because the golden ratio is very close to the miles-to-km conversion factor.
• Keeping track of type-I/type-II errors: I heard this one on twitter the other day, and it goes like this. In the story of the boy who cried wolf, the errors appear in order.
Things I need memory aides for:
• Bernstein vs McDiarmid vs Azuma, or different Chernoff bounds in general.
• Jensen's inequality, so I don't have to constantly draw the picture of the convex function to derive it.
• Submodularity vs supermodularity: I know you add the diagonals and compare them, but which way is which???
Others?