## Wednesday, July 25, 2012

### Data Streaming in Dortmund: Day II

Andrew McGregor and I are tag-blogging the workshop. Read his post on day 1 here.

Day II could roughly be summarized as:

Sitting by the stream recovering from a sparse attack of acronyms.

There were a number of talks on sparse recovery, which includes compressed sensing, and asks the question: How best can we reconstruct a signal from linear probes if we know the signal is sparse.

Eric Price led things off with a discussion of his recent breakthroughs on sparse FFTs. While the regular DFT takes $n \log n$ time, it's reasonable to ask if we can do better if we know the signal has only $k$ nonzero Fourier coefficients. He talked about a sequence of papers that do this "almost optimally", in that they improve the $n\log n$ running time as long as the sparsity parameter $k = o(n)$.

Anna Gilbert provided an interesting counterpoint to this line of work. She argued that for real analog signals the process of sensing and recording them, even if the original signal was extremely sparse, can lead to discrete signals that have $\Theta(n)$ nonzero Fourier coefficients, and this is in some way intrinsic to the sensing process. This was part of an attempt to explain why a long line of sublinear methods (including Eric's work) don't do very well on real signals. This line of attack is called 'off the grid" reconstruction because you're off the (discrete) grid of a digital signal.

In sparse recovery, there are two parameters of interest: the number of measurements you make of the signal, and the time for reconstruction. Obviously, you'd like to get both to be as small as possible, and information-theoretic arguments show that you have to spend at least $k \log(n/k)$ measurements. Martin Strauss focused on speeding up the reconstruction time while maintaining measurement-optimality, in a setting known as the "for-all" formulation, where the adversary is allowed to pick a signal after seeing the probe matrix (there's a related "foreach" model that is subtlely different, and I'm still a little confused about the two).

On a slightly different note (but not that different as it turns out), Atri Rudra talked about a fun problem: Given the "shadows" of a set of 3D points along three orthogonal planes, can you find the minimum number of points that could yield the specific shadows ? If all projections have complexity $n$, it's well known that the right bound is $n^{3/2}$. While this bound was known, it wasn't constructive, and part of Atri's work was providing an actual construction. There are all kinds of interesting connections between this problem, join queries, triangle counting in graphs, and even the scary-hairy 3SUM, but that's a topic for another time.

Other talks in the afternoon: Alex Andoni talked about finding the eigenvalues of a matrix efficiently on streams (specifically finding the "heavy" eigenvalues). I learnt about the Cauchy interlacing theorem from his talk - it's a neat result about how the eigenvalues of submatrices behave.  Ely Porat talked about the problem of preventing evil entities Hollywood from poisoning a BitTorrent stream of packets, and presented ideas involving homomorphic signatures for packets via error correcting codes.

Joshua Brody returned to the basic streaming setup. Most stream algorithms that estimate some quantity introduce two-sided error (the true estimate could be above or below the reported value). He asked whether this was necessary to stay with sublinear bounds: it turns out for some problems, limiting yourself to 1-sided error can worsen the space complexity needed to solve the problem (note that for problems like the Count-min sketch, the estimate is one-sided by design, and so there's no deterioriation)

Coming up next: Andrew's summary of day three, and a report on our heroic tree-swinging adventures.

## Tuesday, July 17, 2012

### Permutation tests, graph non-isomorphism and kernels

I was reading Larry Wasserman's post on the modern two-sample test and a few thoughts came to mind.

1. I've known about permutation tests for a long time, but it only just occurred to me that the permutation test is exactly what you do in the AM protocol for Graph non-isomorphism. The principle is that if the two distributions are identical, your test statistic should not be able to tell them apart, and the way to test this is by randomly changing labels. Replace distributions by graphs and test by Merlin, and it looks the same. Is this a trivially trite observation, or are there other examples of protocols that use standard statistical tests in disguise ? (hmm I see a cstheory question here)

2. The kernel test he mentions, if you look closely, is merely computing the kernel distance between the two distributions. And the energy test he mentions later is doing something akin to earthmover. The kernel distance strikes again !!!

## Wednesday, July 11, 2012

### Guest Post: Women in Theory Workshop II

This is the second part in Samira Daruki's report from the Women in Theory workshop at Princeton. The first part focused on the technical talks at the workshop, and this part focuses on the non-technical events.

As an aside, many of the topics discussed would be quite useful to have at a general conference as well: it would be nice to have a similar panel discussion at a STOC/FOCS/SODA.

The opening talk of the workshop was by Joan Girgus (Princeton). She presented statistics about the percentage of women at the undergrad and graduate level in different fields of science and engineering in the last five decades. She mentioned that nearly fifty years ago, those who wanted to combine a career with raising a family were seen as anomalies. Today, combining family and the career is the norm with its own complexities and difficulties.  However, even now women continue to be underrepresented in science and engineering beginning from undergraduate level till the faculty and research positions. Joan presented several possible reasons for this and also suggested approaches that could be taken by universities to improve the  participation of women in academic and research careers.

The other interesting talk on the first day was by Maria Klawe (Harvey Mudd) who argued (and actually convinced us!) that it is the best time ever to be a woman in theory, and discussed opportunities for the future.

On the second day, there was a "work-life balance" panel led by Tal Rabin. All the speakers and organizers of the workshop were gathered to answer the questions by students. This panel was one of the most useful sessions in the workshop, because  we could hear the real experiences and challenges that pioneering female researchers faced in their career and life.

The panel began by Tal asking speakers to just give one piece of  advice to the audience. Some highlighted points were:
• " Be aware of young male talkers”
• “make conscious choices”
• “Make a right decision and go for it”,
• “Go for what really interests you”
• “The path is difficult, and so you must acquire  the ability to talk about yourself and your work”,
• “do the work you like and be excited about that”,
• “Try to be more self promoting”

The floor was then opened for questions. There were different types of questions. Some of the questions were about  family life for female researchers and the right time to have children.

Some speakers believed that having children is an option, rather than a default, and there should be no pressure to have children. While it might seem that everybody by default expects you to raise a family and have children, you don’t need to listen to people and do what they want.  It was also mentioned that there is no "right time" to have children, and that it was a very personal decision. You should just decide for yourself when it's the right time for you.

Valerie King said that some of the conditions at one's  job  can affect decisions regarding one's family. She pointed out that in Canada there are child-friendly policies for women in academia. But she also mentioned that sometimes you have to sacrifice something because of your family life, and  in those cases you should find some other alternative ways to minimize  the impact, like choosing a less taxing problem to work on or...

There were different views on how family life and having children could affect the career life of young female researchers. Some believed that it wasn't a major issue - you could  take off for a few years and wait till you reach a stable point for going back to your job -  but some argued against this, pointing out that research in fields like CS develops very fast,  and coming back after being away from research for a while can be very difficult and destructive for your research career without a  careful and conscious plan. There were also discussions among speakers about how choosing a supportive partner can help young female researchers to deal with these difficulties more easily, (and that actually finding a proper partner is time-consuming and challenging by itself!).

Another question was about the challenges and pressures of graduate study. One of the highlighted issues was about challenges in working with colleagues in academia. It was mentioned that you should be careful with the people around you, and make sure to have some people around you that talk about your contribution and acknowledge you.

Panelists talked about different experiences they had with male colleagues.  some of whom would make sure to acknowledge your contributions explicitly in their presentations, and some who would use your ideas without any acknowledgement. Clearly if you want to be more successful you should avoid being surrounded by this latter group of people. It was mentioned that one of the techniques in dealing with problems that you might face with male colleagues (if you find yourself unable to solve it by yourself) is to go to your manager or boss and push him to help you in that situation.

Another challenge that was highlighted was finding a research problem to work on during graduate study and also for one's research career after that. Many of the speakers agreed  that was one of the biggest challenges in their research work). Other discussed challenges were about choosing the right advisor and changing research problems or advisors during one's PhD.

It was mentioned that usually the most common mistake new students make in doing research is that they decide on some topic, do a wide search on the current and previous work, and then come to the conclusion that  all the problems had already been solved and that there was  nothing new to do. But in fact in most research topics there are always ways to make the area broader and find new directions: this is the creative aspect of research. This is the main distinction between doing research and "re-search"

There were also some discussions about the different aspects of working as a researcher at research labs or at a university as faculty. Lisa Zhang from Lucent mentioned that research labs have good quality of life and encourage a lot of flexibility.  However, there are issues relating to job security versus tenure and there is a trade-off between these two kinds of research positions.

There was  discussion about collaboration between researchers. Valerie King mentioned that one should not be afraid to  contact  people and ask to work with them. In fact, people like it that others come and work with them on interesting research problems. She related experiences where she got stuck in solving some problem and  found someone more expert in that area to collaborate with. One such collaboration was with two other female researchers resulting in what she called the only “Three Women SODA paper”.

At the end of the panel, Shubhangi Saraf (IAS, Rutgers) talked about her experiences during graduate study. She said that it was  very important for one's research career to do multiple internships,  travel and visit different schools and research labs,  find good people to work with and build  a good network and connections. Shubhangi was one of the participants  that first attended the Women In Theory workshop as a student four years ago and is now, at the third workshop, one of the organizers. She mentioned that this workshop was one of the ways that she was able to meet new people and make connection to do internships.

At the end of the second day there was a banquet in  Palmer House at which we were able to meet other professors from Princeton University and talk with them.

To conclude this post, I think this workshop was successful in its  main goal of bringing together theory women from different  departments and fostering a sense of kinship and camaraderie among  them.  It was really a good feeling to talk about challenges you  faced or times when you got stuck during your research and realize  that other students and researchers have had the same experience! You  feel a lot more powerful, because now when you're stuck with a  problem and don’t know what to do, you know there are some other  people with a similar situations that you can just shoot an email to  and say: “Hey! I'm stuck and need to talk with you! ”.

## Tuesday, July 03, 2012

### Guest Post: Women in Theory Workshop I

My student Samira Daruki recently attended the Women In Theory workshop at Princeton. This is the first part of her (lightly edited) report from the workshop, focusing on the technical talks. For more on the workshop (from a presenter's perspective) see Glencora Borradaile's post.

This past week, I was at the third workshop on “Women in Theory (WIT)” workshop held in Princeton University with about 46 female students and 12 speakers. Most of the students came from different schools in USA:  University of Maryland, College Park, Brown, Oregon State, UMass, MIT, Stony Brook, Berkeley, Princeton, Columbia, Boston U, Washington, Stevens Institute of Technology, , UCSD, Northwestern, Harvard, UW-Madison, UCLA, CUNY, GMU, Harvey Mudd and Uta . There were also some international students from India (IIT), ETH Zurich, Germany, Canada (Toronto), City University Hong Kong, Moscow Engineering Institute and Israil (Technion, Weizmann, Hebrew).

Participants were from a wide range of research fields like Cryptography and Privacy, Graph theory and algorithms, Computational Geometry, Computational Biology, Complexity, Game theory and mechanism design and Social Networks. But the interesting thing was that you could find more students working on cryptography than other fields.

Tal Rabin, Shubhangi Saraf and Lisa Zhang were organizers of the workshop and there were many junior and senior women in the field as speakers: Gagan Aggarwal (Google), Glencora Borradaile (Oregon State), Joan Girgus(Princeton), Nicole Immorlica (Northwestern University), Valerie King (University of Victorial),Maria Klawe (Harvey Mudd), Vijaya Ramachandran (UT Austin), Jennifer Rexford (Princeton),Dana Ron (Tel Aviv University), Ronitt Rubinfeld (MIT and Tel Aviv University), Shubhangi Saraf(IAS), Rebecca Wright (Rutgers and DIMACS).

Beside the non-technical parts of the workshop, there were many interesting technical talks by different researchers on different topics.

Among them, one of the very wonderful and interesting talks was presented by Dana Ron (Tel-Aviv University, Columbia) on sublinear-time algorithms. When we refer to “efficient algorithms” we usually mean “polynomial-time algorithm” and at the best we can try to lower the degree of the polynomial as much as possible, leading to “linear time algorithms“. However when we are dealing with massive data sets, even linear time might be infeasible. In these cases we look  to design even more efficient algorithms, namely “sub-linear time algorithms”. These algorithms do not even go through the whole input data, so we just expect to output approximately-good answers. They are also probabilistic, and so are allowed to have  a failure probability (i.e are Monte Carlo).

One such type of algorithms are property testing algorithms, in which one has to decide with high success probability whether an input (like a graph), has a certain property (for example is bipartite), or is relatively far from having that property (for example in this case a relatively large fraction of its edges should be removed so that the graph become bipartite). In this talk several properties and algorithms were discussed. Other types of sublinear algorithms discussed in her talk were algorithms for estimating various graph parameters, like the number of connected components, the size of a minimum vertex cover, and so on. I think Dana wase able to give a flavor of analysis techniques for sublinear-time algorithms with great clarity,  and it was definitely one of the best talks in this workshop.

Another great talk was given by Ronitt Rubinfeld (Tel Aviv University and MIT), on estimating properties of distributions. In her talk, she surveyed a body of work regarding the complexity of testing and estimating various parameters of distributions over large domains, when given access to only few samples from the distributions. Such properties include testing if two distributions have small statistical distance, testing if the marginal distributions induced by a joint distribution are independent, testing if a distribution is monotone, and approximating the entropy of the distribution. In this kind of problems, the classical techniques such as the Chi-squared test or the use of Chernoff bounds have sample complexities that are at least linear in the size of the support of the underlying discrete probability distributions. However, algorithms whose sample complexity is “sublinear” in the size of the support were shown for all of the above problems.

Nicole Immorlica (Northwestern University) also gave an interesting talk about cooperation in social networks, presented as a game among students. In this talk, she explored the impact of networks on behavior through a study of the emergence of cooperation in the dynamic, anonymous social networks that occur in online communities. In these communities, cooperation (for example in business deal) results in mutual benefits, whereas cheating results in a high short-term gain.

Some of the other interesting talks was presented by Gagan Aggarwal (Google) on mechanism design for online advertising, Glencora Borradaile (Oregon State) on designing algorithms for planar graphs (and it was the only talk on the blackboard without any slides!), Vijaya Ramachandran (U of Texas, Austin) on Parallel and Multicore Computing Theory and Valerie King (University of Victoria) on Dynamic Graph Algorithms for maintaining connectivity. In her talk, Valerie discussed this problem that had been studied for over 30 years, and she reviewed the area and described a new direction for achieving polylogarithmic worst case performance. At the end of her talk she also mentioned Mihai Patrascu as a pathbreaking researcher who was taken too soon from us.

Unfortunately there were no talks on topics like Computational Geometry, but we had a few students working on CG related topics and they presented their research problems in the rump session. My presentation at the rump session was on building core sets for uncertain data

## Monday, July 02, 2012

### Guest Post: Report from the Snowbird Math Research Communities program

My student Amirali Abdullah attended a recent MRC event on discrete and computational geometry at Snowbird, organized by Satyen Devadoss, Vida Dujmovic, Joe O'Rourke, and Yusu Wang. This is his (lightly edited) report from the event. Note that the organizers requested that the problems discussed not be widely disseminated, so no specific technical questions will be discussed here.

Recently, there was an MRC workshop for Discrete and Combinatorial Geometry held right in my hometown, at Snowbird in Utah. Suresh invited me to share my experience of the event.

Working with trained mathematicians illustrated to me just how many tools are out there
that I wasn't familiar with beyond a "oh I heard that word somewhere" head-nod. Ergodic theory, configuration spaces of cohomologies, measure theoretic ideas, algebraic geometry techniques and variational calculus tools and more. Now, there's always a strong need for self-study and picking up techniques independently in a Ph.D but I can't help but feel that most theoretical CS students would benefit from required courses and curricula more tailored to supplementing our math backgrounds.

But more than being exposed to the mathematical depth of knowledge out there, I loved the intoxicating energy of a room filled with curious mathematicians. One group would be eagerly folding origami cubes, another playing with colored bricks and crayons on a coloring problem,
a trifecta of mathematicians lying passed out by a whiteboard decoated with scribbled figures and gazing dreamily into the distance .I finally understand the cliche of mathematicians having the enthusiasm and boundless energy of kindergarteners,playing with ideas and constructs-- no disinterested 'suits' here!

More so, it was good for me to associate real faces and people with many of the papers and fields I had read of.  One grad student speaks of how he had been burned out for several months after his last paper,  another stares blankly at his beer at 10:30 pm after a twelve hour session discussing a tricky problem, another discuss the merits of wine vs scotch, one raves about an advisor who recommend his students go out hiking on a beautiful day instead of staying in their labs (Suresh, hint, hint!!), another of handling the two-body problem in an already restricted job market. And of course, the common theme of how wonderful and at times frustrating math is.

There were many light-hearted moments too, to brighten a few faces. For example after mathematicians A and B had spent all day on a problem only to realize their approach was all wrong-
Mathematician A: "I guess doing math is all about cursing sometimes."
Mathematician B: "\$%#@! F-bomb. #@%#@ F-bomb. Censored here for family audiences".
Or another light-hearted conversation between Organizer A and a student who had invited him to speak at a conference -
"So, am I the entertainment for the high school students
then?"
Student-"Yes, we have your talk on geometry the evening after the sword-swallower performs."

Let me give a shout out to the wonderful facilities provided us by the organizers, especially the amazing food.We were fed three square meals a day, plus tea twice a day and another informal snacks and beer session after nine pm. Most of the meals were supplemented by confectionaries including red velvet cake or pastries, the meals were generally 3-4 courses (including mushroom and cheese garlic pizza, salmon fillet, chicken teriyaki, beef broccoli and more) and there were several rounds of free wine and scotch during the week. I may or may not have been slightly tipsy on a few occasions, and most of us put on at least a couple of pounds in a gluttonous week of math and fine cuisine. Several of us also went on a hike up the nearby trails, or enjoyed the swimming pools. I'm from Utah, of course, so I've been spoiled to always have the outdoors easily available.

There was a lovely talk given by the organizers on the job hunt process and pointers on finding the best fit institution. We've all heard the horror tales of how tight the academic job market is, but it's
disconcerting nonetheless to hear firsthand of several hundred applicants for a single faculty position, or of how many of the workshop participants had sent in over a 100 applications to various universities for their first job. Despite this, the independence of a research university position is still THE holy grail for those of a more mathematical bent - most of those attending  seemed uninterested in the compromises involved in a teaching intensive or industry position, and I can certainly understand that sentiment.

Finally a shout out for my favorite screening of the session- Diana Davis showed us her entry for "Dance your Ph.D thesis", which drew much approval from an audience worn out by the excessive number of dry beamer and powerpoint presentations we've seen. .