Tuesday, January 27, 2015

Streaming @ SODA: Part II

This is the second of two posts by Samira Daruki on the streaming sessions at SODA 2015. For the first post, see here. 

In the third paper from the streaming graph family in SODA15: "Parameterized Streaming: Maximal Matching and Vertex Cover", Chitnis, Cormode, Hajiaghayi and Monemizadeh introduce a new approach to handling graph streams called  parameterized streaming algorithms. Also, in addition to insertion-only model, they consider the dynamic model of streaming graphs in which the input is a sequence of insertion/deletion on the edges.

This dynamic model of streaming graph processing is popular when the graph size is changing, and has recently received much attention due to breakthroughs by Ahn, Guha and McGregor (one, and two).  Over these two papers, they showed the first results for a number of graph problems over dynamic streams. This has provoked much interest into what can be computed over dynamic graph streams, although still there is not much work on solving graph optimization problems in this model. The challenge here is that when an edge is deleted, sometimes it requires a substantial work to repair the solution again, so we need to make sure that the algorithm has enough information to do so, while keeping only a bounded amount of working space. (ed: not surprisingly, some of these ideas are useful for non-streaming dynamic graph algorithms: witness the paper by Kapron, King and Mountjoy on dynamic connectivity in (randomized) worst-case polylog time from SODA a few years ago)

Returning to parametrized streaming, in this paper instead of solving exactly the optimization problem on the graph stream, the goal is to solve the “parametrized” version of the problem, where the parameter $k$ is given and we want to solve the following decision problem:
Is there a solution with size bounded by $k$? 
The motivation behind parametrizing the problem comes from real world applications in which the solution of the graph problems is small comparing to the size of the input (i.e. sublinear in the size of input). In these cases, the interesting challenge is to solve the optimization graph problems in streaming fashion using space bounded by some function of the “solution size” instead of the “input size”.

To solve the parameterized problems, one of the techniques which is used is kernelization, which uses a polynomial time preprocessing to map the input to another equivalent input of smaller size $f(k)$ (called a kernel) with a new parameter value $k’ \le g(k)$, for a computable function $g$.

In this paper, by combining kernelization techniques with randomized sketch structures, the first streaming algorithms for the parameterized versions of the Vertex Cover problem is obtained. The main idea here is to maintain a maximal matching of underlying graph in a streaming fashion. Then run the well-known kernelization algorithm for Vertex Cover on the maintained maximal matching. The data structure to maintain the maximal matching use the $k$-sample recovery sketching algorithm, which is a generalization of linear sketching for $\ell_0$-sampling, as the main tool and apply it to the neighborhood of each vertex (incident edges) in the resulted matching. So as the edges are inserted or deleted, these sketches can be updated without needing knowledge of the full neighborhood of nodes. However, there are some challenges with deletion of edges: as the edges are deleted we need to have an intelligent mechanism to ensure the matching remains maximal using only limited stored information.

Another nice result here is showing a tight lower bound of $\Omega(k^2)$ (by reducing from the INDEX problem in communication complexity) for the space complexity of any (randomized) streaming algorithms for  parameterized Vertex Cover, which holds even in the insertion-only model.

Besides the general models of insert-only and dynamic, another restricted model in dynamic framework is also discussed in which we know for sure that at time $i$, the size of the vertex cover of underlying graph induced by the edges till that point is at most $k$. With this promise, they develop a dynamic parameterized streaming algorithm whose space usage matches the proved lower bound.

It is interesting to think about other NP-hard problems in the framework of parameterized streaming and explore how kernelization can be helpful in this direction or see whether we can find other powerful hammers to overcome the challenges which arises in designing algorithms for hard problems in streaming setting.


Coda:
Along with the three papers discussed above, there was another paper on streaming presented at SODA (by Naumovitz and Saks) which provides a deterministic polylogarithmic-space streaming algorithm for approximating distance to monotonicity for a sequence of $n$ numbers, compared to the corresponding randomized result presented at SODA two years ago.

While I won't discuss this work here so as to keep these posts  just about streaming graph algorithms, I encourage the interested readers to take a look at this paper as well, as the last one in the streaming family of SODA15.

Disqus for The Geomblog