Monday, October 31, 2011

Models for MapReduce

I've been listening to Jeff Phillips' comparison of different models for MapReduce (he's teaching a class on models for massive data). In what follows, I'll add the disclaimer IANACT (I am not a complexity theorist).

There's something that bothers me about the various models floating around that attempt to capture the MapReduce framework (specifically the MUD framework by Feldman et al, the MRC framework by Karloff, Suri and (co-blogger) Vassilvitskii, and the newer Goodrich-Sitchinava-Zhang framework).

For "seasoned" models of computation like the RAM, or PRAM, or even I/O and streaming, there's a sense of identifying some intrinsic feature of the computation that you wish to capture, building a model around it, and then identifying resources that you wish to expend little of. After the dust settled around the different TM/RAM models for effective computation, we now accept (for the most part) the RAM as a good model for sequential computation. The I/O model identifies disk access as the key operation to capture and builds a model of computation where disk access is a limited resource. Streaming identifies "amnesia" as the key limitation on a computation, and builds a model of computation centered around that.

In all these cases, it's possible to grumble that $O(n^5)$ isn't really "efficient" or that galactic algorithms shouldn't count, or that ignoring main memory computations is "cheating" in the I/O model, or even that allowing $n^{1-\epsilon}$ memory storage is useless for streaming. But those discussions are secondary to the model: and in fact history tells us that inevitably models that appear to allow for horribly inefficient computation facilitate algorithm design that is quite efficient !

Which is why I'm puzzled when I look at the different ways in which the MapReduce framework is being modelled. I see a number of choices being made:  number of processors should be $O(n^{2-\epsilon})$, or total memory usage should be $O(n^\epsilon)$ or number of 'rounds' of computation should be polylogarithmic or even constant.

At one level I understand these choices: for example, a memory bound of $n^\epsilon$ appears to lead to constant $(1/\epsilon)$ round algorithms.

But should the model be getting into decisions about what's efficient before even deciding what resources they are trying to capture ?

Ultimately, this for me gets back to a question that I asked at the W8F workshop back in May):
What is the intrinsic computational metaphor that we are trying to capture with these models ?
It's not just parallelism. It's not just distributed computation. It's not even streaming (or more generally sublinear computations). Is it some delicate tradeoff between these three ? Should we think of map-reduce models like the circuit classes that have bounds on both space and depth ? Wouldn't this be an unholy mess ?

I suspect that over time, if the MapReduce computational framework persists, then these issues will get shaken out. But there's a provisional (and exciting!) feel about the models we currently have.

Wednesday, October 12, 2011

O() notation breeds laziness

I will often write "We store the $n^2$ pairs of objects in a data structure" when what I really mean is "We store the $\binom{n}{2}$ pairs ..". Of course there's no "real" difference in asymptotic land. But I've often found that students reading a paper will puzzle over asymptotic handwaving wondering why someone wrote $O(n^2)$ when it should really be over all pairs and whether they're missing some key insight. It's even worse when people throw in gratuitous and specific constants just to make an analysis work ("why do we really need to run for 18.5 n steps???").

While I completely understand the reasons people do this (ranging from the last minute deadline insertion to a misguided attempt at clarity), I think that making these shortcuts often makes a paper harder to read rather than easier. There are places where the asymptotics are all that matter ("Run this randomized strategy $O(\log n)$ times and apply a Chernoff bound") but even there, it doesn't hurt to use $c \log n$ (or a specific constant if you need a specific polynomial bound) instead. And when there actually is a precise expression that conveys meaning (like in the binomial example above), it's far better to use that to signify what's going on.

I think the precept we teach to students in algorithm analysis (don't throw in O() notation till the end) would be well worth paying more heed to ourselves.

Caveat: I'm not advocating this in ALL instances. I'm advocating a greater sensitivity to when and where the use of O() notation is more or less appropriate.

SoCG 2012 CFP

It's out: !! Please also note the new workshop CG:APT (highlighted) that will attempt to expand the scope of the conference and bring in more applied work.I'm looking forward to hearing more about it.

28th Annual Symposium on Computational Geometry (SoCG 2012)
June 17-20, 2012 (tentative)
University of North Carolina, Chapel Hill
In cooperation with ACM SIGACT and SIGGRAPH

The Twenty-Eighth Annual Symposium on Computational Geometry will be held in University of North Carolina, Chapel Hill. We invite submissions of high-quality papers that describe original research on issues related to computational problems arising from geometric considerations. We also invite submissions of original videos and multimedia on these same themes. This year, SoCG will be collocated with a workshop, Computational Geometry: Applications, Practice, and Theory (CG:APT), whose call for submissions (due Feb 27) will be distributed separately.

The topics of the SoCG 2012 Symposium reflect the rich diversity of research interests in computational geometry. They are intended to highlight both the depth and scope of computational geometry, and to invite fruitful interactions with other disciplines. Topics of interest include, but are not limited to:
  • design, analysis, and implementation of geometric algorithms and data structures; lower bounds on the computational complexity of geometric problems;
  • mathematical, numerical, algebraic, and experimental issues arising in geometric algorithms and heuristics; discrete and combinatorial geometry; computational topology;
  • novel applications of computational geometry in computer graphics, geometric modeling, computer-aided design and manufacturing, scientific computing, geographic information systems, database systems, robotics, computational biology, machine learning, sensor networks, biomedical imaging, combinatorial optimization, statistical analysis, discrete differential geometry, theoretical computer science, graph drawing, pure mathematics, and other fields; novel problems in computational geometry arising from these fields. 

Important Dates 

November 22, 2011: Paper titles and abstracts (at most 300 words) due (23:59, Honolulu time)
December 2, 2011: Paper submissions due (23:59, Honolulu time)
February 17, 2012: Notification of acceptance/rejection of papers
February 27, 2012 : Submissions of videos and multimedia due (23:59, Honolulu Time)
March 8, 2012: Notification of acceptance/rejection of video/multimedia
March 22, 2012: Camera-ready versions due for papers and video/multimedia abstracts
April 26, 2012: Final versions of video/multimedia due
June 17-20, 2012 (tentative): Symposium in Chapel Hill, North Carolina

Call for Papers

We invite submissions of high-quality papers describing original research on geometric algorithms and data structures, their mathematical foundations and correctness, their implementation, and their applications.

Final versions of accepted papers will be published by ACM in the symposium proceedings. Proceedings will be distributed to symposium participants and will also be available from ACM for purchase and through the ACM digital library. An author of each accepted paper will be expected to attend the Symposium and give a presentation (approximately 20 minutes) of the paper. Authors of a selection of papers from the conference will be invited to submit extended versions of their papers to a special issue of one or more journals. This year we also plan to confer two Best Paper Awards, and a Best Student Presentation Award. The Student Presentation award will be based on the quality of the presentation of a paper by a student during the conference.

Paper Submission 

All submissions should be made electronically; see the EasyChair SoCG2012 web page for detailed submission instructions (to be available shortly). If electronic submission is not feasible, please contact the program committee chairs, Tamal Dey and Sue Whitesides, well in advance of the submission deadlines.

Submission Guidelines
Papers should be submitted in the form of an extended abstract, which begins with the title and abstract page that contains the title of the paper, each author's name, affiliation, and e-mail address followed by a short abstract. The main body of the extended abstract should begin with a precise statement of the problem considered, a succinct summary of the results obtained (emphasizing the significance, novelty, and potential impact of the research), and a clear comparison with related work. The remainder of the extended abstract should provide sufficient detail to allow the program committee to evaluate the validity, quality, and relevance of the contribution. Clarity of presentation is very important; the whole extended abstract should be written carefully, taking into consideration that it will be read and evaluated by both experts and non-experts, often under tight time constraints.

Submissions should be typeset in single column format, using 11-point or larger font, with at least 1 inch/2.54 cm margins and single line spacing. Excluding the title page and bibliography, the extended abstract must not exceed 10 pages. Submissions deviating from these guidelines risk rejection without consideration of their merits.

All necessary details to verify the results must be provided. If they cannot fit within the 10-page limit, a clearly marked appendix containing omitted details should be included. Appendices are not counted in the 10 page limit, so while they may serve as a reference, they will be read by the program committee members at their discretion. The paper excluding the appendix should clearly describe the results and the approach to achieve them, and give sufficient confidence for their validity. The appendix should then give all the necessary details to verify correctness.

Anticipating the usual high overall quality of submissions, the program committee intends to interpret the scope of the conference broadly, and will seriously consider all papers that are of significant interest to our research community.

Authors must submit the title and an abstract (at most 300 words) of their papers by November 22, 2011. This pre-submission will be used to help make program committee reading assignments. Extended abstracts must be received by December 2, 2011 (23:59, Honolulu time). There will be no extension of these deadlines; late submissions will not be considered. Authors will be notified of acceptance or rejection by February 17, 2012. The final proceedings papers must be formatted in accordance with ACM proceedings guidelines; LaTeX style files will be made available to authors of accepted papers.

Concurrent submission of the same (or essentially the same) abstract to SoCG and to another conference with published proceedings is not allowed. An extended abstract of a paper that is under journal review, or scheduled for publication in a journal after June 2012, may be submitted, when it is clear that the extended abstract differs substantially from the journal version. In such cases, the authors must include the journal version in an appendix that clearly identifies the status of the journal submission.

Program Committee

  • Pankaj Agarwal(Duke University)
  • Dominique Attali (Gipsa-Lab, Grenoble)
  • Gill Barequet (Technion-Israel Institute of Technology)
  • Mark de Berg (Technische Universiteit, Eindhoven)
  • Danny Chen (University of Notre Dame)
  • Tamal Dey (The Ohio State University) (co-chair)
  • Vida Dujmović (Carleton University)
  • David Eppstein (University of California, Irvine)
  • Leonidas Guibas (Stanford University)
  • Sylvain Lazard (INRIA, Nancy Grand Est)
  • Dinesh Manocha (The University of North Carolina at Chapel Hill)
  • Steve Oudot (INRIA Saclay)
  • Konrad Polthier (Freie Universität Berlin)
  • Edgar Ramos (Universidad Nacional de Colombia, Medellín)
  • Jian Sun (Tsinghua University)
  • Takeshi Tokuyama (Tohoku University)
  • Yusu Wang (The Ohio State University)
  • Max Wardetzky (University of Göttingen)
  • Sue Whitesides (University of Victoria, British Columbia) (co-chair)

Call for Video and Multimedia Presentations

Video and multimedia presentations are sought for the 21st Annual Video and Multimedia Review of Computational Geometry, to accompany the 28th Annual Symposium on Computational Geometry. This review showcases the use of visualization in computational geometry for exposition and education, for visual exploration of geometry in research, and as an interface and a debugging tool in software development. Algorithm animations, visual explanations of structural theorems, descriptions of applications of computational geometry, and demonstrations of software systems are all appropriate.

Three to five minutes is ideal for most presentations; eight minutes is the upper limit. Accepted video and multimedia presentations will have an abstract in the published conference proceedings; video/multimedia authors will have an opportunity to present their work at the Symposium during a dedicated video session. Accepted presentations will be available online in various formats in a web proceedings. See for examples of previous years' proceedings.

Submissions of video clips in QuickTime or MPEG-4 compressed formats (e.g., XviD or DivX version 6) are encouraged. We also encourage submissions of Macromedia Flash, Java applets, and limited forms of other multimedia or software. These formats must come with a script that will allow them to be distributed in both interactive and canned Quicktime or MPEG video formats. In case of doubt, please email the Video and Multimedia Program chair.

Each submission should include a one or two-page description of the material shown in the presentation, and where applicable, the techniques used in the implementation. The final two-page descriptions must be formatted according to the guidelines for proceedings. LaTeX style files will be provided to authors of accepted presentations.


Submissions should be deposited online where they are accessible through the web or via FTP. Send email to the Video/Multimedia committee chair, Christian Knauer by 23:59 Honolulu Time, Monday, February 27, 2012, with the following information: the names and institutions of the authors, the email address of the corresponding author, and instructions for downloading the submission. For ease of sharing and viewing, we encourage (but do not require) that each submission be uploaded to YouTube, and that the corresponding URL be included with the submission.

We explicitly encourage video/multimedia submissions that support papers submitted to the Symposium. Submitted papers and associated video/multimedia submissions will be treated entirely separately by the respective committees: acceptance or rejection of one will not influence acceptance or rejection of the other.

Authors will be notified of acceptance or rejection, and given reviewers' comments by March 8, 2012. For each accepted submission, the final version of the 2-page textual description will be due by March 22, 2012 (electronically) for inclusion in the proceedings. Final versions of accepted video/multimedia presentations will be due April 26, 2012 in the best format available.

Important Dates

Feb. 27, 2012: Video and Multimedia submissions due
Mar. 8, 2012: Notification for Video/MM submissions (23:59 Honolulu time)
Mar. 22, 2012: Camera-ready video/MM abstracts due
Apr. 26, 2012: Final versions of video/MM presentations due

Video and Multimedia Presentation Program Committee
  • Helmut Alt -Freie Universität Berlin
  • Sandor Fekete - TU Braunschweig
  • Panos Giannopoulos -  Universität Bayreuth
  • Xavier Goaoc - INRIA Nancy Grand Est
  • Rolf Klein - Universität Bonn
  • Christian Knauer (chair) - Universität Bayreuth
  • Joseph S. B. Mitchell - SUNY Stony Brook
  • Bettina Speckmann - TU Eindhoven
  • Fabian Stehn - Universität Bayreuth
  • Alexander Wolff - Universität Würzburg

SOCG 2012 Local Arrangements

Jack Snoeyink, Chair

Disqus for The Geomblog