## Thursday, September 14, 2006

### Implementing geometric algorithms.

Over at bit-player, Brian Hayes has a tale of woe about implementing a geometric algorithm. Not even a complicated one at that, merely an algorithm to check if two line segments intersect. His story is the usual one: general position is a fiction thought up by sadistic geometers (but of course !), and even the simplest of ideas needs many special cases to take of degenerate data.

Apart from being surprised that he didn't have problems with precision (how DO you tell if both endpoints of a line segment are identical), I sympathize with his plight. As anyone who's ever implemented a geometric algorithm will tell you, it's much harder than it seems, and to an extent, the idealized algorithms geometers design can be hard to implement. CGAL and LEDA have made life a lot easier, but robustness and exact computation are still important (if highly uncool) areas of computational geometry.

Categories:

1. Can't agree more! As a freshman, I took a Pascal programming course and for the final project, we had to implement an nlog(n) polygon triangulation algorithm (the one in Chapter 3 of de Berg et al). I remember it was supposed to work only for reasonably bounded integer coordinates. Still I found it MUCH more tricky than what I originally thought after reading the chapter. Seems that only after doing at least one "hands-on" project can one really appreciate how life gets easier in the fictional "general position " world!

Posted by Mahdi

2. While geometric algorithms of course have their own special challenges, in general implementing algorithms is hard. Even simple algorithms can be hard once you start scaling up your inputs (so you have to worry about space), or you have to cover special cases that are swept under the rug in the analysis (which isn't just an issue for geometric algorithms!).

In my undergraduate Algorithms/Data Structures class, I have 5 "pen and paper" assignments (which sometimes include some small programming bits) and 3 programming assignments. I'm always surprised that programming assignments are uncommon in similar standard classes. I understand the common view that other classes are supposed to teach them programming, and we as theorists feel it is our duty to teach these students mathematical, theoretical thinking. But I think most students gain a much better understanding and appreciation of theory by trying to code some algorithms, and perhaps we're doing them -- and, in the long run, ourselves -- a disservice by not including some programming in the standard undergraduate theory courses.

Posted by Michael Mitzenmacher

3. On the one hand, I agree. Programming assignments help tease out the subtleties in many algorithms. But there are two things I worry about:

1. There is often a fair amount of effort involved in going from an algorithm to efficient code. It is clearly important to understand the flow of "conceptualization - algorithm design/analysis - implementation - modification -" and so on. But to what extent does this detract from the course material, which aims (in most algorithms classes) to teach the formal foundations of algorithmic thinking ? To an extent, the difference between probability and statistics comes to mind, in a local way. Statistics draws on the methods of probability theory, but the tools of the trade are geared towards applications (and mathematical statistics shifts back towards probability again). In my mind, algorithms courses are taught from the "probability" point of view, rather than the "statistics" point of view, and it's not clear to me that this is a bad thing.

2. Secondly, I wonder how many faculty who teach algorithms are even competent to guide students in programming assignments. I would not be surprised if many esteemed theoreticians have never written any code in their life !

Posted by Suresh

4. Perhaps the main reason that makes programming assignments uncommon is the amount of effort needed for grading them, considering the fact that algorithms classes are often big ones. Reading individual source codes, compiling and running the code to see if it works properly is infeasible. On the other hand, automated procedures for grading are in many cases unreliable. Judging can be automated in situations like ACM programming contests, where the only measure of correctness is producing the right answer in (say) less than a minute. For programming projects, even if you force the students to follow a prescribed I/O format and test the algorithm for correctness on a few random instances, you will never know whether what being implemented is the right algorithm, without actually reading the code. If you're expecting them to implement, say, heap sort, how do you want to differentiate that with quick sort, or even bubble sort? (This problem can be significantly ruled out if the assignment only describes an algorithmic problem for which any efficient algorithm is accepted). Things get even more difficult if space efficiency and the choice of data structures are also being considered, cheating is to be detected, etc.

Posted by Mahdi

5. Suresh,

Perhaps I am just being disagreeable, but I think your worries actually give a great deal of support for my argument.

"It is clearly important to understand the flow of "conceptualization - algorithm design/analysis - implementation - modification -" and so on.  "

I am glad we agree on this. If they do not get this is an algorithms class, where do you think they are going to get it?

"To an extent, the difference between probability and statistics comes to mind, in a local way. Statistics draws on the methods of probability theory, but the tools of the trade are geared towards applications (and mathematical statistics shifts back towards probability again). In my mind, algorithms courses are taught from the "probability" point of view, rather than the "statistics" point of view, and it's not clear to me that this is a bad thing. "

But this really supports my argument. If I was teaching an introductory probability class to bunch of future mathematicians, I would teach a very "probability" oriented class. But if I was teaching to a group that included economists, biologists, physicists, chemists, etc., who may never take another probability course, and who are less likely to need formal probability theory in the future, I would include a lot of statistics. It places the ideas of probability in contexts they can use and understand more easily, and would have more potential for actually having an impact on their future.

Harvard, I feel safe in saying, produces far more than its share of theoreticians from its undergraduate pool. But if even 5% of the people taking my introductory algorithms/data structures class become theorists, I'd find that amazing. The other 95% consist of CS majors with other interests, as well as people from applied math, biology, physics, and yes, some mathematicians. But overall I think the large majority of my class gets much more out of my class by doing some  programming along with more formal theory. This has to be good for them, and good for theory.

"Secondly, I wonder how many faculty who teach algorithms are even competent to guide students in programming assignments. I would not be surprised if many esteemed theoreticians have never written any code in their life !"

Which, of course, keeps theory "separated" from the rest of computer science. Reinforcing that image and that culture in introductory theory classes can't be best long-term for theory. And then we wonder why even the NSF doesn't seem to want to fund us, why so much of the rest of CS does not seem to understand or appreciate us, etc.

I'm not calling for an end to proofs and mathematical formalisms in introductory theory classes. Clearly, that's got to be a main point of these classes. But to ignore the last few steps of that "conceptualization - algorithm design/analysis - implementation - modification -" flow you described, which seems to be fairly standard practice, still seems like a disservice to large numbers of students and to theory itself.

Posted by MIchael Mitzenmacher

6. I guess there are two concerns here:

1. not to "dilute" with programming assignments the formal approach to algorithm design while training future algorithms folks

2. Evangelize about the power of algorithms by demonstrating end-to-end solutions brought about by the power of formal methods.

I guess I was thinking about the first, but I agree that the second is essential if we don't want to be marginalized. In fact, the same point was made to me in the context of course design: we should try to be innovative and inclusionary in the design of new courses centered around the applications of algorithms so as to emphasize relevance, rather than applying 'purity tests' that marginalize us.

Posted by Suresh