Tuesday, January 03, 2012

Teaching review I: Programming

Please note: I'm in Boston this week at the Joint Math Meetings (6100 participants and counting!) for a SIAM minisymposium on computational geometry. If you happen to be in the area for the conference, stop by our session on Thursday at 8:30am.

I always assign programming questions in my graduate algorithms class. In the beginning, I used the ACM programming competition server, and also tried the Sphere Online system. This semester, I didn't do either, designing my own homeworks.

Why do I ask students to code ? There are many reasons it's a good idea to get students to program in a graduate algorithms class. There are some additional reasons in my setting: a majority of the students in my class take it as a required course for the MS or Ph.D program. They don't plan on doing research in algorithms, many of them are looking for jobs at tech companies, and many others will at best be using algorithmic thinking in their own research.

Given that, the kinds of programming assignments I tried to give this semester focused on the boundary between theory and practice (or where O() notation ends). In principle, the goal of each assignment on a specific topic was to convert the theory (dynamic programming, prune and search, hashing etc) into practice by experimenting with different design choices and seeing how they affected overall run time.

Designing assignments that couldn't be solved merely by downloading some code was particularly tricky. As Panos Ipeirotis recommended in his study of cheating in the classroom (original post deleted, here's the post-post), the key is to design questions whose answer requires some exploration after you've implemented the algorithm. I was reasonably happy with my hashing assignment, where students were asked to experiment with different hashing strategies (open addressing, chaining, cuckoo hashing), measure the hash table behavior for different parameter choices, and draw their own conclusions. There was a fair degree of consistency in the reported results, and I got the feeling that the students (and I!) learned something lasting about the choices one makes in hashing (for example, simple cuckoo hashing degrades rapidly after a 70% load ratio).

I did have a few disasters, and learned some important lessons.
  • If I expect to run student code on my test cases, I HAVE to provide a test harness into which they'd merely insert a subroutine. Otherwise, merely specifying input and output format is not enough. Students are not as familiar with a simple UNIX command line interface as I expected, and use pre-canned libraries in ways that make uniform testing almost impossible. Note that this is not language-independent, unless I'm willing to provide harnesses for the myriad different languages people use.
  • If you do want to accept code without using the 'subroutine-inside-harness' form, then you need some kind of testing environment where they can submit the code and get answers. This is essentially what ACM/TopCoder/SphereOnline do. 
  • If you're accepting code from students, use Moss or something like it. It's great for detecting suspiciously common code. When I used it and detected duplicates, in all cases the students admitted to working together.
  • It's easier to design assignments where the submitted product is not code, but some analysis of the performance (like in the hashing assignment above). Firstly, this is platform independent, so I don't care if people use Java, C, C++, C#, python, perl, ruby, scheme, racket, haskell, .....Secondly, it avoids the issue of "did you write the code yourself" - not entirely, but partly.
But I'll definitely do this again. Firstly, it's good for the students - from what I hear, having written actual code for dynamic programming makes it much easier for them to write code under pressure during their tech interviews, and I've heard that knowledge of DPs and network flows is a key separator during tech hiring. Secondly, even for theory students, it helps illustrate where asymptotic analysis is effective and where it breaks down. I think this is invaluable to help theoreticians think beyond galactic algorithms.

Of course, no discussion of programming in algorithms classes is complete with a reference to Michael Mitzenmacher's exhortation.


  1. Language can matter in a big way though I normally leave it open. One of the assignments I occasionally give is to determine the breakpoint for switching over from Strassen's to ordinary matrix multiply when doing the recursion. Though this works fine in C where the dimensions are quite small; in Java with its horrible handling of 2-d arrays it can be a disaster.

  2. For tech hiring, knowing network flows is probably overkill, but being comfortable with writing dynamic programs is definitely useful.

    (who has conducted ~180 interviews at Google)

  3. As a former student and by having a lot of discussions with my professors about assignment format, I believe that you tackled all the major issues.

    Although assignment selection needs special care from course to course, your general framework seems to me as very effective.

  4. As a replacement for Topcoder etc., you may want to take a look at the PC2 programming contest software (if you haven't already). It is very easy to set up (despite the daunting loooong guide), you can have your own tests, do automated testing (at least to ensure correct input/output formats). It can handle c++/java etc. It is stable software, and although not meant for this purpose, it can be tweaked to simply accept the code and test it (I think I had the server running for days on my linux box, accepting submissions).


  5. Hi,
    I just stumbled onto this blog while searching for information on the company that created the shape-shifting ruler that connects to make all different shapes. Somebody sent me this link and this thing looks really cool--

    Anyway, nice blog... I'll bookmark it and share it with my mathlete buddies!


Disqus for The Geomblog