Monday, February 08, 2010

Good prototyping software

All the code for my recent paper was written in MATLAB. it was convenient, especially since a lot of the prior work was in MATLAB too. I actually know almost no MATLAB, preferring to do my rapid protoptyping in C++ (yes, I'm crazy, I know).

Which brings me to the question that I know my lurking software hackers might have an answer to. If I have to invest in learning a new language for doing the empirical side of my work, what should it be ? Here are some desiderata:
  • Extensive package base: I don't want to reinvent wheels if I can avoid it. In this respect, the C++ STL is great, as is Boost, and Python has PADS, as well as many nifty packages. MATLAB is of course excellent.
  • Portability: I'm sure there's an exotic language out there that does EXACTLY what I need in 0.33 lines of code. But if I write code, I want to be able to put it out there for people to use, and I'd like to use it across multiple platforms. So Lua, not so great (at least in the research community) (sorry Otfried)
  • Good I/O modules: if I write code, I'll often want to send output to a graph, or plot some pictures etc. Some systems (MATLAB) are excellent for graphing data.
  • Performance: I don't want to sacrifice performance too much for ease of coding. I've always been afraid of things like Java for this reason. Of course, I'm told I'm dead wrong about this.
I deliberately haven't listed 'learning curve' as an option. If the language merits it, I'm willing to invest more time in switching, but obviously the benefits have to pay for the time spent. In terms of background, I'm most familiar with C++, and am a nodding acquaintance of python, and perl. I occasionally nod at MATLAB in the street when I see it, but usually cross the road to the other side if I see Java approaching. I used to be BFF with OpenGL, but then we broke up over finances (specifically the cost of GPUs).

Thoughts ?

20 comments:

  1. I suggest looking into Sage http://www.sagemath.org

    It fit's your description because,

    1. it is based on Python, where the code is quite intuitive, widely used, enforces good code style by design and strong supported.

    2. I/O: Besides Python's basics, there are many modules for various tasks. From reading different file formats, writing data to network ports, websites, images, binary format, etc.

    3. For numerics and I/O don't forget about Numpy/Scipy (which is part of Sage) ... That's usually the first thing you might wanna check out when coming from MATLAB.

    4. Speed. Python is an interpreted highly dynamic language. It's not possible to make it high speed easily because of this. But that's not the whole story. Sage uses a two layer approach, where (if possible) all time critical code is either written in C and interfaced to Python, or written in Cython. Cython is a Python like language, but a compiler generates C code that is an Pyton module and that C code gets compiled as usual. Using that approach, you get the full speed of C for all operations that are time critical while you still code in a familiar language. http://www.sagemath.org/tour-benchmarks.html

    5. Documentation: Check out the various chapters here: http://www.sagemath.org/doc/

    ReplyDelete
  2. I agree: Python has an amazing collection of mathematical libraries and, since it is interpreted, it is very easy to experiment with changes and with different inputs. It works on any platform and it is very easy to learn, to the point that even I have learned it.

    Being a general-purpose language rather than a system specialized for math, means that there are libraries to do the most unlikely thing. I believe there are libraries to decompress music in mp3 format, so when you experiment with your clustering algorithms, you could try clustering songs :)

    ReplyDelete
  3. I like the idea of clustering music ;)

    ReplyDelete
  4. I use Mathematica and Fortran. For an open source and portable version, scilab would be my first try. Despite of Numpy/Scipy inclusion, python is not on the top of my list because speed matters A LOT for my job. I have some experience with matlab, but since Mathematica came first in my computing life, I tend to live with it.

    ReplyDelete
  5. Python, R, Gnuplot.
    I'm itching write a short monograph along the lines of "Unleash your inner Freakonomist with Python and Gnuplot" :-)

    However Python does struggle with its memory mgmt - e.g., a 1 million node graph with avg degree 50 is a piece of cake to work with in C++ (assuming you have 3GB of memory) - but could be a nightmare to play with in Python.

    ReplyDelete
  6. siva: that's my worry too. that I'd need to do something at scale and would have to reimplement everything in C++

    ReplyDelete
  7. suresh: doesn't your most recent comment contradict the title of the post?

    i guess it depends what you're doing, but as much as i'd rather not, i vote for matlab (if it's mostly numerical -- data structures are a PITA) or python. if you're worried about python speed, you can import psycho.

    ReplyDelete
  8. Why do you care about speed (of execution) or portability for a prototype? Isn't the purpose of a prototype to explore the problem domain (quickly), so that you can figure it out and move on? You write a prototype not really to solve a problem, but rather to understand it.

    Perhaps if you want to solve only one problem, you could do it faster in C++ than Python or some other more problem specific language, due of a lack of working knowledge of the appriopriate tools. However I doubt that would be true of say, 5 problems. Surely you are smart enough and working on hard enough problems that becoming familiar with nice languages like Python dwarfs the effort of actually solving the problems you are trying to solve. Compare yourself vs a hypothetical forked copy of yourself that bite the bullet and developed an effective workflow with tools more appropriate than C++, and the fork will surely be better off in the long run.

    Sure, there are circumstances in which C++ is the right tool from the start. If your problems are so big that they can be easily dealt with in C++, but not in an inefficient language and you can't learn from a toy subset of the problem. Or maybe you just flat aren't writing a prototype (be honest, if you plan on keeping it around, it's not a prototype).

    Rob (who makes his living writing mostly C++ code, but occasionally has the joys of getting paid to write Python)

    ReplyDelete
  9. As someone who has worked with both Java and C++ for several years, you're not dead wrong about Java. For the things you want, Java is an inferior performer. That said, it's not as bad as you're making it out. But is it worth you learning? Probably not, especially since its place in the world is changing.

    As for other languages, I'm not sure why python would offer such an edge for the stuff you're looking to do. You're already comfortable with C++ to the point where you can prototype stuff easily. What does python really give you? There are native libs that do the same things as python libs (in fact many of the latter are really just wrappers around the former). But I speak as a person who has not bothered to learn python, so really this is just my outlook on it.

    My personal opinion is that if you're going to teach yourself a new language you may as well learn one that gives you a different way of looking at problems. Maybe a functional language? I've heard good things about Scala.

    ReplyDelete
  10. These are all great comments and make me think very carefully about what exactly I want to do with my code. I like the idea of learning a language that gives me a completely different angle to attack a problem. Don't know whether a functional language is the answer though.

    ReplyDelete
  11. I'll echo the support for Python. Performance is its biggest weakness, but for me the lower programming time makes it worth it (which leaves a lot more time for algorithm improvements compared to, say, C++).

    One approach could be to explore the problem with Python, and once you know how to solve it, re-write for speed in C/C++ (either completely or piecemeal -- Python plays nicely with C).

    However, since Google and many others are Python enthusiasts, there are several major efforts underway to improve performance. Unladen Swallow could eventually increase performance by an order of magnitude (it's probably a year or two from being practical -- but it might end up being directly built into Python 3.0.

    ReplyDelete
  12. Java (the JVM) will certainly outperform Python on just about any task. Use a decent language on top of the JVM, such as Scala or Clojure, and you have a good combination. There is a stats/ML package for Clojure called Incanter which might be a good place to start.

    I think you really should learn a functional language. It is so much closer to the problem domain, and will give you a new way of thinking about programming.

    Finally, I do all my work in PLT Scheme, but I do have to reinvent some wheels.

    ReplyDelete
  13. OCAML. Fast, portable, functional. Can be both compiled and interpreted. Very fast for prototyping.

    ReplyDelete
  14. Thanks, hsy, for linking to Sage. It does seem interesting, although Mathematica sure has it beat in interface. (I have never seen uglier "pretty printing" in my life.)

    ReplyDelete
  15. You might also want to consider C# and LINQ. The latter makes programming on large amounts of data very flexible and easy, and is the code is often closer to the way you think about the algorithm. As an added bonus, you should eventually be able to run things on a cluster in the cloud with little modification.
    (Full Disclosure: I work at Microsoft)

    ReplyDelete
  16. +1 for C#; you can use your code from any .NET language and you can use any libraries written in other .NET languages (includes C++, C#, Python, Ruby, F#, VB, ...).

    The managed runtime is a good compromise: much faster than interpreted languages like Python but only slightly slower (20%'ish) then optimized C/C++. Having a managed runtime will give you a productivity boost over C++ ...

    ReplyDelete
  17. Perl Data Language is another good option. And CPAN is second to none in providing an "extensive package base".
    Perl is also faster than Python for many things, though in reality it might not matter - in such number-crunching applications they both are mostly glue layers over the C libraries that handle the actual math.

    ReplyDelete
  18. Suresh,

    I would be curious to know your final decision.

    ReplyDelete
  19. I think I'll probably go with python, and maybe use SAGE as and when I need it.

    ReplyDelete
  20. Do you really *need* a language that has a compiler? Why not create your own notation and then "hand" compile it to whatever existing languages work best?

    ReplyDelete

Disqus for The Geomblog