Thursday, July 09, 2009

NSF EDA Workshop: Wrap up

Today morning was wrap up day. The theory group was "tasked" with coming up with a few slides suggested avenues for further cooperation. Dick talked about creating (or re-creating) an environment in academia where researchers are building chips and talking to other experts down the corridor (like theoreticians) rather than the 'planted' model where a theoretician is planted in a corporate design environment for 6 months or so.

I talked briefly about what I called "new" theory: the twin ideas of randomization and approximation that have been so powerful in algorithm design (even for intractable prolems), and how these play into the problems of dealing with massive high dimensional data.

Vijaya gave a synopsis of cache-obliviousness, multicore research, and her recent work in this area. There's a lot of interest in EDA and multicore work, and she made the argument that theoreticians are learning how to design efficient multicore algorithms and can be of great help in adapting EDA methods.

There were also four sub-panels that addressed various aspects of EDA. What to me was interesting that many of the panels brought up "EDA needs in theory" that matched some of the things we talked about. For example, there's a pressing need for methods that run linear (and even sublinear) time, tools that are incremental, in that they can progressively generate better and better solutions given more time, and tools that can parallelize well.

They also talked about providing benchmarks: for example, a 10,000 variable SAT instance and code that solves it. I thought (and said) that this would be an excellent way to get people dabbling in this area.

Randomization was a trickier matter. Although the EDA folks recognize the power of randomization, they are concerned about reproducibility. The DA pipelne is long and involved, and once you've fixed the output of a piece of the pipeline, you'd like to keep it in place and not change from iteration to iteration.

Of course this is something that can be fixed with seed control. For more portability, it's conceivable that you can cache the random coin tosses once you've settled on a reasonable solution to any piece in the pipeline.

Overall, it was quite interesting, although exhausting. The general idea with such workshops is that the findings make their way into the next call for proposals (sometime in October/November), so if you have EDA people you'd like to collaborate it, this might be a good opportunity.

It's time to go home.


  1. I admit some confusion as to why "randomness" is considered such a problem? In the real world, usually randomness is pseudo-randomness (often something simple, perhaps lacking in provable properties but functional nonetheless), and you can essentially make your "randomized algorithm" deterministic by remembering the small seed. So why is randomness going to be a problem?

    (I'm not being flip -- I'm really interested! If they view it as a problem, there's probably some research to be done to make it not a problem in this context.)

    I'm cc'ing the comment at Lipton's blog...

  2. We need a way to link comments across blogs !

    To answer your question, Michael, my understanding is this is part user driven and part problem driven.

    From the system perspective, there's an entire pipeline of components that go into making a design, and they like to fiddle with these pieces repeatedly (the incremental part). Having a randomized element that could change its answer the next time the whole pipeline is compiled is a problem.

    Now this can be changed with a litte tweaking: as you mention, as long as the seed can be preserved, an algorithm can be run in "randomized" or "deterministic mode": in the latter form, it merely uses the cached seed.

    But here's the user side of the story. They want these designs to be portable across different machines/systems, so that different users, running with the same "knobs", get similar answers.

    This is because the user of such a tool is faced with a bewilidering array of knobs, and over time has become accustomed to setting certain knobs a certain way. Changing things around makes them unhappy.

    Now this latter problem is sociological, and is difficult to deal with for us. But I suspect the best approach is to ignore it for now. It's really the companies that are concerned with satisfying their user base, and it's not clear to me why research-level tools need to live under the same constraints. My personal view is that the underlying facilitating technologies for EDA research appear to be very tied to corporate products, and this is somewhat of a problem. To me it's as if all database researchers had to use Oracle, and couldn't really complain about it if it wasn't very useful.

    I will also Cc this over at @rjlipton.

  3. "They also talked about providing benchmarks: for example, a 10,000 variable SAT instance and code that solves it."

    Right.... Then as a theorist you can show how if they had that, they could answer unsolved problems in Ramsey theory very quickly and be heroes of the discrete math community. R(5,5) and R(6,6) would be resolved immediately.

    Let me know who's up to the challenge. I'll give them the SAT instances (if you haven't figured them out already.)

  4. anon: I'm a little confused about your comment. could you clarify ?

  5. Sure. The question "does R(k,k) = n?" can be phrased as a SAT problem with n choose 2 variables and 2(n choose k) clauses. Assign each edge in K_n a variable so that false is red and true is blue. For each k-clique in K_n, create two clauses to constrain the variables so that not all the variables can be equal. If you can satisfy the formula, then you have a two-coloring where there is no monochromatic k-clique and thus know that R(k,k) > n. If it is unsatisfiable, then you know R(k,k) <=n.

    Now, R(5,5) is unknown. But we do know that 42 < R(5,5) < 49. Thus can see that the SAT formula involved in pinning the number down have much less than 10,000 variables. I don't think any SAT solver can answer that at the moment.

    Wow...moderating comments at 37,000 feet is cool! You might be the first blogger in history to do this!

  6. ok. so that's what I thought you meant. Remember that I didn't say that they could solve ALL 10,000 variable instances (as an aside, I actually forget the upper limit of what they can solve). They have solvers that can solve a number of large instances that arise in practice, and one question that they'd like to understand is why some instances are solvable and some aren't.

  7. Fair enough. I'd rather have a SAT solver that solves all instances up to 1000 variables than one that does special cases of 10000-variables instances.

    I guess the theory question would be: can the instances of SAT that they're interested in be "reduced" to problems known to be in P? My guess is yes but it's easier for them to write them as SAT problems.

  8. Indeed. and see Michael Mitzenmacher's comment at Dick Lipton's blog:

  9. I'd rather have a SAT solver that solves all instances up to 1000 variables than one that does special cases of 10000-variables instances.

    I don't think we should discount these solvers. Yes they may not solve your R(k,k) problem, or your factoring problem, but they are actually great at solving a large number of real problems. I think formal verification is one great example: they can often phrase the problem of checking correctness of a program as a SAT instance and these SAT solvers work great and are used in practice.

    Also, I am not sure what you would do if I gave you an oracle that could solve all 1000 variable SAT instances. :)


  10. Also, I am not sure what you would do if I gave you an oracle that could solve all 1000 variable SAT instances. :)

    Oh, I'd probably have more fun with a 1000-variable SAT solver than I would writing an NSF grant proposal with a working title of "Why people who solve formal verification problems are less miserable than theoretical computer scientists think they should be." (Of course, that title would have to change in the final draft.) But to each his own.


Disqus for The Geomblog