## Saturday, May 28, 2005

### Sudoku OK, kudos UK

Thanks to Suresh for the introduction and for the opportunity to add a few posts to the Geomblog. Since it is a "holiday" weekend in the US and elsewhere, I thought I would begin with some posts on puzzles and games from a computer science perspective.

As Lance observes, there seems to be a sudden and unexplained interest in Sudoku in my home country. It felt very odd to read about this while living in the US, since it makes the extraordinary interest this seems to have generated (newspapers publishing tens of puzzles every day and competing with each other for who has the best puzzles) seem very abstract and hard to imagine. Of course, since I was reading about this in an online newspaper, it's quite possible that they were merely hyping the phenomenon in their own interest.

The game is pretty simple: one is given a 9 by 9 grid, formed of 9 subgrids of size 3x3 each. Some digits from 1 to 9 are filled in some of the squares, and the goal is to complete the grid under the constraints that each digit must appear exactly once in each row, column, and subgrid.

I tried some of these puzzles online at http://www.sudokufun.com/sudoku.php and the guardian, and finished them in an hour or so each. As a game, I didn't find it very satisfying: it didn't seem to require too much wit or inventiveness to finish. Once I picked up a few heuristics, it seemed to follow pretty easily from repeated application of the Sherlock Holmes method: eliminating possibilities. Perhaps it takes a few more tries to get addictive. There is certainly something rewarding about the "endgame" where the last few squares can get filled in very quickly indeed.

As a good computer scientist, my first thought was also how to generalize this puzzle. If we call this version Sudoku-3, then one can easily imagine how Sudoku-n will look like, and very quickly (by websearching, of course...) discover that the decision problem is NP-Complete (that is, to decide whether a given instance has any solution). Is this the end of it? Not really. There are many other questions that spring to mind that I do not know the answer to:

• Counting version: How many different filled in Sudoku-n grids are there? Let me get you started: For n=1, the answer is 1. For n=2... actually, I don't know, but there are at most (4!)^4 ways to arrange 4 lines of permutations of {1,2,3,4}, so checking all 300,000-ish possibilities won't take too long on a modern PC. Someone who is a whiz with mathematica could probably do this in a few lines of code. But for 3, one will need some smarter techniques to do the counting, perhaps building up satisfying sublocks, and using symmetries more cleverly. I couldn't spot any entries in Neil Sloane's encylopedia of integer sequences for this problem yet.

• Algorithms. One important feature of the puzzles that are printed is that there is exactly one solution. For these the decision problem is simple: the answer is always "solvable!" However, given this fact, is there now a simple algorithm to find the solution which does not resort to exhaustive checking? here someone has laboriously gone through an example of how to solve the puzzle by repeatedly applying simple elimination rules. Can you prove that this is always sufficient?

There's a danger of a tautology here: apparently many of these puzzles are generated by computer and then checked automatically that there is a unique solution. So the difficulty of these puzzles is only as hard as the complexity of the checker! In other words, if the checker runs the elimination algorithm to verify that there is a unique solution, then it will reject some grids that can be solved uniquely if they do not yield to the elimination method. Indeed, some grids are crafted by human designers and these are supposed to be superior (although, I do not know how easy it would be to distinguish machine generated from human). I tried one of each, and although the human generated puzzle took longer and seemed to require some additional rules to finish, it would be hard for me to say where the creativity comes from.

• Puzzle generation. Staying on this theme, how could you generate a random instance of a puzzle to give to someone to solve? Randomly filling in a few entries then checking whether there is a (unique) solution might work. But now how do you make this a good puzzle? Perhaps the generated instance is too easy (eg, if very many entries are filled in already, or all 9 copies of one digit are filled in). How to tune the algorithm to generate good grids, or ones at differing levels of difficulty? I suspect that this problem very rapidly becomes a subjective matter, and comes down to a variety of heuristics, but perhaps one could define some metrics or desired properties and make it more rigourous.

• (Philosophical -- Extra Credit). Why is this a craze? What is it about these puzzles that makes them enjoyable? As has been proved by many computer scientists, many popular games and puzzles, from Tetris and Minesweeper, to various block pushing and object arranging games, are NP-Hard. Does this explain why we enjoy them? Is there something common to these "constraint satisfaction problems" that makes them enjoyable? There's something a little wrong with this hypothesis though: the instances of these games used to prove hardness have a tendency to be very repetitive, and boring seeming. For instance, the reduction used to prove Tetris NP-Hard relies on a very regular initial arrangement and many repetitions of the same block in sequence. When I play Tetris, I enjoy much more the application of essentially simple heuristics to try to produce an arrangment that can adapt to whatever happens to come next. When I play Lemmings, I know that the level designers have in mind some sneaky way of using the physics of the game in order to win the game. The reason I didn't enjoy my first games of Sudoku very much was because it seemed too deterministic: I wanted some non-determinism, but not too much!

So, I haven't really been hooked by this craze. I think I will stick to my preferred newspaper puzzle, the cryptic crossword, of which more in a later post...

1. There are 288 Sudoku-2 boards, found by enumerating the 12 boards which have 1234 as the first row by hand, and noting that all other boards are obtained from one of these by replacing 1, 2, 3, and 4 by one of the 4! permutations of them.

The number of Sudoku-3 boards is > 3*10^14, and < 5524751496156892842531225600 ~= 5*10^27: lower bound from enumeration (before I killed it), upper bound from the number of 9x9 latin squares (from Neil Sloane's OEIS ).

In fact, the enumeration had found >10^9 boards starting:
123|456|789
456|789|123
789|123|456
---+---+---
2
when I put it out of its misery.

Posted by David Applegate

2. Wow -- so no danger of the brit newspapers of running out of puzzles to run!

cheers
graham

Posted by Graham Cormode

3. Well, thank you - it is fascinating to see how you became hooked on this.

My first reaction on discovering them was to complete the puzzle but before I completed it, I realised that it would be more fun to write a program to solve them, which I did after a little thought to the algorithm over lunch.

Then the Independent issued a 16 x 16 Super Sudoku puzzle, so having foolishly programmed with absolutes, I had to re-write using a table-driven algorithm. So now, when we get the Christmas 25 x 25 or worse, I can modify the program very quickly to add these options, and everything down to the type sizes should automatically change.

The problem I have found is with creation. I haven't managed to determine a creation algorithm so ended up by using my solver in the background to create a random puzzle then removing all but a fixed proportion of the answers, leaving a symmetrical pattern of numbers behind for the solver to solve, more for the "easy" puzzles than for the "difficult" ones.

I haven't worked out yet how to determine the difficulty of a puzzle and in solving them and building them with my method, one could unwittingly create a puzzle with multiple solutions, although again, how to determine that in less than a couple of seconds of processing would be difficult. Some newspapers guarantee that there is only one solution, but they are simple puzzles with only one solution route, solved by my solver in a single pass, whereas those with a choice take two or even three passes.

I did find that the processing was greatly speeded up by adding an algorithm to remove alternative options from two doubles or three triples within each row and/or column and/or cube, but the 16 x 16 still take over one second to complete and even longer to create using a randomised two-level table.

My current version is available under the StedmanTim ID on eBay or from the www.LexingSoftware.com website, but if anyone has any suggestions for improvements in creation, and in particular can provide a creation algorithm, I'd be a friend for life!!

Having solved the problem with a workable solution, I'm slightly losing interest, but this would be revived with suggestions for improvement or a better mathematician's input to the algorithm.

And now, 2,866 lines of VB6 code later, I'll go away and do something more useful, like feeding the ducks.

Thank you, Graham, for letting me respond - I'd be most grateful to hear from anyone else.

Chris.

Posted by Chris Stedman

4. Some background knowledge you might be interested in, Chris:
wikipedia.org/wiki/Sudoku has a nice article on both solving and creation of Sudokus.
The largest sudokus that seem to exist are 49x49, published by a Japanese magazine.
There are many programs on the web that solve and create Sudokus, but sudoku.com is a good place to start.

5. I also wrote a quick 200-line solver in ruby. I tried at first for a closed solution, but in the end I found a recursive brute-force method to be easier. It's fixed for 9x9, but everything but the code I use to print it in ASCII should translate easily to nxn.
I'm in pretty much the same boat as you, chris. I can "create" puzzles by blanking out a pattern of numbers from a complete board, but these aren't "real" puzzles like (i assume) they have in the papers.
This  website offers a somewhat-cryptic algorithm for creating the puzzles the "right way", but I think there must be something more elegant out there somewhere.
Oh, and if anybody wants to see my code, just drop me an email.

Posted by peter

6. http://www.yogi.net/sudoku generates random puzzles

Posted by alex

7. Well, but the fact that sudoku (and latin squares) belong to the NP-complete class does involve any important consequence on the whole class itself ? Does it shade new light on the deep nature of NP and its relationship to P ?

Posted by Anthony

8. I am in love with this game. This daily sudoku site has it all...

Engoy

Posted by Keren Simpson

9. do you want a 64x64 sudoku ?

Posted by azfodel

10. I found a way of calculating the solution to (most) sudoku puzzles, which does not involve "logic" or "trial and error," and which is in fact brain-dead simple; it makes me wonder whether sudoku is really a lot simpler than it seems (perhaps that's the wide appeal: it looks hard, but it's not TOO hard). You can read about PSS, my probabilistic sudoku solver by clicking on the ling below.

Posted by Michael A. Gottlieb

11. I recently created a PHP version of a Sudoku generator. The results can be found on my website. The class uses an elimination strategy. I have created a debug page so you can see what it's doing each step of the way.

Posted by Chris

12. I would be very interested in seeing that php sudoku generator, Im running an excellent one at a dailysudoku website, perhaps I can help you improve upon it.