## Monday, May 5, 2008

### An algorithm for solving Sudoku puzzles

In the previous post, Impossible Sudoku, I said that there exists a general algorithm that allows to solve any Sudoku puzzle. In this post I will present this algorithm and the solutions to the Sudoku puzzles I posted in the previous post.

First of all, in order to talk about solving Sudoku we need to define what a proper Sudoku is - A proper Sudoku puzzle is a puzzle that has unique solution. It is important to notice that in this definition there is no requirement that a proper Sudoku can be solved - the only thing we require is that there exists a way, and its is only one, to put the missing numbers in the puzzle without breaking the nine in row/column and nine in square rules.

In this post I will talk only about proper Sudoku puzzles. There are 6,670,903,752,021,072,936,960 such Sudoku by the way...

Now that we have this definition we can talk about solutions. Lets prove that all proper Sudoku puzzles can be solved:
A Sudoku puzzle it considered to be solvable if there exists a finite series of step that will produce a filled grid. T o prove that for all proper Sudoku puzzles such a series exists we will need to use linear algebra. If you are not familiar with it please, skip this paragraph. Lets define the following vector space:

$V=[(a_{1},a_{2},....,a_{81})|a_{i}\in Z_{9}]$
This is a 81 dimensional vector space. $Z_{9}$ is the field of natural numbers mod9. Now we can talk about a function from the group of all Sudoku puzzles to V. This function is not a nice one - it moves all the 1 in the puzzle to 0 and 9 to 8, so it loses information. However this is not important. What is important is that for proper Sudoku its solution is moved by the function to a unique vector in V. Because V is a vector space with finite dimension it must have a basis. The standard basis of V is clearly (1,0,0,...,0),.....(0,0,0,...,1). Lets selected one proper Sudoku. The solution of our Sudoku can be expressed therefore as a finite linear combination of these vectors. The problem is that each of them can appear more then one in the combination - we don't want this to happen. Therefore we will sum all the appearance of the same vector - for example if (1,0,0,...,0) appears 5 times in the combination we will write (5,0,0,...,0) instead of it. It is very easy to see that the puzzle we chose can also be written as a sum of part of the linear combination we got in the previous step - so lets sum a part of it to get the puzzle. What we now have is the following equation:
$U+v_{1}+...+v_{k}=w$
in this equation U is our original puzzle and w is the solution. $v_{i}$ are the remaining vectors in the linear combination. Each of them represents a number and a cell, so adding one of them is filling a cell. We already saw that k is a finite number, so indeed there exists a finite series of steps that solvs a certain proper Sudoku puzzle.

You are probably wondering why I wrote this prove - isn't it obvious that Sudoku puzzle can be solved? Well, yes. But if it is obvious, why not prove it? After the prove it is no longer obvious - it is just a plain fact.

Now that we have shown that there is a solution lets talk about how to find it. The algorithm consists of three main steps that are repeated until a solution is reached:

Step one - Scanning:
Cross-hatching: The scanning of rows to identify which line in a region may contain a certain numeral by a process of elimination. The process is repeated with the columns. It is important to perform this process systematically, checking all of the digits 1–9. It is usually faster to check for a digit in all of the grid at once than to check square by square.
Column check: After cross-hatching it is usually a good idea to use elimination on columns. It is done in the same manner as with squares.

Step two - Logic:
This step consists of attempts to make logical conclusions. For example in the puzzle above you don't know were 4 is in the center square. But you now that it must be in the right column of this square. You can use this information to conclude that it cannot be in the right column of the top-middle square. In this case this is enough to find 4 in that square.
Step three - Guess:
If you don't know you can always guess. In this step it is important not to make mistakes - otherwise it will not work.
If you want to make a guess you will need to find a good digit to guess. The best option is to find a cell (or row/column) with two candidate numbers. After finding such cell make a guess and then repeat steps one and two.
If you will get an impossibility at some point than your guess was wrong - and because there were only two candidate numbers it is the second number that should be in that cell.
If after repeating steps one and two you need to make another guess repeat step three - if your second guess is incorrect it means that your previous guess requires the second candidate in the cell you choose for the second guess.

Notes:
1. Easy and medium puzzles can be solved using first step only. Hard puzzles require step two, and very hard step three.
2. Don't overuse step three - it makes the game much less fun.
3. For step three you will need to keep track of the numbers you wrote after the guess. The Gnome-Sudoku I mentioned in the previous post gas an inbuilt tracker function for cases like this.
4. It is possible to add another step to the algorithm - marks. It basically consist of recording all the candidates for a certain cell on the puzzle itself. I don't think that it is needed, and it is time consuming.

And now as promised - solution to the puzzles from the previous post:

brundinbar said...

I solved the similar problem, the solver I've created starts with a "logical" solving - it controls rows, columns, squares, etc. After all the possibilities are spent, the second section comes up - backtracking :)

Anatoly said...

Hello Brundibar,
Thanks for the link. I am going to write a separate post about you program, it certainly looks useful.

Mucha said...

Hi, it's some time ago I've programmed the application, I can't remember all the steps in detail. I'd like to refactor the program soon, then I describe the steps more clearly.

Have a nice day,
Brundibar

woerner1 said...

There is never a need to guess at the sudoku solution (step 3). I came across a solution after working on about 10 'diabolical' level puzzles and haven't had to guess since. I've spent the last couple days creating a program to create tables from scratch. I finally finished it, and it takes approximately .5 seconds to fill the grid. I am creating a total sudoku program that will create grids from scratch, or you can fill the squares in from the numbers given (in a newspaper, for example). The individual program level settings will then optionally provide hints for your next available move and guide you through the to the puzzles solution. I will probably be offering this program for free download, and the dll library routines separated for purchase if anyone wants to create their own programs with them. I will also have the complete source code available. I will post more info here about where you can download the program, view the algorithm of why its not necessary to guess, and optionally buy the separate libraries for interested parties.

pbsl said...