less_retarded_wiki/sudoku.md
2022-09-28 21:42:32 +02:00

7.4 KiB

Sudoku

Sudoku is a puzzle that's based on filling a grid with numbers that is hugely popular even among normies such as grandmas and grandpas who find this stuff in magazines for elderly people. The goal is to fill in all squares of a 9x9 grid, prefilled with a few clue digits, with digits 1 to 9 so that no digit repeats in any column, row and 3x3 subgrid. It is like a crosswords puzzle for people who lack general knowledge, but it's also pretty suckless, pure logic-based puzzle whose generation and solving can be relatively easily automatized (unlike generating crosswords which requires some big databases). The puzzle is a pretty fun singleplayer game, posing opportunities for nice mathematical research and analysis as well as a comfy programming exercise. Sudokus are a bit similar to magic squares.

Curiously sudoku has its origins in agricultural designs in which people wanted to lay out fields of different plants in more or less uniform distributions (or something like that, there are some papers about this from 1950s). The puzzle itself became popular in Japan in about 1980s and experienced a boom of popularity in the western world some time after 2000.

The following is an example of a sudoku puzzle with only the initial clues given:

 _________________
|  3 1|  5 7|  6  |
|     |9   8|  4  |
|4_7_8|6_ _2|1_ _5|
|7   5|  6  |4    |
|    6|  8 1|7 2  |
| _ _ |7_ _3|6_5_ |
|5 6  |  9  |    2|
|     |1   5|9   6|
| _ _3|8_2_6| _ _4|

The solution to the above is:

 _________________
|9 3 1|4 5 7|2 6 8|
|6 5 2|9 1 8|3 4 7|
|4_7_8|6_3_2|1_9_5|
|7 1 5|2 6 9|4 8 3|
|3 4 6|5 8 1|7 2 9|
|2_8_9|7_4_3|6_5_1|
|5 6 7|3 9 4|8 1 2|
|8 2 4|1 7 5|9 3 6|
|1_9_3|8_2_6|5_7_4|

We can see neither digit in the solution repeats in any column, row and any of the 9 marked 3x3 subgrids or, in other words, the digits 1 to 9 appear in each column, row and subgrid exactly once. These are basically the whole rules.

We generally want a sudoku puzzle to have initial clues such that there is exactly one possible (unique) solution. For this sudoku has to have at least 17 clues (this was proven by a computer). Why do we want this? Probably because in the puzzle world it is simply nice to have a unique solution so that human solvers can check whether they got it right at the back page of the magazine. This constraint is also mathematically more interesting.

How many possible sudokus are there? Well, this depends on how we view the problem: let's call one sudoku one grid completely filled according to the rules of sudoku. Now if we consider all possible such grids, there are 6670903752021072936960 of them. However some of these grids are "basically the same" because we can e.g. swap all 3s and 5s in any grid and we get basically the same thing as digits are nothing more than symbols here. We can also e.g. flip the grid horizontally and it's basically the same. If we take such things into account, there remain "only" 5472730538 essentially different sudokus.

Sudoku puzzles are sometimes assigned a difficulty rating that is based e.g. on the techniques required for its solving.

Of course there exist variants of sudoku, e.g. with different grid sizes, extended to 3D, different constraints on placing the numbers etc.

Solving Sudoku

There are two topics to address: solving sudoku by people and solving sudoku by computers.

Humans almost exclusively use logical reasoning techniques to solve sudoku, which include:

  • scanning: We take a look at some frequently appearing number in the grid and see which columns and rows they intersect which implies they cannot be placed in those columns and rows, possibly revealing the only possible location to place such number.
  • single remaining candidate: When there is only one number left to fill in any column, row or subgrid, it is always clear which one it is and can be safely placed.
  • candidate sets: An advanced technique in which we create sets of possible candidate numbers for each square on the grid e.g. by writing tiny numbers in the top corners of the squares. We then apply various reasoning to reduce those sets, i.e. remove candidate numbers, until a single candidate remains for a certain square in which case we can fill in that number with certainty. This will further help us reason about candidates in other squares.

For computers the traditional 9x9 sudoku is nowadays pretty easy to solve, however solving an NxN sudoku is an NP complete problem, i.e. there most likely doesn't exist a "fast" algorithm for solving a generalized NxN sudoku, even though the common 9x9 variant can still be solved pretty quickly with today's computers by using some kind of "smart" brute force, for example backtracking (or another state tree search) which recursively tries all possibilities and at any violation of the rules gets one step back to change the previous number. Besides this a computer can of course use all the reasoning techniques that humans use such as creating sets of possible values for each square and reducing those sets until only one possibility stays. The approach of reasoning and brute forcing may also be combined: first apply the former and when stuck fall back to the latter.

Generating Sudoku

{ I haven't personally tested these methods yet, I'm just writing what I've read on some web pages and ideas that come to my mind. ~drummyfish }

Generating sudoku puzzles is non-trivial. There are potentially many different algorithms to do it, here we just foreshadow some common simple approaches.

Firstly we need to have implemented basic code for checking the validity of a grid and also some automatic solver, e.g. based on backtracking.

For generating a sudoku we usually start with a completely filled grid and keep removing numbers to leave only a few ones that become the initial clues. For this we have to know how to generate the solved grids. Dumb brute force (i.e. generating completely random grids and testing their validity) won't work here as the probability of finding a valid grid this way is astronomically low (seems around 10^(-56)). What may work is to randomly fill a few squares so that they don't break the rules and then apply our solver to fill in the rest of the squares. Yet a simpler way may be to have a database of a few hand-made grids, then we pick on of them and apply some transformations that keep the validity of the grid which include swapping any two columns, swapping any two rows, tansposing, flipping the grid, rotating it 90 degrees or swapping any two digits (e.g. swap all 7s with all 9s).

With having a completely filled grid generating a non-unique (more than one solution) sudoku puzzle is trivial -- just take some completely filled grid and remove a few numbers. But as stated, we usually don't want non-unique sudokus.

For a unique solution sudoku we have to check there still exists exactly one solution after removing any numbers from the grid, for which we can again use our solver. Of course we should optimize this process by quitting the check after finding more than one solution, we don't need to know the exact count of the solutions, only whether it differs from one.

The matter of generating sudokus is further complicated by taking into account the difficulty rating of the puzzle.