Saturday, July 6, 2013

Sudoku Grabber and Solver using OpenCV, JavaFX and Scala

In this post I'll show you how to build an Application which solves a Sudoku puzzle from a o photo of a local newspaper.

First of all, here is a video of the application:




Maybe a screen shot suffices for most visitors:

OpenCV for image processing,  JavaFX for visualization and Scala as the driving force

Ok. Fine. Source Code!


Click here for the complete source code. Make sure you have openCV installed however and change the path of the native libraries matching your setup. At the moment you have to start it from your IDE.

Why a sudoku program?


My main motivation to create the program was to explore the possibilities openCV offers a little more in detail, and tackle a non trivial (of course "non triviality" always lies in the eye of the beholder) problem with it. As it turned out there are many blog postings around which describe how one could recognize the digits and their position and as such the whole sudoku puzzle. Nevertheless it was a pleasant experience to implement it on my own. (Disclaimer: Needless to say I would have been totally lost without reading all available online resources, browsing through blogs and books, trying out stuff and redoing it all over ... )

Anyway, when creating such a program, you have two major challenges:

  • Image recognition
  • Solve the sudoku problem itself

I was more interested in the image processing domain since it was apparent to me that the solving algorithm is a very deep snake pit. But more on that later.

How to squeeze out digits out of an image?


It turns out that it is relatively easy to do this with openCV. The main idea is that every sudoku is surrounded by a border and the sudoku is defined by a 9x9 matrix of squares containing the digits. As such, the first task is to identify the borders of the sudoku. It goes without saying that everything else on the input image should be ignored, only the sudoku remains as a region of interest. Furthermore you know in advance what you want to extract out of the photo - you want to know which square at which position contains a number or not, and you want to determine the number.

There are several preprocessing steps involved such that it is easier for the openCV algorithms to detect the borders properly. The idea is that the surrounding borders define the rectangle with the biggest area on the photo, which is a valid assumption if you create such a program. Concerning the recognition of the digits there exist several approaches like using a neural network or even a pre-made library like tesseract for doing this tasks. Tesseract is surely the choice to be made if you want to have a robust ocr engine, there exist also java bindings for it, I didn't try it out though.

Another approach, and maybe the simplest one, is template matching.

Image preprocessing


In order to filter out stuff we don't need for digit recognition, we apply certain preprocessing steps, which are shown in the screenshot below:


This is fairly self explanatory (and the implementations of the called functions are one liners calling some OpenCV library functions).

Finding the contour with maximum area


The result of this preprocessing step will be examined in order to find the shape with 4 sides with the maximal area.


Here you can see the power of Scala collections applied by filtering out only shapes with 4 sides and choosing the one with the biggest area. The result already contains the 4 points which define the sudoku borders.

Now the situation already turns out to be quite a good one: we know the corner points of our sudoku, but it still is placed somehow on the photo, we have to "normalize" it to be able to do template matching. OpenCV comes to the rescue, it has a function called "warpperspective" which does exactly what we want and is one very important aspect of the whole approach.



On the right side you'll see the sudoku which serves as an input for the next stage, detecting the numbers.

Detecting numbers


I made myself my life very easy by just dividing the resulting Mat from the previous warp step to be a 9x9 matrix. All cells are inspected separately and thus the problem is reduced to detect one number in one cell.

The contour with the biggest area will be searched in a subarea of the cell (in order not to have false positives with a sudoku line which quite often interferes here). Another assumption is here that the contour must contain the center of the sudoku square. With those two assumptions we get quite reliably the contours we are interested in.

Like this we get a list of contours which have to be matched against a template library. We have to make sure that the detected contour is normalized in order to properly compare it with our templates.

OpenCV has everything one needs for such an operation. In essence you need to compare two lists of points which each other and calculate the minimal distance between reference lists (associated with a number) and the list in question. The main problem is that you don't know how many points there are in the list you investigate, and as such the trouble starts. Here it is important to know which functionality is already available in the library, and which glue code between the calls to the library still have to be custom made.

I am sure that wheels get reinvented all the time - so make sure to double check before implementing a too low level concept in your code. (I would not be surprised if some steps in this small project are also superfluous)

I designed the program in a way that the detection method can be exchanged, above you see the approach using template matching. A set of reference images is compared to the slices picked out by all operations described above, and at the end we assume the best match is good enough and return the number associated with the template match.

Here is an example for some templates to match against:





You can see that I've choosen templates which aren't perfect either - the reasoning behind this is some kind of "fuzziness" in the matching should be possible. Try yourself what works best for you.

Almost there

Now we have identified the cells which are empty and distinguished them from the cells containing numbers. We are were most sudoku solver start anyway - the definition of the puzzle in some form of an array.

Assuming you have a function which knows how to solve such a puzzle it is easy to show the user the solution of the puzzle. There are some very nice thoughts about a sudoku solving algorithm online, make sure you check it out. I took somewhat of a shortcut and used following approach:

complete Sudoku solving algorithm

This is the first google hit for "scala sudoku solver" with small adaptions to fit into my program.

The output gets translated to JavaFX Labels which are shown when pressed on the "solve" button.

Ah. Great. What about this bumping effect??


You may have noticed in the video that the contour visualization bumps up and down when selected - I thought it would be nice to have such a thing in the program to make it a little bit more attractive. The contours are the shapes which are detected by openCV.

I googled for "bounce JavaFX" and stumbled upon this project, i've extracted the parts I needed and was astonished that there is not more code necessary for such an effect. In short, you just need to define the keyframes of the animation, and the rest is done by JavaFX.

Conclusion


To sum up, it was again a nice experience writing the program with my favorite toolchain. I'm very pleased about how well JavaFX and openCV can work together,  especially if you use JavaFX for visualization and UI and let openCV do the core processing tasks, and let Scala be mediator between those two great libraries.

Anyway, I only scratched (again) the surface of all involved technologies, I'm sure that the integration of openCV with Scala for example could be improved a lot by creating a DSL on top of the available Java bindings. Some implicits would do the job, too - sometimes it is a little tedious to convert MatOfFoo to List[Foo] - applying here a little scala magic would declutter the API a lot.

Nevertheless already in their current state all three technologies can be considered - when used together - as a very powerful platform for creating image processing applications.

Update

Maybe you want to check out Part II of the Sudoku2go blog post series.

1 comment:

  1. cool warping idea, really like adding the solution to the warped image

    ReplyDelete