Solving Square Piece Jigsaw Puzzle using Computer Vision€¦ · Matej Hamaö Solving Square Piece...

Matej Hamaö

Solving Square Piece Jigsaw Puzzleusing Computer Vision

Computer Science Tripos – Part II

Robinson College

May 6, 2015

[This page intentionally left blank]

Proforma

Name: Matej HamaöCollege: Robinson CollegeProject Title: Solving Square Piece Jigsaw Puzzle

using Computer VisionExamination: Computer Science Tripos Part IIYear: 2015Word Count: approximately 11,8001

Project Originator: Matej HamaöSupervisor: Dr Chris TownSpecial di�culties encountered: None

Original Aims of the Project

To create a program that is capable of assembling a pictorial (displaying an image), squarepiece jigsaw puzzle. The program has access to two images taken with a smartphonecamera: an image of randomly shu�ed jigsaw pieces placed on a single-colour backgroundand a final image of a solved puzzle. The aim is to use computer vision and imageprocessing techniques to find pieces in the puzzle image and use them to do computationalreassembly of the puzzle in order to solve the jigsaw.

Work Completed

An algorithm that can find and extract square pieces from a given image of a disassem-bled square piece jigsaw puzzle. The image is taken using a mediocre smartphone camera.Using information about the original image, the algorithm can solve the jigsaw puzzle bycomputationally reassembling it. Its performance has been evaluated in various configu-rations (some being unplanned extensions) on more than 10,000 sample inputs from bothcamera-taken and artificially-generated datasets. It is able to solve puzzles with hundredsof pieces with high accuracy, exceeding expectations from the original proposal.

1using textcount available at http://app.uio.no/ifi/texcount/

3

http://app.uio.no/ifi/texcount/

Declaration

I, Matej Hamaö of Robinson College, being a candidate for Part II of the ComputerScience Tripos, hereby declare that this dissertation and the work described in it are myown work, unaided except as may be specified below, and that the dissertation does notcontain material that has already been used to any substantial extent for a comparablepurpose.

Signed ..............................

Date .................................

4

Acknowledgements

I would like to thank my project supervisor Dr Town for his advice and feedback on theproject. I would also like to express my gratitude to my parents and sister who helpedme to manually create puzzle images which involved lots of printing, cutting, laying outpieces and picture taking e�orts.

5

Contents

1 Introduction 91.1 Goal of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.1 Jigsaw Puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.2 Possible Applications . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Preparation 132.1 Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Computer Vision Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Gaussian Pyramid Downsampling . . . . . . . . . . . . . . . . . . 142.2.2 Gabor Wavelets in 2D . . . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Scale Invariant Feature Transform (SIFT) . . . . . . . . . . . . . 16

2.3 Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.1 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Project methodology . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.3 Technology and Backups . . . . . . . . . . . . . . . . . . . . . . . 202.3.4 Proposal Refinement . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.5 Evaluation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Implementation 213.1 Top-level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Analysis Part of The Solver . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Image Preprocessing Module . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Initial Preprocessing - Sharpening, HSV, Edge Detection . . . . . 243.3.2 Background Removal . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Piece Extraction Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4.2 Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.3 Inner Border Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 373.4.4 Inference of Squares . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.5 Post-processing of Squares . . . . . . . . . . . . . . . . . . . . . 41

3.5 Matching Part of The Solver . . . . . . . . . . . . . . . . . . . . . . . . . 433.5.1 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6

3.5.2 Construction of Probability Matrix . . . . . . . . . . . . . . . . . 533.5.3 Weighted Average of Probability Matrices . . . . . . . . . . . . . 543.5.4 Greedy Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.6 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 563.7 Reassembly without Final Image Clue . . . . . . . . . . . . . . . . . . . 56

4 Evaluation 574.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Artificial Generation of Puzzle Images . . . . . . . . . . . . . . . . . . . 594.3 Accuracy Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4 Software Support for Automatic Evaluation . . . . . . . . . . . . . . . . 604.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.5.1 Camera-taken Dataset Details . . . . . . . . . . . . . . . . . . . . 644.5.2 Artificially-generated Datasets Details . . . . . . . . . . . . . . . 64

4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.6.1 Detection Rate of Analysis Part of The Solver . . . . . . . . . . . 664.6.2 End-to-end Accuracy of The Solver . . . . . . . . . . . . . . . . . 704.6.3 Performance of The Solver . . . . . . . . . . . . . . . . . . . . . . 734.6.4 Pairwise Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 754.6.5 Human Evaluation of Camera-taken Dataset . . . . . . . . . . . . 784.6.6 Results on Very Large Puzzles . . . . . . . . . . . . . . . . . . . . 79

5 Conclusion 805.1 Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Bibliography 81

A Scale Invariant Feature Transform (SIFT) Overview 83

B Source Images Used for Datasets 85

C Camera-taken Puzzle Images 86

D Instructions for Human Evaluators 91

E Project Proposal 1E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1E.2 Description of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

E.2.1 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2E.2.2 Proposed Structure of Solver . . . . . . . . . . . . . . . . . . . . . 3E.2.3 Structure of the Project . . . . . . . . . . . . . . . . . . . . . . . 4E.2.4 Possible Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 4

E.3 Evaluation Metrics and Success Criteria . . . . . . . . . . . . . . . . . . 4E.3.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 4E.3.2 Success Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

7

8

E.4 Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5E.5 Work Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5E.6 Resources Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 1

Introduction

1.1 Goal of the Project

The project’s aim is to implement a program, named The Solver, that performs compu-tational assembly of a pictorial jigsaw puzzle that display an image printed on pieces. Allpuzzle pieces are squares, hence the only source of useful information for the assembly arecolours contained in the pieces. Due to these constraints, two images are required for theinput. The first of them is an image of the disassembled puzzle (called puzzle image),consisting of square pieces that have been randomly placed on a uniform background(Figure 1.1a). The second image (called final image clue) shows the original picturedisplayed in the jigsaw (Figure 1.1b).

(a) The first input image (puzzle image) (b) The second input image (final image clue)

Figure 1.1: Two input images taken using a smartphone camera.

The puzzle dimensions (number of rows and columns) are known but the size (resolu-tion) of individual pieces in pixels is not given. Pieces should be correctly rotated, i.e.their sides should be vertical or horizontal with respect to the photograph taken and theorientation should be the same as in the solved puzzle. To allow for human error in layingout puzzle pieces, slight deviations from perfect rotation need to be allowed.

9

CHAPTER 1. INTRODUCTION 10

The elementary success criteria of the project requires The Solver to

• Correctly locate majority of puzzle pieces in the puzzle image and extract them.

• Assemble pieces. In other words, determine an absolute position of each piece inthe solution and return the mapping from extracted pieces to their final absolutelocations in the solved puzzle.

1.2 Motivation

1.2.1 Jigsaw Puzzle

Jigsaws are one of the most famous logic puzzles. It dates back to the late 18th centurywhen John Spilsburt, a London mapmaker who used wood to create the first puzzle de-picting a map of the world. Since then, jigsaw puzzles have spread rapidly and nowadays,people of all ages enjoy the activity of assembling a picture from tiny pieces. There iseven the annual world championship held in Belgium.1

For jigsaws, the problem statement is very easy to understand but the assembly itself isoften tremendously di�cult and challenging. The pictures used to produce jigsaw puzzlesare often natural scenes with lots of continuity, uniform texture or repeated pattern(consider sky, grass or winter scenes) and the number of pieces can reach a couple ofthousand.

The inherent di�culty of jigsaw raises an interesting question, namely whether technologycould be used to tackle this task and assist humans in finding the solution. From acomputer vision point of view, the task of assembling a puzzle, given two input imagesas described earlier, consists of several challenging steps:

1. The pieces themselves must be located and extracted from an input puzzle image(Analysis part).

Since the image has been taken by a camera, it is not perfect. The lighting acrossits surface may vary. There may be undesired shadows created by other objectsin the vicinity or the image may contain noise introduced during photographing.A camera is human operated, hence it is not completely stationary. It may beslightly out of focus and the objects displayed in the image undergo perspectivetransformation during the 3D æ 2D conversion. Moreover, allowing the usageof a mediocre smartphone camera to take input pictures deteriorates the qualityof inputs. Attempting to deal with these artefacts during piece extraction is aninteresting problem that must be tackled in order to succeed in the later assemblystage.

1http://www.worldjigsawpuzzle.org/en/home_en.htm

http://www.worldjigsawpuzzle.org/en/home_en.htm


2. Extracted puzzle pieces must be assembled (Matching part).

Aforementioned complications in the first phase naturally lead to non-perfect ex-traction of the pieces. We may expect some pieces to show traces of the backgroundaround its borders or to be extracted only partially. Moreover, a few pieces may bemissed out completely as distortions in the image or low contrast between the pieceand background prevented them from being located and extracted. This makes thesecond assembly stage more challenging, even in the presence of the final imageclue.

1.2.2 Possible Applications

Besides being an interesting problem for its inherent di�culty, there are other possibleapplications of The Solver. These include the assembly of archaeological artefacts, paint-ings or fossils in cases where the picture of an original (or similar artefact) is available,but the original itself has been broken, even in the case where some pieces are missing.Moreover, such system may be at core of an assembly assistant that can be used in adidactic environment for developing analytical thinking of children or helping patientswith various fine motor skills or cognitive disorders.

1.3 Related Work

Work on computational solvers for jigsaw puzzles dates back to 1964 when Freeman andGardner [8] studied ways to computationally solve an apictorial puzzle of 9 pieces. Suchpuzzle does not show any image and shapes of pieces are the only cues for the assemblyalgorithm. Since then, various computer scientists and mathematicians approached theproblem from di�erent angles.

The most relevant to this project is the work of a stream of researchers that concentratedon solving pictorial jigsaw puzzles consisting of square pieces only. This variant makes thepuzzle more di�cult as it discards useful shape-based cues. For humans, joining piecesbased on their shapes is often harder than an approach which relies on discovering colourpatterns. However, this does not apply to a computational solver that can easily tryout all possible combinations (in O(n2) time) to find two pieces with complement shapesthat can be joined. On the other hand, pictorial square-piece-only puzzles involve lots ofimage processing and the computer vision tasks that turn out to be quite challenging forsuch solver.

We did not see much progress in the area of pictorial puzzle solvers until recently whenCho et al. [3] introduced a probabilistic approach for solving this type of jigsaw puzzlein 2010 at MIT. This was followed by the work of Pomeranz et al. [14] who introduceda greedy algorithm for puzzle assembly in 2011. In 2012, Gallagher [9] studied the moregeneral problem where pieces were of unknown orientation and managed to assemblepuzzles consisting of 9,600 pieces. A year later, in 2013, Sholomon et al. [16] experimented


with genetic algorithms for solving jigsaw puzzles and achieved assembly of a particularpuzzle consisting of over 22,000 pieces.

Although there has been lots of work done in this area over the last five years, it di�ersfrom this project in two major aspects:

1. The researchers worked with sets of perfect square pieces artificially prepared bycomputer. None of these studies were concerned with detection of the pieces thatare randomly placed on a background surface and photographed to yield an inputimage.

2. Hence, as input squares pieces were perfect, researchers were not using final imageclues and performed jigsaw assembly using various edge compatibility metrics. Themost basic one would be the sum of square di�erences of pixel values between edgesof two pieces.

To the best of my knowledge, the only research dealing also with the extraction of pieceswas done by Erell and Nagar [7] as their fourth year engineering project at Ben-GurionUniversity of the Negev, Israel. Using an algorithm introduced by Pomeranz [14] and arobotic arm with an attached camera, they managed to correctly assemble a 36 piece puz-zle.2 Since they had very good conditions, namely a black background, proper stationarycamera and small number of pieces, they did not use the final image clue.

I made an early decision not to abandon the analysis, i.e. the extraction of pieces from aninput image. As a result, approaches and techniques used in the project di�er significantlyfrom those introduced in earlier research. Since extracted pieces are of significantly lowerquality, as described in Section 1.2.1, I allow for the usage of the final image clue duringjigsaw assembly.

2A prototype demonstration at https://www.youtube.com/watch?v=gco7LGHw9Yg.

https://www.youtube.com/watch?v=gco7LGHw9Yg

Chapter 2

Preparation

2.1 Starting Point

Prior to the project, I was completely new to the computer vision field and had never usedthe OpenCV library before. I had some knowledge of image processing and C++ fromCST 1B courses.

To familiarize myself with the field, I made an early decision to read a substantial partof the book A Practical Introduction to Computer Vision with OpenCV [6].

I decided to typeset my dissertation in LATEX which I had never used before. Git waschosen as a version control tool as I had been exposed to it earlier during CST part 1Bproject and my summer internship.

All the code I have written is not based on any earlier code base.

13

CHAPTER 2. PREPARATION 14

2.2 Computer Vision Algorithms

2.2.1 Gaussian Pyramid Downsampling

Gaussian pyramid is a downsampling method that decreases the size and resolution ofan image (Figure 2.1).1 Each iteration consists of the following two steps.

1. Blurring an image I with discrete Gaussian kernel G.

I Õ[x][y] = (I ú G)[x][y] =Œÿ

m=≠Œ

Œÿ

n=≠ŒI[m][n] ◊ G[x ≠ m][y ≠ n]

2. Downsampling I Õ by discarding even rows and columns.

Figure 2.1: Gaussian pyramid that combines blurring and downsampling to reduce thesize and resolution of an image

This process is suitable for reducing the size of the image to speed up potential furtherimage processing (such as convolutions). Blurring the image redistributes informationbetween pixels which makes this method preferable to simpler brute force resizing carriedout by discarding some pixels. This will be important for e�cient computation of cosinedistance in Section 3.5.1. The Gaussian pyramid is also internally used in SIFT algorithm(Section 2.2.3).

1Adopted from https://compvisionlab.wordpress.com/2013/04/28/image-pyramids-theory/

https://compvisionlab.wordpress.com/2013/04/28/image-pyramids-theory/


2.2.2 Gabor Wavelets in 2D

Two dimensional Gabor wavelets are oriented sinusoidal plane waves of a given frequencymodulated by a 2D Gaussian envelope (Figure 2.2). Their functional form is

g(x, y) = e≠[

(x≠x

0

)

2

–

2

+

(y≠y

0

)

2

—

2

]

e≠i[u0

(x≠x0

)+v0

(y≠y0

)]

The wavelet is located at position (x0

, y0

), standard deviations of Gaussian envelope in thex and y directions are – and — respectively; frequencies along these axes are u

0

, v0

. Themodulation of the wavelet can be also described by its spatial frequency w

0

=Ò

u2

0

+ v2

0

with orientation ◊0

= arctan v0

/u0

. Note that we are assuming that the Gaussian envelopehas the same orientation as the sinusoidal plane wave.

Figure 2.2: Real part of 2D Gabor wavelet - planar cosine wave modulated by a Gaussianenvelope (image taken from [5]).

In image processing, Gabor wavelets act as band pass filters that extract features fora given frequency, scale and orientation. Where frequency and orientation of a featureapproximately align with those of a Gabor wavelet, strong filter response is produced;very little response is produced elsewhere (Figure 2.3).

Moreover, Gabor wavelets are conjointly optimal filters in space and frequency domains.They achieve the theoretical lower bound on joint uncertainty and hence they are usefulin maximizing information extracted from an image [4].


*

Figure 2.3: An image convolved with the real part of a Gabor wavelet. Filter responseis shown on the right. Note that features that have similar orientation and frequency tothe wavelet produce strong responses.

2.2.3 Scale Invariant Feature Transform (SIFT)

SIFT is an algorithm proposed by D. Lowe in 2004 [12]. It is used to extract features(points-of-interest) from an input image in a scale and orientation invariant manner.I will use this algorithm to detect features in the puzzle image and final image clue. Thesefeatures will form the basis of one possible distance measure used for puzzle assembly(Section 3.5).

The algorithm is quite complex and consists of several pipelined stages. See Appendix Afor further details.


2.3 Software Engineering

2.3.1 Requirement Analysis

Image Preprocessing Module

Piece Extraction Module

Jigsaw Assembly Module

Artificial Input Creation Module (optional)

Puzzle Evaluation Module (optional)

Puzzle Image

Final Image Clue

Solved Jigsaw

+ description how to assemble pieces from

the puzzle image

Figure 2.4: Top-level modular structure of the project. Oblique rectangles representmodules while diamonds represent inputs and outputs. Dashed rectangles are auxiliarymodules that help to evaluate The Solver.

The Solver should be capable of solving a jigsaw puzzle after it receives a puzzle image andfinal image clue on an input (see 1.1 for an example). As mentioned in 1.2.1, this problemcan be divided into separate tasks which naturally leads to a modular architecture of TheSolver.

It consists of di�erent pipelined modules as shown in Figure 2.4. Core modules are Im-age preprocessing, Piece extraction and Jigsaw assembly. Moreover, there are additionalmodules that are not required for correct operation of The Solver but they help in itsautomatic evaluation: Artificial input creation and Evaluation module.


Requirements on Modules

1. Image preprocessing module (analysis 1)This module receives a puzzle image on an input and outputs a binary image wherebackground and foreground have been separated. The main tasks that are involvedare:

(a) Resize the image to make further image processing operations, such as convo-lutions, run faster.

(b) Do some image enhancement to sharpen edges in order to ease later pieceextraction.

(c) Separate background and foreground pixels (i.e. background removal).Foreground pixels form parts of the puzzle pieces and the rest is background.The output will be a binary image displaying black patches on a white back-ground. We would hope that these patches were all squares but this is notlikely both because of implicit perspective transform during image taking, andimperfect background removal caused e.g. by noise in the image or lack of per-fect edge sharpness. The foreground-background separation later helps withidentification and extraction of square pieces.

2. Piece extraction module (analysis 2)After receiving the binary image, this module is supposed to examine black patchesin order to:

(a) Locate candidate corners of the square pieces.

(b) Find candidate borders of the pieces.

(c) Use these to infer which patches are, or can be completed to, squares. Hence,find exact description of each square in terms of coordinates of its corners.

(d) Extract square pieces, as small images, from the puzzle image.

3. Jigsaw assembly moduleThis module receives extracted pieces from Piece extraction module as well as true

(final) pieces from the final image clue. The dimensions, i.e. number of rows andcolumns in terms of pieces, are known. Hence, final pieces are obtained simply byresizing and splitting the final image clue into required squares.

This module should find the best mapping between both sets. The final image cluehas been split deterministically, hence we know the coordinates of each final piece inthe final solution. The mapping gives us complete information on how to assemblethe puzzle using pieces extracted from the puzzle image.

Note that this module may internally use various algorithms. Due to the nature ofthe task of this module, they will be all related to computer vision image matchingproblem.


4. Artificial input creation moduleIt would be infeasible to evaluate The Solver only by hand. To do some automaticevaluation, I will need to prepare various computer-generated samples. Two imageswill be supplied to this module: an image displayed on the jigsaw and a backgroundimage (or a background colour description).

The first image should be split into pieces that are randomly shu�ed and placedon the background. Note that their orientation should remain correct, according tothe problem statement. However, I will allow for slight deviations from the correctorientation in order to mimic human error when laying out jigsaw pieces.

The module should also be capable of adding artificial noise to the generated im-ages as this allows more thorough evaluation of The Solver. Moreover, to enableautomatic assessment, coordinates of corners of all pieces should be written out tothe disk during the generation of a puzzle image.

5. Puzzle evaluation moduleThis module will work only on computer-generated puzzles. For such puzzles, weknow exact coordinates of square pieces. Hence, it is possible to automaticallyevaluate Solver’s performance in terms of the number of square pieces correctlylocated. If we shu�e pieces deterministically during generation of a sample puzzleimage, we will also be able to measure the number of pieces correctly assembled.

The Solver will focus on the accuracy of a solution, not on the speed of execution.Moreover, it is not supposed to handle inputs of deliberately low quality, such as imagesthat have low contrast with respect to the background, low resolution or are either poorlyor excessively lightened or shaded.

2.3.2 Project methodology

I chose a modular architecture for the project to make the code flexible, easy to developand debug. Using iterative software development methodology aided the prototyping ofthe software. This allowed for early debugging, testing and refinement of The Solver.


2.3.3 Technology and Backups

I am using a MacBook Pro2 for the development. Computer Laboratory facilities wouldbe used in the unlikely case of its failure.

During the requirements analysis, I chose OpenCV3 as the computer vision library touse, due to its rich built in features, such as convolution (filter2D) or SIFT and highlyoptimised code. I chose C++ to work with because of the great documentation of theOpenCV interface.

I am using GitHub and OS-X time-machine for backups and version control. I alsoperiodically backed up crucial parts of the project onto an external hard drive.

Camera-taken images used for the evaluation were taken using a Samsung Galaxy S4Mini with 8 MPixel camera.

2.3.4 Proposal Refinement

As I explained in the Progress report, I decided not to implement Final image clue ex-traction module, originally planned in the Project proposal. I decided to prioritize moreinteresting and challenging tasks that had emerged over the course of the project, suchas Gabor wavelet bank filtering.

2.3.5 Evaluation Techniques

I am using both true camera-taken and artificially-generated puzzle images to evaluateThe Solver ’s performance. Computer-generated images are necessary to do large scale,automatic testing of The Solver. Multiple computer-generated sets are created that try tosimulate noise and geometric distortions of the true photography. Camera-taken imagesundergo human evaluation to depict the human opinion on the quality of The Solver ’sperformance.

See more details in Chapter 4.

2Intel Core i7, 2.2 GHz, 16 GB RAM3http://opencv.org/

http://opencv.org/

Chapter 3

Implementation

3.1 Top-level Design

Figure 3.1 shows the top-level structure of The Solver that has been already brieflyoutlined in the Section 2.3.1. I will distinguish between analysis and matching partsof The Solver.

The analysis part is responsible for preprocessing of a puzzle image, piece location andextraction. It outputs a bag of pieces, a set of small square images extracted from thepuzzle image. This part of the program contains the majority of computer vision andimage processing techniques.

The matching part of the program receives two bags1 on an input: bag of pieces and bagof final pieces. The latter is obtained by splitting a final image clue into square piecesof the same size (in pixels) as extracted pieces. This is easy, as the dimensions of thepuzzle are given and the correct size of a piece (in pixels) is inferred during the analysis.Matching part of The Solver uses various image matching algorithms that incorporatecomputer vision techniques (e.g. SIFT, histograms or Gabor wavelets) and hand-tailoredheuristics to solve the jigsaw.

In addition, there are two separate modules used to aid the evaluation of The Solver :Artificial input creation module and Evaluation module. These can optionally be includedin the pipeline during test runs.

1Note that bags are not multisets in this context. Bag is the name of the set containing pieces.

21

CHAPTER 3. IMPLEMENTATION 22

Image Preprocessing Module (F1)

Piece Extraction Module (F2)

Jigsaw Assembly Module (B1)

Artificial Input Creation Module (optional)

Puzzle Evaluation Module (optional)

Bag of Pieces

Mapping (solved jigsaw)

Bag of Final Pieces

Puzzle Image

Final Image Clue

Analysis

Matching

Input

Output

= description how to assemble pieces from the puzzle image

Figure 3.1: More detailed top-level structure of The Solver.Abbreviations for modules (F1, F2, B1) are used to ease the understanding of relation-ships between this and further diagrams.

3.2 Analysis Part of The Solver

Analysis part of The Solver can be easily visualized as an execution pipeline(Figure 3.2).

Image preprocessing module processes a puzzle image to create four binary images. Mul-tiple binary images are created as the algorithm operates on various colour channels(hue, saturation, value and original RGB image) to create four binary image outputs byremoving the background from the puzzle image. This is done to maximize chances ofsuccessful detection of puzzle pieces.

These binary images are then passed to the Piece extraction module. This module detectssquares in all four binary images independently and then merges sets of found squares.The final set is filtered and, at the end, corresponding square pieces are extracted from


the puzzle image. They are referred to as bag of pieces and are pipelined to the latermatching part of The Solver.

Output: 4 images (from the original RGB puzzle image, its hue, saturation and value channels) Background removal and piece extraction performed on each of them independently

(*)

Initial Preprocessing Stage (F1-1)

Background Removal Stage (binarization) (F1-2)

Same path as (*) for 3 additional images

Corner and Border Detection Stage (F2-1)

Borders with

Corners

Binary Image

Inference of Squares Stage (F2-2) Squares

Squares from all 4 paths merged to one set

Filtering by Rotation

Clustering of Squares Distance Based (preferred)

Area Intersection (legacy)

Physical Extraction

Image Preprocessing Module (F1)

Piece Extraction Module (F2)

Analysis

Bag of Pieces

Post-processing of Squares (F2-3)

Figure 3.2: Execution pipeline of the analysis part of The Solver, containing Imagepreprocessing and Piece extraction module. Diamond shaped boxes represent data thatare passed between di�erent modules or stages.


3.3 Image Preprocessing Module

Figure 3.3: Puzzle image is an input to the Image preprocessing module.

3.3.1 Initial Preprocessing - Sharpening, HSV, Edge Detection

Initial preprocessing consists of several steps as shown in Figure 3.4.

Unsharp Filtering

Puzzle Image

Hue Channel

Saturation Channel

Value Channel

Canny Edge Detection

Dilation

4 images – background removal and piece extraction performed on each of them independently

Image Preprocessing Module Initial Preprocessing Stage (F1-1)

Conversion to HSV

Figure 3.4: Initial preprocessing stage that is located at the beginning of the Imagepreprocessing module.


1. Unsharp FilteringUnsharp filtering is carried out independently for every image channel. It begins with acreation of an unsharp mask - the di�erence of an image and its slightly blurred version.A portion of the mask is then added to the puzzle image. This filtering acts as a high passfilter and enhances edges in the image. High frequency noise is also slightly amplifiedduring the process, but this does not matter as it is filtered in later stages. See pseudocodein Algorithm 1 and an example in Figure 3.5.

(a) Original puzzle image (b) Unsharp mask used for edge enhancement

Figure 3.5: Original puzzle image and an unsharp mask computed by subtracting slightlyblurred version of the image from itself.

Algorithm 1 Unsharp filtering to enhance edges in the puzzle image.1: procedure ApplyUnsharpFilter(Matrix img, int n, double ‡)2: gaussianKernel Ω gaussian kernel of size n ú n with standard deviation ‡

3: blurredImg Ω image.convolve(gaussianKernel)4: unsharpMask Ω img ≠ blurredImg

5: sharpenedImg Ω image + 0.5 ◊ unsharpMask

6: return sharpenedImg

To achieve the filtering, I am using OpenCV GaussianBlur and basic matrix subtractionand scalar multiplication.


2. Conversion to HSVThe sharpened puzzle image is converted into HSV colour space.2 This space is moresuited for further image processing as it splits image properties such as principal chroma(colour) or value (lightness) into separate channels. HSV channels of the puzzle imageshown in Figure 3.6 .

(a1) Hue channel (a2) Saturation channel (a3) Value channel

Figure 3.6: Hue, saturation and value channels of the puzzle image after unsharp filtering.

3. Canny edge detectionUsing OpenCV, I apply the Canny edge detector to every channel (Figure 3.7). Thisdetector internally uses Gaussian and Sobel filtering. It further applies non-maximumsuppression, to remove pixels that are unlikely to be part of an edge. It finally usesgradients computed at each pixel to retain only pixels that have gradients between specificthresholds (hysterises).

(a1) Hue channel (a2) Saturation channel (a3) Value channel

Figure 3.7: Hue, saturation and value channels after application of the Canny edgedetector. Note that the background is now solid black, but some pieces are not “closed”,i.e. properly surrounded by white edges.

4. DilationFinally, morphological dilation is used to make the edges on the contours of pieces moreconnected. This aims to “close” the piece patches (i.e. to make their white bordercontiguous) to prevent labelling them as background later. This is already implementedin the dilate function.

The output of the Initial preprocessing stage are four images: three Cannied imagesfeatured in Figure 3.7 and the original sharpened puzzle image shown in Figure 3.5a.

2OpenCV provides convenient built in function cvtColor.


3.3.2 Background Removal

The goal of this stage is to remove background from the images, producing four binary im-ages with black patches on a white background. Note that the original plan was to applythis algorithm only to the sharpened puzzle image. During early testing stages, I foundthe performance unsatisfactory. Experiments showed that using four input images,as described above, boost the performance and does not require significant modificationof the background removal algorithm. Background is removed independently fromeach input image.

Mask of Image Border

Let us consider borders of an image (width 20px), as this region corresponds with highprobability to the background. First of all, I create a border mask. It is a binary matrixover the whole image, where only pixels in outer region near the image border are set.(Figure 3.8).

20px 20px

Mask%pixels%set%to%0%in%the%inner%region%(including%pieces)%

Mask%pixels%set%to%1%in%the%outer%region%

Figure 3.8: Border mask (shaded outer region of width 20px) shown on a schematicdrawing consisting of 6 pieces. This masked region is assumed to be a representativesample of the background.


Identifying Candidate Background Pixels

0 256 20 39

area > threshold num

ber o

f pix

els i

n a

bin

Saturation 0 256 144 198

area > threshold

num

ber o

f pix

els i

n a

bin

Value 0 180 101 108

Hue

num

ber o

f pix

els i

n a

bin

area > threshold

Figure 3.9: Example of finding regions in hue, saturation and value histograms whichcover more than 95% of histogram area. Lower and upper limits of regions are:(hueMIN, hueMAX) = (101, 108), (satMIN, satMAX) = (20, 39) and (valMIN, valMAX) = (144, 198),

Then I compute three 1D histograms (H,S,V) of the masked region, using built-incalcHist function. We would like to find out a representative hue, saturation and valuefor a background pixel. I do this manually by identifying the narrowest region of a par-ticular histogram that covers area exceeding a given threshold.3 Such region is discoveredin every histogram and represented by its MIN and MAX limit. Hence, six limits are setall together: hueMIN, hueMAX, satMIN, satMAX valMIN and valMAX (Figure 3.9). Example ofsetting satMIN and satMAX is shown in Algorithm 2.

Algorithm 2 Finding the narrowest contiguous region in a saturation histogram thatcovers an area exceeding a given threshold

1: procedure FindRegionAboveThreshold(int[] histogram, double threshold)2: totalArea Ω sum over whole histogram array3: for width = 1 æ size of histogram do4: for all (low, high) s.t. high ≠ low = width do5: area Ω sum over histogram array from low to high

6: if areatotalArea

> threshold then7: satMIN Ω low

8: satMAX Ω high

9: return

3During prototyping, 95% showed to be a sensible value. For hue, the region may cross the boundarybetween mininimum (0) and maximum (180) hue. Despite being implemented, this detail is ignored infurther descriptions for simplicity.


Then, every pixel in the image is labelled as candidate background pixel if its hue, satu-ration and value fall between corresponding limits (Algorithm 3).

Algorithm 3 Labelling candidate background pixels1: procedure LabelCandidatePixels(Matrix image)2: for all pixel œ image do3: H Ω pixel.hue4: S Ω pixel.saturation5: V Ω pixel.value6: if H œ [ hueMIN, hueMAX ] · S œ [ satMIN, satMAX ] · V œ [ valMIN, valMAX ] then7: label pixel as candidate background pixel

Iterative Background Removal - True Background Pixels

Eight points, four corners of an image and four midpoints of each side, are consideredto be true background pixels. I iterate over the whole image eight times, twice in everypossible direction.4 Every pixels that is a candidate background pixel and is a neighbourto some true background pixel is labelled as true background pixel. I call this hysteresisprocess iterative background removal (Algorithm 4).

Algorithm 4 Labelling true background pixels iteratively.1: procedure IterativeBackgroundRemoval(Matrix image)2: trueBgMatrix Ω none of pixels is trueBackgroundPixel

3: for all corners or side-mid-points: bgP ixel œ trueBgMatrix do4: bgP ixel Ω trueBackgroundPixel

5: for iteration = 1 æ 8 do6: % NB: each iteration is in particular direction7: for all pixel œ image do8: bgP ixel Ω pixel in trueBgMatrix with same coordinates as pixel

9: if (pixel is candidateBackgroundPixel) then10: if (some neighbour of bgP ixel is trueBackgroundPixel) then11: bgP ixel Ω trueBackgroundPixel

The original technique I implemented and used for this stage was recursive backgroundremoval (recursive spilling). However, this technique was too memory consuming andhence abandoned.

42 (iterations) ú 2 (vertical dir.) ú 2 (horizontal dir.) = 8 iterations


After completing all iterations, true background pixels are coloured white and rest of thepixels black, producing binary image (Figure 3.10). I use median filtering and morpho-logical closing, i.e. dilation followed by erosion, to remove standalone, tiny black regionsthat have been incorrectly labelled as foreground. Kernel size 7 is used for all operations.

(a1) Original puzzle image (a2) Hue channel

(a3) Saturation channel (a4) Value channel

Figure 3.10: Original puzzle image together with hue, saturation and value channels afterbackground removal and binarization. These binary images are imperfect, but extractingpieces independently from all of them will lead to high proportion of all pieces beingcorrectly extracted. Note that this example features same four images shown in Figures3.3 and 3.7.


To summarize Background removal stage, Figure 3.11 lists aforementioned steps. Theseare applied independently on four input images (Figures 3.3, 3.7) and produces fouroutput binary images (Figure 3.10).

Border Mask

Histograms (H, S, V)

Background Removal

Image Preprocessing Module Background Removal Stage (F1-2)

(binarization)

Narrowest Region with Area above Threshold

Iterative (preferred)

Recursive “spilling” (legacy)

Binary Image

Morphological Closing

Median Filtering Noise Removal

Figure 3.11: Background removal pipeline that is at the heart of the Image preprocessingmodule.


3.4 Piece Extraction Module

3.4.1 Overview

Previous module in the pipeline (Image preprocessing module) produces a binary imagefor each of the four parallel paths (Figure 3.2). Foreground pixels that ought to belongto puzzle pieces are black and the background is white. The next step is to analyse thisbitmap to find analytical representation of black patches and consequently infer which ofthem are squares. This is done separately on each of the four binary images. Results arethen aggregated to yield one final bag of square pieces.

This task naturally encourages to use an analysis that can discover points and lines onborders of black patches. Firstly, I experimented with Hough line transforms in order tofind equations of such lines. Unfortunately, I had to conclude that the number of falsepositives was too high to allow for e�cient piece detection.

Hence, I decided to try another approach (Figure 3.12). This involves:

• detecting corners of patches. Corners are further

– clustered

– filtered

– and their local gradient (into patch) is computed

• tracing their inner borders (curves represented as vectors of points). Cornersfound in previous step are gradually associated with inner borders.

Finally, each such structure is inspected to decide whether it could represent a squarepiece (Figure 3.19).


Piece Extraction Module Corner and Border Detection Stage (F2-1)

Corner Detection

FAST (preferred)

Harris (legacy)

Corners

Local Gradient Computation

Simple Clustering

Binary Image

Inner Border Tracing

Borders with

Corners

Filtering

Filtered Corners

Figure 3.12: Pipeline at the beginning of the Piece extraction module.


3.4.2 Corner Detection

I have experimented with two algorithms for corner detection: FAST and Harris cornerdetector. E�cient implementations of both are provided in OpenCV.

Harris corner detector (cornerHarris in OpenCV5) considers intensity gradientsaround a pixel that is being tested to be a corner point. It constructs a scoring functionwhich then determines whether the pixel is a corner.

FAST (FAST in OpenCV6) considers 16 points on a Bresenham circle of radius 3 aroundeach pixel that is tested as a candidate corner point. If the di�erence between intensityof the centre pixel and at least 12 contiguous pixels on the circle is above given threshold,the centre point is considered to be a corner.

Early testing showed that the Harris corner detector is too slow and does not significantlyoutperform FAST. Hence, only FAST is used in the final implementation.

Once corner pixels are identified, they are clustered into sets based on their Euclideandistances.7 The centre of mass of every cluster is considered to be a candidate cornerpoint.

Computing Gradients into Patch and Corner Filtering

After clustering, there is still high number of false corners (false positives) that needto be filtered out (see Figure 3.15). We are only interested in corners that touch approx-imately right angled patches.

Inspired by FAST, I decided to consider a circle of radius 30 pixels around every candidatecorner point. I divide each such circle into 16 equal sectors as shown in Figure 3.13. ThenI calculate coverage (percentage of black pixels) of each sector.

Corner is considered further only if there are 4 consecutive sectors such that

• coverage of middle 2 sectors is above 80% (hatched in Figure 3.13)

• coverage of 2 outer sectors is above 60% (dotted in Figure 3.13)

• coverage of all other sectors is below 60%

5http://docs.opencv.org/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html; original paper [10]

6http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_fast/py_fast.html; original paper [15]

7distance threshold is 10 pixels

http://docs.opencv.org/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html

http://docs.opencv.org/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html

http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_fast/py_fast.html

http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_fast/py_fast.html


Figure 3.13: Circle with 16 segments around a candidate corner point. Black pixels areshown in grey for visibility of the grid. Hatched sectors have coverage greater than 80%and dotted ones greater than 60%. Other sectors have coverage below 60%. Hence thiscorner is not discarded and is considered further. Note that the pixel grid is not to scale.

If such 4 consecutive sectors are not found the corner is considered to be a false positiveand is discarded (Figure 3.15). Otherwise, I compute intensity gradient into the patchand store it alongside the corner itself (Figure 3.14). This is useful in further inferenceof square pieces.


Figure 3.14: Corners that are not discarded during filtering have their gradient vectorsinto corresponding neighbouring patches computed (heads of vectors depicted as smallcircles). Note that if a corner is not sharp enough it is discarded at this stage. Hencesome corners are missing from the picture. The algorithm is robust enough that a fewmissing corners does not impede reliable extraction of square pieces in later stages.

(a) Before filtering (b) After filtering

Figure 3.15: Example showing corners before and after filtering. Corners after filteringhave their corresponding intensity gradient vectors shown.


3.4.3 Inner Border Tracing

Previous corner filtering yields set of “good” corners and their corresponding intensitygradient vectors into adjacent patches. Afterwards, I manually find inner borders of blackpatches using a variant of inner boundary tracing algorithm presented by M. Sonkain [17](p. 191, Algorithm 6.6) which I implemented. Each corner from the previous stageis related to at most one inner border. A corner is related to a border if the smallestdistance from the corner to the border is lower than a given threshold.8 Let us call thestructure consisting of a border and all related corners BorderWithRelatedCorners.

The high level structure of this stage is shown in Algorithm 5.

Algorithm 5 Main loop of Inner border tracing stage1: procedure FindBordersWithRelatedCorners(Matrix image, Set corners)2: Set<BorderWithRelatedCorners> result Ω empty set3: while corners is not empty do4: pick any corner œ corners and remove it from the set5: if pixel at corner position is WHITE then6: initialBorderPoint Ω first BLACK point in corner gradient direction7: else8: initialBorderPoint Ω last BLACK point in inverse gradient direction9: border Ω trace inner border from initialBorderPoint (details later)

10: relatedCorners Ω singleton set containing corner

11: for all c œ corners do12: if smallest distance between c and a point on border < 10 pixels then13: add c to relatedCorners

14: remove c from corners

15: group relatedCorners and border into one structure16: add this structure to result

17: return result

The inner border tracing itself is shown below. I am using 4-connectivity for the tracing,with 4 directions being labelled 0–3 (Figure 3.16b).

1. Choose initial direction dir (0–3) according to inward gradient vector at initial pixelon the border.

2. Search 4 neighbours (Figure 3.16a) of the current pixel in anticlockwise direction,starting from the neighbour pixel in the direction (dir + 3) mod 4.

Note that this is equivalent to (dir ≠ 1) mod 4. dir can be though of as an ap-proximation of inward local normal into the patch, hence I subtract one from it totrace border in anticlockwise direction.

The first black pixel found is the next pixel on the border. Call dirOfMove thedirection from the current to next pixel.

810 pixels used


(a) 4-connectivity.Dotted neighbours of the cen-tre solid pixel are considered.

1

0

3

2

(b) Directions notation forInner border tracing.

(c) Example of a part oftraced path. Pixels on thepath are hatched.

Figure 3.16: Inner border tracing directions and example.

3. Update dir to (dirOfMove + 1) mod 4 (so dir still approximates inward normal)

4. Repeat these steps until you revisit the very first two pixels encountered duringtracing.

Figure 3.16c shows an example of such trace and Figure 3.17 shows two completeBorderWithRelatedCorners structures.

Figure 3.17: Example of two BorderWithRelatedCorners structures numbered 17 and18. Note that the borders of both patches are black. They have been discovered by innerborder tracing explained earlier.


3.4.4 Inference of Squares

The program at this stage receives BorderWithRelatedCorners structures. The goalis to use them to infer which pieces are squares to produce set of squares, each beingrepresented by 4 corner points.

Given a BorderWithRelatedCorners structure, following values are computed for everytwo corners on the border:

1. shortest Manhattan distance (L1

distance) along the border

2. Euclidean distance (L2 distance)

Moreover, acute angle of each corner triple is computed.

Having all this information, a square can be inferred from nearly every structure using4, 3 or even only 2 corners as shown in Figure 3.18.

Inferred from 4 points, simplest case. Square side length determined.

Inferred from 4 points, 2 points ignored

Inferred from 2 points on a side. Remaining 2 points are computed. The half-plane in which they lie is given by the 2 intensity gradient vectors.

Inferred from 2 points on a diagonal. Remaining 2 points are computed.

Inferred from 3 points and 1 point computed. The simplest case of an inference from 3 points.

Inferred from 3 points and 1 point computed. 1 point is ignored as it cannot be a square corner.

Figure 3.18: Sample image showing couple of square inference examples. Inference from4, 3 or 2 corners is shown. In some cases, missing corners need to be computed. Onthe other hand, sometimes a few detected corners are ignored as the algorithm finds outthey cannot be true corners of a square piece. Note that for couple of highlighted pieces,known used corners are shown in a red circle and locations of computed corners are shownin dashed circles. Note that in this image, piece borders are not shown in black and boldsquares around pieces are used solely for highlighting. The actual extracted squares aregiven by encircled corners.


1. Inferring Square from 4 CornersEuclidian and Manhattan distances between every two consecutive corners must be ap-proximately equal (small tolerances allowed).

2. Inferring Square from 3 CornersEuclidean and Manhattan distance used as before. Moreover, the angle of corner triplemust by within 10o tolerance from 90o.

3. Inferring Square from 2 CornersAssuming we know the length of a square side, Euclidean distance of two corners tellsus whether these two corners can form a side or diagonal of a potential square. In the“diagonal case”, remaining two unknown corners can be easily computed. In the “sidecase”, inward intensity gradients computed earlier tell us in which half-plane the squarelies. I use well-known cross product trick to infer the relative angular position of twovectors. Hence, two unknown points can be also easily found.

These three heuristics are applied in sequence. First, I look for squares that can beinferred from 4 points, then 3 and finally 2. The true square side length (in pixels) isusually discovered in the first two steps and then used for inference from 2 corner points.In cases true side length is not determined, inference from two points is unsuccessful andno squares are found. Note that even in such case, squares can be found on parallel pathswhich process di�erent binary images (refer back to Figure 3.2).

Figure 3.19 shows this part of the processing pipeline schematically.

Piece Extraction Module Inference of Squares Stage (F2-2) Borders

with Corners

Squares Inferred from 4 Corners Euclidian Distance

Manhattan Distance

Squares Inferred and Completed from 3 Corners

Euclidian Distance

Manhattan Distance

Angle Tolerance

Squares Inferred and Completed from 2 Corners

Euclidian Distance

Inward Intensity Gradients

Squares

Figure 3.19: Inference of Squares at the end of Piece extraction module.


3.4.5 Post-processing of Squares

Recall that the processing that happens between Initial preprocessing stage (3.3.1) andhere is done separately on four binary images (original puzzle image and its individualHSV channels). Hence, at this stage, we have four sets of squares that are mergedtogether (Figure 3.20 and earlier 3.2). Note that some squares are likely to be discoveredon multiple paths, so this merged set is redundant.

Set of Squares

Squares from all 4 paths merged to one set

Set of Squares

Set of Squares

Set of Squares

Initial Preprocessing Stage (F1-1)

produces 4 binary images

Puzzle image in RGB Hue Channel Saturation Channel Value Channel

Filtering by Rotation (F2-3)

Clustering of Squares (F2-4) Distance Based (preferred)

Area Intersection (legacy)

Physical Extraction

Bag of Pieces

Post-processing of Squares (F2-3)

Figure 3.20: Post-processing of squares at the end of Piece extraction module. Top partof the figure emphasizes that four sets of squares are created in previous stages.


I further process this set in order to

1. Discard incorrectly rotated squares

Requirement analysis says that all square pieces should be correctly rotated, whichmeans that their sides should either be horizontal or vertical with respect to thepuzzle image photography. I allow for 15o error in this rotation since a person layingout puzzle pieces is not able to achieve perfect orientations of pieces.

2. Cluster squares to remove redundancy

As mentioned earlier, some square pieces can occur multiple times. Hence, thewhole set of squares must be clustered according to some criterion and only onesquare per cluster must be present in final set.

I am using OpenCV functionality to partition squares. The only tricky part is toinvent a measure that decides whether two squares belong to the same cluster. Ihave experimented with two such measures:

(a) Area Intersection Measure

Intersection is computed using a pixelwise boolean AND of both squares. Twosquares belong to same cluster if their intersection covers at least 80% of areasof both of them.

(b) Distance Based Measure

If the distance between the centre of masses of two squares is below giventhreshold, they are considered to belong to the same cluster. Threshold iscomputed as arithmetic average of side lengths of both squares.

Both measures are heuristics. Simpler distance based measure showed to befavourable as it works pretty well and is much faster. Hence, only this measureis used in the final implementation.

After the clustering, one square is picked per cluster. If a cluster contains twoto four squares, I compute sum of square di�erences between each square sideand earlier determined true square side. This is done for each square in the clusterand the one with the smallest error is picked for this cluster.

If the cluster contains exactly one square, it is trivially chosen and in a rare caseof more than four squares, their arithmetic average is computed and used as arepresentative square.

Finally, OpenCV functions getPerspectiveTransform and warpPerspective are usedto extract pieces from the puzzle image to yield the bag of pieces.


3.5 Matching Part of The Solver

At this stage, the bag of pieces from the puzzle image has been extracted. Dimensions ofthe puzzle, i.e. number of rows (R) and columns (C) is given, so the final image clue canbe easily split into square pieces to yield bag of final pieces. All pieces in both bags aresquares and are resized to the same size. Internally in OpenCV, each piece is representedas a matrix with three channels.

Note that bags in this context are names of two sets of pieces (not multisets). In thissection, let us denote the bag of pieces P and bag of final pieces F . I will call elementsof P pieces and elements of F final pieces and use p and f for elements of these twosets respectively.

Note that cardinalities of these two sets do not have to equal (|P| ”= |F | allowed) asfewer or more pieces than the true number could have been extracted from the puzzleimage.

Goal of the Matching Part

The goal is to find an optimal mapping Ï : P Ô F . In this case, optimal means thatit gives the best possible assembly of the pieces to yield the final result.

Note that if

• |P| Æ |F | ∆ Ï is total function

• |P| > |F | ∆ Ï is partial function

In both cases, Ï is injective as two distinct pieces cannot map to the same final piece.9

Positions of each f œ F is known, as the final image clue was deterministically split tofinal pieces. Once Ï is determined, any p œ P, such that Ï(p) exists will have its positionidentical to the position of Ï(p). Hence pieces in P can be assembled to the grid of size(R, C) to solve the puzzle.

If there exists some f œ F such that no p œ P maps to it, this position in the finalresult will be unoccupied by any piece and a hole will exist at this position. If there aremore pieces than final pieces, some of pieces will not appear in the final result.

9possible piece break-up is not considered


Jigsaw Assembly Module (B1)

Bag of Pieces

Bag of Final Pieces

1 to 7 comparison methods used

1. Cosine Distance

2. SIFT matching

3. Gabor Wavelet Bank Filtering

4. Histogram Related

correlation

Chi-square distance

intersection

Bhattacharyya distance

up to 7 probability matrices produces

(optional) weighted average computation

Probability Matrix

Greedy Assembly

Final Mapping φ

Figure 3.21: Overview of the Matching part of The Solver.

Subtasks of the Matching Part

I have decided on the following approach in order to construct Ï (Figure 3.21).

1. Invent a distance measure between pieces and final pieces (Section 3.5.1).

This part of the project o�ers a lot of freedom as there are various distance measuresthat can be considered. I have experimented with measures based on

• Sum of square di�erences and cosine distance computed on pixel values

• Histograms (four di�erent histogram comparison methods available)

• Feature points (SIFT and Gabor wavelets used for feature points extraction)

For some measures (such as correlation of histograms), it is more convenient to workwith similarity values instead of distances, to which they are inversely related.


2. Using similarity values (or distances), construct a probability vector for eachp œ P, ranging over all f œ F (Section 3.5.2).

Using these vectors as rows of a matrix yields a probability10 matrix P of size|P| ◊ |F |. Each row of the matrix corresponds to a probability distribution of asingle piece over all final pieces. Element P[i][j] = pij denotes the probability thatthe piece i maps to the final piece j.

One probability matrix for each used distance measure is constructed.

3. (optional) Combine probability matrices to a single one using weighted average(Section 3.5.3).

This is included if we choose to use multiple distance measures in the first step.

4. Use the final probability matrix to construct Ï in a greedy manner (Section 3.5.4).

Let us now go through these subtasks in order.

3.5.1 Distance Measures

The goal of this task is clear. Given a p œ P and f œ F , a scalar distance � betweenthem should be computed. Alternatively, we could also work with similarity value ⁄

which is inversely related to �.

Both p and f are provided as three channel matrices, P and F respectively. W.l.o.g.assume they are of size n ◊ n pixels (indexed from 1) and the colour channel is the thirddimension.

0. Naive Approach - Sum of Square Di�erences

At first, I tried to use a simple sum of square di�erences as the naive distance measure. Iconverted both matrices to HSV colour space and computed the distance �P,F as follows.

�P,F =nÿ

i=1

nÿ

j=1

ÿ

kœ{H,S,V }(Pijk ≠ Fijk)2

Early testing showed that this measure is very poor and does not yield satisfactory results.Hence it was abandoned early in the development and will not be discussed further.

10also called stochastic


1. Cosine Distance

To speed up the computation of cosine distance, I apply the Gaussian pyramid down-sampling described in Section 2.2.1, using OpenCV pyrDown function. Four iterationsshowed to provide the best trade-o� between the speed of further processing and thequality of produced results. The length of a piece side is decreased 24 = 16 times and thearea 28 = 256 times (quadratically with side length).

By appending all rows of a matrix P into one vector, in every colour channel (in HSVspace), we can treat this matrix as a vector p of length 3n2 (i.e. in R3n2). We do the samewith matrix F , obtaining f . Elements of p and f are pixels values in H,S,V channels.

Afterwards, we compute the value of the cosine of the angle – between these two vectorsas

cos – = p · f

|p| ◊ |f |

All the elements (representing HSV values) in the both vectors are positive, hence thecosine is always positive (in [0, 1]).

The higher this value, the smaller the angle – between these two vectors and hence thecloser they are. In the extreme case when cos – = 1, p and f are the same vector.

Distance between two vectors can be then computed as �P,F = 1 ≠ cos –. However, it iseasier to directly define a similarity value ⁄P,F which is in this case equal to cos –.

⁄P,F = p · f

|p| ◊ |f |

Note that in OpenCV, this similarity can be computed using template matching func-tion with argument CV_TM_CCORR_NORMED.

2. Histogram Distance Measures

Using OpenCV, I compute 2D histograms of both P and F . Hue-saturation histogramsare chosen to achieve luminance invariance. Early testing showed that using 4 and 16bins for H and S respectively provides good trade-o� between speed and accuracy of thelater assembly. Using more bins is not guaranteed to be advantageous as the bins end upbeing sparsely filled.

After this step, P and F are represented by two histogram matrices HP and HF of size4 ◊ 16. Let I be the set of all 64 bins in both histograms. For i œ I, I will use H(i) torefer to the histogram count at the (2D) position i.


Four di�erent measures, o�ered by OpenCV in compareHist function, are then used tocompute the distance �P,F (or the similarity ⁄P,F ) between P and F .

1. Correlation flP,F

flP,F =q

iœI(HP (i) ≠ HP )(HF (i) ≠ HF )Òq

iœI(HP (i) ≠ HP ) qiœI(HF (i) ≠ HF )

where H is the mean of a histogram H defined as

H = 1|I|

ÿ

iœI

H(i)

This gives values in [≠1, 1]. Higher correlation implies that P and F are moreclosely related. Hence this is directly the similarity ⁄P,F .

⁄P,F = flP,F

2. Chi-square Distance

�P,F =ÿ

iœI

(HP (i) ≠ HF (i))2

HP (i)

3. Intersection of Histograms

Larger intersection indicates higher similarity, hence this method directly expressesthe similarity value ⁄P,F .

⁄P,F =ÿ

iœI

min(HP (i), HF (i))

4. Bhattacharyya Distance (aka Hellinger distance)

�P,F =ıııÙ1 ≠ 1

ÚHP HF

qiœI

ÒHP (i)HF (i)

In conclusion, correlation and intersection comparison methods yield similarity valuesand Chi-square and Bhattacharyya produce distance values. These are then used inthe construction of a probability matrix explained in Section 3.5.2.


3. Gabor Wavelet Bank Filtering

Implementation of algorithms described in this section was not planned at the beginningand can be considered as an extension to the project.

Basic principles underlying 2D Gabor wavelets are described in Section 2.2.2. I use aset of wavelets (wavelet filter bank) in order to extract features with di�erent scales,frequencies and orientations from both the piece P and the final piece F . The filterbank contains wavelets of 4 scales, also called stages, and 6 orientations, i.e. 24 in total(Figure 3.22). Each wavelet has two components, real and imaginary, hence the filterbank contains 48 convolution kernels.

OpenCV makes filtering easy via filter2D function. However, it does not directly pro-vide Gabor wavelet convolution kernels. Hence, I needed to construct these kernels man-ually from first principles and the functional definition of the wavelet. I adopted anapproach inspired by W. Chao [2].

Using the functional form of Gabor wavelet outlined in 2.2.2, we can achieve this asfollows.

Notation and Components of the Gabor Wavelet Function

• u is the stage of the wavelet, u = 0, . . . , 3

• fMAX is the maximum central frequency of the wavelet and the spacing betweencentral frequencies of wavelets at two adjacent scales is

Ô2. I am using fMAX = 0.35,

as proposed in [2].

• fu is the central frequency of the wavelet at the stage u.

fu = fMAXÔ2u

2fifu is the angular central frequency of the wavelet.

• V is the total number of orientations, i.e. V = 6

• ‚ is the index of the current orientation, ‚ = 0, . . . , 5

• ◊‚ is the orientation of the wavelet in radians (i.e. orientation of the major axis ofthe elliptical Gaussian envelope)

◊‚ = ‚

Vfi

• (xR, yR) is a point (x, y) rotated by angle ◊‚

xR = x cos ◊ + y sin ◊

yR = ≠x sin ◊ + y cos ◊

• –, — are parameters determining the width of the Gaussian envelope in x and y

directions. I am using – = — = 1.2, as proposed in [2].


Using this notation, the wavelet at the stage u with orientation ‚ (out of all V possibleorientations), centred at the origin, can be written as

g(x, y) = f 2

u

fi–—e

≠Ë

f

2

u

–

2

x2

R

+

f

2

u

—

2

y2

R

È

ei2fifu

xR

g(x, y) = f 2

u

fi–—e

≠Ë

f

2

u

–

2

x2

R

+

f

2

u

—

2

y2

R

È

(cos 2fifuxR + i sin 2fifuxR)

We would like to use this function to construct real convolution filters. Hence, we areinterested in the real and imaginary parts of the wavelet.

Ÿ(g(x, y)) = f 2

u

fi–—e

≠Ë

f

2

u

–

2

x2

R

+

f

2

u

—

2

y2

R

È

cos 2fifuxR

⁄(g(x, y)) = f 2

u

fi–—e

≠Ë

f

2

u

–

2

x2

R

+

f

2

u

—

2

y2

R

È

sin 2fifuxR

For every scale and orientation, I sample both the real and imaginary part of g(x, y)to obtain two discrete convolution filters. Using this technique, the whole filter bankcontaining 48 filters is created (Figure 3.22). The filters are square real matrices andtheir size is approximately equal to the size of P and F . I chose such size after earlyexperiments showed that it yielded the best results.

Figure 3.22: Filter bank containing 48 Gabor wavelet filters (4 scales ◊ 6 orientations ◊2 (real and imaginary component)). The figure shows half of the bank.

After constructing the filter bank, I convolve both piece and final piece with every filterusing OpenCV filter2D function. This results in 48 filter responses for both images as


shown in Figure 3.23. Logically, these 48 filter responses R can be internally groupedto pairs where each pair (RŸ(w

i

)

, R⁄(wi

)

) contains a filter response of the real and theimaginary part of the same wavelet wi.

Piece P

Convolution with every Filter in the Bank

Final Piece F

48 filter responses (24 pairs)

48 filter responses (24 pairs)

Figure 3.23: Both the piece P and the final piece F are independently convolved withevery filter in the bank to produce 48 filter responses. Note that both sets of responsesare internally grouped to pairs (RŸ(w

i

)

, R⁄(wi

)

), i = 1..24, as the real and imaginary partof each wavelet wi accounts for one filter.


Let us now concentrate on piece P and its 48 filter responses (same done to final piece F in-dependently). I take the modulus (Equation 3.1) of each pair of responses (RŸ(w

i

)

, R⁄(wi

)

)to reduce 48 responses to 24. The modulus is taken pixel-wise and the idea is inspired byQuadrature Demodulator Network presented in the Computer vision course [5](p.41).

Rwi

= R2

Ÿ(wi

)

+ R2

⁄(wi

)

, i = 1, . . . , 24 (3.1)

In order to compute distance between P and F , I need to use these filter responses RP,wi

and RF,wi

to construct two feature vectors p and f . These vectors ought to describe P

and F respectively.

Feature vector p can be constructed from filter responses RP,wi

by computing means (µ)and standard deviations (‡) of every filter response j. Related µ’s and ‡’s are pairedto (µ(j), ‡(j)) pairs that form elements of the feature vector p (Algorithm 6). Featurevector f is created in the same manner from filter responses RF,w

i

.

Algorithm 6 Creating feature vector of length 48 using mean and standard deviationof filter responses.

1: procedure ComputeGaborFeatureVector(vector<Matrix> filterResponses)2: featureV ec Ω empty vector of doubles3: meanV ec Ω means of filterResponses

4: stdevV ec Ω standard deviations of filterResponses

5: for j = 1 æ 24 do6: (µ(j), ‡(j)) Ω (meanV ec[j], stdevV ec[j])7: featureV ec.append((µ(j), ‡(j)))8: return featureV ec

The final part is to compute the distance �P,F between feature vectors p and f . I decidedto use the metric proposed by Yang and Newsam [18] which is well suited for this vectorthat contains concatenated µ’s and ‡’s.

The distance is defined as

�P,F =24ÿ

j=1

”P,F (j),

where j indexes feature pairs (µ(j), ‡(j)).

”P,F (j) is the distance between two such pairs from p and f that describe feature j.

”P,F (j) =-----µ(j)

P ≠ µ(j)

F

Ê(µ(j))

----- +-----‡(j)

P ≠ ‡(j)

F

Ê(‡(j))

-----

Here, µ(j)

P , ‡(j)

P are elements of p corresponding to the feature j, i.e. the mean and standarddeviation computed using wavelet wj. Ê(µ(j)) and Ê(‡(j)) are normalisation factors.


They are standard deviations of respective features computed over whole dataset (featurevectors for all pieces and final pieces).

Using this approach, I compute distance �P,F that is later used for construction of prob-ability matrix (Section 3.5.2).

4. Scale Invariant Feature Transform (SIFT)

The last distance measure that will be discussed is a measure based on SIFT (Section2.2.3 and Appendix A). Using this measure was considered as a possible extension inthe original proposal. Unlike Gabor wavelets, SIFT is well-implemented in OpenCV andhence I decided not to reimplement it due to the limited scope of the project.

Given a piece P and final piece F , OpenCV detect function is used to find 100 mostdominant keypoints in both images (Figure 3.24). Then, compute function computes thekeypoint descriptor11 for every detected keypoint.

Figure 3.24: Most dominant SIFT keypoints shown in the image (a piece from RaspberryPi puzzle image). The size and orientation of every keypoint is indicated by the radiusof corresponding circle mark.

OpenCV o�ers a convenient interface to the FLANN library. I use this library in order tofind the top N matches12 between keypoint descriptors in both images. Initially, I usedbrute force matching but I switched to FLANN to achieve better time performance.

Let us call the distance between ith best keypoint match ”i. The distance �P,F betweenthe piece and final piece can be then computed as the average distance between the N

best keypoint matches.

�P,F = 1N

Nÿ

i=1

”i

Visualisation of the matching procedure is shown in Figure 3.25.

In conclusion, this method yields distance value �P,F between P and F .11128 bin orientation histogram12N = 30 used


Figure 3.25: Visualisation of the top matches between a piece from puzzle image (left)and the corresponding piece from final image clue (right).

3.5.2 Construction of Probability Matrix

So far, I have discussed several distance measures �Pi

,Fj

and similarity measures ⁄Pi

,Fj

tocompute the distance or similarity between a piece Pi and final piece Fj. In this section,let us have a look at how to process and aggregate such values to obtain a probabilitymatrix P.

Given |P| pieces and |F | final pieces, a probability matrix P is a matrix of size|P| ◊ |F |. An element P[i][j] = pi,j at the position (i, j) represents the probability thatthe mapping function Ï maps the piece Pi maps to the final piece Fj. Pi and Fj areindexed from 0.

Let NP = |P| ≠ 1 and NF = |F | ≠ 1.

P =

Y_____]

_____[

p0,0 p

0,1 . . . p0,N

F

p1,0 p

1,1 . . . p0,N

F

... ... . . .pN

P

,0 pNP

,1 pNP

,NF

Z_____

_____\

=

Y_____]

_____[

≠æp0

≠æp1

...≠≠æpN

P

Z_____

_____\

P is a stochastic matrix which means that the ith row is the probability distribution vector≠æpi of the piece Pi over all final pieces with respect to the mapping function Ï. Hence,every row must sum to 1.

NFÿ

j=0

pi,j = 1, ’i = 0, . . . , NP

In order to construct P, we need to construct individual probability distribution vectors≠æpi which form rows of P.


Constructing Probability Distribution Vector ≠æpi

We would like to construct a probability distribution ≠æpi (for every i) using distance values�P

i

,Fj

or similarity values ⁄Pi

,Fj

that have been computed previously.

Let us deal with the distance values first. For a given i, let ≠æ�i be a vector of all distancevalues �P

i

,Fj

.

≠æ�i = (�Pi

,F0

, �Pi

,F1

, . . . , �Pi

,FN

F

)

Let �max

i

= max ≠æ�i.

We first turn this distance vector ≠æ�i to the corresponding similarity vector ≠æ⁄i by sub-

tracting every distance value from the maximum distance value plus one. Extra one isadded to avoid creating a zero entry.

≠æ⁄i = ((�

max

i

≠ �Pi

,F0

), (�max

i

≠ �Pi

,F1

), . . . , (�max

i

≠ �Pi

,FN

F

))

If we have used a measure that directly produces similarity values (such as using corre-lation or intersection for histogram comparison) we obtain this ≠æ

⁄i directly. In such case≠æ⁄i = (⁄P

i

,F0

, ⁄Pi

,F1

, . . . , ⁄Pi

,FN

F

).

The final step is to normalise ≠æ⁄i to make it a probability vector ≠æpi . In other words, the

sum of all its entries must be one.

To define ≠æpi , label entries of ≠æ⁄i as ⁄j, i.e. ≠æ

⁄i = (⁄0

, ⁄1

, . . . , ⁄NF

).

Then≠æpi =

≠æ⁄i

qNF

j=0

⁄j

Finally, we form P using vectors ≠æpi as row entries.

3.5.3 Weighted Average of Probability Matrices

The Solver o�ers an option to use multiple distance or similarity measures. Let K bethe number measure used. In such case, K probability matrices Pk are produced.

However, I need exactly one probability matrix P as an input to the final Greedy assemblystage. In order to obtain it, I weight Pk values using weighting factors –k.

P =K≠1ÿ

k=0

–kPk

K≠1ÿ

k=0

–k = 1


The condition that all weighting factors sum to one ensures that the resulting P is still astochastic matrix.

A possible heuristic to choose weighting factors –k is to choose distance or similaritymeasures that

• produce good results

• are weakly correlated

See Section 4.6.4 for further details.

3.5.4 Greedy Assembly

Given a single probability matrix P of size |P| ◊ |F |, the goal is to find an feasible algo-rithm to construct a mapping Ï : P æ F . One evident approach would be to constructall possible mappings, evaluate them and choose the best one. However, number of suchmappings grows exponentially13 with |F | and hence this approach is not computationallyfeasible.

Therefore, I have chosen a greedy strategy to solve this problem (Algorithm 7). Startingwith an empty mapping Ï, I iterate over P to find a maximal entry pi,j such that thepiece i and final piece j have not yet been used (i.e. Pi ”œ domain(Ï) and Fi ”œ range(Ï)).Once there are no unused pieces or final pieces14, the algorithm terminates.

Algorithm 7 Greedy construction of the mapping Ï.1: procedure GreedyAssembly(Matrix P)2: Ï Ω empty mapping œ (P æ F )3: while ÷i, j.Pi ”œ domain(Ï) · Fj ”œ range(Ï) do4: (iÕ, jÕ) Ω arg max

(i,j) s.t.

P

i

”œdomain(Ï)·F

j

”œrange(Ï)

P[i][j]

5: Ï(PiÕ) Ω FjÕ

6: return Ï

Using the mapping Ï and the known positions of final pieces Fj in the final image clue,I connect pieces Pi to produce the final result.

13Assuming |P| Æ |F |, it is exactly |F |!(|F |≠|P|)!

14depending on the arity of both sets


3.6 Performance Considerations

As mentioned earlier, the run time of The Solver was not the primary factor during itsanalysis and implementation. However, it became apparent during the implementationthat some parts of the code would be performance bottlenecks and could hinder properevaluation. Hence, I decided to exploit multiple cores I had available to reduce the runtime of The Solver.

I identified parts of the code that were the main limiting factors of The Solver ’s speed. Ithen used C++ threads from the standard library to parallelise these parts of the code.This was not planned at the beginning and hence can be considered as an extension.

Parts of the code that have been parallelised include

• part of the analysis part of The Solver. Recall that identical processing (backgroundremoval, corner and border detection and later square inference) is independentlydone on four images (original RGB image and its H,S,V channels), which is idealfor parallelisation.

• generating feature vectors for the pieces and final pieces using Gabor filter bank(Section 3.5.1). This is possible as same operation, namely convolution with everyfilter in the bank, is done for every piece and final piece.

3.7 Reassembly without Final Image Clue

As an extension, I attempted to do the reassembly without using the final image clue. Iused an edge compatibility metric based on the sum of square di�erences of pixel valuesalong edges. Compatibility of all possible adjacent edges was computed and fed into agreedy assembler that gradually joins pieces to grow areas of the image that are solved.

However, early testing showed that the performance of this algorithm is very poor. It isdue to very severe traces of background colour around sides of individual square pieces.These arose naturally as a result of imperfect extraction of pieces in the analysis part ofThe Solver. When applied to perfect pieces (artificially generated by simple splitting animage into squares), the algorithm worked correctly. However, as it is inapplicable to theoriginal problem considered in this project, I abandoned the development after couple ofdays and will not discuss it further in the dissertation.

Chapter 4

Evaluation

4.1 Overview

As outlined in Section 2.3.5, I used both camera-taken and artificially-generated im-ages to test the accuracy and performance of The Solver. Eleven images1 featuring variousscenes were used to create four artificially-generated and one camera-taken datasets.

I carried out both automatic and human evaluation of The Solver under various criteria.

1. Automatic evaluation

This was performed on four artificially-generated datasets. I wrote code (Section4.4) to aid the automatic evaluation on large datasets consisting of thousands of gen-erated puzzle images. CSV2 file is generated for storing and further hand-processingof generated statistics.

On each of the datasets, I evaluated The Solver from three distinct points of view:

(a) End-to-end accuracy

Accuracy of The Solver was tested end-to-end in seven configurations,corresponding to seven possible distance measures discussed in Section 3.5.1.In every configuration, one of these distance measures was used in the matchingpart of The Solver. See Section 4.6.2 for the comparison of results. Directcomparison metric, i.e. the fraction of pieces that are in the correct absoluteposition, was used in this evaluation.

1see Appendix B2comma separated values file

57

CHAPTER 4. EVALUATION 58

Seven configurations are:

• Cosine distance

• SIFT based distance

• Gabor wavelets based distance

• Histogram measures

– Correlation between histograms

– Chi-square distance

– Intersection of histograms

– Bhattacharyya distance

(b) Detection rate of square pieces in the analysis part of The Solver

This evaluation considers the detection rate of the analysis, i.e. the firstpart of The Solver responsible for image preprocessing and later detection andextraction of square pieces from the puzzle image.

The detection rate is expressed as the percentage of square pieces that arecorrectly detected and extracted from the puzzle image.

(c) Run times

Run times of the analysis (detection time) and matching part (assemblytime) (using every configuration out of the seven possible) of The Solver wereindependently measured and compared where appropriate. Note that this wasdone on top of the original project plan. Although I did not consider the speedof The Solver to be the primary goal of the project, I found it interesting anduseful to include the comparison of run times.

2. Human evaluation

The camera-taken dataset was used for human evaluation. Participants were askedto judge the accuracy of the puzzle assembled by The Solver using intuition. SeeSection 4.6.5 for further details.

As an extension, I used results from the automatic evaluation to compute pairwisecorrelations between seven possible configurations of The Solver (Section 4.6.4). Thiswas done in order to estimate whether a combination of several distance measures couldboost The Solver ’s accuracy.


4.2 Artificial Generation of Puzzle Images

All puzzle images used for the automatic evaluation are artificially-generated. Duringtheir generation, I remember the correct positions of the corners of square pieces in thepuzzle image. This gives me the complete information about the generated puzzle imagesof various dimensions and quality.

The data is stored in a text file (Figure 4.1) on the disk during the generation of thedatasets and is later used to automatically evaluate The Solver ’s accuracy and detectionrate.

Figure 4.1: Screenshot of a sample text file containing coordinates of corners of pieces.Each row contains four (x, y) coordinates describing one square piece.

4.3 Accuracy Metrics

The metrics discussed in this section are used solely for the purposes of the automaticevaluation. All puzzle images used in this evaluation are artificially-generated.

Detection rate of Analysis part of The Solver

After the analysis part of The Solver completes, I compared the coordinates of foundsquare pieces to the true coordinates of pieces present in the puzzle image. Thelatter are stored on the disk as discussed in Section 4.2. A piece is considered to becorrectly located and extracted if all four discovered corners lie within one third of anedge side from the true corners (Figure 4.2). I have chosen this relatively liberal measureas I observed that pieces which are extracted with this accuracy can in most cases stillsuccessfully be used in the later matching part of The Solver.

Detection rate is the percentage of all pieces that are correctly detected and extracted.For example, if a puzzle has dimensions 6 ◊ 10 and 30 pieces are correctly extracted, thedetection rate would be 30

6◊10

= 0.5 = 50%.


at least bottom right true corner is too far from detected corner

tolerated error is ⅓ of a side length

Figure 4.2: Two detected square pieces featured. The square on the right is considered tobe erroneously detected as the distance between true and detected bottom right cornersis too large.

detection rate = number of correctly detected piecestotal number of pieces

End-to-end Accuracy

Accuracy of the solver is expressed as the percentage of all pieces that end up in correctabsolute positions after the assembly of a jigsaw puzzle. In literature this metrics is alsoknown as direct comparison [3].

accuracy = number of pieces placed in correct absolute positionstotal number of pieces

Other metrics, such as neighbour comparison or largest component computation, whichI originally planned to use for end-to-end evaluation showed not to be applicable to thefinal implementation and hence were omitted.

4.4 Software Support for Automatic Evaluation

Parts of the code I wrote are dedicated to allow automatic evaluation. These include:

• Artificial-generation of puzzle images and final image clues using functions suchas createCameraLikeDataset. I have written a dedicated module namedArtificialInputCreation implementing the generation of various datasets.

• Saving corner coordinates of generated puzzles onto a disk in a plain text file.

• Producing statistics about the detection rate and accuracy (in percentage)of the analysis and matching parts of The Solver respectively. DedicatedcreateStatistics function has been implemented for this purpose in thePuzzleEvaluation module. Statistics are saved onto the hard disk in a CSV file.


4.5 Datasets

I used eleven source images3 in order to prepare sample puzzle images and final imageclues for the evaluation. I aimed for high variability in these images and therefore theyfeature both natural and indoor scenes, objects, human portraits and paintings.

I produced following five datasets:

1. Camera-taken dataset

I printed all source images on an A3 format. I cut these into square pieces, randomlyshu�ed them and placed them onto single-colour paper to construct a puzzle image.I used di�erent colours for backgrounds to achieve various kinds of foreground-background contrast. Afterwards, I took a picture of this puzzle image using asmartphone camera. Final image clues were also taken using the smartphone.

This dataset is human-evaluated as I do not have coordinates of corners of thepieces. Note that two Raspberry Pi samples are used in this dataset (on bothorange and green background as the green one should be more di�cult for TheSolver due to colour similarity). Hence the dataset contains twelve samples4.

Figure 4.3: Camera-taken sample puzzle image. The traces of vertical and horizontallines in the middle of the image are due to using four green A4 papers as backgroundwhen taking the photography.

2. Generated datasets

All these datasets were generated by a dedicated ArtificialInputCreation mod-ule. After resizing which makes both image dimensions multiples of piece sidelength, a source image was split to squares. These were then randomly shu�edand placed onto a background in a non-overlapping manner. Afterwards, randomrotations up to 5o were added to mimic human error when laying out the puzzlepieces. Final image clue is identical to the original source image.

3see Appendix B4see Appendix C


By creating multiple datasets, I tried to account for various deteriorating artefacts,such as noise or geometric distortion, that is an inevitable consequence of realphotography.

(a) “Perfect” dataset

No distortions or noise is applied to generated puzzle images and final imageclues. Perfectly uniform background colour is used.

Figure 4.4: Generated “perfect” sample puzzle image.

(b) “Gaussian” dataset

Same as “perfect dataset”, but Gaussian noise with ‡ = 20 is added to bothpuzzle images and final image clues.

Figure 4.5: Generated “Gaussian” sample puzzle image. Gaussian noise is easily visiblein the zoomed in portion of the image shown on the right.

(c) “Camera-like” dataset

I wanted to get the best of both worlds:

• an ability to do large scale evaluation and generation of lots of samples,

• an ability to reliably mimic artefacts added during taking a smartphonephotography.


Therefore I decided to separately take pictures of both

• eleven input images,

• uniformly coloured papers that would serve as backgrounds.

Having taken these photographs, I did the rest of the generation (splitting,shu�ing etc.) in software. This way, I was able to create lots of samples thatmimic camera-taken pictures.

Figure 4.6: Generated “camera-like” sample puzzle image. Note the variation of thelighting across the image. This was created naturally when smartphone pictures of theoriginal Raspberry Pi image and green background were taken.

(d) “Distorted” dataset

This dataset was created in a same way as the “camera-like” dataset butperspective distortions were added to the pieces after splitting an inputimage into squares. Each corner was moved in a random direction by up to15% of square side length. I used OpenCV getPerspectiveTransform andwarpPerspective functions.

Figure 4.7: Generated “distorted” sample puzzle image. Mild perspective transforms wereperformed to individual pieces at random. Picture on the right showed one such pieceunder the zoom.


4.5.1 Camera-taken Dataset Details

As previously mentioned, this dataset contains twelve manually captured samples (Ap-pendix C)5. Raspberry Pi and Edsac images are reused on two di�erent backgroundsin order to produce two challenging input puzzle images (Raspberry Pi on green back-ground and Edsac on pink). Sir David Robinson source image is not used in this particulardataset.

The size of each piece side is approximately 230 pixels, as measured in Photoshop usingthe ruler tool.

4.5.2 Artificially-generated Datasets Details

Every possible input image out of N = 11 images was split and placed on B backgroundsof di�erent colours. In every dataset, I consistently used D = 6 di�erent puzzle dimen-sions (number of rows and columns). Hence, the total number of samples per dataset canbe computed as

number of samples in a dataset = N ◊ B ◊ D.

Dataset N B D Total number of samples“Perfect” 11 32 6 2112

“Gaussian” 11 32 6 2112“Camera-like” 11 10 6 660“Distorted” 11 10 6 660

Table 4.1: Details of four artificially-generated datasets.

Six dimensions (rows ◊ columns), with corresponding total number of pieces in a singlepuzzle image, are

• 3 ◊ 5 = 15

• 6 ◊ 10 = 60

• 10 ◊ 13 = 130

• 15 ◊ 20 = 300

• 20 ◊ 30 = 600

• 26 ◊ 39 = 1014

Size of the individual square pieces (i.e. resolution in pixels) vary across datasets and aresummarized in Table 4.2. I decided for the drop from 300 to 100 pixels in the “perfect” and“Gaussian” datasets in order to make the run time low enough for complete evaluation.

5Number of images is comparable to previous work on computational solvers.


Side lengths drop progressively in the “camera-like” and “distorted” datasets as I did notwant to drastically resize the photography of the input image during puzzle creation.

3 ◊ 5 6 ◊ 10 10 ◊ 13 15 ◊ 20 20 ◊ 30 26 ◊ 39“Perfect” 300 300 300 100 100 100

“Gaussian” 300 300 300 100 100 100“Camera-like” 500 240 150 100 70 50“Distorted” 500 240 150 100 70 50

Table 4.2: Table features approximate side length (in pixels) of individual pieces in thepuzzle image.

Exhaustive evaluation of larger puzzle was not feasible due to long run times. Hence,I only did a few runs on puzzles made of thousands pieces. See Section 4.6.6 for anexample.


4.6 Results

For illustration purposes, Figure 4.8 shows a solved puzzle that corresponds to the puzzleimage showed at the very beginning of the dissertation (Figure 1.1a). This puzzle hasbeen used as a working example and reoccurred at multiple places in earlier chapters.

Figure 4.8: Example of a solved puzzle. This one corresponds to a true camera-takenpuzzle image. Note the visible boundaries between assembled square pieces, e.g. between’c’ and ’‘’ symbols in the top central part of the image.

4.6.1 Detection Rate of Analysis Part of The Solver

Figure 4.9 depicts the detection rate of the analysis part of The Solver, i.e. the per-centage of all square pieces that are correctly found and extracted. In each subfigure,puzzle dimensions (six possible) are on the x-axis. The detection rate is on the left y-axisand the square extraction time (run time of the analysis part of The Solver) is on theright y-axis. Negative error bars for the detection rate are shown. Their lengths equalstandard deviations ‡ of detection rates. There is a separate figure for every generateddataset used.


99.8% 99.7% 99.7% 99.6% 99.7% 99.3%

3.0

13.0

23.0

33.0

43.0

53.0

63.0

73.0

90.0%

91.0%

92.0%

93.0%

94.0%

95.0%

96.0%

97.0%

98.0%

99.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Det

ectio

n tim

e [s

]

Det

ectio

n ra

te (s

quar

es fo

und

corr

ectly

) [%

]

Dimensions of test puzzles [rows x columns]

Squares found correctly [%] Detection time [s]


93.9% 95.0% 95.3% 95.5%

94.7% 94.4%

3.0

13.0

23.0

33.0

43.0

53.0

63.0

73.0

83.0

80.0%

82.0%

84.0%

86.0%

88.0%

90.0%

92.0%

94.0%

96.0%

98.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Det

ectio

n tim

e [s

]

Det

ectio

n ra

te (

squa

res

foun

d co

rrec

tly) [

%]





98.5% 99.2% 99.1% 99.3% 99.3% 99.1%

9.0

14.0

19.0

24.0

29.0

34.0

90.0%

91.0%

92.0%

93.0%

94.0%

95.0%

96.0%

97.0%

98.0%

99.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Det

ectio

n tim

e [s

]

Det

ectio

n ra

te (

squa

res

foun

d co

rrec

tly) [

%]



(c) “Camera-like” dataset

98.2% 99.1% 99.3% 99.3% 99.2% 99.1%

5.0

10.0

15.0

20.0

25.0

30.0

35.0

90.0%

91.0%

92.0%

93.0%

94.0%

95.0%

96.0%

97.0%

98.0%

99.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Det

ectio

n tim

e [s

]

Det

ectio

n ra

te (s

quar

es fo

und

corr

ectly

) [%

]



(d) “Distorted” dataset

Figure 4.9: Detection rates measured for di�erent artificially-generated datasets. Columngraphs features detection rates for corresponding puzzle dimensions (including negativeerrors bars with ‡ length) and line graphs shows the square extraction times.Note the sudden drop of run time between dimensions 10◊13 and 15◊20 in 4.9a and 4.9b.This is caused by decrease of piece resolution from 300 to 100 pixels during generation ofdatasets (Table 4.2).


This part of The Solver performs surprisingly well, achieving average detection rateabove 90% in all tests. In majority of cases, The Solver achieved detection rateis about 99%. The most challenging dataset was the “Gaussian” dataset with averagedetection rate 94.8% and considerably large error bars. The easiest dataset for squaredetection was the “perfect” dataset with average detection rate 99.6%.

Detection Rate on Untrimmed Puzzle Images

I have also tried to run the analysis part of The Solver on untrimmed, camera-taken puzzleimages that have traces of floor around borders to test the robustness of my algorithms.This is quite challenging as the background removal algorithm estimates backgroundpixels according to histogram characteristics of border pixels (Section 3.3.2).

Nevertheless, majority of square pieces have been successfully detected (Figure 4.10). Ithink that this success is based on using four processing paths (original RGB image andH,S,V channels) in the analysis, which leads to successful piece extraction at least on oneof the paths.

Figure 4.10: Example of detected squares (shown bold in the image) in the camera-takenpuzzle image where traces of the floor are present on its border.


4.6.2 End-to-end Accuracy of The Solver

Following figures illustrate the of The Solver, i.e. percentage of the pieces that end upat correct absolute positions after the assembly. Seven series are depicted in every figurecorresponding to di�erent configurations, i.e. distance measures used in the matchingpart of The Solver. In every figure, there are six possible puzzle dimensions on the x-axisand the accuracy on the y-axis.

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Acc

urac

y (p

iece

s pl

aced

cor

rect

ly) [

%]


Features (SIFT)

Gabor Wavelets

Histogram (correlation)

Histogram (Chi-square)

Histogram (intersection)

Cosine distance

Histogram (Bhattarachyya)


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Acc

urac

y (p

iece

s pl

aced

cor

rect

ly) [

%]


Features (SIFT)

Gabor Wavelets




Cosine distance




0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Acc

urac

y (p

iece

s pl

aced

cor

rect

ly) [

%]


Features (SIFT)

Gabor Wavelets




Cosine distance


(a) “Camera-like” dataset

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Acc

urac

y (p

iece

s pl

aced

cor

rect

ly) [

%]


Features (SIFT)

Gabor Wavelets




Cosine distance


(b) “Distorted” dataset

Figure 4.12: Accuracy measured for di�erent artificially-generated datasets.

All figures show downward sloping trends, as expected. This happens because the dif-ficulty of assembling a puzzle grows with growing number of puzzle pieces. Apart fromthe “distorted” dataset, the accuracies for puzzles of moderate sizes (up to 100 pieces)are consistently high.

Note the sudden drop in Figures 4.11a and 4.11b between dimensions 10◊13 and 15◊20.This probably happens because the resolution of individual pieces decreases from 300◊300to 100 ◊ 100 pixels (Table 4.2). This indicates that the resolution of individual pieces


in the puzzle image is probably more important than the total number of the pieces.The curves in the “camera-taken” dataset (Figure 4.12a) support this argument as boththe resolution of pieces (Table 4.2) and accuracy of The Solver decrease more gradually.Considering all the image processing operations performed on the puzzle image, this resultabout the importance of the resolution of puzzle pieces is not a big surprise.

Cosine distance and SIFT-based measures achieve consistently highest accuracy acrossall datasets. Out of histogram-based measures, Bhattacharyya distance outperforms theother ones. Distance measure based on the Gabor wavelet bank is outperformed by themajority of other measures, mainly for puzzles with larger dimensions.

Note that the accuracy of all measures in the “distorted” dataset is pretty poor. SIFT-based distance measure achieves the best result. This is logical as SIFT is an orientationand scale invariant algorithm, hence it naturally provides better robustness to perspectivedistortions than other measures.


4.6.3 Performance of The Solver

For completeness, Figure 4.14 shows the assembly run times of the matching part of TheSolver in seven di�erent configurations.

100

1000

10000

100000

1000000

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Ass

embl

y tim

e [m

s] (l

og s

cale

)


Features (SIFT)

Gabor Wavelets




Cosine distance



1

10

100

1000

10000

100000

1000000

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Ass

embl

y tim

e [m

s] (l

og s

cale

)


Features (SIFT)

Gabor Wavelets




Cosine distance




100

1000

10000

100000

1000000

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Ass

embl

y tim

e [m

s] (l

og s

cale

)


Features (SIFT)

Gabor Wavelets




Cosine distance


(a) “Camera-like” dataset

100

1000

10000

100000

1000000

3 x 5 6 x 10 10 x 13 15 x 20 20 x 30 26 x 39

Ass

embl

y tim

e [m

s] (l

og s

cale

)


Features (SIFT)

Gabor Wavelets




Cosine distance


(b) “Distorted” dataset

Figure 4.14: Assembly run times of the matching part of The Solver measured for di�erentartificially-generated datasets. Note the log scale used on the y-axis.

Configurations using histogram-based distance measures consistently have the lowest runtimes, followed by the cosine distance which uses Gaussian pyramid to reduce the runtime. As expected, configuration using the SIFT is slowest, due to the high complexityof this algorithm. Gabor-wavelet configuration has also quite large run time. This canbe explained by extensive usage of convolution operations with di�erent filters from thefilter bank.


Interesting thing to note is that the run times of Gabor wavelet configuration tend todecrease in Figures 4.14a and 4.14b. This is probably due to the fact that individualpiece resolution decreases for larger puzzle which makes convolution so cheap that itoutweighs the increase in number of puzzle pieces.

The sudden drop, or relatively mild increase, in run times between puzzle dimensions10 ◊ 13 and 15 ◊ 20 in “perfect” and “Gaussian” datasets can be again explained bychange of the puzzle piece resolution from 300 ◊ 300 to 100 ◊ 100 pixels (Table 4.2).

4.6.4 Pairwise Correlation

As an extension to the project, I was interested in whether the combination of severaldistance measures (Section 3.5.3) could yield better results. Trying and testing all possiblecombinations is not feasible and hence I decided to test only the combination mixingdistance measures that

• independently produce results with good accuracy.

• are weakly correlated.

I chose the weak-correlation criterion because there is a high chance that weaklycorrelated measures would perform badly on di�erent sample inputs. Hence, their com-bination could in theory boost the overall accuracy of The Solver.

The question arose about how to compute this correlation. I decided to use accuracyresults obtained in the earlier automatic evaluation. Using the “camera-like” dataset,I constructed seven accuracy vectors of length6 660. Each vector corresponds to onepossible configuration. It contains accuracy values between 0 and 1.

When two distance measures yield accuracy vectors u and v of length N , the correlationbetween them can be computed using Pearson correlation coe�cient.

corr(u, v) =qN

i=1

(ui ≠ u)(vi ≠ v)ÒqN

i=1

(ui ≠ u)2

qNi=1

(vi ≠ v)2

where

u = [u1

, u2

, . . . , uN ]v = [v

1

, v2

, . . . , vN ]

u = 1N

Nÿ

i=1

ui

v = 1N

Nÿ

i=1

vi

6660 is the number of sample inputs


Table 4.3 shows the pairwise correlation coe�cients of di�erent distance measures. Asexpected, correlations between same distance measures are 1 (grey entries on the diago-nal). The two lowest correlations in the table are coloured green (Cosine distance withSIFT and Histogram-Bhattacharyya distance).

Cosine SIFT Gabor Histogram-baseddistance wavelets corr. Chi-sqr. intersect. Bhatta.

Cosine distance 1.00 0.53 0.74 0.57 0.65 0.65 0.52SIFT 1.00 0.76 0.84 0.78 0.79 0.82

Gabor wavelets 1.00 0.84 0.93 0.92 0.77

Hist

ogra

m correlation 1.00 0.90 0.95 0.98Chi-sqruare 1.00 0.97 0.84intersection 1.00 0.90

Bhattacharyya 1.00

Table 4.3: Pairwise correlations between di�erent distance measures used in the matchingpart of The Solver. The “camera-like” dataset was used for the computation of correla-tions.

Given these results and individual accuracies of cosine distance, SIFT and histogram(Bhattacharrya), I decided to perform automatic testing on the “camera-like” dataset ofthe following three distance measure combinations

1. cosine distance + SIFT

2. cosine distance + histogram (Bhattacharyya distance)

3. cosine distance + SIFT + histogram (Bhattacharyya distance)

In every case, the weighting of individual measures is equal (i.e. 1

2

: 1

2

or 1

3

: 1

3

: 1

3

).Accuracies of The Solver using these combined measures are shown in Figure 4.15.


0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

3 x 5 6 x 10 10 x 15 15 x 20 20 x 30 36 x 39

Acc

urac

y (p

iece

s pl

aced

cor

rect

ly) [

%]


cos dst. + SIFT

cos dst + Bhatta.

cos dst + SIFT + Bhatta.

Cosine distance

SIFT


Figure 4.15: Accuracy of The Solver shown for three combined and three original (dashedlines) measures.

It is apparent from Figure 4.15 that the accuracy of The Solver is higher whencombined measures are used. All combinations seem advantageous, mainly for largerpuzzles. According to these results, the best combination is the combination of all threeoriginal measures. The trade-o� of combining three measures is an increase in the runtime.


4.6.5 Human Evaluation of Camera-taken Dataset

This evaluation has been performed on the camera-taken dataset that contains twelvepuzzle images (Section 4.5.1).

Using eight distance measures (7 original and combined SIFT + cosine distance7), Ipresented 12◊7 = 84 solved puzzles to the human participants, together with 12 originalsource images for reference. They were asked to rank each result on the scale 1 - 10(10 is the best) using their intuition. They could complete the task in their own timeand were fully informed and assured about the voluntary nature of the evaluation andanonymity of their results. The original instruction e-mail is presented in Appendix D.

In total, 26 human evaluators participated, both male and female, from all kinds of back-grounds, including non-technical backgrounds. I aggregated the results and computedaverage mark and standard deviation for each distance measure used. Results of theevaluation are shown in Figure 4.16.

7.94

4.41

6.86

2.29 1.76 1.87 1.93

7.16

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

Cosine

dista

nce

Gabor

wavele

ts SIFT

Histog

ram (c

orrela

tion)

Histog

ram (C

hi-squ

ared)

Histog

ram (in

tersec

tion)

Histog

ram (B

hatta

charr

ya)

Cosine

dista

nce +

SIFT

Ave

rage

mar

k - h

uman

ass

igne

d (1

0 hi

ghes

t)

Distance measure used in matching part of the Solver

Figure 4.16: Average marks given by human evaluators to di�erent distance measuresused in the matching part of The Solver.

As in automatic evaluation, cosine distance and SIFT measures score high. Results ofhistogram-based Bhattacharyya distance are poorer which indicates that this distance isnot necessarily the best choice for the true camera-taken images.

7This combined measure is not the best one according to previous results. The reason is that I hadto choose it early, prior to unplanned computation of pairwise correlations and thorough evaluation ofseveral combined measures.


4.6.6 Results on Very Large Puzzles

To push The Solver to its limits, I attempted to solve a puzzle containing thousands ofpieces. Due to large run times (in order of tens of hours), exhaustive testing of manysamples was not possible.

Using histogram-based correlation distance, due to its low run time, I was able to suc-cessfully solve a puzzle containing 10,000 pieces (80◊125). The puzzle was generatedin the same way as samples from the “perfect” dataset, on a blue background and usingresolution 50 ◊ 50 pixels for individual pieces.

Solved puzzle is shown in Figure 4.17.

Figure 4.17: Example of a solved very large puzzle containing 10,000 pieces. Pieces thatare put to wrong locations are marked with a red cross. 94.7% of pieces are correctlyplaced and 99.4% of all square pieces were correctly extracted from the puzzle image.

The run times of the analysis and matching parts of the solver were 1.2 and 7.2 hoursrespectively. Hence, the total duration of the run was 8.4 hours.

The detection rate of the analysis is 99.4% and the accuracy of the matching

part is 94.7%. This far exceeds the original aims and success criteria given in theproposal.

I also attempted to solve a puzzle containing 30,000 pieces (150 ◊ 200). The analysispart of The Solver successfully completed. However, The Solver ran out of computingresources in the matching part.

Chapter 5

Conclusion

5.1 Accomplishments

In this project, I successfully designed and implemented an algorithm (The Solver) thatcan do a computational reassembly of a square piece jigsaw puzzle, given an image ofdisassembled puzzle and an image featuring the final solution. It can detect and extractsquare pieces from the first input image and map them to the second image to do thereassembly. Input images are supposed to be taken with mediocre smartphone cameras.As a result, variations of lightning, undesired shading, low resolution, contrast and othernoise make the task quite challenging.

The Solver was thoroughly tested and both automatically evaluated on computer gener-ated datasets containing more than 10,000 sample input puzzles and evaluated by humanparticipants on a true camera taken dataset. Results show that The Solver can success-fully solve puzzles up to hundreds pieces with accuracy over 90%. In some cases, similaraccuracy can be achieved on computer generated puzzles of thousands of pieces, 10,000being the current record.

I was able to keep on the proposed schedule and implement several extensions, such as:

• Implementing Gabor wavelet bank from first principles and using them as basis fora distance measure (Section 3.5.1).

• Using correlation between distance measures to find a combination with betterperformance (Section 4.6.4).

• Using SIFT as a distance measure (Section 3.5.1).

• Paralellising parts of the code to speed The Solver up (Section 3.6).

• Undergoing evaluation on larger scale than originally proposed, in terms of number,size and types of datasets, including very large puzzles over thousand pieces large.

• Implementing a proof of concept of The Solver that does not use final image clue(Section 3.7).

80

CHAPTER 5. CONCLUSION 81

In conclusion, I consider the success criteria of the project to be more than met.

5.2 Further Work

Further work involves an investigation of better ways to combine distance measures usedin the matching part of The Solver and possible invention of new measures. An interestingtask would also be to do the assembly without using the final image clue, although this islikely to be quite di�cult and could require additional restriction on input image quality.Further work could also involve extending the algorithm to puzzles with arbitrary initialrotation of pieces, with pieces of non-square shapes or even apictorial puzzles.

Bibliography

[1] N. Alajlan. Solving square jigsaw puzzles using dynamic programming and the hungarian procedure.American Journal of Applied Sciences, 6, 2009.

[2] W. Chao. Gabor wavelet transform and its application, 2009 - 2011. Course Work at NationalTaiwan University (http://disp.ee.ntu.edu.tw/~pujols/).

[3] T. S. Cho, S. Avidan, and W. T. Freeman. A probabilistic image jigsaw puzzle solver. In Proc.CVPR, 2010.

[4] John G. Daugman. Complete discrete 2D Gabor transforms by neural networks for image analysisand compression, 1988.

[5] John G. Daugman. Computer vision lecture notes for Cambridge CST Part II, 2015.

[6] Kenneth Dawson-Howe. A Practical Introduction to Computer Vision with OpenCV. Wiley Pub-lishing, 1st edition, 2014.

[7] N. Eller and R. Nagar. Robotic jigsaw puzzle solver, preliminary design, 2013. Ben-Gurion Univer-sity of the Negev, Israel.

[8] H. Freeman and L. Gardner. Apictorial jigsaw puzzles: The computer solution of a problem inpattern recognition. IEEE. Trans. on Electronic Computers, 1964.

[9] A. C. Gallagher. Jigsaw puzzles with pieces of unknown orientation. In Proc. of CVPR, 2012.

[10] Chris Harris and Mike Stephens. A combined corner and edge detector. In In Proc. of Fourth AlveyVision Conference, pages 147–151, 1988.

[11] L. Liang and Z. Liu. A jigsaw puzzle solving guide on mobile devices.

[12] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision,60(2):91–110, November 2004.

[13] S. Mondal, Y. Wang, and S. Durocher. Robust solvers for square jigsaw puzzles, 2013. At: IEEEConference on Computer and Robot Vision.

[14] D. Pomeranz, M. Shemesh, and O. Ben-Shahar. A fully automated greedy square jigsaw puzzlesolver. In Proc. of CVPR, 2011.

[15] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection. In Pro-ceedings of the 9th European Conference on Computer Vision - Volume Part I, ECCV’06, pages430–443, Berlin, Heidelberg, 2006. Springer-Verlag.

[16] D. Sholomon, O. David, and N. S. Netanyahu. A genetic algorithm-based solver for very large jigsawpuzzles, 2013. At: IEEE Conference on Computer Vision and Pattern Recognition.

[17] Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision.Thomson-Engineering, 2007.

[18] Yi Yang and S. Newsam. Comparing sift descriptors and gabor texture features for classification ofremote sensed imagery. In Image Processing, 2008. ICIP 2008. 15th IEEE International Conferenceon, pages 1852–1855, Oct 2008.

82

Appendix A

Scale Invariant Feature Transform(SIFT) Overview

SIFT is an algorithm proposed by D. Lowe in 2004 [12]. It is used to extract features (points of interest)from an input image in scale and orientation invariant manner. I use this algorithm to detectfeatures in the puzzle image and final image clue. These features form the basis of one of possibledistance measures used for puzzle assembly (Section 3.5).

The algorithm is quite complex and consists of several pipelined stages.

Figure A.1: Construction of a scale space using Di�erence of Gaussians (DoG) operators.This is done independently at di�erent levels of Gaussian pyramid (image adopted from[12]).

1. Scale-space Extrema DetectionFollowing processing is done at various stages of Gaussian pyramid (Section 2.2.1) computed froman input image. At each stage, several Di�erence of Gaussians (DoG) operators are used toextract features from the image1. The image is convolved with Gaussian kernels with di�erent ‡’sand filter responses are then subtracted to yield DoG responses (Figure A.1).The scale space is then searched for extrema (Figure A.2). The search is done over scale and spacein order to yield good keypoints candidates.

1DoG approximates Laplacian of Gaussian (LoG) operator that is quite expensive.

83

APPENDIX A. SCALE INVARIANT FEATURE TRANSFORM (SIFT) OVERVIEW84

Figure A.2: Scale space local extrema detection. For each pixel, 8 neighbours at the samescale and 9 corresponding pixels in the scale above and below are considered to mark apixel as a candidate keypoint (image adopted from [12]).

2. Keypoint LocalizationCandidate keypoints need to be refined in order to remove false positives. Using Hessian matrix,principal curvature is computed to reject responses along edges. Low-contrast keypoints are alsodiscarded.

3. Orientation AssignmentIntensity gradients in various directions are computed for every remaining candidate keypointby considering its neighbourhood. This results in an orientation histogram with 36 bins. Imageis then rotated so that the largest gradient points in a well defined direction (aka dominantorientation). If there are multiple dominant gradients, several keypoints are derived, one for everysuch gradient. This gives the algorithm orientation invariance. Keypoints are also rescaled toachieve scale invariance precondition for the following processing step.

4. Keypoint DescriptorFor each candidate keypoints, it is 16 ◊ 16 neighbourhood is divided into sixteen 4 ◊ 4 blocks.Eight-bin orientation histogram is computed per block. These histograms are concatenated toyield 16 ◊ 8 = 128 bin keypoint descriptor.

Keypoint descriptors are then later used to match features between di�erent images.

Appendix B

Source Images Used for Datasets

(a) Antelope Canyon in USA (b) Dandelions (c) Edsac

(d) Fruit (e) Sir D. Robinson (f) Climbing an icefall

(g) Robinson College (h) Impression (i) Raspberry Pi

(j) Hundertwasser house in Vienna (k) Klementinum library in Prague

All images were freely available and downloaded from the internet.

85

Appendix C

Camera-taken Puzzle Images

Twelve puzzle images in the camera-taken dataset.

Figure C.1: Antelope Canyon in USA (dimensions 8 ◊ 12 = 96 pieces)

Figure C.2: Dandelions (dimensions 8 ◊ 12 = 96 pieces)

86

APPENDIX C. CAMERA-TAKEN PUZZLE IMAGES 87

Figure C.3: Edsac 1 (dimensions 8 ◊ 11 = 88 pieces)

Figure C.4: Edsac 2 (dimensions 8 ◊ 11 = 88 pieces)


Figure C.5: Fruit (dimensions 7 ◊ 12 = 84 pieces)

Figure C.6: Hundertwasser house in Vienna (dimensions 6 ◊ 12 = 72 pieces)

Figure C.7: Climbing an icefall (dimensions 9 ◊ 12 = 108 pieces)


Figure C.8: Impressionist painting (dimensions 8 ◊ 11 = 88 pieces)

Figure C.9: Klementinum library in Prague (dimensions 7 ◊ 12 = 84 pieces)


Figure C.10: Raspberry Pi 1 (dimensions 8 ◊ 12 = 96 pieces)

Figure C.11: Raspberry Pi 2 (dimensions 8 ◊ 12 = 96 pieces)

Figure C.12: Robinson College (dimensions 8 ◊ 12 = 96 pieces)

Appendix D

Instructions for Human Evaluators

91

Appendix E

Project Proposal

E.1 Introduction

Jigsaw puzzle is undoubtedly one of the most favourite and well-known puzzles, attracting attention ofpeople of all ages. Jigsaws have been entertaining human solvers for centuries. First puzzle dates back tolate 18th century when London mapmaker John Spilsbury carved the puzzle by cutting out the countriesfrom a wooden map of the world. Nowadays there exist numerous di�erent puzzle variants, ranging fromsimple children puzzles with several tens of large pieces to challenging puzzles containing thousands ofpieces displaying relatively uniform image such as sky or landscape panorama. One can even buy 3Dpuzzles in form of jigsaw spheres, although 2D variants are more popular.

Assembling jigsaw is certainly challenging problem and there is even world jigsaw championship held an-nually in Belgium. Thanks to its intrinsic complexity, the puzzle started to draw attention of researchersfrom various fields such as computer vision, image processing or combinatorial optimization, ponderingwhether it is possible to automate the solution procedure.

Besides jigsaw being fascinating problem for its intrinsic complexity, there exist many practical applica-tions of a computational method for finding the solution. They include challenges such as reassemblingarchaeological artefacts and fossils, recovering shredded historical documents, DNA modelling or evenspeech descrambling.

First attempt to invent a computational jigsaw solver dates back to 1964 when Freeman and Gardner[8] explored ways for solving an apictorial puzzle (i.e. pieces don’t display an image and only shapeinformation is available), coming up with computational solution to jigsaw of nine pieces. This work,together with raising popularity and availability of computers, inspired many researchers to look at thenumerous variants of the problem in past fifty years. Many of recent ones [1,9,13,14,16] concentrated onsolving pictorial puzzle (i.e. one displaying an image), discarding shape information by considering onlysquare pieces (a.k.a. patches) and using colour as the primary source of information for computationalassembly of the pieces.

This project aims to take similar approach and tackle computational assembly of coloured pictorialjigsaw puzzles of known dimensions consisting of pieces of exclusively square shape with correct initialorientation, but not location (Type I puzzle according to Gallagher [9]). The primary aim is to constructa program (later referred to as Solver) that accepts two photographs as input, one of disassembled puzzle,one of the final solution (later referred to as final image clue) and finds the mapping of individual squarepieces to correct positions in the final solution. In other words, given the initial state of the puzzle andthe final image clue, the program should be able to find the correct place of every piece.

This task will certainly require splitting into subtasks that will deal with input image processing, piece

1

APPENDIX E. PROJECT PROPOSAL 2

identification and later combinatorial assembly of the pieces using various approaches, possibly consultingthe final image clue. Such project organisation also allows for prototyping, since the two input images canbe prepared artificially to decrease the complexity in earlier phases of the project. Artificial preparation ofinput will probably also help to test the program on larger inputs that couldn’t be manually manufacturedin the scope of this project.

E.2 Description of Project

As outlined at the end of Introduction, the project aims to implement a computational Solver for pictorial2D jigsaw puzzles with the possibility to use final image as a clue to Solver. To limit the scope of theproject, some restrictions need to be introduced.

E.2.1 Restrictions

Pieces are of square shapes. Image dimensions, in term of number of pieces, are known. Moreover,the shape of assembled puzzle is rectangular and can be split into square pieces exactly, implying noleftovers or partial pieces. Two images are required on input, disassembled puzzle and final image. Thegoal of the project isn’t to deal with inputs that are deliberately of low quality. That means the imagesshould be taken on background of uniform colour that is di�erent to colours prevailing in the picture andparticularly on its edges. Moreover, since all shapes of pieces are uniform squares, the image displayedon assembled puzzle must be of high enough resolution and contrast that doesn’t prevent from doing theassembly. Core project won’t deal with images with excessive image self-similarity or colour uniformity,although this can be considered as an extension. Both input images should be also taken at the sametime, by the same camera located at approximately same location to achieve as similar conditions forphotographing as possible.

In the initial state of the puzzle, square pieces must have correct orientation, they mustn’t occlude oneanother and they should be reasonable spaced to allow their extraction from the input image. Theposition of camera and lightning should be as good as possible in given circumstances in order to avoidlow resolution, noisy and badly focused images. I won’t assume usage of state-of-art cameras for takinginput images, but minimum requirement for camera standard will have to be specified and is likelyequivalent to middle class Android phone camera1.

In principle, neither limit on the dimensions of puzzle, nor on the absolute size of the individual pieceshould be imposed, but the input should comply with the earlier description. Due to the practical issuesof manually manufacturing the puzzle for testing purposes and the hardware limits of mediocre cameras,this probably wont allow solving either puzzles of huge dimensions or puzzles consisting of tiny pieces.

I assume to be able to manually manufacture several puzzles consisting of up to hundred pieces, butI will also artificially generate puzzles with up to thousands pieces to test the parts of the softwareresponsible for puzzle assembly. I will also consider adding artificial noise to generated puzzles to mimicthe limitation of photography. Regarding manually manufactured pieces, I intend to create at least 5sets of puzzles consisting of tens of pieces, with largest set having up to hundred. The plan is to print outpictures of various scenes (both indoor and outdoor), glue them to the cardboard and cut the cardboardto square pieces of size approximately 3x3 cm. I will also consider using Computer Laboratory facilities,such as large size paper printer and paper cuter. However, this is not crucial for the project success atall as I find it possible to manufacture pieces without these facilities as well.

1such as Samsung Galaxy SII available to the author


E.2.2 Proposed Structure of Solver

Given the description of the goal and input restriction, the structure of Solver can be summarized asfollows:

1. Image preprocessing module

2. Piece extraction module

3. Final image clue extraction module

4. Puzzle assembly module

5. Artificial input creation module

Image Preprocessing Module

Although the input images will be of as high quality as possible in given circumstances, it is very likelythat some preprocessing will have to be carried out on the image. Such preprocessing aims to enhancethe image by removing noise and improving contrast to make it easier to distinguish and extract bothsquare puzzle pieces and the final image clue.

Piece Extraction Module

The module will be responsible for detecting and extracting square pieces out of the input image in orderto create bag of pieces, i.e. the set of all pieces of the puzzle in correct orientation. Even though piecesare rotate correctly in the initial state of the puzzle, the human error during preparation of the initialstate of puzzle may make slight rotations necessary. The position of camera will certainly create needfor some other, hopefully only minor, a�ne transformations.

Final Image Clue Extraction Module

Similar to piece extraction module, this module will be responsible for detection of final image clue fromthe second input image and its extraction. Furthermore, it will be responsible for splitting the finalimage clue into the square pieces according to known dimensions of the puzzle to create a bag of controlpieces.

Puzzle Assembly Module

This module will make use of extracted bag of pieces and bag of control pieces from previous step inorder to assemble the puzzle. There are multiple strategies how to do it and several will be probablyattempted to reach satisfiable solution. These will likely include histogram comparison approach andtemplate-matching techniques, such as using sum of square di�erences (SSD) or correlation measures.Usage of more advances feature-matching techniques such as SIFT (Scale Invariant Feature Transform)or SURF (Speeded Up Robust Features), used e.g. in [11], will be considered as a possible extension forboosting performance.

Depending on results achieved with aforementioned techniques, other approaches using piece compati-bility measures to find neighbouring pieces out of bag of pieces may be tried (e.g. SSD between edges orMahalonobis distance [13]), accompanied with greedy strategy of puzzle assembly [14].


Artificial Input Creation Module

As discussed in section E.2.1, it is probably not feasible to manually manufacture large enough inputsample to exhaustively test puzzle assembly module. Artificial input preparation module will split thegiven image into square pieces to create and sample input image complying with restrictions discussedearlier.

E.2.3 Structure of the Project

The project itself may be split into following section, in chronological order:

1. Background reading on the subject and familiarizing with tools such as OpenCV.2. Implementation of Solver3. Gathering test samples and evaluation of Solver4. Dissertation write up

E.2.4 Possible Extensions• Implementing some additional technique outlined in previous sections• Improving Solver to easily deal with thousands of pieces• Weakening input restrictions to allow pieces in arbitrary ([0, 360¶]) initial rotation• Implementing genetic algorithm-based solver as mentioned in [16]• Using Solver for creating an Android application that solves jigsaw puzzle by sending input to

the server that runs Solver and sends back the correct solution• Improving Solver to deal with inputs of substantially lower quality or images with high level of

self – similarity and colour uniformity

E.3 Evaluation Metrics and Success Criteria

E.3.1 Evaluation Metrics

I intend to use 3 metrics for evaluating Solver’s performance, all of which are inspired by the previouswork of several authors and I consider them to be reasonable quantitative measures.

1. Direct comparison [3] - percentage of pieces that are at correct absolute positions in the finalsolution produced by Solver

2. Neighbour comparison [3] - percentage of correct pairwise relationships between adjacent pieces3. Largest component [9] - percentage of pieces that are part of the largest component, i.e. the

largest area of the puzzle formed by pieces that are correctly placed with respect to their neighboursfrom that component

E.3.2 Success Criteria

The project will be considered successful if the following objectives are met for the puzzles of up tohundred pieces:


1. Given input images of reasonable quality, image preprocessing module can prepare images suitablefor further processing by other modules.

2. Piece extraction module can correctly extract the square pieces from an input image, preparingthe bag of pieces for further use and final image clue extraction module can correctly extract thefinal image from the second input.

3. Given bag of pieces, final image clue and bag of control pieces from previous modules, puzzleassembly module can correctly assemble the puzzle with reasonable accuracy. We will use allthree metrics as described above and Solver will be tested on manually manufactured puzzles asdescribed at the end of section E.2.1. There will be three test cases for every puzzle, starting inrandom initial configuration, and each individual test case will be evaluated under every metrics.Average of results will serve as evaluation figure of Solver’s performance under that particularmetrics. The puzzle assembly module will be deemed successful if it can correctly assemble eachpuzzle with reasonable accuracy under at least one metric.

E.4 Starting Point

I am completely new to Computer Vision and the course on it is lectured only in Lent term this year. Myknowledge to Image Processing is limited to Part 1b Graphics course and my problem specific knowledgein computational jigsaw puzzle solving is limited to few papers I’ve seen in past couple of days.

The project will be most likely implemented in C++ using some features of Open Computer Visionlibrary (OpenCV). I have some experience with C++ from Part 1b course and summer internship, butI have never used OpenCV before. With exception of this library, I expect my project to be build fromscratch and not be based on any existing codebase.

Having participated in Android Summer Camp, I have very basic knowledge of Android SDK and quitegood understanding of Java thanks to Part 1a and 1b Computer Science Tripos courses. I would usethese skills if I chose to develop an Android application as an extension to the project in the event oftime abundance.

I will be typesetting my dissertation in LATEX which I haven’t used before. Regarding version controlsystems, I will use GitHub that I have experiences with.

E.5 Work Plan

Week 1 – 2, (20th Oct – 2nd Nov)Deadline: Project proposal submission on 24th Oct

• Complete and submit project proposal• Background reading on Computer Vision and more research on work on computational jigsaw

solvers• Reading about and experimenting with OpenCV• Setup development, backup and version control systems

Week 3 – 4, (3rd Nov – 16th Nov)• Continue background reading and familiarizing with concepts OpenCV• Decide what techniques to use for image preprocessing module• Implement image preprocessing module and smoke test it


Week 5 – 6, (17th Nov – 30th Nov)• Decide which techniques to use for piece and final image clue extraction modules

• Implement these two modules and do some preliminary testing on a few test samples

Week 7 – 8, (1st Dec – 14th Dec)• Create more test samples to test implemented modules

• Test the modules

• Decide which approach to use first for implementation of puzzle assembly module and start imple-menting it

Week 9 – 11, (15th Dec – 4th Jan)• Finish implementation (probably ine�cient) of puzzle assembly module

• Make smoke test of the implementation and Solver as whole

Week 12 – 13, (5th Jan – 18th Jan)• Catch up time for any outstanding issues

• Write the Progress report

Week 14 – 15, (19th Jan – 1st Feb)Deadline: Progress report submission on 30th Jan

• Prepare and rehearse the presentation

• Implement the artificial input creation module

Week 16 – 17, (2nd Feb – 15st Feb)Deadline: Presentation on 5th – 10th Feb

• Prepare reasonable sets of test puzzles for evaluation (as described at the end of section E.2.1)

• Evaluate Solver both using manually manufactured puzzles and inputs from artificial input cre-ation module

Week 18 – 19, (16th Feb – 1st Mar)• Having evaluation data, make improvements to Solver, possibly using di�erent techniques.

• Re-evaluate Solver, obtaining decent statistics

• Process evaluation data to have them ready for the dissertation write up

Week 20 – 21, (2nd Mar – 15th Mar)• Catch up time for any outstanding issues


• Plan and draft the introduction and preparation parts of the dissertation

• If no issues emerge and time allows, choose, research and start implementing one of extensions

Week 22 – 23, (16th Mar – 29th Mar)• Plan and draft the implementation parts of the dissertation

• Complete the extension if time allows

Week 24 – 25, (30th Mar – 12th Apr)• Finish the complete draft of the dissertation by finishing evaluation and conclusion sections

• Send the draft to the Supervisor

• Catch up time for unforeseeable complications

Week 26 – 27, (13th Apr – 26th Apr)• Format and edit the dissertation over Easter break, including all appendices

• Incorporate Supervisor’s feedback into the dissertation and send ameliorated version to Supervisorand Director of Studies

Week 28 – 29, (27th Apr – 10th May)Deadline: Final deadline for submission on 15th May

• Make final edits to the dissertation based on the received feedback

• Complete the final version, including all the compulsory formal sections, such as proforma anddeclaration of originality

• Print, bind and submit the dissertation on Monday 11th May (Monday before hard deadline)

E.6 Resources Required

I will be using my own laptop for the whole duration of the project (Mac OS X, 2.2 GHz Intel Corei7, 16 GB RAM). If this computer failed, I would work on MCS workstations. I will be using Git andGitHub for version controlling and will also make weekly backup on 600 GB external hard drive usingTime Machine. After every significant updates, I will also make back ups of important documents (suchas dissertation drafts) to Google Drive.

Regarding camera and possible Android device, should time allow for implementing an Android extension,I will use my Samsung Galaxy SII (GT-I9100) with Android 4.1.2 and 8MP camera resolution as statedby tech specs.

I will use Xcode with OpenCV library as main IDE for development.

Date post:	23-Sep-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Solving Square Piece Jigsaw Puzzle using Computer Vision€¦ · Matej Hamaö Solving Square Piece...

Documents