QR CODE LOCALIZATION USING DEEP NEURAL NETWORKSinf.u-szeged.hu/~groszt/pubs/mlsp2014.pdf · QR code...

2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 21–24, 2014, REIMS, FRANCE

QR CODE LOCALIZATION USING DEEP NEURAL NETWORKS

Tamas Grosz∗ Peter Bodnar† Laszlo Toth∗ Laszlo G. Nyul†

∗MTA-SZTE Research Group on Artificial IntelligenceHungarian Academy of Sciences and University of Szeged

†Department of Image Processing and Computer Graphics, University of Szeged{groszt, bodnaar, tothl, nyul}@inf.u-szeged.hu

ABSTRACT

Usage of computer-readable visual codes became common inour everyday life at industrial environments and private use.The reading process of visual codes consists of two steps, lo-calization and data decoding. This paper introduces a newmethod for QR code localization using conventional and deeprectifier neural networks. The structure of the neural net-works, regularization, and training parameters, like input vec-tor properties, amount of overlapping at samples, and effect ofdifferent block sizes are evaluated and discussed. Results arecompared to localization algorithms of the literature.

Index Terms— QR code, Object detection, Pattern recog-nition, Neural networks, Machine learning

1. INTRODUCTION

QR code is a common type of visual code format that is usedat various industrial setups and private projects as well. Itsstructure is well-defined and makes automatic reading avail-able by computers and embedded systems. QR codes had adecent increase of usage in the last couple of years, more thanother patented code types, like Aztec codes or Maxicodes.This is due to its well-constructed error correction schemethat allows recovery of damaged codes up to cca. 30 % ofdamage.

Image acquisition techniques and computer hardwarehave also improved significantly, that made automatic read-ing of QR codes available. State of the art algorithms do notrequire human presence and assumptions on code orientation,position and coverage rate in the image any longer. However,image quality and acquisition techniques vary considerablyand each application has its own requirements for detectionspeed and accuracy, making the task more complex.

The recognition process consists of two steps, localiza-tion and decoding. The literature already has a wide selectionof papers proposing algorithms for efficient QR code local-ization [1–5], however, each has its own strengths and weak-nesses. For example, while these methods are proven to be

accurate, morphological operations, convolutions, corner de-tection and calculation of convexity defects can be a bottle-neck for processing performance. Applications using variousmachine learning techniques can overcome this issue and pro-duce efficient solutions respect to speed and accuracy.

In the last few years, there has been a renewed interest inapplying neural networks especially deep neural networks tovarious tasks. As the name suggests, deep neural networks(DNN) differ from conventional ones in that they consist ofseveral hidden layers. However, to properly train these deepnetworks, the training method requires modifications as theconventional backpropagation algorithm encounters difficul-ties (“vanishing gradient” and “explaining away” effects). Inthis case the “vanishing gradient” effect means that the er-ror might vanish as it gets propagated back through the hid-den layers [6]. In this way some hidden layers, in particu-lar those that are close to the input layer, may fail to learnduring training. At the same time, in fully connected deepnetworks, the “explaining away” effects make inference ex-tremely difficult in practice [7]. Several solutions have beenproposed to counter these problems. These solutions mod-ify either the training algorithm by extending it with a pre-training phase [7, 8], or the architecture of the neural net-works [9].

2. THE PROPOSED METHOD

The first step of the localization process is uniform partition-ing of the image into square blocks. Each block is processedby the neural network individually and a measure is assignedwhich reflects the probability of presence of a QR code partin that block. After the evaluation of all image blocks, a ma-trix is formed with the probability values (Fig. 1). The nextstep of the process is to find clusters in this matrix that hassufficient size, compactness and high values of probability toform a QR code. The final step is to return the cluster centersthat satisfy these conditions, and give bounding boxes for theprobable QR code areas.

For each block the image is divided into, a one-dimensionalvector is formed by reading pixels in a circular pattern as sug-

978-1-4799-3694-6/14/$31.00 c©2014 IEEE

(a) example real-life image (b) feature image

Fig. 1. Image captured by a phone camera and visualizedfeature image according to the output of the neural network.

gested in [10]. This pattern provides prominent featureswhile keeping the number of necessary pixels low, typicallyon 6-10 % of the image pixels. Furthermore, it can indicatepresence of a QR code in any orientation.

2.1. The Neural Network

Vectors are passed to the core of this approach, which isthe neural network. In this paper, both conventional andDeep Rectifier Neural Networks (DRN) have been evaluated.DRNs alter the hidden neurons in the network and not thetraining algorithm by using rectified linear units. These recti-fied units differ from standard neurons only in their activationfunction, as they apply the rectifier function (max(0, x)) in-stead of the sigmoid or hyperbolic tangent activation. Owingto their properties, the DRN does not require any pre-trainingto achieve good results [9].

The rectifier function has two important properties,namely its hard saturation at 0 and its linear behaviour forpositive input. The first property means that only a subsetof neurons will be active in each hidden layer. For example,when we initialize the weights uniformly, around half of thehidden units output are zeros. In theory, this hard saturationat 0 could harm optimization by blocking gradient backprop-agation. Fortunately, experimental results do not support this,showing that the hard non-linearities do no harm as long asthe gradient can propagate along some path [9]. Owing tothe other property of the rectified units, namely the linearbehaviour of the active units, there is no “vanishing gradient”effect [9]. This linear behaviour also means that the compu-tational cost will be smaller, as there is no need to computethe exponential function during the activation calculation,and the sparsity can also be exploited. Unfortunately, thereis a disadvantage because of this linearity, the “explodinggradient” effect, which means that the gradients can growwithout limit. To prevent this, we applied L1 normalizationby scaling the weights such that the L1 norm of each layer’sweights remained the same as it was after initialization. Whatmakes this possible is that for a given input the subset ofactive neurons behaves linearly, so a scaling of the weights isequivalent to a scaling of the activations.

Our deep networks consisted of three hidden layers and

each hidden layer had 1000 rectified neurons, as DRN withthis structure yielded the best results on the development sets.The shallow neural net was a sigmoid net with only one hid-den layer, with the same number of hidden neurons (3000) asthat for the deep one.

The output layer of the neural networks consisted of twosoftmax neurons, one for the positive and one for the negativelabel, allowing the networks to output not only classificationdecisions but also posterior probabilities. As error functionwe applied the cross entropy function.

In our study were utilized two regularization methods toprevent overfitting, namely early stopping and weight decay.Early stopping was achieved by stopping the training whenthere was no improvement in two subsequent iterations onthe validation set. As weight decay regularization the weightswere scaled back after each iteration, forcing them to con-verge to smaller absolute values than they otherwise would.

The neural networks were trained using semi-batch back-propagation, the batch size being 100. The initial learn ratewas set to 0.001 and held fixed while the error on the develop-ment set kept decreasing. Afterwards, if the error rate did notdecrease in the given iteration, then the learn rate was subse-quently halved.

We used our custom implementation for neural networks,which was implemented to run on GPU [11]. The training ofa DRN on the synthetic dataset took less than 8 minutes usingan NVIDIA GTX-770 graphics card.

2.2. Input Data and Block Size

For the input data, various options were available and a sepa-rate training was performed with each type. The first choicewas to train a neural network on raw pixel data, read in thediscussed pattern for each block. Binary version of the vec-tors was also evaluated to compare efficiency of the neuralnetworks on grayscale and binary images. Furthermore, sinceQR code has a well-defined, strict structure, it means blocksof QR code parts probably have very specific componentsin the frequency domain. This assumption motivated us toperform experiments with vectors in that domain. We trans-formed the training vectors to the DCT and DFT domains,and evaluated neural networks on both sets. Furthermore, afinal neural network has been trained to the edge map, sincethe structure of QR codes also suggest very specific edge lay-outs that probably can be fed to a neural network and resultin efficient training. Training vectors in that case consisted ofpixels of the unified magnitude map of Sobel-X and Sobel-Ygradients (Fig. 2), read in circular pattern.

For making a decision about block size, two conditionshave to be fulfilled. Block size should be small enough tohaving many of them to build a QR code, so threshold wouldbe chosen easily for dropping or keeping clusters in the prob-ability matrix. On the other hand, block size has to be largeenough to provide prominent characteristics in the input vec-

(a) example synthetic image (b) edge magnitude map

Fig. 2. Example synthetic image and Sobel magnitude image.Edge maps seem to be reliable input for the neural networktraining, and it is reassured in the results section

tors of the neural network. Papers on visual code localizationsuggest empirically [12], or using geometrical approach [10],that optimal tile size is about 1/3 of the smaller dimensionof the examined visual code in case of features based on im-age partitioning. This choice for block size was our initialvalue, however, we evaluated neural networks with input vec-tors of different block sizes. After making a decision abouta suitable block size, the amount of overlap also has to bedecided. Different offsets for blocks having the optimal sizeare also evaluated empirically, because papers on this topic donot categorically declare if overlapping increases localizationperformance.

3. EVALUATION AND RESULTS

The test database consists of 10 000 synthetic and 100 arbi-trarily acquired images containing QR code. The syntheticexamples are built with a computer-generated QR code con-taining all of the lower- and uppercase letters of the alpha-bet in random order. This QR code was placed on a ran-dom negative image, with perspective transformation. Af-ter that, Gaussian smoothing and noise have been graduallyadded to the images, ranging [0,3] for the σ of the Gaussiankernel. For noise, a noise image (In) was generated with in-tensities ranging from [-127, 127] following normal distribu-tion, and added gradually to the original 8-bit image (Io) asI = αIn + (1−α)Io, with α ranging [0, 0.5]. Some sampleswith parameters being in the discussed ranges are present in(Fig. 3). A total number of 1.8 million vectors were extractedfrom those images, about 400 000 of them are labeled as pos-itive. Real images were taken with a 3.2 Mpx Huawei hand-held phone camera. Significant smoothing is present on thoseimages due to the absence of flash and precise focusing capa-bility. These images contain a printed QR code that mostlysuffer from minor bending, placed on various textured back-grounds, like creased cloth, carpet, marble or cover page of abook (Fig. 1(a) and 4(d)), in order to create high backgroundvariability and a make the classification task more complex.

(a) σ = 0.07, α = 0.39 (b) σ = 0.65, α = 0.77

(c) σ = 1.32, α = 0.05 (d) σ = 1.88, α = 0.95

(e) σ = 2.86, α = 0.65 (f) σ = 2.99, α = 0.69

Fig. 3. Samples of the training database with different amountof smoothing and noise.

The arbitrarily acquired image set roughly had QR codes ofthe same size as the artificial set, and its extracted vector setconsisted about 350 000 vectors, about 80 000 of them beingpositive. Images were converted to grayscale before the vec-tor extraction.

3.1. Input Types

The first group of neural network training was performed us-ing different input data type and range as well, in order todetermine the best available option for input vectors. Thisfirst test set did not contain full images, only the vectors ofblocks that has 100 % coverage of QR code part, and an equalnumber of random negative blocks, about 500 000 vectors intotal. Both normal and deep rectifier neural networks havebeen trained for the data, as shown in Table 1. For this firsttest, only two portions of the database were separated, for

Input data Input range Precision Hit rate F-measureANN

Raw pixels 8-bit 0.9892 0.9953 0.9922Raw pixels binary 0.9705 0.9846 0.9774DCT 8-bit 0.9889 0.9951 0.9920DCT binary 0.9693 0.9860 0.9776DFT 8-bit 0.9904 0.9981 0.9943DFT binary 0.9711 0.9808 0.9760Sobel-XY 8-bit 0.9979 0.9990 0.9984

DRNRaw pixels 8-bit 0.9947 0.9972 0.9959Raw pixels binary 0.9704 0.9862 0.9782DCT 8-bit 0.9941 0.9967 0.9954DCT binary 0.9686 0.9873 0.9778DFT 8-bit 0.9933 0.9958 0.9945DFT binary 0.9621 0.9850 0.9734Sobel-XY 8-bit 0.9978 0.9991 0.9984

Table 1. Training results for different input data types andranges.

NN type Precision Hit rate F-measureANN 0.5508 0.9895 0.7077DRN 0.4395 0.9979 0.6103

Table 2. Evaluation of the neural networks on arbitrarilyacquired images. Input vectors were 8-bit raw pixels. Infe-rior DRN precision indicates that these neural nets need moretraining samples compared to ANNs.

training and testing purposes in 8:2 proportion. For later tests,the smaller set has been further divided into test and valida-tion parts. Results show that deep neural networks outperformregular ones in general, and 8-bit input data facilitates moreaccurate training. Application of DCT and DFT to the vectorsdo not influence the training significantly, and even thoughthose transformations are easy to compute on GPUs, they donot seem to be worth the computational cost. However, vec-tors of Sobel magnitude images slightly improve training ef-ficiency, which could be expected due to the QR code edgestructure (Fig. 2). Neural networks of this first evaluationused 60×60 px block size, which is roughly 1/3 of the ex-pected QR code size. Block offset for overlapping blocks wasset to 10 px, since element size of the generated QR code wasalso that large. Each vector consisted of 128 points sampledfrom the circular pattern within the block.

After training these neural networks on the syntheticdatabase, neural networks that were trained on the raw, 8-bitpixel data, have been evaluated on the arbitrarily acquiredimage set. Precision dropped to about 0.4, but hit rate is stillacceptable. Both shallow and deep rectifier neural networkshave been evaluated, as shown in Table 2. The precision dropis probably due to the various textures, like tablecloth, woodand marble, present in the real images.

NN type Data type Precision Hit rate F-measureT+ = 0.1

ANN Real 0.6454 0.9518 0.7692DRN Real 0.6699 0.9419 0.7829ANN Synthetic 0.5654 0.9901 0.7198DRN Synthetic 0.5630 0.9865 0.7169

T+ = 0.5ANN Real 0.9175 0.5994 0.7251DRN Real 0.8962 0.8414 0.8679ANN Synthetic 0.8703 0.8947 0.8823DRN Synthetic 0.9059 0.9347 0.9201

Table 3. Evaluation of the neural networks also having vec-tors of partially covered blocks as input, with different thresh-olds for input labels

Subset Opt. Thresh. max(F1) AUCAll vectors 0.62 0.9343 0.9957N+ ≈ N− 0.63 0.9270 0.9949N+

b ≈ N−b 0.82 0.8312 0.9608

Table 4. DRN results on different subsets of input vectorsof synthetic images. N+ and N− denote for positive andnegative samples, while N+

b and N−b are subsets of N+ and

N− with partially covered blocks excluded

3.2. Partially Covered Blocks

In the first case, training was only performed using vectors ofbackground and blocks fully covered with QR code parts. Asthe next step of this experiment, the input data has been ex-tended for partially covered blocks, thresholded at 0.1 and 0.5coverage ratios for positive labeling. No filtering has been ap-plied to the amount of negatively labeled vectors. Both ANNsand DRNs were trained on vectors of both synthetic and realimages, separately. The database has been divided to 8:1:1 ra-tio for training, testing and validation parts, respectively. Re-sults (Table 3) show that allowing the partially covered sam-ples for the training significantly drops the precision of theNNs. Furthermore, threshold of 0.1 for the positive labelskeeps the hit rate in a satisfactory level, while 0.5 decreases itlargely (Fig. 4). Still, the low hit rate is not a problem, sinceenough positively predicted blocks remain in the probabilitymatrix for being able to indicate a QR code candidate even inthose cases (Fig. 4(e)).

It can be concluded that NNs trained on full images and ona reduced subset of vectors, where positively and negativelylabeled samples are roughly in the same amount, do not dif-fer significantly, while excluding vectors that come from par-tially covered blocks drops the performance noticeably. Table4 summarizes results measured on the discussed training sets.

(a) Synthetic image (b) ANN feature image (c) DRN feature image

(d) Real image (e) ANN feature image (f) DRN feature image

Fig. 4. Original and feature images built from the output ofthe neural networks. Even though ANNs have weaker classi-fication power and miss several blocks, they can indicate thepresence of QR code candidates strongly enough.

3.3. Parameters and Run Times

We used our custom implementation for neural networks,which was implemented to run on GPU. The training of aDRN on the synthetic dataset took less than 8 minutes usingan NVIDIA GTX-770 grapics card. The computation powerof this setup allowed to process about 450 000 vectors persecond, which means real-time processing of 800 × 600 pximages with cca. 5 FPS using 60 px for block size and 10 pxfor offset.

The QR codes the evaluation has been performed on, had10 px element size. Block sizes of the input vectors from30 px to 90 px have been evaluated. Block sizes smallerthan 30 px would lead the training to learn solid black andwhite blocks as positive, thus heavily raising false positiverate. Larger than 90 px block sizes would also decrease per-formance, since large block size drastically decreases thenumber of positive samples available for the training whileintroducing a vast amount of partially covered blocks that areharder to train onto. The numbers of Fig. 5 also show thatthe proposed method is robust for expected QR code size,since block size range for good performance is large enoughaccording to code dimensions.

Fig. 6 shows results for DRNs using different rate of over-lapping, up to 60 px (no overlap). It is notable that F-scoreonly slightly differs considering overlap from 10 to 60 px,while the amount of input vectors is drastically smaller.

After all these experiments, a DRN has been trained withthe best experimental training setup, in order to compare itwith other localization methods of the literature. Accordingto Table 5, NN training is proven to be a viable option forreal-time efficient QR code localization. As the first approachto compare, an algorithm based on mathematical morphologyhas been selected [2], since it is also a general purpose toolset

93 93.2 93.4 93.6 93.8

94

30 40 50 60 70 80 90

max(F)

(a) max(F )

99.4 99.45

99.5 99.55

99.6 99.65

99.7

30 40 50 60 70 80 90

AUC

(b) AUC

Fig. 5. DRN results respect to input block size.

82 84 86 88 90 92

0 10 20 30 40 50 60

max(F)

(a) max(F )

97 97.5

98 98.5

99 99.5

0 10 20 30 40 50 60

AUC

(b) AUC

Fig. 6. DRN results respect to input block offset, using 60 pxblock size.

of image processing, as NNs are for machine learning tasks.This reference method is reliable, but due to the morpholog-ical operations, it requires the most computational capacityand gives the slowest processing speed, from 900 to 1300 msper image. Another work in QR code localization is using aspecific, image-based classifier training. It proposes traininga cascase of classifiers using Haar-like features [3].

4. CONCLUDING REMARKS

Usage of neural networks is a robust, reliable approach forfast QR code localization. We have examined performanceof shallow and deep rectifier neural networks for the task aswell, on various input types. We have proven efficiency ofNNs on a large amount of input images, both synthetic andreal.

It is observed that DRNs perform better than ANNs ingeneral. The best choice for input data is the edge magnitudemap. Block size does not interfere with DRN accuracy signif-

Algorithm Precision Recall F-measureProposed 0.9360 0.9327 0.9343REF-HAAR [3] 0.1535 0.9436 0.2640REF-MORPH [2] 0.6989 0.9930 0.8042

Table 5. Performance comparison of the best NN to otheralgorithms

icantly, and has a wide working range between 10 to 40 per-cent of the expected QR code dimensions. Even with denseoverlapping at blocks, real-time image processing is possible.

Our future work includes experimenting with other pat-terns and features for the neural network training.

AcknowledgementThis publication is supported by the European Union andco-funded by the European Social Fund. Project title:Telemedicine-oriented research activities in the fields ofmathematics, informatics and medical sciences. Project num-ber: TAMOP-4.2.2.A-11/1/KONV-2012-0073.

5. REFERENCES

[1] Chung-Hua Chu, De-Nian Yang, Ya-Lan Pan, andMing-Syan Chen, “Stabilization and extraction of 2Dbarcodes for camera phones,” Multimedia Systems, pp.113–133.

[2] Eisaku Ohbuchi, Hiroshi Hanaizumi, and Lim Ah Hock,“Barcode readers using the camera device in mobilephones,” in Cyberworlds, 2004 International Confer-ence on, 2004, pp. 260–265.

[3] Luiz F. F. Belussi and Nina S. T. Hirata, “Fast QR codedetection in arbitrarily acquired images,” in Graphics,Patterns and Images (Sibgrapi), 2011 24th SIBGRAPIConference on, 2011, pp. 281–288.

[4] Gabor Soros and Christian Florkemeier, “Blur-resistantjoint 1D and 2D barcode localization for smartphones,”in Proceedings of the 12th International Conference onMobile and Ubiquitous Multimedia, New York, NY,USA, 2013, MUM ’13, pp. 11:1–11:8, ACM.

[5] Istvan Szentandrasi, Adam Herout, and MarketaDubska, “Fast detection and recognition of QR codesin high-resolution images,” in Proceedings of the 28thSpring Conference on Computer Graphics, New York,NY, USA, 2013, SCCG ’12, pp. 129–136, ACM.

[6] Xavier Glorot and Yoshua Bengio, “Understanding thedifficulty of training deep feedforward neural networks,”in Proc. AISTATS, 2010, pp. 249–256.

[7] Geoffrey E Hinton, Simon Osindero, and Yee-WhyeTeh, “A fast learning algorithm for deep belief nets,”Neural Computation, vol. 18, no. 7, pp. 1527–1554,2006.

[8] Frank Seide, Gang Li, Xie Chen, and Dong Yu, “Featureengineering in context-dependent deep neural networksfor conversational speech transcription,” in Proc. ASRU,2011, pp. 24–29.

[9] Xavier Glorot, Antoine Bordes, and Yoshua Bengio,“Deep sparse rectifier networks,” in Proc. AISTATS,2011, pp. 315–323.

[10] Peter Bodnar and Laszlo G. Nyul, “A novel method forbarcode localization in image domain,” in Image Anal-ysis and Recognition. 2013, vol. 7950 of Lecture Notesin Computer Science, pp. 189–196, Springer Berlin Hei-delberg.

[11] Laszlo Toth and Tamas Grosz, “A comparison of deepneural network training methods for large vocabularyspeech recognition,” in Proceedings of TSD, 2013, pp.36–43.

[12] Peter Bodnar and Laszlo G. Nyul, “Improving barcodedetection with combination of simple detectors,” in The8th International Conference on Signal Image Technol-ogy (SITIS 2012), 2012, pp. 300–306.

Date post:	24-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

QR CODE LOCALIZATION USING DEEP NEURAL NETWORKSinf.u-szeged.hu/~groszt/pubs/mlsp2014.pdf · QR code...

Documents