[IEEE 2012 19th IEEE International Conference on Image Processing (ICIP 2012) - Orlando, FL, USA...

IMAGE SUPER-RESOLUTION BY EXTREME LEARNING MACHINE

Le An, Bir Bhanu

Center for Research in Intelligent Systems, University of California, [email protected], [email protected]

ABSTRACT

Image super-resolution is the process to generate high-resolution images from low-resolution inputs. In this pa-per, an efficient image super-resolution approach based onthe recent development of extreme learning machine (ELM)is proposed. We aim at reconstructing the high-frequencycomponents containing details and fine structures that aremissing from the low-resolution images. In the training step,high-frequency components from the original high-resolutionimages as the target values and image features from low-resolution images are fed to ELM to learn a model. Givena low-resolution image, the high-frequency components aregenerated via the learned model and added to the initiallyinterpolated low-resolution image. Experiments show thatwith simple image features our algorithm performs better interms of accuracy and efficiency with different magnificationfactors compared to the state-of-the-art methods.

Index Terms— Image, super-resolution, feature, learning

1. INTRODUCTION

Often due to the various limitations such as moderate imagingsensors, the environmental conditions, or the limited trans-mission channel capacity, the images that we acquire are oflow-resolution (LR). The generation of the LR images is com-monly modeled with a blurring process followed by down-sampling. Visually the LR images are blurred with loss ofdetails in structures that usually reside in the high-frequency(HF) components of the original high-resolution (HR) image.Image super-resolution (SR) consists of the approaches thattry to solve the inverse problem of recovering the HR imagesfrom the LR images. For some applications where the qual-ity of the images greatly affects the subsequent processing,SR as a preprocessing step is quite desirable. For example,face images in surveillance cameras often have poor resolu-tion, which makes the face recognition algorithms difficult toachieve high accuracy. By using super-resolution (SR) tech-nique to feed HR images to the recognition algorithm, therecognition accuracy can be improved significantly [1].

There has been extensive work on SR methods. Tradi-tional SR algorithms require multiple LR images of the samescene to generate a HR image by integrating all the informa-tion from different images [2, 3]. However, registration atsub-pixel accuracy is indispensible in order to perform SR

successfully. Another type of SR algorithms requires singleLR image as input [4, 5, 6]. These reconstruction based meth-ods often use some heuristics or specific interpolation func-tions. The performance of these techniques degrades espe-cially when the magnification factor becomes large.

In recent years, learning based approaches for image SRhave received a lot of attention in which patterns of the imagesfrom the training set are explored. Freeman et al. [7] proposedan example-based method by predicting the HR images fromthe LR images using Markov Random Field (MRF) computedby belief propagation. Yang et al. [8] solved the SR problemfrom the perspective of compressive sensing, which ensuresthat under mild conditions the sparse representation of a HRimage can be recovered from the downsampled signal.

In this paper, we tackle the SR problem using a learningbased approach. Our SR algorithm is based on the extremelearning machine (ELM) [9]. The focus is to recover the HFcomponents of the HR image efficiently and accurately. Inthe training step, features are extracted from the initially in-terpolated LR images (e.g., using bicubic interpolation) anda model that maps the interpolated images to the HF compo-nents from the HR images is learned. Given a test LR image,we first interpolate the image. Then the HF components areestimated using the model learned during the training. Bycombining the interpolated image and the HF components, afinal HR image is generated faithfully with sufficient details.

The remainder of this paper is organized as follows. InSection 2 ELM is briefly introduced. The proposed SR al-gorithm is described in Section 3. Section 4 shows the ex-perimental results and their comparison to the state-of-the-artmethods. Finally Section 5 concludes the paper.

2. EXTREME LEARNING MACHINE

ELM was initially developed for single-hidden-layer feedfor-ward neural networks (SLFNs) [10]. One of the major meritsof ELM is that the hidden layer needs not to be tuned. Theoutput function of ELM is given by

fL(x) =L∑

i=1

βihi(x) = h(x)β (1)

where β = [β1, β2, ..., βL]T is a vector consisting of the out-

put weights between the hidden layer and the output node.h(x) = [h1(x), h2(x), ...hL(x)]T is the output of the hidden

2209978-1-4673-2533-2/12/$26.00 ©2012 IEEE ICIP 2012

layer given input x. Function h(x) maps the original inputdata space to the L-dimensional feature space.

According to [9], ELM does not only aim at reaching theminimum training error but also the smallest norm of the out-put weights, which would yield a better generalization perfor-mance. Thus, in ELM the following quantities are minimized

minimize

{‖Hβ − T‖2‖β‖ (2)

where T contains the training target value and H is the thehidden-layer output matrix

H =

⎡⎢⎣

h1(x1) · · · hL(x1)...

...h1(xN ) · · · hL(xN )

⎤⎥⎦ (3)

In the implementation, the minimal norm least squaremethod was used instead of standard optimization method [10].One advantage of ELM is that in the training process thehassle of parameter tuning is avoided. The generalizationperformance of ELM is not sensitive to the number of hiddennodes as tested in [9]. In addition, ELM has very fast learningspeed. These merits make ELM user-friendly and efficient.

3. TECHNICAL APPROACH

The proposed algorithm consists of two steps: training andtesting. Figure 1 gives an overview of the proposed method.Similar to [8] we apply our method on the luminance channelonly since humans are more sensitive to luminance changes.For the chrominance channels bicubic interpolation is ap-plied.

In the training process, a number of HR images are used.The HR image IHR is first blurred and downsampled by afactor of k. The downsampled image is then interpolated bya basic interpolation method (bicubic interploation in this pa-per) with a magnification factor of k. This step yields an ini-tially upscaled image I0 with the same size as IHR. The HFcomponents IHF are obtained by

IHF = IHR − I0 (4)

Simple features from I0 are extracted and the feature vec-tors fed to ELM for training consist of two components: pixelintensity values from local image patches and 1st and 2nd or-der derivative magnitudes.

At each pixel location (i, j) of I0, a local patch Pi,j ofsize m ×m centered at (i, j) is extracted. This image patchis then reshaped into a row vector p(i,j) of size m2. Thus, theinformation about local pixel intensity values is encoded.

In order to account for the directional change in pixelintensity values, we calculate the 1st order derivatives in thehorizontal and vertical directions. In addition, 2nd orderderivatives are calculated to capture the rate of change in the1st order derivatives. For each pixel 5 derivative values are

obtained (∂I0∂x , ∂I0∂y , ∂2I0

∂x2 , ∂2I0∂y2 , ∂2I0

∂x∂y ).

Fig. 1. The system diagram of the proposed super-resolutionalgorithm. LR image is obtained by blurring and downsam-pling the HR image. I0 is the initially interpolated image. Inthe training process, feature vectors from I0 (X) and the targetvalues (Y) from HF (the high-frequency components obtainedby subtracting I0 from the HR image) are sent to ELM to gen-erate a model. In the testing process, HF is predicted by ELMusing trained model and the ouput is the combination of thepredicted HF image and the initially interpolated image.

To calculate the 1st and 2nd order derivatives we adopt themethod in [11] which is more accurate than common routineby taking difference between the adjacent pixels. Mathemat-ically, here the derivative is formulated as an optimization ofthe rotation-invariance of the gradient operator. In the dis-crete implementation, to design a filter of length L, the errorfunctions for the 1st and 2nd order derivatives are given by

E(�p, �d1) =

∣∣∣jwFs�p− Fa�d1

∣∣∣2|Fs�p|2

(5)

E( �d2) =∣∣∣j2w2Fs�p− Fa

�d2

∣∣∣2 (6)

where �p is a defined parameter vector of length L+12 contain-

ing the independent prefilter samples. �d1 contains the inde-

pendent derivative kernel samples of length L−12 . �d2 con-

tains one half of the full filters taps. Fs and Fa are matricescontaining the real and imaginary components of the discreteFourier basis. For details please refer to [11]. In our case weuse the 5-tap filter. The computed 1st and 2nd order derivativevalues are padded to form a row vector d(i,j).

Combining all the features together, we now have the fea-ture vector at (i, j) as v(i,j) = [p(i,j), d(i,j)]. The length of

this feature vector is m2 + 5. For the corresponding targetvalue, we take the pixel value i(i,j) from IHF . For each train-ing image, the pixels are traversed in a raster scan manner.

2210

The instances [v(i,j), i(i,j)] from all of the HR training im-ages are stacked together as input to ELM. After training, amodel is generated that describes the mapping from the ini-tially interpolated image to the HF image.

To super-resolve a LR image ILR, the same initial inter-polation is applied, generating a base image I0. At each pixelposition (i, j) in I0, we extract the same features as we didin the training step. With the input feature vectors and thetrained model, the predicted value i(i,j) is obtained. Aftergoing through every pixel in I0, we then have the HF compo-nents IHF . The final output IHR is constructed by combiningI0 and IHF together as shown in Figure 1.

4. EXPERIMENTS

In the experiment, we use 20 hidden neurons and sigmoidfunction as the activation function in ELM (the results are notsensitive to the number of hidden neurons as tested in our ex-periments). The input attributes in the feature vectors are nor-malized to [−1, 1]. Eight 512 × 512 sized HR images fromUSC SIPI image database (http://sipi.usc.edu/database/) areused for training. We use a 5× 5 Gaussian blur function withstandard deviation of 1 to preprocess the HR images beforedownsampling. For testing, we use 25 various images fromthe morgueFile online archive different from the training im-ages (http://morguefile.com, same collection of images wereused in [5]). The HR images are of the size 512 × 512. Ex-periments are conducted with magnification factors of 2 and4. The corresponding sizes of the LR images are 256 × 256and 128× 128. The size of the local image patch is 3× 3.

We compare our method to three state-of-the-art methods:iterative curve based interpolation (ICBI) [5], kernel regres-sion based method (KR) [6] and sparse representation basedmethod (SP) [8]. The implementations of these methods arefrom the authors’ websites. The default parameters and set-tings of these methods are used in our experiments. Figure 2shows some sample results of the four methods (Electronicversion is recommended for better comparison).

As can be seen from the results, ICBI and KR are notable to recover the HF components and the generated imagessuffer from blurriness at both 2x and 4x magnification. KRtends to over-smooth the images especially in the texture-richregions. SP and our methods produces more details on thesuper-resolved images. However compared to SP, the resultsby our method are visually superior. At 2x, our method pro-duce HR images that have vivid details and are very close tothe ground truth. Even at 4x in which case the resolution ofthe inputs is very low, our method is still able to achieve goodperformance.

To measure the performance quantitatively, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [12]are calculated as shown in Table 1. At 2x the numerical scoresof our method are better than all the other methods. Whenthe magnification factor becomes 4, SP and our method bothoffer competitive results compared to ICBI and KR. Note thatthe above performance of our method is achieved by usingvery simple image features and the calculation involves very

Method ICBI [5] KR [6] SP [8] Proposed

PSNR (2x) 35.03 34.64 35.95 36.74PSNR (4x) 33.77 32.55 33.94 34.02SSIM (2x) 0.9227 0.9125 0.9628 0.9750SSIM (4x) 0.8520 0.7749 0.8804 0.8680

Table 1. Average PSNR and SSIM scores for different super-resolution methods (2x and 4x).

Method ICBI [5] KR [6] SP [8] Proposed

Time (2x) 3.23 21.10 727.34 3.71Time (4x) 3.52 18.97 720.82 3.74

Table 2. Average time (in seconds) to super-resolve an imagefor different super-resolution methods (2x and 4x).

small feature vectors of length 14 (pixel intensity values 3 ×3+derivative magnitudes 5). The proposed algorithm is lesscomplicated as compared to the other methods.

We also computed the average time to super-resolve animage with different magnification factors. The programswere executed on a desktop with Intel Core2 2.4 GHz CPUand 3 GB of RAM. Here we do not compare the training timesince this is a one-time offline process, although in the train-ing process ELM converges very fast (within 20 seconds).From Table 2 we can see that the running time of all themethods are not sensitive to the magnification factors. Ourmethod without code optimization is very fast with differentmagnification factors. The running time of ICBI is close toour method but the output quality is less satisfactory. KRprocesses images at a moderate rate without competitive per-formance. Although SP achieves similar performance to ourmethod at 4x, it takes much more time to generate the output.Due to the efficiency of our method, many real-time applica-tions are possible.

5. CONCLUSIONS

In this paper, an efficient algorithm for image super-resolutionbased on extreme learning machine (ELM) is proposed. Dur-ing the training process, simple features including pixel in-tensity values in a local image patch and the 1st and 2ndorder derivative magnitudes are extracted. The target valueis the high-frequency components obtained by subtractingthe initially interpolated low-resolution image from the high-resolution one. ELM then learns a model that maps theinterpolated image to the high-frequency components. Givena low-resolution image, same features are extracted from theinterpolated image. By applying the trained model, ELMis able to predict the high-frequency components. The fi-nal output is the combination of the interpolated image andthe high-frequency components. Compared to the state-of-the-art methods, our method achieves high performance inboth subjective and quantitative evaluations, and less compli-cated. Furthermore, the computation of our method is veryefficient. Involving a more comprehensive dataset and moresophisticated image features would be promising for better

2211

Fig. 2. From left to right: results by ICBI [5], results by KR [6], results by SP [8], results by the proposed method, originalimages. From top to down: super-resolution at 2x and 4x.

performance and these aspects will be investigated in ourfuture work.

Acknowledgment This work was supported in part by NSFgrant 0905671.

6. REFERENCES

[1] S. Biswas, G. Aggarwal, and P.J. Flynn, “Pose-robust recogni-tion of low-resolution face images,” in Proc. CVPR, 2011.

[2] R. Tsai and T. Huang, “Multi-frame image restoration and reg-istration,” Advances in Computer Vision and Image Process-ing, 1984.

[3] S. Farsiu, M.D. Robinson, M. Elad, and P. Milanfar, “Fast androbust multiframe super resolution,” IEEE TIP, 2004.

[4] J. Sun, Z. Xu, and H.-Y. Shum, “Image super-resolution usinggradient profile prior,” in Proc. CVPR, 2008.

[5] A. Giachetti and N. Asuni, “Real time artifact-free image up-scaling,” IEEE TIP, 2011.

[6] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression forimage processing and reconstruction,” IEEE TIP, 2007.

[7] W.T. Freeman, E.C. Pasztor, and O.T. Carmichael, “Learninglow-level vision,” IJCV, 2000.

[8] J. Yang, J. Wright, T.S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE TIP, 2010.

[9] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extremelearning machine for regression and multiclass classification,”IEEE SMCB, 2011.

[10] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learningmachine: a new learning scheme of feedforward neural net-works,” in Proc. IJCNN, 2004.

[11] H. Farid and E.P. Simoncelli, “Differentiation of discrete mul-tidimensional signals,” IEEE TIP, 2004.

[12] Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, “Im-age quality assessment: from error visibility to structural simi-larity,” IEEE TIP, 2004.

2212

Date post:	09-Dec-2016
Category:	Documents
Upload:	bir
View:	220 times
Download:	5 times

[IEEE 2012 19th IEEE International Conference on Image Processing (ICIP 2012) - Orlando, FL, USA...

Documents