IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. …€¦IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19,...

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 8, AUGUST 2010 2017

Robust Web Image/Video Super-ResolutionZhiwei Xiong, Xiaoyan Sun, Member, IEEE, and Feng Wu, Senior Member, IEEE

Abstract—This paper proposes a robust single-image super-reso-lution method for enlarging low quality web image/video degradedby downsampling and compression. To simultaneously improvethe resolution and perceptual quality of such web image/video, webring forward a practical solution which combines adaptive regu-larization and learning-based super-resolution. The contributionof this work is twofold. First, we propose to analyze the imageenergy change characteristics during the iterative regularizationprocess, i.e., the energy change ratio between primitive (e.g.,edges, ridges and corners) and nonprimitive fields. Based on therevealed convergence property of the energy change ratio, ap-propriate regularization strength can then be determined to wellbalance compression artifacts removal and primitive componentspreservation. Second, we verify that this adaptive regularizationcan steadily and greatly improve the pair matching accuracy inlearning-based super-resolution. Consequently, their combinationeffectively eliminates the quantization noise and meanwhile faith-fully compensates the missing high-frequency details, yieldingrobust super-resolution performance in the compression scenario.Experimental results demonstrate that our solution producesvisually pleasing enlargements for various web images/videos.

Index Terms—Adaptive regularization, compression artifacts re-moval, energy change ratio, learning-based super-resolution (SR),primitive/nonprimitive field, web image/video.

I. INTRODUCTION

W ITH the Internet flourishing and the rapid progress inhand-held photographic devices, image and video are

becoming more and more popular on the web, due to their richcontent and easy perception. Consequently, image search en-gines and online video websites have experienced an explosionof visits during the past few years. However, limited by thenetwork bandwidth and server storage, most web image/videoexists in a low quality version degraded from the source. Themost common degradations are downsampling and compres-sion. Downsampling exploits the correlation in the spatial do-main while compression further exploits the correlation in thefrequency and temporal (for video) domains. Quality degrada-tion greatly lowers the required bandwidth and storage, makingthe access to web image/video practical and convenient. Butthese benefits are obtained at the expense of impairing the per-ceptual experience of users, as degradation inevitably leads to

Manuscript received June 07, 2009; revised February 17, 2010; First pub-lished March 15, 2010; current version published July 16, 2010. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Kenneth K. M. Lam.

Z. Xiong is with University of Science and Technology of China, Hefei,230027, China. This work was done while he was with Microsoft Research Asia,Beijing, 100081, China (e-mail: [email protected]).

X. Sun and F. Wu are with Microsoft Research Asia, Beijing, 100081, China(e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2010.2045707

information loss, which behaves as various artifacts in the re-sulting image/video, e.g., blurring, blocking and ringing.

There is a large demand for improving the perceptual qualityof web image/video, among which the resolution enhancement,also known as super-resolution (SR), is an especially impor-tant issue and attracts a lot of attention. SR refers to the tech-niques achieving high-resolution (HR) enlargements of pixel-based low-resolution (LR) image/video. Basically, there are twokinds of SR, according to the amount of LR images utilized:multi-image SR, which requires several LR images of the samescene to be aligned in subpixel accuracy, and single-image SR,which generates a HR image from a unique source.

SR has many applications in the real world. Take imagesearch engines for example: once a query is entered, a largenumber of images need to be returned simultaneously, andthe results are first displayed in their LR forms (often called“thumbnails”). Users then need to click on the thumbnailto get its original HR version. Nevertheless, sometimes it isfrustrating that the source image is removed or the server istemporarily unavailable. Single-image SR, at this moment, cansave users from the bother of linking to every source if only anenlarged preview is desired.

Previous work on single-image SR can be roughly dividedinto four categories: interpolation-based [1]–[4], reconstruc-tion-based [5], [6], classification-based [7] and learning-based[8]–[17]. Despite great diversity in implementation, thesemethods have a common premise that the LR image is onlydegraded by downsampling. This is not always true in the webenvironment, where compression is widely adopted. For imagesearch engines, compression helps reduce the thumbnail sizeby up to 50% without obvious perceptual quality loss whenpresented in the LR form. But now if SR (any of the above) isdirectly performed, compression artifacts will be magnified outand the perceptual quality of resulting HR images will be poor.

On the other hand, multi-image SR has been used to enlargevideo for a long time [5], [18]–[22], and corresponding tech-niques for compressed video SR have also been reported inliterature [23]–[25]. These methods generally assume a priordistribution of the quantization noise and then integrate thisknowledge into a Bayesian SR framework, or use the quanti-zation bounds to determine convex sets which constrain the SRproblem. In practical applications, however, the compression ar-tifacts caused by the quantization noise are largely dependenton the video content and difficult to be modeled with an explicitdistribution. Moreover, since the performance of Bayesian SRheavily depends on the accuracy of frame registration and mo-tion estimation, these methods are not capable of reconstructinghigh frequency details of dynamic videos that contain fast andcomplex object motions.

In this paper, we present a practical solution which combinesadaptive regularization with learning-based SR to simulta-

1057-7149/$26.00 © 2010 IEEE

Authorized licensed use limited to: MICROSOFT. Downloaded on July 31,2010 at 03:31:52 UTC from IEEE Xplore. Restrictions apply.

2018 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 8, AUGUST 2010

Fig. 1. Flowchart of our compressed image super-resolution scheme.

neously improve the resolution and perceptual quality ofcompressed image/video. A straightforward implementation ofthis idea has been reported in our previous work [26], wherethe regularization strength is determined by the JPEG com-pression quality parameter (QP), followed by learning-basedpair matching to further enhance the high-frequency details inthe interpolated image. This simple yet effective combinationgives perceptually high quality SR results for compressedthumbnail images. To further improve the robustness of suchan approach, we propose a more solid criterion for the adaptiveregularization control in this work, based on the convergenceproperty of the image energy change ratio between primitiveand nonprimitive fields during iterative regularization. By ap-propriately locating the turning point where regularization losesits efficacy in distinguishing primitive components from com-pression artifacts, the pair matching accuracy in learning-basedSR can be steadily and greatly improved. In this way, thequantization noise is effectively eliminated while the missinghigh-frequency details are faithfully compensated. Moreover,the proposed single-image SR method can be directly appliedinto compressed video SR, by introducing certain interframeinteractions on the regularization strength and simple spa-tial-temporal consistency optimization, as reported in our latestwork [37]. Different from conventional methods, our solutiondoes not require any specific assumption on the quantizationnoise or the object motion, which greatly extends its scope ofapplication in the web environment.

The rest of this paper is organized as follows. Section IIformulates the single-image SR problem in the compres-sion scenario and briefly introduces the regularization andlearning-based SR techniques used in our scheme. The adaptiveregularization control is elaborated on in Section III. Section IVextends the proposed method into compressed video SR. Ex-perimental results are presented in Section V, and Section VIconcludes the paper.

II. COMPRESSED IMAGE SUPER-RESOLUTION

A. Problem Formulation

An overview of our single-image SR scheme in the compres-sion scenario is shown in Fig. 1. Suppose is an original HRimage, it is first downsampled with a low-pass filter (mostlyisotropic Gaussian) to form an LR measurement

(1)

where is a decimation operator with scaling factor . isthen compressed, resulting in a degraded LR measurement

(2)

where represents the quantization error introduced by com-pression in the spatial domain.

is the actual input of our SR system. This system consistsof three modules: PDE regularization, bicubic interpolation andlearning-based pair matching. Regularization is first performedon to get an artifacts-relieved LR image

(3)

where denotes the PDE regularization functional and thesuperscript represents the total iteration number of regular-ization, which determines the regularization strength. is thenupsampled with scaling factor to get an intermediate HR re-sult

(4)

where stands for the bicubic interpolation filter. The final HRimage is obtained after learning-based pair matching from

and a prepared database . The maximum a posterior prob-ability (MAP) estimate of can be expressed as

(5)

B. Learning-Based Pair Matching

Single-image SR aims to obtain a HR reconstruction froma LR measurement . For learning-based approaches, a set ofexamples organized in a database are utilized in the onlinereconstruction process. These examples usually exist as cooc-curring patch pairs extracted from training imagesat two different resolution levels. The basic idea of using exam-ples in SR is that natural images are special signals occupyingonly a vanishingly small fraction of the high dimensional imagespace. Therefore, high-frequency details that do not exist incan be “stolen” from through pair matching, i.e., given anLR patch from the input measurement, seek in the database forsimilar LR examples, and their corresponding HR counterpartscan then be used for the reconstruction as they provide high-fre-quency details that fit the input measurement.

In our scheme, the primitive-based hallucination method pro-posed in [12] is adopted for pair matching, for which imageprimitives (edges, ridges and corners) are represented by exam-ples and pair matching is only applied to the primitive compo-nents in images. The superiority of this method is twofold. First,human observations are especially sensitive to image primitiveswhen going from LR to HR. Second, the primitive manifold is ofintrinsic lower dimensionality compared with raw image patchmanifolds as used in [10], [13], [14] and can be more effectivelyrepresented by examples. (Please refer to [12] for detailed pro-cedures of primitive example generation.)

Generally speaking, learning-based pair matching exploitsthe correspondence between image signals at two different res-olution levels, whereas another kind of degradation—compres-sion—is seldom considered in previous works. One may sug-


XIONG et al.: ROBUST WEB IMAGE/VIDEO SUPER-RESOLUTION 2019

gest directly involving compression in preparing the examples.Unfortunately, this implementation will heavily lower the pairmatching accuracy as the quantization noise, unlike the high fre-quency components lost in downsampling, is difficult to be ef-fectively represented by examples. The underlying reason is thatcompression corrupts the primitive pattern (or other feature pat-terns) contained in examples, and thus the correspondence be-tween them. As an alternative, we propose to keep the databaseaway from compression while introducing regularization on thecompressed LR measurement.

C. PDE Regularization

Among various available regularization techniques,anisotropic PDE’s [27]–[30] are considered to be one ofthe best, due to their ability to smooth data while preservingvisually salient features in images. A brief restatement of PDEregularization is given below. Suppose is a 2D scalar image,the PDE regularization can be formulated as the juxtapositionof two oriented 1D heat flows along the gradient direction

and its orthogonal, named the isophote direction (as iseverywhere tangent to the isophote lines in the image), withcorresponding weights and

(6)

where denotes the image gradient magni-tude. The choice of and is not determinate; only certainproperties need to be satisfied. In this paper, we use the fol-lowing weights as suggested in [30]

(7)

This is one possible choice inspired from the hyper-surface for-mulation of the scalar case [31]. The PDE in (6) can be equiva-lently written as

(8)

where is the Hessian matrix of and is an anisotropic 22 tensor defined as

(9)

gives the exact smoothing geometry performed by the PDE,which can be viewed as a thin ellipsoid with the major axis per-pendicular to the gradient direction, as shown in Fig. 2.

In a numerical scheme, the input measurement is regularizediteratively. In the th iteration, there is

(10)

where is a positive constant controlling the updating step,is the total iteration number, and the image intensity change ve-locity can be calculated from (8) based on the spatial dis-cretization of the gradients and the Hessians.

Fig. 2. PDE smoothing geometry. It can be viewed as a juxtaposition of twooriented 1D heat flows along the gradient direction � and the isophote direction�, with corresponding weights � and � .

As pointed out in [30], regularization PDE’s generally do notconverge toward a very interesting solution. Most of the time,the image obtained at is constant, corresponding toan image without any variations. Therefore, the total iterationnumber of regularization is often manually determined ac-cording to the image degradation level. In the scenario of com-pressed image SR, one question arising here is that how to adap-tively control this regularization strength to maximize theposterior in (5), given prefixed pair matching database andPDE regularization functional . In other words, can we find theturning point where compression artifacts are effectively elimi-nated while primitive components are still well preserved?

III. ADAPTIVE REGULARIZATION CONTROL

A. Energy Change During Regularization

To obtain appropriate regularization strength that well bal-ances artifacts removal and primitive preservation, we proposeto investigate the image energy change characteristics duringthe iterative regularization process. For this purpose, an imageis first divided into primitive field and nonprimitive field. Thispartition can be determined by the orientation energy edge de-tection [32]. Suppose is a bitmap storing detected edge pixellocations in

is an edge pixelelse.

(11)

The primitive field (PF) is defined as

else(12)

where refers to a th order neighborhood of

(13)

Correspondingly, the nonprimitive field (NPF) is defined as

(14)

Fig. 3 illustrates the PF and NPF partition with .After the th iteration of regularization, the image energy

change in PF and NPF can be calculated as

(15)



Fig. 3. Partition of primitive and nonprimitive fields with � � �. Black edgepixels and their gray neighbors constitute the primitive field while the remainingwhite pixels constitute the nonprimitive field. Pixel �� is in the primitivefield and pixel �� is in the nonprimitive field.

Fig. 4. (a) Image energy change ratio between PF and NPF during iterativeregularization, and (b) test images.

where is the image intensity change in (10). We denote theenergy change ratio between PF and NPF as

(16)

Fig. 4(a) gives several practical results of the curve onthe test images shown in Fig. 4(b). The test images are first 1/3downsampled from the original and then compressed by JPEGwith QP set to 60. It can be seen all those curves con-verge after a few iterations. To obtain a general distribution pat-tern of the convergence speed, we use several video se-quences (each containing 1500 frames) randomly downloadedfrom Youtube [35] for test, instead of gathering a large numberof images. As shown in Fig. 5, most frames require less than15 iterations, which validates the convergence property of the

curve in the web environment.

Fig. 5. Distribution of total regularization iteration number for web videos.

The convergence property of the curve can be figuredout intuitively from the edge-preserving nature of PDE regular-ization. In the earlier stage, the energy change in NPF is moreintensive than that in PF, so increases with . When regular-ization is performed to a certain stage, PDE loses its efficacy indistinguishing salient features from images, and then remainsat a stable level. Further, when is small, there is a large prob-ability that compression artifacts such as ringing and blockingappear in NPF. So PDE regularization removes compression ar-tifacts first.

According to the energy change characteristics during PDEregularization, we can now determine appropriate regularizationstrength to maximize the posterior in (5), by locating the turningpoint where the curve tends to converge. At this time, arti-facts removal and primitive preservation in a compressed imageare best balanced. In practice, we stop the regularization at the

th iteration when

(17)where is a constant and represents the fastest increasingspeed of .

B. Pair Matching Accuracy

To demonstrate the necessity and effectiveness of adaptiveregularization in compressed image SR with learning-basedmethods, we then investigate the pair matching accuracy underthree different circumstances, i.e., without compression, withcompression but no regularization and with both compressionand adaptive regularization. We denote them as , and .

To compare the pair matching accuracy, we use a ReceiverOperating Characteristic (ROC) curve to demonstrate thetradeoff between match error and hit rate. We define the matcherror as

(18)

where denotes the real missing HR primitive patch and isthe HR example found through pair matching from the existentLR primitive patch . For a given match error , the hit rate isthe percentage of test data whose match error is less than .

Fig. 6 presents three ROC curves based on the pair matchingresults of 50,000 primitive patches over 100,000 pairs of trained



Fig. 6. ROC curves of pair matching accuracy. 50,000 primitive patches aretested over 100,000 trained examples.

Fig. 7. PSNR curves against the iteration number of regularization, before andafter pair matching. Test images are taken from Fig. 4(b). Left: Lena, and right:Peppers.

examples. The test data is sampled from images in Fig. 4(b),whereas the training images are shown in Fig. 11. As can beobserved from and , when compression is involved, the pairmatching accuracy degrades heavily. On the other hand, withadaptive regularization, is higher than at any match errorand quickly approaches , which indicates that the proposedadaptive regularization steadily and greatly improves the pairmatching accuracy in compressed image SR.

To further verify the adaptivity criterion in our scheme, inFig. 7 we plot two PSNR curves against the iteration numberof regularization, with and without pair matching. It can be ob-served that, though PDE regularization removes artifacts (andthus improves the perceptual quality), it gradually lowers thePSNR, as we don’t enforce any fidelity constraints. However,after pair matching, the PSNR curve pattern changes. The peakvalue appears near the iteration number when the corresponding

curve converges, if compared with Fig. 4(a).

C. Revisit of Training Set

In the above discussion, we assume a prefixed pair matchingdatabase , mainly to verify the adaptivity criterion in the regu-larization step. Nevertheless, for learning-based approaches, thechoice of training set is still an important issue. Therefore, wealso analyze how the choice of training set influences the per-formance of our method.

Since the primitive pair matching mainly exploits the cor-respondence between LR and HR primitive components, thetraining images are generally required to have rich and diverseprimitive patterns. However, we would like to point out that, as

Fig. 8. PSNR improvement between directly interpolated images and thosewith our proposed SR under different database sizes. Test images are taken fromFig. 4(b).

the primitive pattern is a low-level vision feature, the SR per-formance only depends on the number of distinct primitive pat-terns in the trained database, instead of the structural similaritybetween training images and test images. Moreover, once thenumber of primitive patterns increases to a certain value, theSR performance will remain stable.

To observe the effect on the size of database, we conduct an-other experiment. During the experiment, we gradually increasethe size of database from 0 (no pair matching) to 100,000 pairsof distinct primitive patterns (extracted from a training set of16 Kodak images [36] shown in Fig. 11), and then measurethe average PSNR improvement between directly interpolatedcompressed images and those with our proposed SR. The re-sult is shown in Fig. 8, where the PSNR improvement tendsto be saturated when the database exceeds a certain size. Notethat the minimum database size required for a stable SR perfor-mance without compression can be larger than that indicated inFig. 8, because compression, as well as regularization, reducesthe number of distinct primitive patterns in the input images.

D. Discussion and Summarization

Compressed image SR is a practical problem in the web ap-plication of single-image SR, but has rarely been investigatedbefore. In this subsection, we would like to supplement someintuition on why learning-based pair matching and PDE regu-larization are combined to solve this problem.

It is a natural idea to conceive a two-step strategy to addressthis problem, first relieving compression artifacts and then per-forming common SR. However, choosing an effective combi-nation is nontrivial. Since there is actually no way to eliminatequantization noise without impairing high-frequency image de-tails, it requires the subsequent SR to have a strong ability to re-cover the weakened information. As mentioned in the introduc-tion, there are several categories of single-image SR methods,among which learning-based pair matching shows its superi-ority when the integration of subclass priors, i.e., trained ex-amples, is more powerful than a generic smoothness prior asused in other approaches. Besides the impressive results ob-tained in domain-specific applications (e.g., face, text [8], [9]),primitive-based hallucination [12] further shows appealing per-formance for generic image SR, which has been validated in thefollowing works [15]–[17]. Therefore, we choose the primitivepair matching as our SR method.



Fig. 9. Framework of our solution for compressed video super-resolution. The input frame � and eliminated noise image � are presented in their interpo-lated version in the example for better visualization. Note � is only the luminance component of the noise image.

As learning-based SR is usually performed at patch level andrelatively noise-sensitive, we then prefer a global algorithm forcompression artifacts removal to avoid local inconsistency (i.e.,adjacent regions should be stably smoothed). This is the firstreason we use PDE regularization, due to its global smoothingproperty. On the other hand, a new challenge emerges from thecombinative method. That is, as stated all along this section, howto adaptively control the regularization strength to best exploitthe capability of these two somewhat contradictive techniques(one tends to smooth and the other tends to enhance). This isthe second reason we choose PDE regularization, due to its pro-gressive smoothing property.

To summarize this section, we give a more informative de-scription of our algorithm for compressed image SR as follows:

Input: compressed low resolution image

Output: enhanced high resolution image

Begin1. Upsample to through bicubic interpolation.2. Find a PF/NPF partition of through the

orientation energy edge detection [32].3. Perform iterative PDE regularization on :

a) After each iteration, upsample the regularizedimage to , through bicubicinterpolation;

b) Calculate the image energy changebetween and based on the PF/NPFpartition ;

c) Calculate the energy change ratio between PF andNPF as , and also record themaximum slope of as ,

;d) If , stop regularization, and keep the

total iteration number .

4. Extract LR primitive patches from and findcorresponding HR primitive patches from a prepareddatabase through pair matching speeded up by theapproximate nearest neighbor (ANN) tree searching [33].

5. Add the HR primitive patches back to to form thefinal HR image , where the compatibility of neighboringHR primitive patches is enforced by averaging the pixelvalues in overlapped regions.

End

IV. COMPRESSED VIDEO SUPER-RESOLUTION

A. Framework

Since the above introduced single-image SR method doesnot require frame registration or motion estimation, it can bedirectly applied into the compressed video SR in a frame-by-frame style. By integrating certain interframe interactions on theregularization strength and simple spatio-temporal coherencyconstraints, our scheme is competent for the SR task of webvideos with dynamic content and different degradation levels.

The framework of our solution is shown in Fig. 9. Similar tothat of image SR, it consists of three steps. First, a th frame

from an LR video is divided into PF and NPF and iterativePDE regularization is performed on , during which the en-ergy change velocities in both PF and NPF are recorded. Whenthe ratio of these two velocities converges (judged by a param-eter , which is also influenced by that of the previous frame

), regularization stops and the accumulated noise imageis subtracted from , resulting in an artifacts-relieved

frame . Then, is upsampled to the desired resolutionthrough bicubic interpolation. Last, the primitive componentsin the interpolated frame are enhanced with learning-basedpair matching. Meanwhile, the temporal consistency is enforcedby referring to the previous interpolated frame and itspair matching indices . Adding the primitive enhancing



Fig. 10. Energy change characteristics in consecutive frames. Frames withheavier degradation exhibit slower � � � convergence speed. Test frames areshown in Fig. 15.

image back to , the final HR frame is gener-ated. A practical example is given in Fig. 9 to visualize thisframework.

B. Interframe Interaction

The regularization strength control elaborated in Section IIIcan adapt to different degradation levels due to quantizationwithin a single image/frame. However, in a video sequence withfast motion or scene switch, compression artifacts in consec-utive frames could greatly vary due to inaccurate interframeprediction, even when the quantization levels are set to be thesame. The adaptive regularization should also take these circum-stances into consideration.

For two frames with similar content, the one with heavierdegradation requires higher regularization strength, and thisadaptivity is mainly reflected by the parameter . One caneasily find in Fig. 10 that frames with heavier degradation havesmaller (refer to Fig. 15 for the test frames), which means theconvergence speed of the curve is inversely proportionalto the degradation level in consecutive frames.

To further improve the adaptivity of regularization, we recordof the th frame and for the th frame, is cal-

culated as

(19)

where is the initial quantity measured from the currentframe. If , it suggests the degradation in thecurrent frame is more severe than that in the previous frame.Then is further diminished to increase the regularizationstrength of the current frame (according to (17), reducingwill increase ), and vice versa. In this way, the regularizationstrength can also adapt to the variable degradation betweenconsecutive frames caused by fast motion or scene switch,making the quality improvement on the whole video sequencemore stable.

C. Spatio-Temporal Coherency Optimization

After the adaptive regularization, compression artifacts ineach frame are effectively reduced while primitive componentsare still well preserved. Primitives in the interpolated frame arethen enhanced with learning-based pair matching. The main

Fig. 11. Training images (1536 � 1024 pixels). 100,000 pairs of primitivepatches are extracted from these images.

problem when applying this step to video is, how to make theenhanced primitives temporally consistent to avoid flicker, espe-cially for sequences with slight motion. To solve this problem,we propose to optimize the spatio-temporal coherency with asimple yet effective constraint.

We first define two terms for the convenience of narration. Letdenote an LR primitive patch extracted from the location of

the th interpolated frame, and represent the enhancing HRprimitive patch corresponding to . In the temporal domain,we record the pair matching indices in the th frame.Then, for the th frame, each is compared with inthe same position of the previous interpolated frame (in case

exists). If is judged the same as , i.e., the sum ofabsolute difference (SAD) is smaller than a given threshold, thepair matching index of is directly assigned to ; or elsea new pair matching for is conducted in the database.

In the spatial domain, since some pair matching resultsare derived from the previous frame and others are generatedfrom the current frame, the compatibility of learned enhancingpatches should be optimized. Specifically, for each thatcannot use the index from the previous frame and a new pairmatching is required, we take the first pair matching re-sults as candidates for , and theoptimum one is found by

(20)

where is the previous selected enhancing patch in the currentframe in raster-scan order, and function measures the SADin the overlapped region of two patches. In summary, the en-hancing patch corresponding to can be denoted as

otherwise(21)

where is the selected enhancing patch in the locationof the th frame, and is a small threshold. Finally, theprimitive enhancing image is generated by assembling all en-hancing patches, where pixel values in the overlapped regionsare averaged.



Fig. 12. Super-resolution results of offline images “Butterfly” and “Lena”. (a) Bicubic interpolation, (b) learning-based pair matching, (c) PDE regularization,(d) our approach, (e) luminance component of eliminated noise image after adaptive regularization, and (f) primitive enhancing image after pair matching.

V. EXPERIMENTAL RESULTS

A. Image Results

We test our SR scheme on both offline images degraded bydesignated downsampling and compression from the sources, aswell as compressed thumbnail images on the web. For the offlinetest images, we use Gaussian filter for downsampling with deci-mation factor , and JPEG for compression with .For the test images on the web, the downsampling process istotally unknown while the compression format is still JPEG,but with unknown QP’s. The PDE updating step , theneighborhood order of primitive field , the regularization

strength control parameter , and the upsampling factor. A 16M record database consisting of 100,000 pairs of

9 9 sized primitive patches is used in pair matching. These ex-amples are trained from 16 representative natural images shownin Fig. 11. For color images, regularization is performed on boththe luminance and chrominance components, for which com-pression is applied. Learning-based pair matching, however, isonly performed on the luminance component, as human ob-servers are more sensitive to the luminance change in imageswhen going through LR to HR.

Fig. 12 gives the SR results of two offline images obtainedthrough several methods, including bicubic interpolation,



Fig. 13. Super-resolution results of offline images “Lily” and “Peppers”. (a) Directional interpolation, (b) backprojection, (c) learning-based pair matching,(d) directional interpolation after adaptive regularization, (e) backprojection after adaptive regularization, and (f) learning-based pair matching after adaptiveregularization.

learning-based pair matching, PDE regularization and ourproposed approach. Compared with bicubic interpolation, reg-ularization effectively reduces the compression artifacts whilepair matching well compensates the high frequency details, asdemonstrated in the eliminated noise image and the primitiveenhancing image. However, neither single regularization norsingle pair matching generates satisfactory SR results. Takingthe advantages of the two techniques, our combinative approachrestores visually pleasing HR images from the compressed LRmeasurements. Note that this combination is nontrivial, butwith adaptive regularization control as an essential coupling,

which guarantees the pair matching accuracy in the learningprocess.

To further verify that the close coupling between regulariza-tion and SR, especially learning-based pair matching, is trulyrequired, we also combine our adaptive regularization with non-learning-based SR techniques, e.g., directional interpolation [4]and backprojection [5]. Experimental results are presented inFig. 13 and Table I. It can be seen that, firstly, the perceptualquality of all SR results is improved after adaptive regulariza-tion. For backprojection and pair matching, the PSNR is also im-proved. (Single interpolation doesn’t see a PSNR gain as PDE



Fig. 14. Super-resolution results of thumbnail web images. From left to right: bicubic interpolation, learning-based pair matching, PDE regularization and ourapproach.

TABLE IPSNR (DB) CORRESPONDING TO FIG. 13

regularization lowers the PSNR). Therefore, in the compressedimage SR scenario, the integration of adaptive regularization isnecessary. Secondly, learning-based pair matching, among thetested SR techniques, achieves both the best perceptual qualityand objective quality results, which in turn indicates the effec-tiveness of our proposed combination.

In Fig. 14 we present some SR results of thumbnail web im-ages from Bing Image Search [34]. One can easily observe adistinct perceptual quality improvement with our method overbicubic interpolation, PDE regularization and learning-basedpair matching.

The computational complexity of our solution is not high. Al-though the pair matching step is relatively time-consuming, itcan be greatly speeded up by the ANN tree searching algorithm

[33]. On the other hand, the database size we used is small com-pared with that generally required in learning-based SR withoutcompression, and it can be even smaller for real-time applica-tion according to Fig. 8. The run time of our algorithm is testedon a Pentium IV 3.0G PC, and it is able to upscale a thumbnailimage sized 160 160 in less than 1 second on average. There-fore, this technique can serve as a useful online enlarge-previewtool for image search engines.

B. Video Results

Our solution for compressed video SR is tested on a variety ofweb videos downloaded from YouTube [35]. They are generallyin a 320 240 resolution but with different degradation levels.We perform a uniform on them, still using the abovedatabase. In the pair matching stage, the candidate number ofthe enhancing patch , and the SAD threshold .

Fig. 15 shows three frames extracted from a super-resolvedweb cartoon video. This result demonstrates the effectivenessof our solution in three aspects. First, the total iteration numberof regularization, as enclosed in the caption, is appropriatelydependent on the degradation level of each frame. Second, theprimitive enhancing images preserve both temporal and spatial



Fig. 15. Super-resolution result of a web video. Top: bicubic interpolation, middle: primitive enhancing images, and bottom: our approach.

Fig. 16. Super-resolution result of an image with rich texture regions.Left: bicubic interpolation, and right: our approach.

consistency due to coherency optimization. Last, the combina-tion of adaptive regularization and learning-based pair matchingsteadily improve the perceptual quality of directly interpolatedvideos, even when severe compression artifacts and fast mo-tions are presented. (Please see the electronic version for bettervisualization.)

C. Applicability Discussion

In general, our method is able to restore HR images/videosfrom compressed LR measurements with different content anddegradation levels. However, it still has certain limitations in ap-plication. For images with rich texture regions, neither PDE reg-ularization nor learning-based primitive enhancing works well.In this case, our method may not give significant perceptualquality improvement. Fig. 16 shows an example.

In addition, the upsampling factor in implementation needsnot to be exactly the same as that in the database. The perfor-mance will not be impacted much if these two factors are closeto each other. Only the case when the upsampling factor in im-plementation is much smaller than that in the database shouldbe avoided (too many high-frequency details may be added andthe resulting image may look noisy). We suggest a database withupsampling factor of 3 can deal with most cases.

VI. CONCLUSION

In this paper, we present a robust single-image SR methodin the compression scenario, which is competent for simul-taneously increasing the resolution and perceptual qualityof web image/video with different content and degradationlevels. Our method combines adaptive PDE regularization withlearning-based pair matching to eliminate the compressionartifacts and meanwhile best preserve and enhance the high-fre-quency details. This method can be naturally extended to videowith certain interframe interaction and simple spatio-temporalcoherency optimization. Experimental results, including bothoffline and online tests, validate the effectiveness of our method.Due to its robust performance and low complexity, our solutionprovides a practical enlarge-preview tool for thumbnail webimages, especially those provided by image search engines; itmay also be applied to video resizing for online video websites,in case more powerful computational resources (e.g., GPU) areavailable.



REFERENCES

[1] R. G. Keys, “Cubic convolution interpolation for digital image pro-cessing,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no.12, pp. 1153–1160, Dec. 1981.

[2] J. Allebach and P. W. Wong, “Edge-directed interpolation,” in Proc.IEEE Int. Conf. Image Processing, 1996, vol. 3, pp. 707–710.

[3] L. Xin and M. T. Orchard, “New edge-directed interpolation,” IEEETrans. Image Processing, vol. 10, no. 10, pp. 1521–1527, Oct. 2001.

[4] Z. Xiong, X. Sun, and F. Wu, “Fast directional image interpolator withdifference projection,” in Proc. IEEE Int. Conf. Multimedia & Expo,2009, pp. 81–84.

[5] M. Irani and S. Peleg, “Motion analysis for image enhancement: Reso-lution, occlusion and transparency,” J. Vis. Commun. Image Represent.,vol. 4, pp. 324–335, Dec. 1993.

[6] B. S. Morse and D. Schwartzwald, “Image magnification usinglevel-set reconstruction,” in Proc. IEEE Conf. Computer Vision andPattern Recognition, 2001, pp. 333–340.

[7] C. B. Atkins, C. A. Bouman, and J. P. Allebach, “Optimal imagescaling using pixel classification,” in Proc. IEEE Int. Conf. ImageProcessing, 2001, pp. 864–867.

[8] S. Baker and T. Kanade, “Limits on super-resolution and how to breakthem,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 2, no. 9, pp.1167–1183, Sep. 2002.

[9] C. Liu, H. Y. Shum, and C. S. Zhang, “A two-step approach to hal-lucinating faces: Global parametric model and local non-parametricmodel,” in Proc. IEEE Conf. Computer Vision and Pattern Recogni-tion, 2001, pp. 192–198.

[10] W. T. Freeman and E. C. Pasztor, “Learning low-level vision,” in Proc.IEEE Int. Conf. Computer Vision, 1999, pp. 1182–1189.

[11] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56–65,Mar.–Apr. 2002.

[12] J. Sun, N. Zheng, H. Tao, and H. Shum, “Image hallucination withprimal sketch priors,” in Proc. IEEE Conf. Computer Vision and PatternRecognition, 2003, pp. 729–736.

[13] H. Chang, D. Yeung, and Y. Xiong, “Super-resolution throughneighbor embedding,” in Proc. IEEE Conf. Computer Vision andPattern Recognition, 2004, pp. 275–282.

[14] J. Yang, J. Wright, Y. Ma, and T. Huang, “Image super-resolution assparse representation of raw image patches,” in Proc. IEEE Conf. Com-puter Vision and Pattern Recognition, 2008, pp. 1–8.

[15] W. Fan and D. Yeung, “Image hallucination using neighbor embeddingover visual primitive manifolds,” in Proc. IEEE Conf. Computer Visionand Pattern Recognition, 2007, pp. 1–7.

[16] L. Ma, Y. Zhang, Y. Lu, F. Wu, and D. Zhao, “Three-tiered networkmodel for image hallucination,” in Proc. IEEE Int. Conf. Image Pro-cessing, 2008, pp. 357–360.

[17] Z. Xiong, X. Sun, and F. Wu, “Image hallucination with feature en-hancement,” in Proc. IEEE Conf. Computer Vision and Pattern Recog-nition Workshops, 2009, pp. 2074–2081.

[18] R. R. Schultz and R. L. Stevenson, “Extraction of high-resolutionframes from video sequences,” IEEE Trans. Image Process., vol. 5,no. 6, pp. 996–1011, Jun. 1996.

[19] A. J. Patti, M. I. Sezan, and A. M. Tekalp, “Super-resolution videoreconstruction with arbitrary sampling lattices and nonzero aperturetime,” IEEE Trans. Image Process., vol. 6, no. 8, pp. 1064–1076, Aug.1997.

[20] N. R. Shah and A. Zakhor, “Resolution enhancement of color videosequences,” IEEE Trans. Image Process., vol. 8, no. 6, pp. 879–885,Jun. 1999.

[21] P. E. Eren, M. I. Sezan, and A. M. Tekalp, “Robust, object based highresolution image reconstruction from low resolution video,” IEEETrans. Image Process., vol. 6, no. 10, pp. 1446–1451, Oct. 1997.

[22] B. C. Tom and A. K. Katsaggelos, “Resolution enhancement of mono-chrome and color video using motion compensation,” IEEE Trans.Image Process., vol. 10, no. 2, pp. 278–287, Feb. 2001.

[23] Y. Altunbasak, A. J. Patti, and R. M. Mersereau, “Super-resolution stilland video reconstruction from MPEG coded video,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 12, no. 4, pp. 217–226, Apr. 2002.

[24] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, “Super-resolutionreconstruction of compressed video using transform-domain statistics,”IEEE Trans. Image Process., vol. 13, no. 1, pp. 33–43, Jan. 2004.

[25] C. A. Segall, A. K. Katsaggelos, R. Molina, and J. Mateos, “Bayesianresolution enhancement of compressed video,” IEEE Trans. ImageProcess., vol. 13, no. 7, pp. 898–911, Jul. 2004.

[26] Z. Xiong, X. Sun, and F. Wu, “Super-resolution for low quality thumb-nail images,” in Proc. IEEE Int. Conf. Multimedia & Expo, 2008, pp.181–184.

[27] G. Aubert and P. Kornprobst, “Mathematical problems in image pro-cessing: Partial differential equations and the calculus of variations,”in Applied Mathematical Sciences. New York: Springer-Verlag, Jan.2002.

[28] G. Sapiro, Geometric Partial Differential Equations and Image Anal-ysis. Cambridge, U.K.: Cambridge Univ. Press, 2001.

[29] J. Weickert, Anisotropic Diffusion in Image Processing. Stuttgart,Germany: Teubner-Verlag, 1998.

[30] D. Tschumperle and R. Deriche, “Vector-valued image regularizationwith PDEs: A common framework for different applications,” IEEETrans. Pattern Anal. Mach. Intell., vol. 2, no. 4, pp. 506–517, Apr.2005.

[31] G. Aubert and P. Kornprobst, “Mathematical problems in image pro-cessing: Partial differential equations and the calculus of variations,” inApplied Math. Sciences. New York: Springer-Verlag, Jan. 2002.

[32] P. Perona and J. Malik, “Detecting and localizing edges composed ofsteps, peaks and roofs,” in Proc. IEEE Int. Conf. Computer Vision,1990, pp. 52–57.

[33] D. Mount and S. Arya, Ann: Library for Approximate NearestNeighbor Searching [Online]. Available: http://www.cs.umd.edu/mount/ANN/

[34] [Online]. Available: http://www.bing.com/images?FORM=Z9LH3[35] [Online]. Available: http://www.youtube.com/[36] [Online]. Available: http://www.kodak.com/digitalImaging/samples/

imageIntro.shtml[37] Z. Xiong, X. Sun, and F. Wu, “Web cartoon video hallucination,” in

Proc. IEEE Int. Conf. Image Processing, 2009, pp. 3941–3944.

Zhiwei Xiong received the B.S. degree in electronicengineering from University of Science and Tech-nology of China (USTC), Hefei, China, in 2006.He is now pursuing the Ph.D. degree in electronicengineering at USTC.

He has been a research intern in MicrosoftResearch Asia since 2007, where his research con-centrates on image/video compression, image/videoprocessing and computer vision. He is particularlyinterested in the inverse problems such as imagesuper-resolution, regularization, and restoration.

Xiaoyan Sun (M’04) received the B.S., M.S., andPh.D. degrees in computer science from Harbin In-stitute of Technology, Harbin, China, in 1997, 1999,and 2003, respectively.

She joined Microsoft Research Asia, Beijing,China, as an Associate Researcher in 2003 and hasbeen a Researcher since 2006. She has authored orco-authored over 30 conference and journal papersand submitted several proposals and contributedtechniques to MPEG-4 and H.264. Her researchinterests include video/image compression, video

streaming, and multimedia processing.

Feng Wu (M’99–SM’06) received the B.S. degree inelectrical engineering from the University of Xi’anElectrical Science and Technology, Xi’an, China, in1992, and the M.S. and Ph.D. degrees in computerscience from Harbin Institute of Technology, Harbin,China, in 1996 and in 1999, respectively.

He joined Microsoft Research Asia, Beijing,China, as an Associate Researcher in 1999 and waspromoted to Lead Researcher in 2006. He has playeda major role in Internet Media Group to developscalable video coding and streaming technologies.

He has authored or co-authored over 100 papers in video compression andcontributed some technologies to MPEG-4 and H.264. His research interestsinclude video and audio compression, multimedia transmission, and videosegmentation.


Date post:	08-Apr-2018
Category:	Documents
Upload:	lycong
View:	237 times
Download:	2 times

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. …€¦IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19,...

Documents