Image Quality Assessment: From Error Visibility to ...z70wang/publications/ssim.pdf · IEEE...

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004 1

Image Quality Assessment: From Error Visibility toStructural Similarity

Zhou Wang, Member, IEEE, Alan C. Bovik, Fellow, IEEEHamid R. Sheikh, Student Member, IEEE, and Eero P. Simoncelli, Senior Member, IEEE

Abstract—Objective methods for assessing perceptual im-age quality have traditionally attempted to quantify the vis-ibility of errors between a distorted image and a referenceimage using a variety of known properties of the humanvisual system. Under the assumption that human visualperception is highly adapted for extracting structural infor-mation from a scene, we introduce an alternative frameworkfor quality assessment based on the degradation of struc-tural information. As a specific example of this concept,we develop a Structural Similarity Index and demonstrateits promise through a set of intuitive examples, as well ascomparison to both subjective ratings and state-of-the-artobjective methods on a database of images compressed withJPEG and JPEG2000.1

Keywords—Error sensitivity, human visual system (HVS),image coding, image quality assessment, JPEG, JPEG2000,perceptual quality, structural information, structural simi-larity (SSIM).

I. Introduction

Digital images are subject to a wide variety of distor-tions during acquisition, processing, compression, storage,transmission and reproduction, any of which may resultin a degradation of visual quality. For applications inwhich images are ultimately to be viewed by human be-ings, the only “correct” method of quantifying visual im-age quality is through subjective evaluation. In practice,however, subjective evaluation is usually too inconvenient,time-consuming and expensive. The goal of research in ob-jective image quality assessment is to develop quantitativemeasures that can automatically predict perceived imagequality.

An objective image quality metric can play a variety ofroles in image processing applications. First, it can beused to dynamically monitor and adjust image quality. Forexample, a network digital video server can examine thequality of video being transmitted in order to control andallocate streaming resources. Second, it can be used tooptimize algorithms and parameter settings of image pro-cessing systems. For instance, in a visual communication

The work of Z. Wang and E. P. Simoncelli was supported by theHoward Hughes Medical Institute. The work of A. C. Bovik and H.R. Sheikh was supported by the National Science Foundation and theTexas Advanced Research Program. Z. Wang and E. P. Simoncelli arewith the Howard Hughes Medical Institute, the Center for Neural Sci-ence and the Courant Institute for Mathematical Sciences, New YorkUniversity, New York, NY 10012 USA (email: [email protected];[email protected]). A. C. Bovik and H. R. Sheikh are with theLaboratory for Image and Video Engineering (LIVE), Departmentof Electrical and Computer Engineering, The University of Texasat Austin, Austin, TX 78712 USA (email: [email protected];[email protected]).

1A MatLab implementation of the proposed algorithm is availableonline at http://www.cns.nyu.edu/~lcv/ssim/.

system, a quality metric can assist in the optimal design ofprefiltering and bit assignment algorithms at the encoderand of optimal reconstruction, error concealment and post-filtering algorithms at the decoder. Third, it can be usedto benchmark image processing systems and algorithms.

Objective image quality metrics can be classified accord-ing to the availability of an original (distortion-free) image,with which the distorted image is to be compared. Mostexisting approaches are known as full-reference, meaningthat a complete reference image is assumed to be known. Inmany practical applications, however, the reference imageis not available, and a no-reference or “blind” quality as-sessment approach is desirable. In a third type of method,the reference image is only partially available, in the formof a set of extracted features made available as side infor-mation to help evaluate the quality of the distorted image.This is referred to as reduced-reference quality assessment.This paper focuses on full-reference image quality assess-ment.

The simplest and most widely used full-reference qualitymetric is the mean squared error (MSE), computed by aver-aging the squared intensity differences of distorted and ref-erence image pixels, along with the related quantity of peaksignal-to-noise ratio (PSNR). These are appealing becausethey are simple to calculate, have clear physical meanings,and are mathematically convenient in the context of opti-mization. But they are not very well matched to perceivedvisual quality (e.g., [1]–[9]). In the last three decades, agreat deal of effort has gone into the development of qualityassessment methods that take advantage of known charac-teristics of the human visual system (HVS). The majorityof the proposed perceptual quality assessment models havefollowed a strategy of modifying the MSE measure so thaterrors are penalized in accordance with their visibility. Sec-tion II summarizes this type of error-sensitivity approachand discusses its difficulties and limitations. In Section III,we describe a new paradigm for quality assessment, basedon the hypothesis that the HVS is highly adapted for ex-tracting structural information. As a specific example, wedevelop a measure of structural similarity that compares lo-cal patterns of pixel intensities that have been normalizedfor luminance and contrast. In Section IV, we compare thetest results of different quality assessment models againsta large set of subjective ratings gathered for a database of344 images compressed with JPEG and JPEG2000.

2 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004

Reference signal

Distorted signal

Quality/ Distortion Measure

Channel Decomposition

Error Normalization

.

.

.

Error Pooling

Pre- processing

CSF Filtering

.

.

.

Fig. 1. A prototypical quality assessment system based on error sensitivity. Note that the CSF feature can be implemented either as aseparate stage (as shown) or within “Error Normalization”.

II. Image Quality Assessment Based on ErrorSensitivity

An image signal whose quality is being evaluated canbe thought of as a sum of an undistorted reference signaland an error signal. A widely adopted assumption is thatthe loss of perceptual quality is directly related to the vis-ibility of the error signal. The simplest implementationof this concept is the MSE, which objectively quantifiesthe strength of the error signal. But two distorted imageswith the same MSE may have very different types of errors,some of which are much more visible than others. Mostperceptual image quality assessment approaches proposedin the literature attempt to weight different aspects of theerror signal according to their visibility, as determined bypsychophysical measurements in humans or physiologicalmeasurements in animals. This approach was pioneeredby Mannos and Sakrison [10], and has been extended bymany other researchers over the years. Reviews on imageand video quality assessment algorithms can be found in[4], [11]–[13].

A. Framework

Fig. 1 illustrates a generic image quality assessmentframework based on error sensitivity. Most perceptualquality assessment models can be described with a simi-lar diagram, although they differ in detail. The stages ofthe diagram are as follows:

Pre-processing. This stage typically performs a varietyof basic operations to eliminate known distortions from theimages being compared. First, the distorted and referencesignals are properly scaled and aligned. Second, the signalmight be transformed into a color space (e.g., [14]) that ismore appropriate for the HVS. Third, quality assessmentmetrics may need to convert the digital pixel values storedin the computer memory into luminance values of pixels onthe display device through pointwise nonlinear transforma-tions. Fourth, a low-pass filter simulating the point spreadfunction of the eye optics may be applied. Finally, the ref-erence and the distorted images may be modified using anonlinear point operation to simulate light adaptation.

CSF Filtering. The contrast sensitivity function (CSF)describes the sensitivity of the HVS to different spatial andtemporal frequencies that are present in the visual stim-ulus. Some image quality metrics include a stage thatweights the signal according to this function (typically im-plemented using a linear filter that approximates the fre-

quency response of the CSF). However, many recent met-rics choose to implement CSF as a base-sensitivity normal-ization factor after channel decomposition.

Channel Decomposition. The images are typically sep-arated into subbands (commonly called “channels” in thepsychophysics literature) that are selective for spatial andtemporal frequency as well as orientation. While somequality assessment methods implement sophisticated chan-nel decompositions that are believed to be closely re-lated to the neural responses in the primary visual cortex[2], [15]–[19], many metrics use simpler transforms such asthe discrete cosine transform (DCT) [20], [21] or separa-ble wavelet transforms [22]–[24]. Channel decompositionstuned to various temporal frequencies have also been re-ported for video quality assessment [5], [25].

Error Normalization. The error (difference) between thedecomposed reference and distorted signals in each channelis calculated and normalized according to a certain maskingmodel, which takes into account the fact that the presenceof one image component will decrease the visibility of an-other image component that is proximate in spatial or tem-poral location, spatial frequency, or orientation. The nor-malization mechanism weights the error signal in a channelby a space-varying visibility threshold [26]. The visibilitythreshold at each point is calculated based on the energyof the reference and/or distorted coefficients in a neighbor-hood (which may include coefficients from within a spatialneighborhood of the same channel as well as other chan-nels) and the base-sensitivity for that channel. The normal-ization process is intended to convert the error into units ofjust noticeable difference (JND). Some methods also con-sider the effect of contrast response saturation (e.g., [2]).

Error Pooling. The final stage of all quality metrics mustcombine the normalized error signals over the spatial extentof the image, and across the different channels, into a singlevalue. For most quality assessment methods, pooling takesthe form of a Minkowski norm:

E ({el,k}) =

(∑

l

∑

k

|el,k|β)1/β

(1)

where el,k is the normalized error of the k-th coefficient inthe l-th channel, and β is a constant exponent typicallychosen to lie between 1 and 4. Minkowski pooling may beperformed over space (index k) and then over frequency(index l), or vice-versa, with some non-linearity betweenthem, or possibly with different exponents β. A spatial

WANG, BOVIK, SHEIKH & SIMONCELLI: IMAGE QUALITY ASSESSMENT: FROM ERROR VISIBILITY TO SSIM 3

map indicating the relative importance of different regionsmay also be used to provide spatially variant weighting[25], [27], [28].

B. Limitations

The underlying principle of the error-sensitivity ap-proach is that perceptual quality is best estimated by quan-tifying the visibility of errors. This is essentially accom-plished by simulating the functional properties of earlystages of the HVS, as characterized by both psychophysicaland physiological experiments. Although this bottom-upapproach to the problem has found nearly universal ac-ceptance, it is important to recognize its limitations. Inparticular, the HVS is a complex and highly non-linear sys-tem, but most models of early vision are based on linear orquasi-linear operators that have been characterized usingrestricted and simplistic stimuli. Thus, error-sensitivity ap-proaches must rely on a number of strong assumptions andgeneralizations. These have been noted by many previousauthors, and we provide only a brief summary here.

The Quality Definition Problem. The most fundamen-tal problem with the traditional approach is the definitionof image quality. In particular, it is not clear that errorvisibility should be equated with loss of quality, as somedistortions may be clearly visible but not so objectionable.An obvious example would be multiplication of the imageintensities by a global scale factor. The study in [29] alsosuggested that the correlation between image fidelity andimage quality is only moderate.

The Suprathreshold Problem. The psychophysical ex-periments that underlie many error sensitivity models arespecifically designed to estimate the threshold at which astimulus is just barely visible. These measured thresholdvalues are then used to define visual error sensitivity mea-sures, such as the CSF and various masking effects. How-ever, very few psychophysical studies indicate whether suchnear-threshold models can be generalized to characterizeperceptual distortions significantly larger than thresholdlevels, as is the case in a majority of image processing situ-ations. In the suprathreshold range, can the relative visualdistortions between different channels be normalized usingthe visibility thresholds? Recent efforts have been madeto incorporate suprathreshold psychophysics for analyzingimage distortions (e.g., [30]–[34]).

The Natural Image Complexity Problem. Most psy-chophysical experiments are conducted using relativelysimple patterns, such as spots, bars, or sinusoidal gratings.For example, the CSF is typically obtained from thresholdexperiments using global sinusoidal images. The maskingphenomena are usually characterized using a superpositionof two (or perhaps a few) different patterns. But all suchpatterns are much simpler than real world images, whichcan be thought of as a superposition of a much larger num-ber of simple patterns. Can the models for the interactionsbetween a few simple patterns generalize to evaluate in-teractions between tens or hundreds of patterns? Is thislimited number of simple-stimulus experiments sufficientto build a model that can predict the visual quality of

complex-structured natural images? Although the answersto these questions are currently not known, the recently es-tablished Modelfest dataset [35] includes both simple andcomplex patterns, and should facilitate future studies.

The Decorrelation Problem. When one chooses to use aMinkowski metric for spatially pooling errors, one is im-plicitly assuming that errors at different locations are sta-tistically independent. This would be true if the processingprior to the pooling eliminated dependencies in the inputsignals. Empirically, however, this is not the case for linearchannel decomposition methods such as the wavelet trans-form. It has been shown that a strong dependency existsbetween intra- and inter-channel wavelet coefficients of nat-ural images [36],[37]. In fact, state-of-the-art wavelet imagecompression techniques achieve their success by exploitingthis strong dependency [38]–[41]. Psychophysically, variousvisual masking models have been used to account for theinteractions between coefficients [2], [42]. Statistically, ithas been shown that a well-designed nonlinear gain controlmodel, in which parameters are optimized to reduce depen-dencies rather than for fitting data from masking experi-ments, can greatly reduce the dependencies of the trans-form coefficients [43], [44]. In [45], [46], it is shown thatoptimal design of transformation and masking models canreduce both statistical and perceptual dependencies. It re-mains to be seen how much these models can improve theperformance of the current quality assessment algorithms.

The Cognitive Interaction Problem. It is widely knownthat cognitive understanding and interactive visual pro-cessing (e.g., eye movements) influence the perceived qual-ity of images. For example, a human observer will givedifferent quality scores to the same image if s/he is pro-vided with different instructions [4],[30]. Prior informationregarding the image content, or attention and fixation, mayalso affect the evaluation of the image quality [4], [47]. Butmost image quality metrics do not consider these effects,as they are difficult to quantify and not well understood.

III. Structural Similarity Based Image QualityAssessment

Natural image signals are highly structured: Their pixelsexhibit strong dependencies, especially when they are spa-tially proximate, and these dependencies carry importantinformation about the structure of the objects in the visualscene. The Minkowski error metric is based on pointwisesignal differences, which are independent of the underlyingsignal structure. Although most quality measures basedon error sensitivity decompose image signals using lineartransformations, these do not remove the strong dependen-cies, as discussed in the previous section. The motivationof our new approach is to find a more direct way to comparethe structures of the reference and the distorted signals.

A. New Philosophy

In [6] and [9], a new framework for the design of imagequality measures was proposed, based on the assumptionthat the human visual system is highly adapted to extractstructural information from the viewing field. It follows


(a) (c) (b)

(d) (f) (e)

Fig. 2. Comparison of “Boat” images with different types of distortions, all with MSE = 210. (a) Original image (8bits/pixel; croppedfrom 512×512 to 256×256 for visibility); (b) Contrast stretched image, MSSIM = 0.9168; (c) Mean-shifted image, MSSIM = 0.9900; (d)JPEG compressed image, MSSIM = 0.6949; (e) Blurred image, MSSIM = 0.7052; (f) Salt-pepper impulsive noise contaminated image,MSSIM = 0.7748.

that a measure of structural information change can pro-vide a good approximation to perceived image distortion.

This new philosophy can be best understood throughcomparison with the error sensitivity philosophy. First,the error sensitivity approach estimates perceived errorsto quantify image degradations, while the new philosophyconsiders image degradations as perceived changes in struc-tural information. A motivating example is shown in Fig.2, where the original “Boat” image is altered with differentdistortions, each adjusted to yield nearly identical MSErelative to the original image. Despite this, the imagescan be seen to have drastically different perceptual qual-ity. With the error sensitivity philosophy, it is difficultto explain why the contrast-stretched image has very highquality in consideration of the fact that its visual differ-ence from the reference image is easily discerned. But itis easily understood with the new philosophy since nearlyall the structural information of the reference image is pre-served, in the sense that the original information can benearly fully recovered via a simple pointwise inverse linearluminance transform (except perhaps for the very brightand dark regions where saturation occurs). On the otherhand, some structural information from the original im-

age is permanently lost in the JPEG compressed and theblurred images, and therefore they should be given lowerquality scores than the contrast-stretched and mean-shiftedimages.

Second, the error-sensitivity paradigm is a bottom-upapproach, simulating the function of relevant early-stagecomponents in the HVS. The new paradigm is a top-downapproach, mimicking the hypothesized functionality of theoverall HVS. This, on the one hand, avoids the suprathresh-old problem mentioned in the previous section because itdoes not rely on threshold psychophysics to quantify theperceived distortions. On the other hand, the cognitiveinteraction problem is also reduced to a certain extent be-cause probing the structures of the objects being observedis thought of as the purpose of the entire process of visualobservation, including high level and interactive processes.

Third, the problems of natural image complexity anddecorrelation are also avoided to some extent because thenew philosophy does not attempt to predict image qualityby accumulating the errors associated with psychophysi-cally understood simple patterns. Instead, the new philos-ophy proposes to evaluate the structural changes betweentwo complex-structured signals directly.


B. The Structural SIMilarity (SSIM) Index

We construct a specific example of a structural similarityquality measure from the perspective of image formation.A previous instantiation of this approach was made in [6]–[8] and promising results on simple tests were achieved.In this paper, we generalize this algorithm, and provide amore extensive set of validation results.

The luminance of the surface of an object being observedis the product of the illumination and the reflectance, butthe structures of the objects in the scene are independentof the illumination. Consequently, to explore the structuralinformation in an image, we wish to separate the influenceof the illumination. We define the structural information inan image as those attributes that represent the structure ofobjects in the scene, independent of the average luminanceand contrast. Since luminance and contrast can vary acrossa scene, we use the local luminance and contrast for ourdefinition.

The system diagram of the proposed quality assessmentsystem is shown in Fig. 3. Suppose x and y are two non-negative image signals, which have been aligned with eachother (e.g., spatial patches extracted from each image). Ifwe consider one of the signals to have perfect quality, thenthe similarity measure can serve as a quantitative measure-ment of the quality of the second signal. The system sep-arates the task of similarity measurement into three com-parisons: luminance, contrast and structure. First, theluminance of each signal is compared. Assuming discretesignals, this is estimated as the mean intensity:

µx =1N

N∑

i=1

xi . (2)

The luminance comparison function l(x,y) is then a func-tion of µx and µy.

Second, we remove the mean intensity from the signal.In discrete form, the resulting signal x−µx corresponds tothe projection of vector x onto the hyperplane defined by

N∑

i=1

xi = 0 . (3)

We use the standard deviation (the square root of variance)as an estimate of the signal contrast. An unbiased estimatein discrete form is given by

σx =

(1

N − 1

N∑

i=1

(xi − µx)2)1/2

. (4)

The contrast comparison c(x,y) is then the comparison ofσx and σy.

Third, the signal is normalized (divided) by its own stan-dard deviation, so that the two signals being comparedhave unit standard deviation. The structure comparisons(x,y) is conducted on these normalized signals (x−µx)/σx

and (y− µy)/σy.

Finally, the three components are combined to yield anoverall similarity measure:

S(x,y) = f(l(x,y), c(x,y), s(x,y)) . (5)

An important point is that the three components are rela-tively independent. For example, the change of luminanceand/or contrast will not affect the structures of images.

In order to complete the definition of the similarity mea-sure in Eq. (5), we need to define the three functionsl(x,y), c(x,y), s(x,y), as well as the combination func-tion f(·). We also would like the similarity measure tosatisfy the following conditions:

1. Symmetry: S(x,y) = S(y,x);2. Boundedness: S(x,y) ≤ 1;3. Unique maximum: S(x,y) = 1 if and only if x = y (indiscrete representations, xi = yi for all i = 1, 2, · · · , N);

For luminance comparison, we define

l(x,y) =2 µx µy + C1

µ2x + µ2

y + C1. (6)

where the constant C1 is included to avoid instability whenµ2

x + µ2y is very close to zero. Specifically, we choose

C1 = (K1 L)2 , (7)

where L is the dynamic range of the pixel values (255 for8-bit grayscale images), and K1 ¿ 1 is a small constant.Similar considerations also apply to contrast comparisonand structure comparison described later. Eq. (6) is easilyseen to obey the three properties listed above.

Equation (6) is also qualitatively consistent with We-ber’s law, which has been widely used to model light adap-tation (also called luminance masking) in the HVS. Ac-cording to Weber’s law, the magnitude of a just-noticeableluminance change ∆I is approximately proportional to thebackground luminance I for a wide range of luminance val-ues. In other words, the HVS is sensitive to the relativeluminance change, and not the absolute luminance change.Letting R represent the size of luminance change relativeto background luminance, we rewrite the luminance of thedistorted signal as µy = (1 + R)µx. Substituting this intoEq. (6) gives

l(x,y) =2(1 + R)

1 + (1 + R)2 + C1/µ2x

. (8)

If we assume C1 is small enough (relative to µ2x) to be

ignored, then l(x,y) is a function only of R, qualitativelyconsistent with Weber’s law.

The contrast comparison function takes a similar form:

c(x,y) =2 σx σy + C2

σ2x + σ2

y + C2, (9)

where C2 = (K2 L)2, and K2 ¿ 1. This definition againsatisfies the three properties listed above. An importantfeature of this function is that with the same amount of


Luminance Comparison

Contrast Comparison

Structure Comparison

Combination Similarity Measure Luminance

Measurement

+ Contrast Measurement

_

+

Signal y

Luminance Measurement

+ Contrast Measurement

_

+

Signal x

_ . .

_ . .

Fig. 3. Diagram of the structural similarity (SSIM) measurement system.

contrast change ∆σ = σy−σx, this measure is less sensitiveto the case of high base contrast σx than low base contrast.This is consistent with the contrast masking feature of theHVS.

Structure comparison is conducted after luminance sub-traction and variance normalization. Specifically, we as-sociate the two unit vectors (x− µx)/σx and (y− µy)/σy,each lying in the hyperplane defined by Eq. (3), with thestructure of the two images. The correlation (inner prod-uct) between these is a simple and effective measure toquantify the structural similarity. Notice that the corre-lation between (x− µx)/σx and (y− µy)/σy is equivalentto the correlation coefficient between x and y. Thus, wedefine the structure comparison function as follows:

s(x,y) =σxy + C3

σx σy + C3. (10)

As in the luminance and contrast measures, we have intro-duced a small constant in both denominator and numera-tor. In discrete form, σxy can be estimated as:

σxy =1

N − 1

N∑

i=1

(xi − µx)(yi − µy) . (11)

Geometrically, the correlation coefficient corresponds tothe cosine of the angle between the vectors x − µx andy−µy. Note also that s(x,y) can take on negative values.

Finally, we combine the three comparisons of Eqs. (6),(9) and (10) and name the resulting similarity measure theStructural SIMilarity (SSIM) index between signals x andy:

SSIM(x,y) = [l(x,y)]α · [c(x,y)]β · [s(x,y)]γ , (12)

where α > 0, β > 0 and γ > 0 are parameters used toadjust the relative importance of the three components.It is easy to verify that this definition satisfies the threeconditions given above. In order to simplify the expression,we set α = β = γ = 1 and C3 = C2/2 in this paper. This

results in a specific form of the SSIM index:

SSIM(x,y) =(2 µx µy + C1) (2 σxy + C2)

(µ2x + µ2

y + C1) (σ2x + σ2

y + C2). (13)

The “universal quality index” (UQI) defined in [6], [7] cor-responds to the special case that C1 = C2 = 0, which pro-duces unstable results when either (µ2

x + µ2y) or (σ2

x + σ2y)

is very close to zero.The relationship between the SSIM index and more tra-

ditional quality metrics may be illustrated geometrically ina vector space of image components. These image com-ponents can be either pixel intensities or other extractedfeatures such as transformed linear coefficients. Fig. 4shows equal-distortion contours drawn around three differ-ent example reference vectors, each of which represents thelocal content of one reference image. For the purpose ofillustration, we show only a two-dimensional space, but ingeneral the dimensionality should match the number of im-age components being compared. Each contour representsa set of images with equal distortions relative to the en-closed reference image. Fig. 4(a) shows the result for asimple Minkowski metric. Each contour has the same sizeand shape (a circle here, as we are assuming an exponent of2). That is, perceptual distance corresponds to Euclideandistance. Fig. 4(b) shows a Minkowski metric in whichdifferent image components are weighted differently. Thiscould be, for example, weighting according to the CSF, asis common in many models. Here the contours are ellipses,but still are all the same size. These are shown alignedwith the axes, but in general could be tilted to any fixedorientation.

Many recent models incorporate contrast masking be-haviors, which has the effect of rescaling the equal-distortion contours according to the signal magnitude, asshown in Fig. 4(c). This may be viewed as a type ofadaptive distortion metric: it depends not just on the dif-ference between the signals, but also on the signals them-selves. Fig. 4(d) shows a combination of contrast masking(magnitude weighting) followed by component weighting.


i

j

O

i

j

O

i

j

O

(a) (b) (c)

i

j

O

i

j

O

i

j

O

(d) (e) (f)

Fig. 4. Three example equal-distance contours for different quality metrics. (a) Minkowski error measurement systems; (b) component-weighted Minkowski error measurement systems; (c) magnitude-weighted Minkowski error measurement systems; (d) magnitude andcomponent-weighted Minkowski error measurement systems; (e) the proposed system (a combination of Eqs. (9) and (10)) with moreemphasis on s(x,y); (f) the proposed system (a combination of Eqs. (9) and (10)) with more emphasis on c(x,y). Each image isrepresented as a vector, whose entries are image components. Note: this is an illustration in 2-D space. In practice, the number ofdimensions should be equal to the number of image components used for comparison (e.g, the number of pixels or transform coefficients).

Our proposed method, on the other hand, separately com-putes a comparison of two independent quantities: the vec-tor lengths, and their angles. Thus, the contours will bealigned with the axes of a polar coordinate system. Figs.4(e) and 4(f) show two examples of this, computed with dif-ferent exponents. Again, this may be viewed as an adaptivedistortion metric, but unlike previous models, both the sizeand the shape of the contours are adapted to the underlyingsignal. Some recent models that use divisive normalizationto describe masking effects also exhibit signal-dependentcontour orientations (e.g., [45], [46], [48]), although precisealignment with the axes of a polar coordinate system as inFigs. 4(e) and 4(f) is not observed in these methods.

C. Image Quality Assessment using SSIM index

For image quality assessment, it is useful to apply theSSIM index locally rather than globally. First, image sta-tistical features are usually highly spatially non-stationary.Second, image distortions, which may or may not dependon the local image statistics, may also be space-variant.Third, at typical viewing distances, only a local area inthe image can be perceived with high resolution by the hu-man observer at one time instance (because of the foveationfeature of the HVS [49], [50]). And finally, localized qual-

ity measurement can provide a spatially varying qualitymap of the image, which delivers more information aboutthe quality degradation of the image and may be useful insome applications.

In [6], [7], the local statistics µx, σx and σxy are com-puted within a local 8 × 8 square window, which movespixel-by-pixel over the entire image. At each step, the localstatistics and SSIM index are calculated within the localwindow. One problem with this method is that the re-sulting SSIM index map often exhibits undesirable “block-ing” artifacts. In this paper, we use an 11 × 11 circular-symmetric Gaussian weighting function w = {wi | i =1, 2, · · · , N}, with standard deviation of 1.5 samples, nor-malized to unit sum (

∑Ni=1 wi = 1). The estimates of local

statistics µx, σx and σxy are then modified accordingly as

µx =N∑

i=1

wi xi . (14)

σx =

(N∑

i=1

wi (xi − µx)2)1/2

. (15)

σxy =N∑

i=1

wi (xi − µx)(yi − µy) . (16)


With such a windowing approach, the quality maps exhibita locally isotropic property. Throughout this paper, theSSIM measure uses the following parameter settings: K1=0.01; K2= 0.03. These values are somewhat arbitrary, butwe find that in our current experiments, the performance ofthe SSIM index algorithm is fairly insensitive to variationsof these values.

In practice, one usually requires a single overall qual-ity measure of the entire image. We use a mean SSIM(MSSIM) index to evaluate the overall image quality:

MSSIM(X,Y) =1M

M∑

j=1

SSIM(xj ,yj) , (17)

where X and Y are the reference and the distorted images,respectively; xj and yj are the image contents at the j-thlocal window; and M is the number of local windows in theimage. Depending on the application, it is also possibleto compute a weighted average of the different samples inthe SSIM index map. For example, region-of-interest imageprocessing systems may give different weights to differentsegmented regions in the image. As another example, it hasbeen observed that different image textures attract humanfixations with varying degrees (e.g., [51], [52]). A smoothlyvarying foveated weighting model (e.g., [50]) can be em-ployed to define the weights. In this paper, however, weuse uniform weighting. A MatLab implementation of theSSIM index algorithm is available online at [53].

IV. Experimental Results

Many image quality assessment algorithms have beenshown to behave consistently when applied to distorted im-ages created from the same original image, using the sametype of distortions (e.g., JPEG compression). However, theeffectiveness of these models degrades significantly whenapplied to a set of images originating from different refer-ence images, and/or including a variety of different typesof distortions. Thus, cross-image and cross-distortion testsare critical in evaluating the effectiveness of an image qual-ity metric. It is impossible to show a thorough set of suchexamples, but the images in Fig. 2 provide an encouragingstarting point for testing the cross-distortion capability ofthe quality assessment algorithms. The MSE and MSSIMmeasurement results are given in the figure caption. Obvi-ously, MSE performs very poorly in this case. The MSSIMvalues exhibit much better consistency with the qualitativevisual appearance.

A. Best-case/worst-case Validation

We also have developed a more efficient methodology forexamining the relationship between our objective measureand perceived quality. Starting from a distorted image, weascend/descend the gradient of MSSIM while constrainingthe MSE to remain equal to that of the initial distortedimage. Specifically, we iterate the following two linear-algebraic steps:

(1) Y → Y± λ P (X,Y) ~∇YMSSIM(X,Y)

(a) (b)

(c)

(d)

gradient ascent

add noise

original image

gradient descent

Fig. 5. Best- and worst-case MSSIM images, with identical MSE.These are computed by gradient ascent/descent iterative searchon MSSIM measure, under the constraint of MSE = 2500. (a)Original image (100×100, 8bits/pixel, cropped from the “Boat”image); (b) Initial image, contaminated with Gaussian whitenoise (MSSIM = 0.3021); (c) Maximum MSSIM image (MSSIM= 0.9337); (d) Minimum MSSIM image (MSSIM = −0.5411).

(2) Y → X + σ E(X,Y)

where σ is the square root of the constrained MSE, λ con-trols the step size, E(X,Y) is a unit vector defined by

E(X,Y) =Y−X||Y−X|| ,

and P (X,Y) is a projection operator:

P (X,Y) = I− E(X,Y) ET (X,Y),

with I the identity operator. MSSIM is differentiable andthis procedure converges to a local maximum/minimumof the objective measure. Visual inspection of these best-and worst-case images, along with the initial distorted im-age, provides a visual indication of the types of distortiondeemed least/most important by the objective measure.


Therefore, it is an expedient and direct method for reveal-ing perceptual implications of the quality measure. Anexample is shown in Fig. 5, where the initial image is con-taminated with Gaussian white noise. It can be seen thatthe local structures of the original image are very well pre-served in the maximal MSSIM image. On the other hand,the image structures are changed dramatically in the worst-case MSSIM image, in some cases reversing contrast.

B. Test on JPEG and JPEG2000 Image Database

We compare the cross-distortion and cross-image perfor-mances of different quality assessment models on an imagedatabase composed of JPEG and JPEG2000 compressedimages. Twenty-nine high-resolution 24 bits/pixel RGBcolor images (typically 768×512 or similar size) were com-pressed at a range of quality levels using either JPEG orJPEG2000, producing a total of 175 JPEG images and169 JPEG2000 images. The bit rates were in the rangeof 0.150 to 3.336 and 0.028 to 3.150 bits/pixel, respec-tively, and were chosen non-uniformly such that the result-ing distribution of subjective quality scores was approx-imately uniform over the entire range. Subjects viewedthe images from comfortable seating distances (this dis-tance was only moderately controlled, to allow the data toreflect natural viewing conditions), and were asked to pro-vide their perception of quality on a continuous linear scalethat was divided into five equal regions marked with ad-jectives “Bad”, “Poor”, “Fair”, “Good” and “Excellent”.Each JPEG and JPEG2000 compressed image was viewedby 13 ∼ 20 subjects and 25 subjects, respectively. Thesubjects were mostly male college students.

Raw scores for each subject were normalized by the meanand variance of scores for that subject (i.e., raw values wereconverted to Z-scores [54]) and then the entire data set wasrescaled to fill the range from 1 to 100. Mean opinion scores(MOSs) were then computed for each image, after removingoutliers (most subjects had no outliers). The average stan-dard deviations (for each image) of the subjective scoresfor JPEG, JPEG2000 and all images were 6.00, 7.33 and6.65, respectively. The image database, together with thesubjective score and standard deviation for each image, hasbeen made available on the Internet at [55].

The luminance component of each JPEG and JPEG2000compressed image is averaged over local 2× 2 window anddownsampled by a factor of 2 before the MSSIM value iscalculated. Our experiments with the current dataset showthat the use of the other color components does not signif-icantly change the performance of the model, though thisshould not be considered generally true for color imagequality assessment. Unlike many other perceptual imagequality assessment approaches, no specific training proce-dure is employed before applying the proposed algorithmto the database, because the proposed method is intendedfor general-purpose image quality assessment (as opposedto image compression alone).

Figs. 6 and 7 show some example images from thedatabase at different quality levels, together with theirSSIM index maps and absolute error maps. Note that

at low bit rate, the coarse quantization in JPEG andJPEG2000 algorithms often results in smooth representa-tions of fine-detail regions in the image (e.g., the tiles inFig.6(d) and the trees in Fig.7(d)). Compared with othertypes of regions, these regions may not be worse in termsof pointwise difference measures such as the absolute error.However, since the structural information of the image de-tails are nearly completely lost, they exhibit poorer visualquality. Comparing Fig. 6(g) with Fig. 6(j), and Fig.7(g) with 6(j)), we observe that the SSIM index is betterin capturing such poor quality regions. Also notice thatfor images with intensive strong edge structures such asFig. 7(c), it is difficult to reduce the pointwise errors inthe compressed image, even at relatively high bit rate, asexemplified by Fig. 7(l). However, the compressed imagesupplies acceptable perceived quality as shown in Fig. 7(f).In fact, although the visual quality of Fig. 7(f) is betterthan Fig. 7(e), its absolute error map Fig. 7(l) appears tobe worse than Fig. 7(k), as is confirmed by their PSNRvalues. The SSIM index maps, Figs. 7(h) and 7(i), deliverbetter consistency with perceived quality measurement.

The quality assessment models used for comparison in-clude PSNR, the well-known Sarnoff model [56]2, UQI [7]and MSSIM. The scatter plot of MOS versus model pre-diction for each model is shown in Fig. 8. If PSNR is con-sidered as a benchmark method to evaluate the effective-ness of the other image quality metrics, the Sarnoff modelperforms quite well in this test. This is in contrast withprevious published test results (e.g., [57], [58]), where theperformance of most models (including the Sarnoff model)were reported to be statistically equivalent to root meansquared error [57] and PSNR [58]. The UQI method per-forms much better than MSE for the simple cross-distortiontest in [7], [8], but does not deliver satisfactory results inFig. 8. We think the major reason is that at nearly flat re-gions, the denominator of the contrast comparison formulais close to zero, which makes the algorithm unstable. By in-serting the small constants C1 and C2, MSSIM completelyavoids this problem and the scatter slot demonstrates thatit supplies remarkably good prediction of the subjectivescores.

In order to provide quantitative measures on the perfor-mance of the objective quality assessment models, we fol-low the performance evaluation procedures employed in thevideo quality experts group (VQEG) Phase I FR-TV test[58], where four evaluation metrics were used. First, logisticfunctions are used in a fitting procedure to provide a non-linear mapping between the objective/subjective scores.The fitted curves are shown in Fig. 8. In [58], Metric 1is the correlation coefficient between objective/subjectivescores after variance-weighted regression analysis. Metric2 is the correlation coefficient between objective/subjectivescores after non-linear regression analysis. These two met-rics combined, provide an evaluation of prediction accuracy.The third metric is the Spearman rank-order correlation co-

2The JNDmetrix software available online from the Sarnoff Cor-poration, at http://www.sarnoff.com/products_services/video_vision/jndmetrix/.


(a) (c) (b)

(d) (f) (e)

(g) (i) (h)

(j) (l) (k)

Fig. 6. Sample JPEG images compressed to different quality levels (original size: 768×512; cropped to 256×192 for visibility). (a), (b) and(c) are the original “Buildings”, “Ocean” and “Monarch” images, respectively. (d) Compressed to 0.2673 bits/pixel, PSNR = 21.98dB,MSSIM = 0.7118; (e) Compressed to 0.2980 bits/pixel, PSNR = 30.87dB, MSSIM = 0.8886; (f) Compressed to 0.7755 bits/pixel, PSNR= 36.78dB, MSSIM = 0.9898. (g), (h) and (i) show SSIM maps of the compressed images, where brightness indicates the magnitude ofthe local SSIM index (squared for visibility). (j), (k) and (l) show absolute error maps of the compressed images (contrast-inverted foreasier comparison to the SSIM maps).

efficient between the objective/subjective scores. It is con-sidered as a measure of prediction monotonicity. Finally,metric 4 is the outlier ratio (percentage of the number ofpredictions outside the range of ±2 times of the standarddeviations) of the predictions after the non-linear mapping,which is a measure of prediction consistency. More de-tails on these metrics can be found in [58]. In addition tothese, we also calculated the mean absolute prediction error(MAE), and root mean square prediction error (RMS) afternon-linear regression, and weighted mean absolute predic-tion error (WMAE) and weighted root mean square pre-

diction error (WRMS) after variance-weighted regression.The evaluation results for all the models being comparedare given in Table I. For every one of these criteria, MSSIMperforms better than all of the other models being com-pared.

V. Discussion

In this paper, we have summarized the traditionalapproach to image quality assessment based on error-sensitivity, and have enumerated its limitations. We haveproposed the use of structural similarity as an alternative


(a) (c) (b)

(d) (f) (e)

(g) (i) (h)

(j) (l) (k)

Fig. 7. Sample JPEG2000 images compressed to different quality levels (original size: 768×512; cropped to 256×192 for visibility). (a),(b) and (c) are the original “Stream”, “Caps” and “Bikes” images, respectively. (d) Compressed to 0.1896 bits/pixel, PSNR = 23.46dB,MSSIM = 0.7339; (e) Compressed to 0.1982 bits/pixel, PSNR = 34.56dB, MSSIM = 0.9409; (f) Compressed to 1.1454 bits/pixel, PSNR= 33.47dB, MSSIM = 0.9747. (g), (h) and (i) show SSIM maps of the compressed images, where brightness indicates the magnitude ofthe local SSIM index (squared for visibility). (j), (k) and (l) show absolute error maps of the compressed images (contrast-inverted foreasier comparison to the SSIM maps).

motivating principle for the design of image quality mea-sures. To demonstrate our structural similarity concept,we developed an SSIM index and showed that it comparesfavorably with other methods in accounting for our exper-imental measurements of subjective quality of 344 JPEGand JPEG2000 compressed images.

Although the proposed SSIM index method is motivatedfrom substantially different design principles, we see it ascomplementary to the traditional approach. Careful anal-ysis shows that both the SSIM index and several recentlydeveloped divisive-normalization based masking models ex-

hibit input-dependent behavior in measuring signal distor-tions [45], [46], [48]. It seems possible that the two ap-proaches may eventually converge to similar solutions.

There are a number of issues that are worth investigationwith regard to the specific SSIM index of Eq. (12). First,the optimization of the SSIM index for various image pro-cessing algorithms needs to be studied. For example, itmay be employed for rate-distortion optimizations in thedesign of image compression algorithms. This is not aneasy task since Eq. (12) is mathematically more cumber-some than MSE. Second, the application scope of the SSIM


15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

PSNR

MO

S

JPEG images JPEG2000 images Fitting with Logistic Function

0 2 4 6 8 10 120

10

20

30

40

50

60

70

80

90

100

Sarnoff

MO

S


(a) (b)

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

UQI

MO

S


0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

MSSIM

MO

S


(c) (d)

Fig. 8. Scatter plots of subjective mean opinion score (MOS) versus model prediction. Each sample point represents one test image. (a)PSNR; (b) Sarnoff model [56]; (c) UQI [7] (equivalent to MSSIM with square window and K1 = K2 = 0); (d) MSSIM (Gaussian window,K1 = 0.01, K2 = 0.03).

TABLE I

Performance comparison of image quality assessment models. CC: correlation coefficient; MAE: mean absolute error; RMS:

root mean squared error; OR: outlier ratio; WMAE: weighted mean absolute error; WRMS: weighted root mean squared

error; SROCC: Spearman rank-order correlation coefficient

Non-linear Regression Variance-weighted Regression Rank-orderModel CC MAE RMS OR CC WMAE WRMS OR SROCCPSNR 0.905 6.53 8.45 0.157 0.903 6.18 8.26 0.140 0.901Sarnoff 0.956 4.66 5.81 0.064 0.956 4.42 5.62 0.061 0.947UQI 0.866 7.76 9.90 0.189 0.861 7.64 9.79 0.195 0.863

MSSIM 0.967 3.95 5.06 0.041 0.967 3.79 4.87 0.041 0.963

index may not be restricted to image processing. In fact,because it is a symmetric measure, it can be thought of asa similarity measure for comparing any two signals. Thesignals can be either discrete or continuous, and can live ina space of arbitrary dimensionality.

We consider the proposed SSIM indexing approach as aparticular implementation of the philosophy of structuralsimilarity, from an image formation point of view. Underthe same philosophy, other approaches may emerge thatcould be significantly different from the proposed SSIM in-dexing algorithm. Creative investigation of the concepts ofstructural information and structural distortion are likely

to drive the success of these innovations.

VI. Acknowledgement

The authors would like to thank Dr. Jesus Malo and Dr.L. Lu for insightful comments, Dr. Jeffrey Lubin and Dr.Douglas Dixon for providing the Sarnoff JNDmetrix soft-ware, Dr. Philip Corriveau and Dr. John Libert for supply-ing the MatLab routines used in VQEG Phase I FR-TVtest for the regression analysis of subjective/objective datacomparison, and Visual Delights, Inc. for allowing the au-thors to use their images for subjective experiments.


References

[1] B. Girod, “What’s wrong with mean-squared error,” in DigitalImages and Human Vision (A. B. Watson, ed.), pp. 207–220,the MIT press, 1993.

[2] P. C. Teo and D. J. Heeger, “Perceptual image distortion,” inProc. SPIE, vol. 2179, pp. 127–141, 1994.

[3] A. M. Eskicioglu and P. S. Fisher, “Image quality measuresand their performance,” IEEE Trans. Communications, vol. 43,pp. 2959–2965, Dec. 1995.

[4] M. P. Eckert and A. P. Bradley, “Perceptual quality metricsapplied to still image compression,” Signal Processing, vol. 70,pp. 177–200, Nov. 1998.

[5] S. Winkler, “A perceptual distortion metric for digital colorvideo,” Proc. SPIE, vol. 3644, pp. 175–184, 1999.

[6] Z. Wang, Rate scalable foveated image and video communica-tions. PhD thesis, Dept. of ECE, The University of Texas atAustin, Dec. 2001.

[7] Z. Wang and A. C. Bovik, “A universal image quality index,”IEEE Signal Processing Letters, vol. 9, pp. 81–84, Mar. 2002.

[8] Z. Wang, “Demo images and free software for ‘a universal im-age quality index’,” http://anchovy.ece.utexas.edu/~zwang/research/quality_index/demo.html.

[9] Z. Wang, A. C. Bovik, and L. Lu, “Why is image quality as-sessment so difficult,” in Proc. IEEE Int. Conf. Acoust., Speech,and Signal Processing, vol. 4, (Orlando), pp. 3313–3316, May2002.

[10] J. L. Mannos and D. J. Sakrison, “The effects of a visual fidelitycriterion on the encoding of images,” IEEE Trans. InformationTheory, vol. 4, pp. 525–536, 1974.

[11] T. N. Pappas and R. J. Safranek, “Perceptual criteria for im-age quality evaluation,” in Handbook of Image and Video Proc.(A. Bovik, ed.), Academic Press, 2000.

[12] Z. Wang, H. R. Sheikh, and A. C. Bovik, “Objective video qual-ity assessment,” in The Handbook of Video Databases: Designand Applications (B. Furht and O. Marques, eds.), CRC Press,2003.

[13] S. Winkler, “Issues in vision modeling for perceptual video qual-ity assessment,” Signal Processing, vol. 78, pp. 231–252, 1999.

[14] A. B. Poirson and B. A. Wandell, “Appearance of colored pat-terns: pattern-color separability,” Journal of Optical Society ofAmerica A: Optics and Image Science, vol. 10, no. 12, pp. 2458–2470, 1993.

[15] A. B. Watson, “The cortex transform: rapid computation of sim-ulated neural images,” Computer Vision, Graphics, and ImageProcessing, vol. 39, pp. 311–327, 1987.

[16] S. Daly, “The visible difference predictor: An algorithm forthe assessment of image fidelity,” in Digital images and hu-man vision (A. B. Watson, ed.), pp. 179–206, Cambridge, Mas-sachusetts: The MIT Press, 1993.

[17] J. Lubin, “The use of psychophysical data and models in theanalysis of display system performance,” in Digital images andhuman vision (A. B. Watson, ed.), pp. 163–178, Cambridge,Massachusetts: The MIT Press, 1993.

[18] D. J. Heeger and P. C. Teo, “A model of perceptual image fi-delity,” in Proc. IEEE Int. Conf. Image Proc., pp. 343–345,1995.

[19] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J.Heeger, “Shiftable multi-scale transforms,” IEEE Trans. Infor-mation Theory, vol. 38, pp. 587–607, 1992.

[20] A. B. Watson, “DCT quantization matrices visually optimizedfor individual images,” in Proc. SPIE, vol. 1913, pp. 202–216,1993.

[21] A. B. Watson, J. Hu, and J. F. III. McGowan, “DVQ: A dig-ital video quality metric based on human vision,” Journal ofElectronic Imaging, vol. 10, no. 1, pp. 20–29, 2001.

[22] A. B. Watson, G. Y. Yang, J. A. Solomon, and J. Villasenor,“Visibility of wavelet quantization noise,” IEEE Trans. ImageProcessing, vol. 6, pp. 1164–1175, Aug. 1997.

[23] A. P. Bradley, “A wavelet visible difference predictor,” IEEETrans. Image Processing, vol. 5, pp. 717–730, May 1999.

[24] Y. K. Lai and C.-C. J. Kuo, “A Haar wavelet approach to com-pressed image quality measurement,” Journal of Visual Com-munication and Image Representation, vol. 11, pp. 17–40, Mar.2000.

[25] C. J. van den Branden Lambrecht and O. Verscheure, “Percep-tual quality measure using a spatio-temporal model of the humanvisual system,” in Proc. SPIE, vol. 2668, pp. 450–461, 1996.

[26] A. B. Watson and J. A. Solomon, “Model of visual contrastgain control and pattern masking,” Journal of Optical Societyof America, vol. 14, no. 9, pp. 2379–2391, 1997.

[27] W. Xu and G. Hauske, “Picture quality evaluation based on errorsegmentation,” Proc. SPIE, vol. 2308, pp. 1454–1465, 1994.

[28] W. Osberger, N. Bergmann, and A. Maeder, “An automatic im-age quality assessment technique incorporating high level per-ceptual factors,” in Proc. IEEE Int. Conf. Image Proc., pp. 414–418, 1998.

[29] D. A. Silverstein and J. E. Farrell, “The relationship between im-age fidelity and image quality,” in Proc. IEEE Int. Conf. ImageProc., pp. 881–884, 1996.

[30] D. R. Fuhrmann, J. A. Baro, and J. R. Cox Jr., “Experimen-tal evaluation of psychophysical distortion metrics for JPEG-encoded images,” Journal of Electronic Imaging, vol. 4, pp. 397–406, Oct. 1995.

[31] A. B. Watson and L. Kreslake, “Measurement of visual impair-ment scales for digital video,” in Human Vision, Visual Process-ing, and Digital Display, Proc. SPIE, vol. 4299, 2001.

[32] J. G. Ramos and S. S. Hemami, “Suprathreshold wavelet coef-ficient quantization in complex stimuli: psychophysical evalua-tion and analysis,” Journal of the Optical Society of America A,vol. 18, pp. 2385–2397, 2001.

[33] D. M. Chandler and S. S. Hemami, “Additivity models forsuprathreshold distortion in quantized wavelet-coded images,”in Human Vision and Electronic Imaging VII, Proc. SPIE,vol. 4662, Jan. 2002.

[34] J. Xing, “An image processing model of contrast perception anddiscrimination of the human visual system,” in SID Conference,(Boston), May 2002.

[35] A. B. Watson, “Visual detection of spatial contrast patterns:Evaluation of five simple models,” Optics Express, vol. 6, pp. 12–33, Jan. 2000.

[36] E. P. Simoncelli, “Statistical models for images: Compression,restoration and synthesis,” in Proc 31st Asilomar Conf on Sig-nals, Systems and Computers, (Pacific Grove, CA), pp. 673–678,IEEE Computer Society, November 1997.

[37] J. Liu and P. Moulin, “Information-theoretic analysis of inter-scale and intrascale dependencies between image wavelet coeffi-cients,” IEEE Trans. Image Processing, vol. 10, pp. 1647–1658,Nov. 2001.

[38] J. M. Shapiro, “Embedded image coding using zerotrees ofwavelets coefficients,” IEEE Trans. Signal Processing, vol. 41,pp. 3445–3462, Dec. 1993.

[39] A. Said and W. A. Pearlman, “A new, fast, and efficient im-age codec based on set partitioning in hierarchical trees,” IEEETrans. Circuits and Systems for Video Tech., vol. 6, pp. 243–250, June 1996.

[40] R. W. Buccigrossi and E. P. Simoncelli, “Image compression viajoint statistical characterization in the wavelet domain,” IEEETrans. Image Processing, vol. 8, pp. 1688–1701, December 1999.

[41] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Com-pression Fundamentals, Standards, and Practice. Kluwer Aca-demic Publishers, 2001.

[42] J. M. Foley and G. M. Boynton, “A new model of human lu-minance pattern vision mechanisms: Analysis of the effects ofpattern orientation, spatial phase, and temporal frequency,”in Computational Vision Based on Neurobiology, Proc. SPIE(T. A. Lawton, ed.), vol. 2054, 1994.

[43] O. Schwartz and E. P. Simoncelli, “Natural signal statistics andsensory gain control,” Nature: Neuroscience, vol. 4, pp. 819–825,Aug. 2001.

[44] M. J. Wainwright, O. Schwartz, and E. P. Simoncelli, “Naturalimage statistics and divisive normalization: Modeling nonlinear-ity and adaptation in cortical neurons,” in Probabilistic Modelsof the Brain: Perception and Neural Function (R. Rao, B. Ol-shausen, and M. Lewicki, eds.), MIT Press, 2002.

[45] J. Malo, R. Navarro, I. Epifanio, F. Ferri, and J. M. Artigas,“Non-linear invertible representation for joint statistical and per-ceptual feature decorrelation,” Lecture Notes on Computer Sci-ence, vol. 1876, pp. 658–667, 2000.

[46] I. Epifanio, J. Gutierrez, and J. Malo, “Linear transform for si-multaneous diagonalization of covariance and perceptual metricmatrix in image coding,” Pattern Recognition, vol. 36, pp. 1799–1811, Aug. 2003.

[47] W. F. Good, G. S. Maitz, and D. Gur, “Joint photographicexperts group (JPEG) compatible data compression of mammo-


grams,” Journal of Digital Imaging, vol. 7, no. 3, pp. 123–132,1994.

[48] A. Pons, J. Malo, J. M. Artigas, and P. Capilla, “Image qualitymetric based on multidimensional contrast perception models,”Displays, vol. 20, pp. 93–110, 1999.

[49] W. S. Geisler and M. S. Banks, “Visual performance,” in Hand-book of Optics (M. Bass, ed.), McGraw-Hill, 1995.

[50] Z. Wang and A. C. Bovik, “Embedded foveation image coding,”IEEE Trans. Image Processing, vol. 10, pp. 1397–1410, Oct.2001.

[51] C. M. Privitera and L. W. Stark, “Algorithms for defining vi-sual regions-of-interest: Comparison with eye fixations,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 22,pp. 970–982, Sept. 2000.

[52] U. Rajashekar, L. K. Cormack, and A. C. Bovik, “Image featuresthat draw fixations,” in Proc. IEEE Int. Conf. Image Proc.,vol. 3, pp. 313–316, Sept. 2003.

[53] Z. Wang, “The SSIM index for image quality assessment,” http://www.cns.nyu.edu/~lcv/ssim/.

[54] A. M. van Dijk, J. B. Martens, and A. B. Watson, “Qualityassessment of coded images using numerical category scaling,”in Proc. SPIE, vol. 2451, 1995.

[55] H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, “Imageand video quality assessment research at LIVE,” http://live.ece.utexas.edu/research/quality/.

[56] J. Lubin, “A visual discrimination mode for image system de-sign and evaluation,” in Visual Models for Target Detection andRecognition (E. Peli, ed.), pp. 245–283, Singapore: World Scien-tific Publishers, 1995.

[57] J.-B. Martens and L. Meesters, “Image dissimilarity,” SignalProcessing, vol. 70, pp. 155–176, Nov. 1998.

[58] VQEG, “Final report from the video quality experts group onthe validation of objective models of video quality assessment,”Mar. 2000. http://www.vqeg.org/.

Zhou Wang (S’97-A’01-M’02) received theB.S. degree from Huazhong University of Sci-ence and Technology, Wuhan, China, in 1993,the M.S. degree from South China Universityof Technology, Guangzhou, China, in 1995, andthe Ph.D. degree from The University of Texasat Austin in 2001.

He is currently a Research Associate atHoward Hughes Medical Institute and Labo-ratory for Computational Vision at New YorkUniversity. Previously, he was a Research En-

gineer at AutoQuant Imaging, Inc., Watervliet, NY. From 1998 to2001, he was a Research Assistant at the Laboratory for Image andVideo Engineering at The University of Texas at Austin. In the sum-mers of 2000 and 2001, he was with Multimedia Technologies, IBMT. J. Watson Research Center, Yorktown Heights, NY. He worked asa Research Assistant in periods during 1996 to 1998 at the Depart-ment of Computer Science, City University of Hong Kong, China.His current research interests include digital image and video coding,processing and quality assessment, and computational vision.

Alan Conrad Bovik (S’81-M’81-SM’89-F’96)is currently the Cullen Trust for Higher Edu-cation Endowed Professor in the Departmentof Electrical and Computer Engineering at theUniversity of Texas at Austin, where he is theDirector of the Laboratory for Image and VideoEngineering (LIVE) in the Center for Percep-tual Systems. During the Spring of 1992, heheld a visiting position in the Division of Ap-plied Sciences, Harvard University, Cambridge,Massachusetts. His current research interests

include digital video, image processing, and computational aspects ofbiological visual perception. He has published nearly 400 technicalarticles in these areas and holds two U.S. patents. He is also theeditor/author of the Handbook of Image and Video Processing (NewYork: Academic, 2000). He is a registered Professional Engineer in

the State of Texas and is a frequent consultant to legal, industrialand academic institutions.

Dr. Bovik was named Distinguished Lecturer of the IEEE Sig-nal Processing Society in 2000, received the IEEE Signal ProcessingSociety Meritorious Service Award in 1998, the IEEE Third Millen-nium Medal in 2000, the University of Texas Engineering FoundationHalliburton Award in 1991 and is a two-time Honorable Mentionwinner of the international Pattern Recognition Society Award forOutstanding Contribution (1988 and 1993). He was named a Dean’sFellow in the College of Engineering in the Year 2001. He is a Fellowof the IEEE and has been involved in numerous professional societyactivities, including: Board of Governors, IEEE Signal Processing So-ciety, 1996-1998; Editor-in-Chief, IEEE Transactions on Image Pro-cessing, 1996-2002; Editorial Board, The Proceedings of the IEEE,1998-present; and Founding General Chairman, First IEEE Inter-national Conference on Image Processing, held in Austin, Texas, inNovember, 1994.

Hamid Rahim Sheikh (S’00) received hisB.Sc. degree in Electrical Engineering fromthe University of Engineering and Technology,Lahore, Pakistan, and his M.S. degree in Engi-neering from the University of Texas at Austinin May 2001, where he is currently pursuing aPh.D. degree.

His research interests include using naturalscene statistical models and human visual sys-tem models for image and video quality assess-ment.

Eero P. Simoncelli (S’92-M’93-SM’04) re-ceived the B.A. degree in Physics in 1984 fromHarvard University, Cambridge, MA, a cer-tificate of advanced study in mathematics in1986 from Cambridge University, Cambridge,England, and the M.S. and Ph.D. degrees in1988 and 1993, both in Electrical Engineeringfrom the Massachusetts Institute of Technol-ogy, Cambridge.

He was an assistant professor in the Com-puter and Information Science department at

the University of Pennsylvania from 1993 until 1996. He moved toNew York University in September of 1996, where he is currently anAssociate Professor in Neural Science and Mathematics. In August2000, he became an Associate Investigator of the Howard HughesMedical Institute, under their new program in Computational Biol-ogy. His research interests span a wide range of topics in the represen-tation and analysis of visual images, in both machine and biologicalvision systems.

Date post:	03-Apr-2018
Category:	Documents
Upload:	dinhdan
View:	223 times
Download:	3 times

Image Quality Assessment: From Error Visibility to ...z70wang/publications/ssim.pdf · IEEE...

Documents