DCT-based objective quality assessment metric of 2D/3D image

DCT-based objective quality assessment metricof 2D/3D image

Xingang Liu & Chao Sun & Laurence T. Yang

# Springer Science+Business Media New York 2013

Abstract With the increasing growth of multimedia applications over the networking inrecent years, users have put forward much higher requirements for multimedia quality ofexperience (QoE) than before. One of the representative requirements is the image quality.Therefore, the image quality assessment ranging from two-dimension (2D) image to three-dimension (3D) image has been getting much attention. In this paper, an efficient objectiveimage quality assessment metric in block-based discrete cosine transform (DCT) coding isproposed. The metric incorporates properties of human visual system (HVS) to improve itsvalidity and reliability in evaluating the quality of stereoscopic image. This is fulfilled bycalculating the local pixel-based distortions in frequency domain, combining the simplifiedmodels of local visibility properties embodied in frequency domain, which consist of regionof interest (ROI) mechanism (visual sensitivity), contrast sensitivity function (CSF) andcontrast masking effect. The performance of the proposed metric is compared with othercurrently state-of-the-art objective image quality assessment metrics. The experimentalresults have demonstrated that the proposed metric is highly consistent with the subjectivetest scores. Moreover, the performance of the metric is also confirmed with the popularIRCCyN/IVC database. Therefore, the proposed metric is promising in term of the practicalefficiency and reliability for real-life multimedia applications.

Keywords Quality assessment . DCTcoding . Stereoscopic image . Human Visual System(HVS) . Region of Interest (ROI)

Multimed Tools ApplDOI 10.1007/s11042-013-1698-z

X. Liu (*) : C. SunSchool of Electronic Engineering, University of Electronic Science and Technology of China, XiyuanRoad 2006#, High Tech. Zone Chengdu, Sichuan 611731, People’s Republic of Chinae-mail: [email protected]

X. Liue-mail: [email protected]

C. Sune-mail: [email protected]

L. T. YangSt. Francis Xavier University, Canada, Antigonish, NS B2G 2W5, Canadae-mail: [email protected]

1 Introduction

Over the past decades, recent advances in multimedia technologies cause an increase in thefield of the three-dimensional (3D) technologies as the next generation of the multimediabroadcasting system. Correspondingly, 3D technologies have become the hotspot andattracted many researchers attention. 3D image as an effective means of communicationhas been extensively used in various fields, such as homes, workplace, and public spaces.Compared with the traditional 2D image (i.e. black-and-white and color televisions), it,expected to succeed high definition TV (HDTV), can create depth perception by binocularparallax and provide a dramatic enhancement in user experience.

In recent years, different 3D image systems relate to specific types of the display areavailable and being under investigation such as public two-view stereoscopic image, multi-viewpoint image, color plus depth, multi-viewpoint plus depth and layered depth image [26].The simplest and pervasive 3D image format in the field of currently various applications isthe stereoscopic image. In general, it is incorporated with two views (left and right views)which are captured by closely located two cameras that are fixed to approximately simulatethe distance between two eyes. With the slight difference between two images, the disparityis generated. By making use of the two different perspectives seen by two eyes, humanbeings can perceive the sensations of depth and reality in 3D image [2, 22]. Currently, thestereoscopic image is the most commercialized 3D image format, since the 3D movietheaters and glass-type 3DTVs are based on this format. With the increasing of the usersthat capture the information (i.e. image, video and text etc.) via wireless communication, thesignal transmission has been an important multimedia exchange mode in our lives. Com-pared with the conventional 2D image, more data is needed to transmit for 3D image [19, 25,28]. It is well-known that the delivery chain for whole image transmission system includesfive stages: content production, encoding, transmission, decoding at the receive side andthen display. However, any of stages may cause the drop of the 3D image quality, and theerror generated at a certain stage may result in the error propagation in the following stagesof the delivery chain [34]. Although some stereoscopic coding methods have proposed andpresented to reduce the information dropout, these technologies are imperfect at the present.Therefore, the objective 3D image quality assessment is an extremely important section tocopy with in the design and optimization of 3D image processing system [12]. Objectivequality assessment metric, an image evaluation technology for 3D image which goesthrough the transmission or processing system, can contribute to maintain the quality ofmedia service by monitoring the quality of image signals [23]. Poor performances of 3Dimage will bring some side effects to users who undergo the visual experience involved withthe poor 3D image for a long period of time such as vomiting, headache, eyestrain or otheruncomfortable symptoms. At present, however, there are not efficient objective imagequality assessment metrics to evaluate the quality of 3D image. Hence, it is essential andurgent to establish efficient metrics that are consistent with HVS.

It is accepted that there are two primary methods to evaluate 3D image quality: subjectivequality assessment and objective quality assessment. The former one is a psychological-based method using structured experimental designs and human participants to measure thequality of 3D image [33]. It is considered as the most accurate method to evaluate the 3Dimage quality. And with the increasing of viewers participating in evaluating the quality of3D image, the average opinions of viewers become more representative. However, subjec-tive quality assessment is time-consuming and expensive. Implementation in real time andautomation are difficult. Therefore, much attention has been concentrated on the latter one.The objective quality assessment measures physical aspects of the 3D image and considers

Multimed Tools Appl

the physical aspects and psychological issues. Its goal is to design algorithms that canautomatically evaluate the quality of the 3D image in the perceptually consistent manner[12]. Compared with the former one, it is much more utility and valuable in multimediaapplications. The objective quality assessment has been widely researched in a number ofuniversities and research institutes. Even the international standardization has been activelyprogressed by ITU-R Working Party 6Q, ITU-R Study Group 9 and Video Quality ExpertGroup (VQEG) [23]. Objective quality assessment metrics for 2D image have been exten-sively studied for many years. However, the objective quality assessment metric for 3Dimage, which recently gets public attention, has not been studied extensively and deeply.

Based on the most widely used 3D image format, the stereoscopic image has been consid-ered in our study. In this paper, we propose an efficient objective quality assessment metric ofstereoscopic image based upon DCT coding. In this approach, it combines the existing ideaswith the following contributions: (1) finding the most matching block based on the referenceblock with the poplar matching method; (2) using the disparity map to effectively find thematching block in another channel; (3) a refined pixel-based calculation with the simplifiedmodel of contrast sensitivity function in frequency domain; (4) combing the HVS property ofcontrast masking to measure the distorted stereoscopic image; (5) region of interest (ROI)mechanism (visual sensitivity) as a weighting function to optimize the performance of theproposed metric. In our proposed metric, DCTcoefficients are applied to measure the quality ofstereoscopic image by modified PSNR with HVS-based properties.

The organization of this paper is as follows. Section 2 discusses related works for objectiveimage quality assessment. The proposed metric is described in Section 3. Subjective qualityassessment tests and the used test sequences are presented in Section 4. Section 5 presents thesimulation results and performance analysis by testing the proposed metric and several existingstate-of-the-art algorithms. Final conclusions and future work are given in Section 6.

2 Related works

Objective quality assessment has the capability of directly obtaining evaluation results byusing the predefined quality assessment metrics. And it is possible to implement the processof evaluating image quality in an automatic or real-time means.

It is approved that 3D image quality assessment metrics are required to incorporate multi-dimensional perceptual factors related with depth, 3D image impairments and visual fatigue[6, 12, 15, 22]. As for the issues of visual discomfort caused in image acquirement and display,some can be resolved by the adjustment of binocular disparity or the conditions of physicalconditions. For instance, the visual comfort can be improved by recommending the disparitythreshold for comfortable viewing. As for issues caused in image encoding, compression andtransmitting, there are some studies on evaluating the quality of 3D image that are published inrecent years. One of the representative species among the published objectivemetrics is to applytraditional 2D metrics to measure the quality of each image view respectively, and thencalculate the quality of stereoscopic image by using the mean of the two views in the left andright channels [20, 33]. The representative metric for 2D metrics is Peak Signal-to-Noise Ratio(PSNR), which is calculated by the summation of the differences of the luminance valuesbetween the original and distorted images. In addition, Structural Similarity (SSIM) [30] andMulti-scale Structural Similarity Index (MSSIM) [31] proposed by Wang et al. have also beenwidely accepted, since both algorithms count the structural information instead of errorsensitivity-based measurement for quality assessment since the main function of the HVS isto extract structural information from the viewing field, and the algorithm became the key

Multimed Tools Appl

method for image quality assessment widely used in multiple multimedia application fields.However, this species might consider the effect of impairments equally affecting the left andright views and does not consider the following two factors. Firstly, it has ignored the sensationof depth or disparity as the depth map (disparity map) is unique factor for 3D image signalcompared with the traditional 2D image signal. Secondly, the interactions between the left andright views have not been taken into consideration, such as masking effect, visual comfort.Therefore, this specie cannot be efficientlymeasure the quality of stereoscopic image. The otherspecies is to consider the depth or disparity information on the basis of the former species toevaluate the distorted degrees of stereoscopic image signal [3, 9]. In [9], K. Ha et al. proposed ano-reference stereoscopic video quality perception model (SV-QPM) and took some factorsinto consideration which included temporal variance, disparity variation in intra-frames, dis-parity variation in inter-frames and disparity distribution of frame boundary areas. Two factorsare inadequate for this method. Firstly, when the disparity in the content is excessively large, thealgorithm is always inaccurate. Secondly, the algorithmmerely considers the temporal variationof disparity magnitudes but not the direction of disparity change which has been analyzed as animportant feature to affect the perceptual quality assessment. In [3], two individual algorithmswere corporately to form the final metric to evaluate the stereoscopic image. One was thestructural similarity algorithm that was used to measure the distortions caused by blur, noise andcontrast change. The other was the multi-scale algorithm that measured the distortions of thedisparity map and stereo-similarity map. The method has certainly optimized the algorithms ofevaluating the quality of stereoscopic image. However, the final metric lacks the matchingmetrics to measure the quality of stereoscopic image and less subjective experiments to provethe method universal. In [14], depth range (binocular disparity) was used as the part of theproposed objective quality assessment metric, as it directly affected the perceived depth in 3Ddisplay with these acquirement parameters (i.e. baseline, focal length shooting distance andresolution) and viewing conditions (i.e. viewing distance, resolution and display size). In [15],the weighting map was produced by combining the depth map and motion information. It wasbelieved that the depth map indicated the depth perception in 3D viewing, while the motioninformation implied the motion cue as a stimulus on human visual system for 3D video. So,combining both factors, the algorithm was built in terms of the models both Peak Signal-to-Noise Ratio (PSNR) and SSIM. In [10], a reduced-reference quality metric for 3D depth mapwas presented by considering the extracted edge information of depth map in terms of PSNRmodel under different packet loss rates. However, this method merely considers the depth mapwithout taking the HVS and image information into consideration. A method assessed stereo-scopic image from the perspective of image quality and stereoscopic sense with the PSNRmodel in [35]. Also this method loses sight of the properties of human visual perception inperceiving 3D image signal. The above-mentioned quality assessment metrics for stereoscopicimage have consideredmultiple factors such as factors involved in human visual system (HVS),sensation of depth and other visual comforts that are peculiar to 3D image. As shown in theabove-mentioned algorithms, the PSNR model as the ground truth has been wildly used instereoscopic image quality assessment.

The aforementioned objective algorithms for stereoscopic image no matter what speciesthese belong to are realized in spatial and temporal domain approaches. Besides, researchesare focused to measure the quality of stereoscopic image in frequency domain. Some worksor studies have addressed to evaluate the quality of 2D image based upon discrete cosinetransform (DCT) coding before [4, 8, 16]. Based upon the similar theoretical hierarchy, in[1], a no-reference perceptual quality assessment was described for JPEG coded stereoscopicimage on account of the segmented local features of artifacts and disparity. And the disparitywas also estimated with the segmentation algorithm for its local feature information. Its

Multimed Tools Appl

performance is superior to the conventional 2D algorithms in measuring the stereoscopicimage quality. However, the psychological effects such as masking, visual sensitivityinvolving with HVS are not considered. In [12], a Full-Reference (FR) quality metric forstereoscopic video was presented by using the technology of block matching and calculatedby MSE algorithm with the 3D-DCT coding. The method has obtained a certain improve-ment for measuring the quality of stereoscopic image. Nevertheless, similar to [1], themethod does not take the region of interest mechanism into consideration. So the methodstill shall be improved.

With the continuous visual demands for users, it is urgently needed to research moreefficient stereoscopic image quality assessment metrics. Considering the multiple factors forthe stereoscopic image and the requirements of the users, in our work, the importantproperties related with human perception in frequency domain are described. Our proposedalgorithm avoids the aforementioned drawbacks in previous works in frequency domain andoutperforms this method [12], which has been confirmed in both popular Databases.

3 Description of the proposed metric

The proposed perceptual quality assessment metric of stereoscopic image scheme ispresented in Fig. 1. In our approach, the stereo pair of both views (left and right views) iscalculated on the basic of block level. The sensation of depth that is created by the slightlydifferent perspective of two views is manifested as disparity. Since JPEG coding is a block-based discrete cosine transform (DCT) coding technology, block 8×8 is usually chose as abasic compute unit in coded images. Block size 8×8 is also chosen in our algorithm as thebasic unit to measure the quality of stereoscopic image. To be more specific, first of all, forthe reference block R0 in the left reference channel, the corresponding distorted block D0 ischosen. In the reference left stereoscopic image, the most similar reference block R1 isselected by using the block matching method. And the block R2 in the right referencechannel is obtained with the disparity information. As the disparity observed between the leftand right views is generally inversely proportional to the distance to the object. By usingsuch information, we can find a point in a stereoscopic image that corresponds to the point

Refleft

Disleft

R0, R1

D0, D1

DCT MSEleft PSNRleft

Refright

Disright

R2, R3

D2, D3

MSEright PSNRright

PSNRm

wleft

wright

CSF, Region ofInterest and Masking

ReferenceSignal

DistortedSignal

DisparityMap

DisparityMap

DCT

Fig. 1 Flow chart of the proposed metric

Multimed Tools Appl

specified in the other image in terms of the associated feature. The similar block R3 can becaught by using the same method as the found R1. Then the same process is done in thedistorted channels. The obtained blocks of both the reference and distorted image signals areprocessed through DCT coding. And then the DCT coefficients are corrected, accounting forthe influences of contrast masking function, contrast masking and region of interest mech-anism in terms of the simplified models. The mean square errors MSEleft and MSEright areindividually calculated based on each set of the corrected DCT coefficients in the left andright channels. Employing the weighting factors Wl and Wr, the final objective qualityassessment model is established. The whole process is calculated on the luminance channelonly in order to reduce the computational load.

3.1 Block selecting mode and DCT coding

In our algorithm, the block matching method is applied to search the similar block of thereference block (R0, D0) within a search range region. Mean squared error (MSE), a popularand effective algorithm, is used to find the most matching block. As shown in Eq. (1).

MSEmatching ¼ 1

M � N

XM−1

j¼0

XN−1

i¼0

Ri; j−Si; j� �2 ð1Þ

Where, Ri,j and Si,j are the pixel values of the reference and searched blocks, respectively.M×N is the block size, which is defined as 8×8 in our algorithm. Utilizing the blockmatching method, the most matched block is found in the same channel as the referenceblock, and the matching sketch is shown in Fig. 2. In Fig. 2, R0 is the original block in theleft view and its left upper pixel position is labeled as (i, j), where i is the position index inhorizontal direction and j donates its position index in vertical direction. R1 is the foundblock to R0 within the region of search through the block matching method. R2 that is thecorresponding block to the reference block R0 in the right view is searched by using thedisparity information and labeled as (i+d, j). And the best matching block R3 is also found inthe right view. By our series experiments, the region of search around the reference blockfixed to 20×20 could obtain its excellent results.

Discrete cosine transform (DCT) is extensively used in image processing (i.e. still imagesand motion images) of multimedia applications, as DCT has the character of energyconcentration, that is to say, the energies for multiple natural signals are concentrated onthe low frequency after these signals operated by DCT coding. Excepting for the property of

(i, j) R 0

R 1Region ofSearch

(i+d, j)R 2

R 3

Fig. 2 Description of block selecting mode

Multimed Tools Appl

high sparse representation, it is accepted that DCT coding has the capability of eliminatingcorrelation [7]. Hence, the DCT coefficients can efficiently represent the image proprieties infrequency domain with less data. On the strength of each DCT coefficients of the block andits matching block, the proposed metric could calculate its maximal distortion along with theproperties of HVS.

3.2 Considering the effect of contrast sensitivity function (CSF)

The response for human beings perceiving the visual stimuli depends on much more theluminance contrast than the absolute luminance [36]. The effect is excellently revealed ascontrast sensitivity function (CSF) in frequency domain. It is an effect to realize thesimulation of how sensitive human beings are to various frequencies of visual stimuli. Ifthe frequencies of visual stimuli are too high, human beings will not be able to recognize thestimuli pattern any more [8]. For a given ambient luminance, Peterson et al. determined themagnitude of the basis function required for human detection. Iteration of this technologywith different coefficients yielded a contrast sensitivity function over the range of DCT basisfunction. This contrast sensitivity function (CSF) suggests a quantization matrix used inDCT coding, and could excellently correspond to the human visual perception [27]. Like-wise Mannos et al. proposed a calculation equation to reflect the performance of CSF [17].And the characteristic curve of CSF is exhibited in Fig. 3.

Where, the horizontal axis represents the spatial frequency intensities of the visual stimuli given incycles/degree.And the vertical axis denotesCSFvalues, whose scope ranges from0 to 1.As exhibitedin Fig. 3, the red characteristic curve represented with solid line symbolizes the CSF information in acertain direction (i.e. horizontal direction and vertical direction). The curve has a peak of value 1approximately at Frequency=8.0 cycles/degree, implicating the frequency making the greatest impacton HVS to perceive the quality of visual stimulus. With regard to the frequencies above60 cycles/degree, however, the frequencies have lower impact than the frequencies below60 cycles/degree. The green characteristic curve presented with dotted line is the CSF information

0 10 20 30 40 50 60 70 80 900

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency [Circle/Degree]

CS

F V

alue

Horizontal/VerticalDiagonal Line

Fig. 3 The characteristic curve of contrast sensitivity function (CSF)

Multimed Tools Appl

in diagonal line direction, which could reach its maximal value 1 approximately at Frequency=12.0 cycles/degree, and ismeaningless for frequencies above 90 cycles/degree. It is concluded that theCSF is analogous to the property of the band-pass filter, and has poor sensitivity when the frequenciesare too low and too high as well. Thus, the correcting factor C8×8 is taken into account on the basis ofthe effect of CSF in our proposed algorithm.And its coefficients are determined and acquired by downsampling the normalized quantization table of JPEG [8]. The modified MSE is described as follows.

MSE x; yð Þ ¼ 1

M � N � 2

Xd¼1

2

λdMSE Adx;y;B

dx;y

� �� C8�8 ð2Þ

Where, λd (d=1, 2) is the weighting factor on the assumption that the reference and itsmatching blocks have different influences on measuring the quality of the stereoscopicimage. In this equation, λ1 is determined as 1 and the λ2 value ranges from 0 to 1. Thecause for the existence of the matching block is to accord with the HVS to perceive thequality of image signal and then optimize the evaluation results. Here, x and y are the blockindexes in the horizontal and vertical directions in an image signal, respectively.

3.3 Considering the factor of contrast masking

Image distortion due to the visibility is relevant to the specific content of the image, thusmeasurement model should fully consider human visual characteristics. We generallybelieve that the main visual model of HVS includes opposing color decomposition, contrastsensitivity, multichannel mechanism, masking and model adaptation [32]. In this paper,excepting for considering the effect of contrast sensitivity function, contrast masking is alsoconsidered. It reflects the interactions among visual stimuli. There exists a phenomenon thata stimulus becomes hard to detect because of the existence of another stimulus. In the imagequality assessment metrics, the original image is always viewed as the background infor-mation. Taken the levels that the interfering signals are covered by the background signalinto consideration, this effect will optimize the final objective quality assessment metric dueto its suitability and reliability for certain applications. Thus, using the effect of contrastmasking, the M8×8 is obtained based on the algorithm [21]. The adjusted MSE is defined as

MSE x; yð Þ ¼ 1

M � N � 2

Xd¼1

2

λdMSE Adx ;B

dy

� �� C8�8 �M 8�8 ð3Þ

3.4 Weighting factors based on the human visual system (Region of Interest)

According to the properties of human visual perception, different frequency component hasdifferent contribution to image content. It is accepted in many studies that the human visualsystem is more sensitive to low frequency and direct current components rather than highfrequency ones [8]. These frequency components determine the main part of luminanceproperties of an image. Even frequencies beyond a certain frequency scope may not affectthe image content. As shown in Fig. 4(b) and (c), frequencies distributions are expressed in2D and 3D views are presented. It is concluded that amount of frequencies information aredistributed at the left upper corner in Fig. 4(b), which is also confirmed at the lowercoordinates in Fig. 4(c). Therefore, considering the high sensitivity of HVS to distortionscaused in the low frequency range, the improved MSE based upon the region of interestmechanism is introduced. In this algorithm, the block is firstly divided into four sub-blocks,

Multimed Tools Appl

which are named as LL, LH, HL and HH as shown in Fig. 4(d). Then these four sub-blocksweighting factors are determined by the effect of visual sensitivity.

Consequently, the MSE equation is modified as:

MSE Adx ;B

dy

� �¼

XM−1

i¼0

XN−1

j¼0

αk Ri; j−Di; j

� �2 ð4Þ

αk ¼

αLL 0 ≤ i <M

2&&0 ≤ j <

N

2

αLHM

2≤ i < M&&0≤ j <

N

2

αHL 0≤ i <M

2&&

N

2≤ j < N

αHHM

2≤ i < M&&

N

2≤ j < N

8>>>>>>>><>>>>>>>>:

ð5Þ

Here, Ri,j and Di,j are the DCT coefficients of M × N image block for which thecoordinates of its left upper corner are equal to i and j. αk is the weighting coefficients withthe constraint ∑

k=14 αk=1 to quantify the importance of each sub-block in the monocular

vision in frequency domain, as shown in Fig. 4 and Eq. (5).

(a) The original image (b) Frequencies distribution in 2D view

LLLL

HLHL

LHLH

HHHH

(c) Frequencies distributionin 3D view (d) Sub-blocks distribution

-14

-12

-10

-8

-6

-4

-2

0

2

4

α

α α

α

Fig. 4 Interpretation of frequencies distribution and illumination of block segmenting

Multimed Tools Appl

The MSE for the whole image can be calculated as:

MSEm ¼ 1

Bh � Bv

XBh−1

x¼0

XBv−1

y¼0

MSE x; yð Þ ð6Þ

In this equation, Bh and Bv are the numbers of the blocks in the horizontal and verticaldirections in an image, respectively.

The final version of PSNR is calculated as:

PSNRproposed ¼ 10log 2552.

MSEm

� �ð7Þ

As expressed both in visual psychology and physiology, there exists the stereoscopicmasking effect in human perception. That is to say, if the qualities of the left and right viewsare different, however, human beings that perceive the quality of stereoscopic image are alsodifferent. The final calculated equation can be described as

PSNRm ¼ wleft � PSNRleft þ wright � PSNRright

wleft þ wright ¼ 1ð8Þ

In Eq. (8), wleft represents the quality weighting of left view in stereoscopic image signal.

4 Subjective quality assessment tests and test sequences

The main target of the image quality assessment is that the proposed objective algorithm couldbe consistent with the subjective assessment scores as much as possible. It is the fact thatsubjective image quality assessment is the most precise measurement of perceptual qualitysince it is generated by human visual system (HVS) directly. For this kind of experiments,human beings are involved to evaluate the image quality in a controlled test environment.Therefore, the subjective image quality assessment experiments are executed at first to providecomparable data for the design of objective quality assessment algorithms.

In our experiments, the LIVE JPEG database [18], which consists of 80 JPEG images withtheir mean opinion scores (MOS), is adopted to validate various objective quality assessmentmetrics. It is accomplished with a single-stimulus continuous quality evaluation (SSCQE) atuniversity of Texas at Austin (UT). And six selected reference image pairs used in the databaseare shown in Fig. 5. Excepting for the above-mentioned subjective quality assessment test, the

(a) Left views

(b) Right views

Fig. 5 Image pairs of stereoscopic image used in the experiments

Multimed Tools Appl

26 28 30 32 34 36 380.5

1

1.5

2

2.5

3

3.5

4

4.5

5

PSNR

MO

S

MOS vs. PSNRLinear Approximation

0.98 0.982 0.984 0.986 0.988 0.99 0.992 0.994 0.996 0.9980.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SSIM

MO

S

MOS vs. SSIMLinear Approximation

0.7 0.75 0.8 0.85 0.9 0.95

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

MSSIM

MO

S

MOS vs. MSSIMLinear Approximation

0 0.05 0.1 0.15 0.2 0.25 0.31

1.5

2

2.5

3

3.5

4

4.5

5

5.5

PQM

MO

S

MOS vs. PQMLinear Approximation

0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.81

1.5

2

2.5

3

3.5

4

4.5

UQI

MO

S

MOS vs. UQILinear Approximation

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

VIFP

MO

S

MOS vs. VIFPLinear Approximation

4 6 8 10 12 14 16 18

1

2

3

4

5

PHVS-3D

MO

S

MOS vs. PHVS-3DLinear Approximation

-4 -2 0 2 4 6 8 10

1

1.5

2

2.5

3

3.5

4

4.5

5

PHVS-M-3D

MO

S

MOS vs. PHVS-M-3DLinear Approximation

8 10 12 14 16 18 20 22

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Proposed PSNR-HVS

MO

S

MOS vs. Proposed PSNR-HVSLinear Approximation

2 4 6 8 10 12 14 16

1

1.5

2

2.5

3

3.5

4

4.5

5

Proposed PSNR-HVS-M

MO

S

MOS vs. Proposed PSNR-HVS-MLinear Approximation

Fig. 6 Logistic fitting figures of MOS vs. objective quality assessment metrics (Database [5])

Multimed Tools Appl

24 26 28 30 32 34

-5

0

5

10

15

20

PSNR

MO

SMOS vs. PSNR Linear Approximation

0.65 0.7 0.75 0.8 0.85 0.9

-5

0

5

10

15

20

SSIM

MO

S

MOS vs. SSIMLinear Approximation

0.7 0.75 0.8 0.85 0.9

-5

0

5

10

15

20

MSSIM

MO

S

MOS vs. MSSIMLinear Approximation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-5

0

5

10

15

20

PQM

MO

S

MOS vs. PQMLinear Approximation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

-5

0

5

10

15

20

UQI

MO

S

MOS vs. UQILinear Approximation

0.25 0.3 0.35 0.4 0.45 0.5

-5

0

5

10

15

20

VIFP

MO

S

MOS vs. VIFPLinear Approximation

6 8 10 12 14 16 18

-5

0

5

10

15

20

PHVS-3D

MO

S

MOS vs. PHVS-3DLinear Approximation

2 4 6 8 10 12

-5

0

5

10

15

20

PHVS-M-3D

MO

S

MOS vs. PHVS-M-3DLinear Approximation

18 20 22 24 26 28 30

-5

0

5

10

15

20

Proposed PSNR-HVS

MO

S

MOS vs. Proposed PSNR-HVSLinear Approximation

0 2 4 6 8 10

-5

0

5

10

15

20

Proposed PSNR-HVS-M

MO

S

MOS vs. Proposed PSNR-HVS-MLinear Approximation

Fig. 7 Logistic fitting figures of MOS vs. objective quality assessment metrics (Database [18])

Multimed Tools Appl

IRCCyN/IVC database [5] is also used to evaluate the performance of our proposed algorithm.Unlike the database [18] involved, the proposed algorithmmerely consider the image informationto evaluate the quality of 2D images, that is to say, disparity or depth information isn’t existed in2D image. The subjective quality assessment is completed at viewing distance of six times thescreen height and uses the method –double stimulus impairment scale (DSIS) with 15 observers.

5 Simulation results and performance analysis

We evaluated the performance of the proposed metric on the basic of both databases. The results ofthe proposed metric are compared with several state-of-the-art image quality assessment metrics -PSNR, perceptual quality assessment (PHVS3D) [12], structural similarity index (SSIM) [30],multi-scale structural similarity index (MSSIM) [31], universal quality index (UQI) [29], perceptualquality metric (PQM) [13], and visual information fidelity (VIFP) [24], which are the 2D and 3Dmetrics. For 2D image quality metrics, which have been run on the left and right views, respectivelyand then, the results have been averaged. These algorithms are implemented with the test imagesequences to exactly measure the proposed metric performance, and the correlations with thesubjective scores are given by Figs. 6 and 7. In the proposed scheme, we determine the weightingcoefficients αk in Eq. (5) by training to optimize the correlation values between the objective andsubjective scores. In our experiments, the parameters are fixed as follows.αLL is determined as 0.58.αLH andαHL have the identical values of 0.16. ThenαHH is 0.1. As shown in Figs. 6 and 7, popular2D image quality assessment metrics have poor correlations to mean opinion scores (MOS) inevaluating the quality of stereoscopic image. For the 2D algorithms, since they are directly extendedfrom the 2D cases and don’t take the binocular visual characteristic or depth visual perception intoconsideration. And the overall performances are worse than the proposed metric in general eventhough they may be effective for some special distortion types. For instance, the PSNR values maynot give good results quality scores formost distortion types. However, it does workwell for imagesthat are distorted by additional noise. Although it is well accepted that SSIM has the goodperformance to measure the distortion of images (i.e. Gaussian Blur), we find that its applicationfor general image signal has some limitations. For instance, the basic coding unit of general imagecoding standard is 8×8 sub-block so that the blocking effect always causes at the correspondingboundaries due to the lossy quantization operation. Besides, some well-known properties relatedwith human visual perception (i.e. masking effect, contrast sensitivity function) are not fullyconsidered, so that it could not give well enough quality evaluations. PQM and PHVS3D as 3Dimage quality assessment metrics have obtained much better performances than the popular 2Dimage quality assessments. But both algorithms still show less stable than the proposed metric.Besides, the performance comparisons are also presented by Tables 1 and 2. In Table 1, Pearsonlinear correlation coefficients (PLCC) for these compared quality assessment metrics are presented.

Table 1 Pearson linear correlation between MOS and the objective assessment metrics

Metrics PSNR SSIM MSSIM UQI PQM

Database [5] 0.8012 0.7751 0.9142 0.8502 0.7685

Database [18] 0.1183 0.4770 0.5112 0.2547 0.5621

Metrics VIFP PHVS3D PHVS-M3D Proposed PSNR-HVS Proposed PSNR-HVS-M

Database [5] 0.9349 0.9278 0.9343 0.9313 0.9554

Database [18] 0.4668 0.3827 0.5468 0.5461 0.5630

Bold values shows the result of our experiment

Multimed Tools Appl

A correlation coefficient value close to 1 indicates good correlation with the human visualperception. The experimental data of the proposed quality assessment metric shows that theproposed metric has a high accuracy and coherence, and is suitable to predict the quality ofstereoscopic image.

In order to provide quantitative measurement on the performance for our proposed objectiveimage quality metric, we follow the standard performance evaluation procedures [11]. Exceptingfor the performance of the Pearson linear correlation coefficient, root mean square prediction error(RMSE) is also given in our experimental results. The values of RMSE indicate the experimentalresults prediction accuracy of the objective quality assessment algorithms, as shown in Table 2.According to the experimental results presented in Tables 1 and 2, it is clear that our proposedmetric outperforms the other 2D/3D image quality assessment metrics. PQM [13] was releasedspecially for measuring the quality of 3D video signal with the color plus depth format. In generalcases, it could get satisfied correlation results with the subjective quality scores. When the imagequality drops greatly, PQM could not correctly reflect the corresponding objective quality scorebecause it does not consider the important distortion factors, and more importantly, it is not takethe disparity or depth map into consideration. For PHVS3D [12], the method has lower perfor-mance than the proposed metric, which could be effectively optimized by taking the property ofhuman visual perception (i.e. region of interest mechanism) into account. And our experimentalresults could be used to confirm this hypothesis.

In our study, in order to compare the measure the influence of contrast masking effect, twomodels have been embedded in the general formulation, which are represented as proposed PSNR-HVS, and proposed PSNR-HVS-M. Their prediction performance is illuminated again in Fig. 8. Itis concluded that contrast masking has an important effect on measuring the quality of

ProposedPSNR-HVS

Correlation0.9313

Correlation0.9554

ProposedPSNR-HVS-M

ProposedPSNR-HVS

Correlation0.5461

Correlation0.5630

ProposedPSNR-HVS-M

LIVE JPEGDatabase [18]

IRCCyN/IVCDatabase [5]

Fig. 8 Illuminations of the proposed metrics in both databases

Table 2 Performance comparison of the objective quality metrics (root mean-squared-error (RMSE))

Metrics PSNR SSIM MSSIM UQI PQM

Database [5] 0.6138 0.7000 0.5426 0.7139 0.6158

Database [18] 6.6500 5.8390 5.7260 6.4040 5.4740

Metrics VIFP PHVS3D PHVS-M3D Proposed PSNR-HVS Proposed PSNR-HVS-M

Database [5] 0.4497 0.4728 0.4515 0.4616 0.3742

Database [18] 5.8200 6.1180 5.3380 5.2710 5.1840

Bold values shows the result of our experiment

Multimed Tools Appl

2D/stereoscopic image. Overall, the proposed metric in our study can achieve better performanceagainst existing 2D and some 3D image quality assessmentmethods involving JPEGdistorted type.

6 Conclusions and future work

In this paper, we propose a novel objective quality assessmentmetric of stereoscopic image based onDCT coding. In our algorithm, Region of Interest (visual sensitivity in frequent domain), contrastsensitivity function (CSF) and contrast masking effect are taken into consideration. Besides,disparity has a significant impact on evaluating the stereoscopic images quality as a vital featurein 3D image. From the statistical analysis, it is concluded that the proposed quality metric has muchbetter alignment with the subjective image quality assessment scores than other state-of-the-artobjective quality assessment metrics.

The main contributions of this research are as follows. At first, the objective quality assess-ment metric could be used to automatically and real-time detect the quality of stereoscopic imageunder the network or wireless communication environment. Secondly, some important proprie-ties related with human visual perception in frequency domain are taken into account. Among ofthem, region of interest (visual sensitivity in frequent domain) has enhanced the proposed metricuniversality. More importantly, the proposed quality assessment metric could extend the scope ofstudy about the quality assessment algorithm of stereoscopic image.

Future studies may focus on the disparity or depth information. It is approved that stereoscopicimage quality assessment metrics are required to incorporate multidimensional perceptual factorsrelated with depth, 3D image impairments, visual fatigue and visual comfort. Therefore, the factorsrelated with depth are the important parts to measure the quality of stereoscopic image. More workswill be focused on this factor. Furthermore, because this paper proposes the algorithm to evaluatethe impairments in frequency domain of stereoscopic image, additional researches will be needed toapply this method to stereoscopic video and actual communication system. This paper’s researchteam is currently developing its universality for that purpose.

Acknowledgments The work was supported by Fundamental Research Funds for the Central Universities onthe grant ZYGX2012J028, and also supported by the China Postdoctoral Science Foundation funded Projecton the grant 2013M530396.

References

1. Akhter R, Sazzad ZMP, Horita Y, Baltes J (2010) No-reference stereoscopic image quality assessment.Proc SPIE 7524, stereoscopic displays and applications XXI, San Jose, California

2. Arican Z, Yea S, Sullivan A, Vetro A (2009) Intermediate view generation for perceived depth adjustmentof stereo video. SPIE Applications of Digital Image Processing XXXII, San Diego, 7443(1), CA, USA.doi:10.1117/12.829381

3. Boev A, Gotchev A, Egiazarian K, Aksay A, Akar GB (2006) Towards compound stereo-video qualitymetric: a specific encoder-based framework. Proceeding of IEEE Southwest Symposium on Image Analysisand Interpretation, Denver, Colorado, USA, pp 218–222

4. Brandão T, Queluz MP (2008) No-reference image quality assessment based on DCT domain statistics. JSignal Process 88(4):822–833. doi:10.1016/j.sigpro.2007.09.017

5. Callet PL, Autrusseau F (2005) Subjective quality assessment IRCCyN/IVC database, available:http://www.irccyn.ec-nantes.fr/ivcdb/

6. Chen W, Fournier J, Barkowsky M, Callet P-Le (2010) New requirements of subjective video qualityassessment methodologies for 3DTV. Proceeding of International Workshop Video Processing QualityMetrics, Scottsdale, USA

Multimed Tools Appl

http://dx.doi.org/10.1117/12.829381

http://dx.doi.org/10.1016/j.sigpro.2007.09.017

http://www.irccyn.ec-nantes.fr/ivcdb/

7. Dabov K, Foi R, Katkovnik V, Egiazarian K (2008) Image restoration by sparse 3D transform –domaincollaborative filtering. Proc. SPIE Electronic Imaging, no. 6812–07, San Jose, USA

8. Egiazarian K, Astola J, Ponomarenko N, Lukin V, Battisti F, Carli M (2006) A new full-referencequality metrics based on HVS. International Workshop on Video Processing and Quality Metrics,Scottsdale, USA

9. Ha K, Kim M (2011) A perceptual quality assessment metric using temporal complexity and disparityinformation for stereoscopic video. Proceeding of ICIP, Brussel, Belguim, pp 2525–2528

10. Hewage CTER, Martini MG (2010) Reduced-reference quality metric for 3D depth map transmission.IEEE Int Conf Image Processing, Hong Kong, China

11. ITU-R BT. 500-11 (2002), Methodology for the subjective assessment of the quality of televisionpictures, International Telecommunication Union (ITU) Radio Communication Sector, Geneva,Switzerland

12. Jin L, Boev A, Gotchev A, Egiazarian K (2011) 3D-DCT based perceptual quality assessment ofstereo video. 18th IEEE International Conference on Image Processing (ICIP2011), Brussels, pp2521–2524

13. Joveluro P, Malekmohamadi H, Fernando WAC, Kondoz AM (2010) Perceptual video quality metric for3D video quality assessment. 3DTV-Conference: the true vision –capture, transmission and display of 3Dvideo (3DTV-CON), pp 1–4

14. Kim D, Min D, Oh J, Jeon S, Sohn K (2009) Depth map quality metric for three-dimensional video.Stereoscopic Displays and Applications XX, vol 7237, San Jose, CA. doi:10.1117/12.806898

15. Kim DHH, Ryu S, Sohn KHW (2012) Depth perception and motion cue based 3D video Qualityassessment. IEEE Int. Symposium on Broadband on Multimedia Systems and Broadcasting (BMSM),Seoul, South Korea, pp 1–4

16. Ma L, Li S, Ngan KN (2012) Reduced-reference image quality assessment in reorganized DCT domain.Signal Process Image Commun, in press. doi:10.1016/j.image.2012.08.001

17. Mannos JL, Sakrison DJ (1974) The effects of a visual fidelity criterion on the encoding of images. IEEETrans Inf Theory 20(4):525–536

18. Moorthy AK, Su CC, Mittal A, Bovik AC (2012) Subjective evaluation of stereoscopic image quality.Signal Process Image Commun 28(8):870–883. doi:10.1016/j.image.2012.08.004

19. Park PK, Oh KJ, Ho YS (2008) Efficient view-temporal prediction structures for multi-view video coding.Electron Lett 44(2):102–103. doi:10.1049/el:20082082

20. Pinson MH, Wolf S (2004) A new standardized method for objectively measuring video quality. IEEETrans Broadcast 50(3):312–322. doi:10.1109/TBC.2004.834028

21. Ponomarenko N, Silvestri F, Egiazarian K, Carli M, Astola J, Lukin V (2007) On between-coefficient contrast masking of DCT basis functions, Int. Workshop on Video Processing andQuality Metrics, USA

22. Sazzad ZMP, Yamanaka S, Kawayoke Y, Horita Y (2009) Stereoscopic image quality prediction. ProcIEEE QoMEX, San Diego, USA, pp 180–185

23. Seo J, Liu X, Kim D, Sohn K (2012) An objective video quality metric for compressed stereoscopic video.Circ Syst Signal Proc (Springer J) 31(3):1089–1107. doi:10.1007/s00034-011-9369-7

24. Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444.doi:10.1109/TIP.2005.859378

25. Smolic A, Mueller K, Stefanoski N, Ostermann J et al (2007) Coding algorithms for 3DTV – a survey.IEEE Trans Circ Syst Video Technol 17(11):1606–1621. doi:10.1109/TCSVT.2007.909972

26. Smolic A, Muller K, Merkle P, Käuff P, Wiegand T (2009) An overview of available and emerging 3Dvideo formats and depth enhanced stereo as efficient generic solution. Proc Picture Coding Symposium(PCS), Chicago, USA, pp 1–4

27. Solomon JA, Watson AB, Ahumada AJ (1994) Visibility of DCT basis functions: effects of contrastmasking. Proceeding of Data Compression Conference, Snowbird, Utah, pp 361–370

28. Tsung PK, Ding LF, Chen WY, Chuang TD, Chen YH, Hsuao PH, Chien SY, Chen LG (2010) Videoencoder design for high-definition 3D video communication systems. IEEE Commun Mag 48(4):76–86.doi:10.1109/MCOM.2010.5439080

29. Wang Z, Bovik A (2002) A universal image quality index. IEEETrans Sig Process Lett 9:81–84. doi:10.1109/97.995823

30. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from errorvisibility to structural similarity. IEEE Trans Image Process 13(4):600–612. doi:10.1109/TIP.2003.819861

31. Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment.IEEE Asilomar Conf Signals Syst Comput 2:1398–1402. doi:10.1109/ACSSC.2003.1292216

Multimed Tools Appl

http://dx.doi.org/10.1117/12.806898

http://dx.doi.org/10.1016/j.image.2012.08.001

http://dx.doi.org/10.1016/j.image.2012.08.004

http://dx.doi.org/10.1049/el:20082082

http://dx.doi.org/10.1109/TBC.2004.834028

http://dx.doi.org/10.1007/s00034-011-9369-7

http://dx.doi.org/10.1109/TIP.2005.859378

http://dx.doi.org/10.1109/TCSVT.2007.909972

http://dx.doi.org/10.1109/MCOM.2010.5439080

http://dx.doi.org/10.1109/97.995823

http://dx.doi.org/10.1109/97.995823



http://dx.doi.org/10.1109/ACSSC.2003.1292216

32. Winkler S (1892) Visual model and quality metrics for image processing applications, PH.D Thesis2000J, A treatise on electronic and magnetism, 3rd edn, vol 2. Clarendon, Oxford, pp 68–73

33. Wu HR, Rao KR (2006) Digital video image quality and perceptive coding. CRC Press Taylor & FrancisGroup Boca Raton, London

34. Xing L, You J, Ebrahimi T, Perkis A (2010) An objective metric for assessing quality of experience onstereoscopic images. Proceeding of MMSP, Saint-Malo, USA, pp 373–378

35. Yang J, Hou C, Zhou Y, Zhang Z, Guo J (2009) Objective quality assessment method of stereo images,3DTV Conference: The True Vision - Capture, Transmission and Display of 3D video, pp 1–4

36. Zhu Z, Wang Y (2009) Perceptual distortion metric for stereo video quality evaluation. WSEAS TransSignal Process 5(7):241–250

Xingang Liu is an associate professor in the School of Electronic Engineering (EE) at University of ElectronicScience and Technology of China (UESTC). He received the B.S. degree in the school of EE at UESTC in 2000,and M.S. and PhD degrees from Yeungnam University of South Korea in 2005 and 2010, respectively. During2000–2003, he worked as a faculty member in EE of UESTC. Dr. Liu was a BK21 research fellow in the school ofElectrical and Electronic Engineering in Yonsei University of SouthKorea from 2010 to 2011. Currently, he is alsoan Adjunct Professor of Dongguk University of South Korea. His research interests are multimedia signalcommunication related topics, such as heterogeneous/homogenous video transcoding, video quality measurement(QoE), video signal error concealment, mode decision algorithm, 2D/3D video codec and so on. He is a member ofIEEE, KICS, and KSII.

Chao Sun is a master student in the School of Electronic Engineering of University of Electronic Science andTechnology of China (UESTC). Her research interests include video compression, 2D/3D video qualityassessment, and so on.

Multimed Tools Appl

Laurence T. Yang is with Department of Computer Science of St. Francis Xavier University, Canada. Hiscurrent interests include parallel and distributed computing, embedded and ubiquitous/pervasive computing.His research has been supported by National Sciences and Engineering Research Council, Canada (NSERC)and the Canadian Foundation for Innovation (CFI). He has published more than 300 papers in various refereedjournals, conference proceedings and book chapters in these areas including around 100 international journalpapers such as IEEE and ACM Transactions. He has been involved actively in conferences and workshops asa program/general/steering conference chair and numerous conference and workshops as a program commit-tee member. He served as the vice-chair of IEEE Technical Committee of Supercomputing Applications(2001–2004), the chair of IEEE Technical Committee of Scalable Computing (2008–2011), and the chair ofIEEE Task force on Ubiquitous Computing and Intelligence (2009–now). In addition, he is the editors-in-chiefof several international journals. He is serving as an editor for many international journals. He has been actingas an author/co-author or an editor/co-editor of more than 25 books from well-known publishers. He has wonseveral Best Paper Awards; Distinguished Achievement Award, 2005 and 2011; Canada Foundation forInnovation Award, 2003. He has been invited to give around 20 keynote talks at various internationalconferences and symposia.

Multimed Tools Appl

Date post:	23-Dec-2016
Category:	Documents
Upload:	laurence-t
View:	214 times
Download:	2 times

DCT-based objective quality assessment metric of 2D/3D image

Documents