DD-CycleGAN: Unpaired image dehazing via Double-Discriminator...

Engineering Applications of Artificial Intelligence 82 (2019) 263–271

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai

DD-CycleGAN: Unpaired image dehazing via Double-DiscriminatorCycle-Consistent Generative Adversarial Network✩

Jingming Zhao a, Juan Zhang a,∗, Zhi Li a, Jenq-Neng Hwang b, Yongbin Gao a, Zhijun Fang a,Xiaoyan Jiang a, Bo Huang a

a School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201610, Chinab Department of Electrical Engineering, University of Washington, WA, USA

A R T I C L E I N F O

Keywords:Haze removalGenerative adversarial network

A B S T R A C T

Despite the recent progress in image dehazing, the task remains tremendous challenging. To improve theperformance of haze removal, we propose a scheme for haze removal based on Double-Discriminator Cycle-Consistent Generative Adversarial Network (DD-CycleGAN), which leverages CycleGAN to translate a hazyimage to the corresponding haze-free image. Unlike other methods, it does not need pairs of haze and theircorresponding haze-free images for training. Extensive experiments demonstrate that the proposed methodachieves significant improvements over the existing methods, both quantitatively as well as qualitatively. Andour method can also achieve good effects qualitatively when applied to the real scenes too.

1. Introduction

High-quality images are critically desired in the fields of trafficand security monitoring. However, images captured by the camerafrom outdoor environments often suffer from floating particles in theatmosphere (e.g., smoke, dust, haze, and liquid droplets). Two maineffects on images caused by haze are: contamination with an additivecomponent to the image and attenuation of the light (Berman and Avi-dan, 2016). Specifically, hazy images captured in outdoor environmentsresult in poor picture quality, and cause difficulty in distinguishing ob-ject features in the image. Therefore, image haze removal has becomean important research field of computer vision.

In order to remove the effect of haze on the images, previousdehazing methods usually follow the similar pipeline of (1) modelingthe medium transmission, (2) refining the coarse transmission model,(3) estimating the global atmospheric light, and (4) reconstructingthe latent image according to the predicted model parameters (Liet al., 2017). Generally, these traditional dehaze methods are mainlydivided into two types: image enhancement and model based dehazingalgorithms.

In recent years, deep learning based methods have attracted muchattention in various fields, such as image classification (Zhang et al.,2018; Fridadar et al., 2018), image inpainting (Yu et al., 2018), par-ticular in single haze removal. However, methods based on traditional

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work.For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.04.003.∗ Corresponding author.

E-mail addresses: [email protected] (J. Zhao), [email protected] (J. Zhang), [email protected] (Z. Li), [email protected] (J.-N. Hwang),[email protected] (Y. Gao), [email protected] (Z. Fang), [email protected] (X. Jiang), [email protected] (B. Huang).

deep learning require paired datasets, i.e., hazy image and its corre-sponding haze-free image are needed. But currently we do not haveenough datasets about paired hazy images, which impede the studyof deep learning in the field of dehazing. Our method requires neitherpaired samples of hazy and ground truth images nor any parameters ofatmospheric scattering model during the training and testing phases.

In 2014, Goodfellow proposed a new framework, called generativeadversarial networks (GANs) for the generation of models through theprocess of confrontation estimation (Goodfellow et al., 2014), whicheffectively solves the problem of small amount datasets. There hasbeen some state-of-the-art GAN based solutions (Zhang et al., 2017)proposed for single image dehazing, which require hazy input im-age and its ground truth in a paired manner. In 2014, Goodfellowet al. (Goodfellow et al., 2014; Li et al., 2017c) propose a novelgenerative adversarial net (GAN), which has achieved great results inrepresentation learning (Salimans et al., 2016; Mathieu et al., 2016)and image generation (Mathieu et al., 2015; Li et al., 2017c). Recentmethods adopt the similar idea for image generation applications, suchas future prediction (Pathak et al., 2016), and image inpainting (Wuet al., 2016), as well as to other domains like 3D data (Vondrick et al.,2016) and videos (Chen et al., 2017).

The algorithm uses the data without labels to train the generatedmodel, which learns the real data characteristics and distribution,through the help of the simultaneously trained discriminator model

https://doi.org/10.1016/j.engappai.2019.04.003Received 17 October 2018; Received in revised form 11 March 2019; Accepted 1 April 2019Available online 8 May 20190952-1976/© 2019 Elsevier Ltd. All rights reserved.

https://doi.org/10.1016/j.engappai.2019.04.003

http://www.elsevier.com/locate/engappai

http://www.elsevier.com/locate/engappai

http://crossmark.crossref.org/dialog/?doi=10.1016/j.engappai.2019.04.003&domain=pdf


mailto:[email protected]









J. Zhao, J. Zhang, Z. Li et al. Engineering Applications of Artificial Intelligence 82 (2019) 263–271

with a small amount of data with labels. The network uses randomnoise as input to generate a sample (fake) image, and then sends it toa discriminative model, which identifies the true and false samples asaccurately as possible so that both models are constantly optimized toachieve equilibrium, that is, the discriminative model cannot judge thecurrent sample being from the generated sample or the real sample. Inthis way, a one-to-one paired mapping between training data is needed.The idea of adversarial loss is the key point to GANs’ success, which canforce the generated images to be more real and indistinguishable fromreal samples.

In 2017, Zhu et al. (2017) proposed a Cycle-Consistent AdversarialNetwork (CycleGAN), which does not require paired images any more,and provides solutions to the problem of scarce datasets. By effectiveaggregation of cycle-consistency and perceptual losses, Cycle-Dehazenetwork (Engin et al., 2018) architecture has also been proposed foran end-to-end image dehaze scheme.

Inspired by the Cycle-Consistent Adversarial Networks, this paperproposes a new type of networks, called Double-Discriminator Cycle-Consistent Adversarial Networks (DD-CycleGANs), for dehazing roadscenes. In this method, only a small amount of unpaired data is needed,greatly reducing the difficulty of data collection.

In summary, following are the major contributions of this work:(1) In traditional GANs, it is hard to balance the generator and

the discriminator, and it gets mode collapse easily. So, we use weightclipping instead of cross entropy, it can solve the problem of traininginstability and mode collapse and the converge speed of training isfaster than that in GANs.

(2) In order to make the discriminator better approximating theoptimality and solve the difficulty to balance the generator and thediscriminator, we increase the number of discriminators to providingreliable feedback for the generator more stable. Therefore, we proposedDouble-Discriminator CycleGAN (DD-CycleGAN) for haze removal.

(3) To our best knowledge, this is the first attempt of using Double-Discriminator CycleGAN (DD-CycleGAN) for dehazing, which can bereadily and effectively adapted to traffic scenes to enhance the diversityof haze removal algorithm application scenarios.

The remainder of this paper is organized as follows: Section 2reviews the related works on dehazing and several existing deep learn-ing models. Section 3 presents our proposed DD-CycleGAN and thecorresponding modifications on the CycleGAN. Section 4 shows oursimulation results and comparisons with those of other competingsolutions, followed by the Conclusion in Section 5.

2. Related works

The existing haze removal methods can be broadly classified intofollowing three categories: Image enhancement algorithms (ying Xuet al., 2006; Rahman et al., 2004; Tan, 2008; Kim et al., 1998; Stark,2000), Model based haze removal algorithms (Swami and Das, 2018;Jean-Philippe and Hautiere, 2009; Fattal, 2008; He et al., 2011; Nayarand Narasimhan, 1999; Narasimhan and Nayar, 2002, 2003, 2000) andDeep learning approaches (Li et al., 2017c; Ren et al., 2016; Ling et al.,2016; Cai et al., 2016; Ren et al., 2018).

2.1. Traditional haze removal methods

Existing traditional haze removal methods can be broadly classi-fied into following two categories: image enhancement algorithms andmodel-based haze removal algorithms. The image enhancement algo-rithm improves the contrast of an image to achieve the dehazing effectby improving the visual effect of the image. The model based dehazingalgorithm takes advantage of the atmospheric model to analyze thecause of haze and restore the haze-free image by repairing the lossinformation of the image.

(1) Image enhancement algorithms Directly from the image pro-cessing point of view, by enhancing the contrast of the hazy image

and highlighting the characteristics or effective information of thepicture, the visual effect of the picture can thus be improved. However,this method ignores the real mechanism of image degradation, so thequality of images with complex scenes cannot be improved and mayeven lose some of the image information. Common image enhance-ment algorithms include histogram equalization (ying Xu et al., 2006),multi-scale Retinex (Rahman et al., 2004), homomorphic filtering, andcontrast enhancement (Tan, 2008; Kim et al., 1998). More specifically,Stark (2000) proposes histogram equalization of subblocks of an image.By replacing the mean gray level of the sub-block with the graylevel after the histogram equalization, this method of processing pixelsaccording to the neighborhood of each pixel facilitates highlightingthe image characteristics. The multi-scale Retinex algorithm (Rahmanet al., 2004) separates the irradiated and reflected components in thehazy image, eliminating the effect of irradiated components. However,this algorithm needs to calculate the illuminance component, which ismathematically an underdetermined problem and can only be approx-imated. Tan (2008) propose the method to maximize local contrast forhaze removal, which is based on the observation that a haze-free imagepossesses better contrast than its corresponding hazy image (Swami andDas, 2018).

(2) Model based haze removal algorithms By establishing the at-mospheric scattering model to study the physical principles of imagedegradation, and obtaining the scattering effect of floating particleson the light in the atmosphere and the influence on the image, amore realistic image with richer information can be recovered and ithas better haze removal effect in complex scenes (Jean-Philippe andHautiere, 2009; Fattal, 2008; He et al., 2011; Nayar and Narasimhan,1999; Narasimhan and Nayar, 2002, 2003, 2000). Nayer et al. (Nayarand Narasimhan, 1999; Narasimhan and Nayar, 2002, 2003, 2000)divide the influence of atmospheric reflections on the light into theatmospheric attenuation of the scene light and the superposition ofambient light, and mitigate their adverse impacts separately, resultingin haze-free images with less information being lost. Fattal (2008)estimates albedo and transmission of a scene by using IndependentComponent Analysis (ICA), however, unsatisfactory performance indense haze and extensive computation time limits the application ofthis method (Swami and Das, 2018). He et al. (2011) propose a noveldark channel prior algorithm, by taking advantage of the observationthat the image captured in the outdoor environments contains manydark pixels. This method fails to work in scenes where a major portionof an image is covered by light or other similar object, such as sky, andthe computation is extensive.

2.2. Deep learning

Recent years, with the rapid developments of deep learning, prob-lems of haze removal are also addressed using deep learning ap-proaches (Li et al., 2017c; Ren et al., 2016; Ling et al., 2016; Caiet al., 2016; Ren et al., 2018). Cai et al. (2016) present an End-to-Endsystem for image haze removal called DehazeNet, which uses the deeplearning to intelligently learn the characteristics of haze and solve thedifficulties of manual feature design. This DehazeNet learns a mappingfrom a hazy image to the scene transmission map. Ren et al. (2016)present a network based on multi-scale convolutional neural network(MSCNN). This approach requires manual tuning parameters of hazyinput image for gamma correction, which is complicated. AOD-Net (Liet al., 2017c) estimates a new variable based on the transformation ofthe atmospheric scattering model.

2.3. Image-to-image translation via CycleGANs

The idea of image-to-image translation goes back to Hertzmanet al.’s Image Analogies (Hertzmann et al., 2001), where a nonpara-metric texture model is employed on a single input–output imagepair (Efros and Leung, 1999). Recently, many approaches use a dataset

264


of input–output examples to learn translation function using CNNs(Long et al., 2015). Many problems in image processing, computergraphics, and computer vision can be formulated as an image-to-imagetranslation task. For example, label to scene, aerial to map, day tonight, edges to photo and generating photographs from sketches orfrom attribute and semantic layouts (Sangkloy et al., 2017; Karacanet al., 2016). In this paper, we also formulate haze removal as an image-to-image translation task, but unlike prior works, we learn the mappingwith unpaired training examples.

In certain cases, the labeled ground truth is hard to obtain. Recently,some methods based on unpaired images have been developed toovercome this problem (Zhu et al., 2017; Dong et al., 2017; Yi et al.,2017a). Owing to a lack of labeled data, Dong et al. (2017) designan unsupervised framework, which succeeds in gender transformationand face swapping. Zhang et al. (2017) proposed a general supervisedframework, called Cycle-Consistent GAN (CycleGAN) as inspired bypix2pix (Isola et al., 2016), which aims to minimize the reconstructionerror between two sets of training data (Zhu et al., 2017). AlthoughCycleGAN works well for the same task of pix2pix in the unsuper-vised way, it does not perform as good as pix2pix. Compared to thetraditional GANs, the input of the network is no longer a noise but apicture and no longer dependent on the paired datasets, and a cycleconsistency (Chen et al., 2017; Godard et al., 2017) is introduced inthe loss function of CNN training.

Learning inter-domain mappings from unpaired data can improveperformance in structured prediction tasks, such as haze removal in thispaper, by reducing the need for paired data. It is particularly powerfulbecause it requires only unpaired examples from two image domainsX and Y. The Cycle-Dehaze network (Engin et al., 2018) architectureis one of the early attempts of using CycleGAN proposed for an end-to-end image dehazing scheme. The Cycle-Dehaze network effectivelyaggregate cycle-consistency and perceptual losses, so it can achieve thedehazing tasks better. To be able to obtain high-resolution dehazedimages, the Cycle-Dehaze employed a simple upsampling method basedon Laplacian pyramid, and successfully applied on the NTIRE 2018challenge on single image dehazing datasets, i.e., I-HAZE (Ancuti et al.,2018a) & O-HAZE (Ancuti et al., 2018b).

Inspired by this idea, we further introduce a more effective lossin our work to push two generators to be consistent with each other,along the same line of research by Yi et al. (2017b), who use a similarobjective as ours for dual leaning in machine translation (He et al.,2016). To overcome the commonly observed model collapse problemsin adversarial training and to achieve more stable results, the Wasser-stein GAN (WGAN) (Arjovsky et al., 2017) with the weight clippingtechnique is introduced to solve the problem of training instabilityand mode collapse in the GAN. More specifically, limiting the absolutechanges of weight updating in the discriminators with a predefinedthreshold to ensure the discriminator will not give very different valuesfor two slightly different input samples.

Since CycleGANs have been successfully used to translate imagesfrom one style to another, thanks to its lower requirement on the scaleand label of the data, we believe it is also suitable for the situationof less available hazy image datasets. By treating image restorationas an image-to-image translation, we are certain that CycleGANs canrestore the haze scenes if trained with the original hazy images andsome unpaired haze-free image. More specifically, a hazy image servesas the input, the haze-free image will be restored successfully by thetrained generator (Chen et al., 2017).

2.4. GANs with multiple-discriminators

Durugkar et al. (2016) extend GANs with multiple discriminators,which are supposed to be more stable on providing feedback for thegenerator. And for one generator, multiple discriminators of the samestructure with random initialization are utilized as teachers for thegenerator.

To be specific, multiple discriminators can approximate the opti-mal discriminator. Take for an example, when one of the discrimi-nators reaches a far superior converged state to the generator, theother discriminator can still be used to provide constructive gradientsfor updating the generators, instead of stopping the learning of thegenerators.

3. Proposed method

As shown in Fig. 1, our proposed Double-Discriminator CycleGAN(DD-CycleGAN) translates a hazy image to the corresponding haze-free image. In this section, an overview of the general architecture isfirst given, then the formulation is presented, and finally, the learningprocedure is provided in details.

3.1. Double-Discriminator CycleGAN (DD-CycleGAN)

The method (DD-CycleGAN) proposed in this paper has two discrim-inators with the same structure against one generator, which solvesthe problem of mode collapse in the traditional GANs. By extendingto multiple discriminators of the same structure, DD-CycleGAN canapproximate the optimal discriminator better (Durugkar et al., 2016),and is more stable on providing reliable feedback for the generator (Xuet al., 2017). A bijective mapping is created between these two gen-erators, so that the images in the X domain can correspond in the Ydomain (Zhu et al., 2017).

Fig. 1 shows the flow chart of the proposed Double-DiscriminatorCycle-Consistent Adversarial Network (DD-CycleGAN). As illustratedin Fig. 1(a), our model includes two mappings: G maps X to Y withtwo discriminators, DY1 and DY2, on domain Y, and F maps Y to Xwith two discriminators, DX1 and DX2, on domain X. In addition, animage x in X domain is mapped to the Y domain by generator G togenerate image 𝑦, that is, G:X → Y, with G(x) = 𝑦. Similarly, take𝑦 as the input of the Generator F maps back to the X domain andgenerates 𝑥, that is, F: Y → X, with F (y) = 𝑥. Note that DX1 and DX2,encourage F to translate Y into outputs indistinguishable from domainX, and we hope when we translate one image into another image andback again, we should return to the original image where we began.Fig. 1(b) shows forward training process for the DD-CycleGAN. Animage x is taken as the input and sent to generator G to generate image𝑦. The discriminators DY1 and DY2, judge whether the image 𝑦 is eithergenerated or real image. The image 𝑥 is then generated by generator F,which encourages the final output be indistinguishable from the inputimage, i.e., 𝑥 = (𝐺(𝑥)) ≈ 𝑥. Fig. 1(c) shows reverse training process forthe proposed DD-CycleGAN. An image y is taken as the input of thegenerator F to generate image 𝑥. The two discriminators DX1, and DX2,judge whether the image 𝑥 is either generated or real image. The image𝑦 is then generated by generator G, which also encourages the finaloutput be indistinguishable from the input image, i.e., 𝑦 = (𝐹 (𝑦)) ≈ 𝑦.

3.2. Formulation

In a traditional GAN, it may happen that two different images inthe source domain X are both mapped to the same image in the targetdomain Y. Pathak et al. (2016) show that adding a traditional lossfunction to the network can increase the effectiveness of the network.In this paper, we follow Zhu et al. to adopt L1 norm (Badrinarayananet al., 2017) to measure the cycle consistency loss, and the imagesmapped to the Y domain can still be mapped back to the X domainthrough the other generator, so that the final output looks more likea real picture and is more like the input image. Furthermore, ourmethod uses a different loss function that can better represent thestate of the network training, as can be seen in Fig. 2 that in thecontinuous iteration process of the generator, the L1 norm decreasescontinuously. Therefore, the dehazing effect of the image is also better

265


Fig. 1. (a) Our proposed DD-CycleGAN contains two generators: G: X → Y and F: Y → X, and four discriminators DX1, DX2, DY1 and DY2. (b) Forward training process takesimage x as input and output image 𝑥. (c) Backward training process takes image y as input and output image 𝑦.

Fig. 2. The L1 norm loss decreases steadily with the increasing number of iterations.

Fig. 3. The L1 norm loss decreases steadily with the increasing number of iterations.

(Arjovsky et al., 2017) because of the correlation of the error and theoutput picture. The L1 norm loss function is:

𝐿 (𝐺, 𝐹 ) = 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎(𝑥) ‖𝐹 (𝐺 (𝑥)) − 𝑥‖1 + 𝐸𝑦∼𝑃𝑑𝑎𝑡𝑎(𝑦) ‖𝐺 (𝐹 (𝑦)) − 𝑦‖1 (1)

where G and F denote two mapping function separately, and we arguethat these two functions should be cycle-consistent: for image x insource domain X, the translation can bring image x back to the sourceimage, i.e., x → G(x) → F(G(x)) ≈ x, and similarly for the input imagey, i.e., 𝐹 (𝐺 (𝑥)) ≈ 𝑥 and 𝐺(𝐹 (𝑦)) ≈ 𝑦.

In this paper, weight clipping technique is also used, instead ofcross-entropy in the traditional GANs. As can be seen in Fig. 3, theconvergence speed of the weight clipping is faster than the cross-entropy and the training process of the weight clipping is more stable.

In this paper, weight clipping is used to solve the problem learninginstability and increase the convergence rate in the traditional GAN

training. Each generation model corresponds to two discriminationmodels of the same structure to enhance the stability of the training andmake the effect of the image haze removal more ideal. For the generatorG and its corresponding discrimination networks DY1 and DY2, the lossfunction is defined as follows:

𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 1, 𝑋, 𝑌)

= 𝐸𝑦∼𝑃𝑑𝑎𝑡𝑎(𝑦)

[

𝐷𝑌 1 (𝑦)]

− 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎(𝑥)[

𝐷𝑌 1 (𝐺 (𝑥))]

(2)

Where mapping function G tries to make image G(x) look like as imagesfrom domain Y, at the same time, discriminator DY1 aims to distinguishthe image to be a real sample y or from a generated image G(x). G isexpected to minimize this function via DY1. We introduced a similarloss function for the discriminator DY2:

𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 2, 𝑋, 𝑌)

= 𝐸𝑦∼𝑃𝑑𝑎𝑡𝑎(𝑦)

[

𝐷𝑌 2 (𝑦)]

− 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎(𝑥)[

𝐷𝑌 2 (𝐺 (𝑥))]

(3)

For generator G, we express the objective loss as:

𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 1, 𝐷𝑌 2, 𝑋, 𝑌)

= 𝜆1𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 1, 𝑋, 𝑌)

+(

1 − 𝜆1)

𝐿𝐺𝐴𝑁 (𝐺,𝐷𝑌 2, 𝑌 ,𝑋)(4)

Similarly, the loss function is defined as follows for mapping functionF and its corresponding discriminating networks DX1 and DX2:

𝐿𝐺𝐴𝑁(

𝐹 ,𝐷𝑋1, 𝑌 ,𝑋)

= 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎(𝑥)

[

𝐷𝑋1 (𝑥)]

− 𝐸𝑦∼𝑃𝑑𝑎𝑡𝑎(𝑦)[

𝐷𝑋1 (𝐹 (𝑦))]

(5)

𝐿𝐺𝐴𝑁(

𝐹 ,𝐷𝑋2, 𝑌 ,𝑋)

= 𝐸𝑥∼𝑃𝑑𝑎𝑡𝑎(𝑥)

[

𝐷𝑋2 (𝑥)]

− 𝐸𝑦∼𝑃𝑑𝑎𝑡𝑎(𝑦)[

𝐷𝑋2 (𝐹 (𝑦))]

(6)

𝐿𝐺𝐴𝑁(

𝐹 ,𝐷𝑋1, 𝐷𝑋2, 𝑌 ,𝑋)

= 𝜆2𝐿𝐺𝐴𝑁(

𝐹 ,𝐷𝑋1, 𝑌 ,𝑋)

+(

1 − 𝜆2)

𝐿𝐺𝐴𝑁 (𝐹 ,𝐷𝑋2, 𝑌 ,𝑋) (7)

The final objective loss function is as follows:

𝐿(

𝐺, 𝐹 ,𝐷𝑋1, 𝐷𝑋2, 𝐷𝑌 1, 𝐷𝑌 2)

= 𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 1, 𝐷𝑌 2, 𝑋, 𝑌)

+ 𝐿𝐺𝐴𝑁(

𝐹 ,𝐷𝑋1, 𝐷𝑋2, 𝑌 ,𝑋)

+ 𝜆3𝐿(𝐺, 𝐹 ) (8)

Where 𝜆1, 𝜆2 and 𝜆3 are respectively the weights of 𝐿𝐺𝐴𝑁(

𝐺,𝐷𝑌 1, 𝑋, 𝑌)

,𝐿𝐺𝐴𝑁

(

𝐹 ,𝐷𝑋1, 𝑌 ,𝑋)

and 𝐿(𝐺, 𝐹 ). We will show the effect of the weightin Section 4.2.

In the original CycleGAN, it fixes the weight of the adversarial lossto 1.0 and the cycle consistency of two generators share the sameweight parameter. In addition, we find that the susceptibility to modecollapse is different when using different datasets, so we introduce 𝜆3as a hyperparameter.

Our final objective is

𝐺∗, 𝐹 ∗ = argmin𝐺,𝐹max𝐷𝑋1 ,𝐷𝑋2 ,𝐷𝑌 1 ,𝐷𝑌 2𝐿(

𝐺, 𝐹 ,𝐷𝑋1, 𝐷𝑋2, 𝐷𝑌 1, 𝐷𝑌 2)

(9)

Among them, generator G wants to minimize the objective loss func-tion, in contrast, discriminator D wants to maximize the objectivefunction.

266


Fig. 4. Architecture of the generator, including three parts: encoder, converter and decoder, different color blocks represent different operations. (For interpretation of the referencesto color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. Encoder layer of the discriminator, which continuously downsamples the images, the final output is the probability of the picture to determine whether the image is atrue sample or from the generator.

3.3. Network architecture

The CNN network structures of the generator model and the dis-criminator model used in this paper are as follow:

(1) Generator ArchitectureGenerator networks contains three parts: encoder, converter and de-

coder, which are composed of stride 2 convolutional layers, 6 residualblocks, and 3 fractionally-stride convolutional layers. The network isillustrated in Fig. 4.

The encoder performs a series of downsampling on the input image,then the converter transforms the feature vectors of the pictures in thesource domain to the feature vectors in the target domain, followedby a series of upsampling performed in the decoder, which achieveslow-level features from feature vectors.

(2) Discriminator ArchitectureFor the discriminator networks we use PatchGAN (Isola et al.,

2016), rather than the whole image, which aims to identify whether

each patch is generated by the generator or not. As shown in Isola et al.(2016), it is found that visually similar results can be created with 256× 256 receptive fields using ImageGAN and 70 × 70 receptive fieldsusing PatchGAN. Therefore, in this paper, we mainly experiment on 70× 70 PatchGAN. Then the average response values of all patches areobtained on the entire image as the final judgment of the image result.The network is illustrated in Fig. 5.

4. Experimental results and discussions

4.1. Training details

The experimental platform for this paper is implemented withPython3.5 and Tensor-Flow1.3. Note that the hardware configurationused is a server with 1.80 GHz, 64-core CPU, GPU GeForce GTX 1080Tiand 128 G of memory.

In our experiments, we use the method presented by Shrivastavaet al. (2017), updating the discriminators using history images rather

267


Fig. 6. The choice of learning rate. We verify the learning rate by 0.0001, 0.0002,and 0.0003 respectively.

Table 1Ablation study on O-HAZE.

CycleGAN + Weight clipping CycleGAN + Double Discriminator Ours

SSIM 0.7593 0.7642 0.7957PSNR 23.432 24.655 25.859

than the image generated by the latest generative networks. We store50 images generated previously in the image buffer. We have testedbased on CycleGAN, so parameters are identical to those in the orig-inal CycleGAN, and we have verified some parameters to ensure thebest results. So in all experiment presented, we set 𝜆1 = 0.5, 𝜆2 =0.5 (Durugkar et al., 2016), 𝜆3 = 10 (Zhu et al., 2017) and the Adamsolver (Kingma and Ba, 2014) is used in this paper with a batch sizeof 1. We conducted experiments and intercepted the 1000 steps of theexperiment to observe the effect of the choice of learning rate on theexperimental process. The results are shown in Fig. 6. At the beginning,the learning rate is 0.0002. At the first 100 epochs we keep the samelearning rate and at the next 100 epochs the rate decays to 0 linearly.

4.2. Dataset

We provide results of our experiments based on Realistic SingleImage Dehazing (RESIDE) dataset (Li et al., 2017b). RESIDE datasetcontains 50 high-resolution monocular videos (21260 frames) gener-ated from five different virtual worlds in urban settings under differentimage and weather conditions. At the same time, we also focused onO-HAZE dataset, it contains 45 pairs of real hazy and correspondinghaze-free outdoor images. Haze has been generated with a professionalhaze machine that imitates with high fidelity real hazy conditions.

4.3. The effect of discriminator in DD-CycleGAN

To illustrate the influences of different factors in the proposed DD-CycleGAN, two widely used metrics, PSNR and SSIM (Wang, 2004).They are adopted to evaluate the haze removal quality regardingsignal and structure similarities. Specifically, we evaluate the differencebetween the haze-free output and the corresponding ground truth viathese two metrics.

As can be seen in Fig. 7, we try different experiments based ondifferent networks. In Fig. 7(c), we use the weight clipping rather thancross entropy used in vanilla CycleGAN, the effect is good. In Fig. 7(d),we add the double discriminators to the vanilla CycleGAN, the effect ofhaze removal is further improved. In Fig. 7(e), we combine the abovemethods, and it can be seen in Table 1, our proposed DD-CycleGANmethod can achieve the highest performance, which verifies the fea-sibility of our method. In addition, an ablation studies are performedto demonstrate the improvements obtained by different modules in theproposed method.

Table 2Dehaze result on DD-CycleGAN without and with ResBlock.

DD-CycleGAN without ResBlock DD-CycleGAN with 6 ResBlocks

SSIM 0.7345 0.7957PSNR 18.769 25.859

Table 3Experimental result of 𝜆1 and 𝜆2.

𝜆1 = 0.3, 𝜆2 = 0.3 in loss function 𝜆1 = 0.5, 𝜆2 = 0.5 in loss function

SSIM 0.7721 0.7957PSNR 23.985 25.859

4.4. Different architecture of generator

In our method, we add 6 ResBlocks in the network. For purposeof proving the effectiveness of residual block, we experiment on DD-CycleGAN with 6 ResBlock and without ResBlock respectively, andcompare the experimental results.

As can be seen from Fig. 8, Fig. 8(c) is DD-CycleGAN withoutResBlock. Some of the haze is still not removed in the processingresults. Fig. 8(d) is DD-CycleGAN with ResBlock, and the results showgood dehaze effect and achieve higher PSNR and SSIM. It can be seenin Table 2, and DD-CycleGAN with Resblock can achieve the highestperformance.

4.5. Effect of the weight in loss function

Taking into account the advantages of Multi-Discriminator, we setthe weight in loss function from Durugkar et al. (2016). The 𝜆1 and 𝜆2is set to be 0.5 and 0.5 respectively which can balance the performanceof double discriminators when necessary.

Table 3 shows that the validation of our experiments further demon-strates the rationality of the parameter settings.

The experimental result can be seen in Table 3, when 𝜆1 = 0.5,𝜆2 = 0.5, the PSNR and SSIM are the highest.

4.6. Effect of optimizers in DD-CycleGAN

We experimented on three optimizers: SGDM, RMSProp and Adam,and the results can be seen as following.

As can be seen in Fig. 9 and Table 4, when we use DD-CycleGANwith Adam, the SSIM and PSNR are the highest, and it can remove hazebetter.

4.7. Quantitative evaluation

In this paper, we experiment on two different datasets. As canbe seen in Tables 5 and 6 whatever which dataset we choose, theperformance of our method is the best, as indicated by the achievedhighest PSNR and SSIM.

4.8. Image analysis

Fig. 10 shows a comparison of the dehazing results of the five meth-ods. It can be seen from the experimental results that the result underMSCNN is not good as others, and the results under DCP, DehazeNetand AOD-Net are not bright enough. But our proposed method still hasa good dehazing capability, which greatly enhances the contrast of theimage.

4.9. Test for realistic image

In this paper, DD-CycleGAN is also applied to the real scenes, and itcan also achieve good effects qualitatively, which proves the feasibilityof this method. (See Fig. 11.)

268


Fig. 7. Qualitative dehazing performance of images in ablation studies. From left to right: input hazy image, ground truth, CycleGAN + weight clipping, CycleGAN + Doublediscriminators, DD-CycleGAN (ours).

Fig. 8. Qualitative dehazing performance of images with and without resblock. From left to right are Input, Ground truth, DD-CycleGAN without ResBlock and DD-CycleGAN withResBlock.

Fig. 9. The effect of the optimizers in our method. From left to right: Input, Ground truth, DD-CycleGAN with SGDM (Wang, 2004), DD-CycleGAN with RMSProp and DD-CycleGANwith Adam.

269


Table 4The effect of optimizers.

DD-CycleGAN with SGDM DD-CycleGAN with RMSProp DD-CycleGAN with Adam

SSIM 0.7846 0.7812 0.7957PSNR 24.163 24.723 25.859

Table 5Dehazing results on RESIDE.

Dataset DCP MSCNN DehazeNet AOD-Net Ours

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

HSTS 16.62 0.8179 17.57 0.8102 21.14 0.8472 19.06 0.8504 25.0134 0.8853SOTS 14.84 0.7609 18.64 0.8168 0.9153 24.48 20.55 0.8973 28.4285 0.9344

Fig. 10. Dehaze results on RESIDE. From left to right are experiments under different methods as DCP, MSCNN, DehazeNet, AOD-Net and Ours.

Fig. 11. Dehaze results on RESIDE. From left to right are experiments under different methods as DCP, MSCNN, DehazeNet, AOD-Net and Ours.

Table 6Dehazing results on O-HAZE.

DCP MSCNN DehazeNet Ours

SSIM 0.735 0.765 0.666 0.7957PSNR 16.586 19.068 16.207 25.859

5. Conclusion

In this paper, we leverage CycleGAN and make improvementsbased on it for image dehazing, namely Double-Discriminator Cycle-Consistent Adversarial Network (DD-CycleGAN). CycleGAN learns aone-to-one mapping, which ensures all source hazy image to be mappedto a target corresponding haze-free image. This two-discriminatorarchitecture are better approximate the optimal discriminator andachieve better result in image with different haze concentrations andscene depths, and the reasonable prediction of the area with higherconcentration of haze is better than the current mainstream dehazemethods, which proves the feasibility of this method.

Acknowledgment

National Natural Science Foundation of China under Grant 61702322,Grant 61772328, and Grant 61801288.

References

Ancuti, Codruta O., et al., 2018. I-HAZE: a dehazing benchmark with real hazy andhaze-free indoor images. arXiv preprint arXiv:1804.05091.

Ancuti, Codruta O., et al., 2018. O-HAZE: a dehazing benchmark with real hazy andhaze-free outdoor images. arXiv preprint arXiv:1804.05101.

Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein gan, arXiv preprint arXiv:1701.07875.

Badrinarayanan, Vijay, Kendall, Alex, Cipolla, Roberto, 2017. Segnet: a deep convolu-tional encoder–decoder architecture for image segmentation. IEEE Trans. PatternAnal. Mach. Intell. 39 (12), 2481–2495.

Berman, D., Avidan, S., 2016. Non-local image dehazing. In: CVPR, pp. 1674–1682.Cai, B., Xu, X., Jia, K., Qing, C., Tao, D., 2016. Dehazenet: An end-to-end system for

single image haze removal. IEEE Trans. Image Process. 25 (11), 5187–5198.Chen, Xingyu, et al., 2017. Towards qualitative advancement of underwater machine

vision with generative adversarial networks, arXiv preprint arXiv:1712.00736.Dong, H., Neekhara, P., Wu, C., Guo, Y., 2017. Unsupervised image-to-image translation

with generative adversarial networks, arXiv preprint arXiv:1701.02676.Durugkar, Ishan, Gemp, Ian, Mahadevan, Sridhar, 2016. Generative multi-adversarial

networks, arXiv preprint arXiv:1611.01673.Efros, Alexei A., Leung, Thomas K., 1999. Texture synthesis by non-parametric sam-

pling. In: Computer Vision, 1999. The Proceedings of the Seventh IEEE InternationalConference on, vol. 2, IEEE, pp. 1033–1038.

270

http://arxiv.org/abs/1804.05091





http://refhub.elsevier.com/S0952-1976(19)30080-6/sb4

















Engin, Deniz, Genç, Anıl., Kemal, Ekenel Hazım, 2018. Cycle-Dehaze: EnhancedCycleGAN for single image dehazing. arXiv preprint arXiv:1805.05308.

Fattal, Raanan, 2008. Single image dehazing. ACM Trans. Graph. 27 (3).Fridadar, M., Diamant, I., Klang, E., et al., 2018. GAN-based synthetic medical

image augmentation for increased CNN performance in liver lesion classification.Neurocomputing.

Godard, Clément, Oisin, Mac Aodha, Brostow, Gabriel J., 2017. Unsupervisedmonocular depth estimation with left–right consistency. In: CVPR, vol. 2, (No. 6).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,Bengio, Y., et al., 2014. Generative adversarial nets. In: NIPS, pp. 2672–2680.

He, Kaiming, Jian, Sun, Xiaoou, Tang, 2011. Single image haze removal using darkchannel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33 (12), 2341–2353.

He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.-Y., 2016. Dual learning formachine translation. In: NIPS, pp. 820–828.

Hertzmann, Aaron, et al., 2001. Image analogies. In: Proceedings of the 28th annualconference on Computer graphics and interactive techniques. ACM, pp. 327–340.

Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A., 2016. Image-to-image translation withconditional adversarial networks, arXiv preprint arXiv:1611.07004.

Jean-Philippe, Tarel, Hautiere, Nicolas, 2009. Fast visibility restoration from a singlecolor or gray level image. In: Computer Vision, 2009 IEEE 12th InternationalConference on. IEEE, pp. 2201–2208.

Karacan, L., Akata, Z., Erdem, A., Erdem, E., 2016. Learning to generate imagesof outdoor scenes from attributes and semantic layouts, arXiv preprint arXiv:1612.00215.

Kim, Tae Keun, Joon, Ki Paik, Bong, Soon Kang, 1998. Contrast enhancement systemusing spatially adaptive histogram equalization with temporal filtering. IEEE Trans.Consum. Electron. 44 (1), 82–87.

Kingma, Diederik P., Ba, Jimmy, 2014. Adam: A method for stochastic optimization,arXiv preprint arXiv:1412.6980.

Li, C., Guo, J., Porikli, F., Guo, C., Fu, H., X, Li, 2017. DR-Net: Transmission steeredsingle image dehazing network with weakly supervised refinement, arXiv preprintarXiv:1712.00621.

Li, B., Peng, X., Wang, Z., Xu, J., Feng, D., 2017c. Aod-net: All-in-one dehazing network.In: IEEE International Conference on Computer Vision.

Li, B., Ren, W., Fu, D., et al., 2017b. Benchmarking single image dehazing and beyond.IEEE Trans. Image Process. PP (99), 492–505.

Ling, Z., Fan, G., Wang, Y., Lu, X., 2016. Learning deep transmission network for singleimage dehazing. In: Image Processing, ICIP, 2016 IEEE International Conference on.IEEE, pp. 2296–2300.

Long, Jonathan, Shelhamer, Evan, Darrell, Trevor, 2015. Fully convolutional networksfor semantic segmentation. In: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pp. 3431–3440.

Mathieu, Michael, Couprie, Camille, LeCun, Yann, 2015. Deep multi-scale videoprediction beyond mean square error, arXiv preprint arXiv:1511.05440.

Mathieu, M.F., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y., 2016. Disentanglingfactors of variation in deep representation using adversarial training. In: NIPS, pp.5040–5048.

Narasimhan, Srinivasa G., Nayar, Shree K., 2000. Chromatic framework for vision inbad weather. In: Computer Vision and Pattern Recognition, 2000. In: Proceedings.IEEE Conference on, vol. 1, IEEE, pp. 598–605.

Narasimhan, Srinivasa G., Nayar, Shree K., 2002. Vision and the atmosphere. Int. J.Comput. Vis. 48 (3), 233–254.

Narasimhan, Srinivasa G., Nayar, Shree K., 2003. Contrast restoration of weatherdegraded images. IEEE Trans. Pattern Anal. Mach. Intell. 25 (6), 713–724.

Nayar, Shree K., Narasimhan, Srinivasa G., 1999. Vision in bad weather. In: ComputerVision, 1999. The Proceedings of the Seventh IEEE International Conference on,vol. 2, IEEE, pp. 820–827.

Pathak, Deepak, et al., 2016. Context encoders: Feature learning by inpainting. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp. 2536–2544.

Rahman, Z.U., Jobson, D.J., Woodell, G. A, 2004. Retinex processing for automaticimage enhancement. J. Electron. Imaging 13 (1), 100–111.

Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., Yang, M.H., 2016. Single imagedehazing via multi-scale convolutional neural networks. In: European Conferenceon Computer Vision. Springer, Cham, pp. 154–169.

Ren, W., Ma, L., Zhang, J., et al., 2018. Gated Fusion Network for Single ImageDehazing.

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., 2016.Improved techniques for training GANs, arXiv preprint arXiv:1606.03498.

Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J., 2017. Scribbler: Controlling deep imagesynthesis with sketch and color. In: CVPR.

Shrivastava, Ashish, et al., 2017. Learning from simulated and unsupervised imagesthrough adversarial training. In: The IEEE Conference on Computer Vision andPattern Recognition, CVPR, vol. 3, (No. 4).

Stark, J. Alex, 2000. Adaptive image contrast enhancement using generalizations ofhistogram equalization. IEEE Trans. Image Process. 9 (5), 889–896.

Swami, Kunal, Das, Saikat Kumar, 2018. CANDY: Conditional adversarial networksbased fully end-to-end system for single image haze removal, arXiv preprint arXiv:1801.02892.

Tan, Robby T., 2008. Visibility in bad weather from a single image. In: CVPR, pp. 1–8.Vondrick, C., Pirsiavash, H., Torralba, A., 2016. Generating videos with scene dynamics.

In: NIPS, pp. 613–621.Wang, Z., 2004. Image quality assessment: Form error visibility to structural similarity.

IEEE Trans. Image Process. 13, 604–606.Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J., 2016. Learning a probabilistic

latent space of object shapes via 3d generative-adversarial modeling. In: NIPS, pp.82–90.

ying Xu, Tong, ming Peng, Ding, xing Wang, Wei, 2006. Improved HistogramEqualization Algorithm. Ordnance Industry Automation, pp. 58–59.

Xu, Runze, et al., 2017. Face transfer with generative adversarial network, arXivpreprint arXiv:1710.06090.

Yi, Z., Zhang, H., Gong, P.-T., 2017. DualGAN: Unsupervised dual learning forimage-to-image translation, arXiv preprint arXiv:1704.02510.

Yi, Zili, et al., 2017. Dualgan: Unsupervised dual learning for image-to-imagetranslation, arXiv preprint arXiv:1704.02510.

Yu, J., Lin, Z., Yang, J., et al., 2018. Generative Image Inpainting with ContextualAttention.

Zhang, Y.D., Muhammad, K., Tang, C., 2018. Twelve-layer deep convolutional neuralnetwork with stochastic pooling for tea category classification on GPU platform.Multimedia Tools Appl..

Zhang, H., Sindagi, V., Patel, V.M., 2017. Joint transmission map estimation anddehazing using deep networks, arXiv preprint arXiv:1708.00581.

Zhu, J.Y., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image translationusing cycle-consistent adversarial networks, arXiv preprint arXiv:1703.10593.

271




































































































Date post:	02-Oct-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	1 times

DD-CycleGAN: Unpaired image dehazing via Double-Discriminator...

Documents