+ All Categories
Home > Documents > Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking...

Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking...

Date post: 31-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
5
Benchmarking Differentially Private Residual Networks for Medical Imagery Sahib Singh *123 Harshvardhan D. Sikka *134 Sasikanth Kotti 15 Andrew Trask 16 Abstract In this paper we measure the effectiveness of - Differential Privacy (DP) when applied to med- ical imaging. We compare two robust differen- tial privacy mechanisms: Local-DP and DP-SGD and benchmark their performance when analyzing medical imagery records. We analyze the trade- off between the model’s accuracy and the level of privacy it guarantees, and also take a closer look to evaluate how useful these theoretical privacy guarantees actually prove to be in the real world medical setting. Introduction Hospitals and other medical institutions often have vast amounts of medical data which can provide significant value when utilized to advance research. However, this data is of- ten sensitive in nature and as such is not readily available for use in a research setting, often due to privacy concerns. We seek to find privacy preserving mechanisms so that medical facilities can better utilize their data while also maintaining the degree of privacy required. To facilitate secure model training we seek to compare two different approaches of Differential Privacy relevant to im- age data: Local-DP and DP-SGD. We benchmark their performances over the following publicly available medical datasets: 1) Chest X-Ray Images Pneumonia Detection Dataset (Kermany et al., 2018): The Chest X-Ray dataset consists of approximately 5,800 images sourced from chest radio- graphy, which is used by medical specialists to confirm pneumonia and other medical concerns though they are not often the sole point of diagnosis. Different radiographic * Equal contribution 1 OpenMined 2 Ford R&A 3 Manifold Computing Group 4 Harvard University 5 Indian Institute of Technology Jodhpur 6 University of Oxford. Correspondence to: Sahib Singh <[email protected]>, Harshvard- han D. Sikka <[email protected]>, Andrew Trask <an- [email protected]>. images taken at separate time intervals, such as before and during an illness, are often useful to physicians during the diagnosis. In general, these images form up an important part of an often multi-stage diagnosis process. The percent- age of deaths attributed to pneumonia and influenza is 8.2% in the United States, exceeding the threshold of epidemic classification, which is 7.2% (cdc, 2020). Recently, deaths due to pneumonia have sharply increased due to the world- wide presence of COVID-19 and the SARS-CoV-2 virus. The rapid construction and evaluation of relevant models to track, diagnose, or support the treatment and mitigation of the COVID-19 is critical given global circumstances. 2) APTOS 2019 Blindness Detection Dataset (Hospital, 2019): The dataset comprises of approximately 3600 train- ing images of retina taken using fundus photography under a variety of imaging conditions. These retinal images can help identify diabetic retinopathy automatically. Diabetic retinopathy (DR) refers to a diabetes complication affecting eyes. It’s caused by damage to the blood vessels of the light- sensitive tissue at the back of the eye (retina). A clinician has rated each image for the severity of diabetic retinopathy on a scale of 0 to 4 where 0 is the case of No DR and 4 is Proliferative DR (Figure 1). Given the urgent need for these developments and the inher- ently sensitive nature of medical data, training and evalu- ating models while maintaining obfuscation of critical per- sonal information in the data corpus is vital. This analysis of these DP mechanisms seeks to aid medical professionals better understand the tradeoff between accuracy and data pri- vacy, and serve as a useful reference to better evaluate how sensitive information must be preserved while still ensuring the data remains useful for research purposes. Furthermore the analysis will also be useful to evaluate which kind of DP Mechanism would be more relevant for the use case. We start off by elucidating the related research previously conducted in the DP space, followed by an introduction to the DP mechanisms as they pertain to the our findings. We then evaluate the relevant experiments conducted. Finally, we discuss the significance of these findings and explore future directions. arXiv:2005.13099v5 [cs.LG] 5 Sep 2020
Transcript
Page 1: Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking Differentially Private Residual Networks for Medical Imagery (a) Original Image (b)

Benchmarking Differentially Private ResidualNetworks for Medical Imagery

Sahib Singh * 1 2 3 Harshvardhan D. Sikka * 1 3 4 Sasikanth Kotti 1 5 Andrew Trask 1 6

AbstractIn this paper we measure the effectiveness of ε-Differential Privacy (DP) when applied to med-ical imaging. We compare two robust differen-tial privacy mechanisms: Local-DP and DP-SGDand benchmark their performance when analyzingmedical imagery records. We analyze the trade-off between the model’s accuracy and the level ofprivacy it guarantees, and also take a closer lookto evaluate how useful these theoretical privacyguarantees actually prove to be in the real worldmedical setting.

IntroductionHospitals and other medical institutions often have vastamounts of medical data which can provide significant valuewhen utilized to advance research. However, this data is of-ten sensitive in nature and as such is not readily available foruse in a research setting, often due to privacy concerns. Weseek to find privacy preserving mechanisms so that medicalfacilities can better utilize their data while also maintainingthe degree of privacy required.

To facilitate secure model training we seek to compare twodifferent approaches of Differential Privacy relevant to im-age data: Local-DP and DP-SGD. We benchmark theirperformances over the following publicly available medicaldatasets:

1) Chest X-Ray Images Pneumonia Detection Dataset(Kermany et al., 2018): The Chest X-Ray dataset consistsof approximately 5,800 images sourced from chest radio-graphy, which is used by medical specialists to confirmpneumonia and other medical concerns though they are notoften the sole point of diagnosis. Different radiographic

*Equal contribution 1OpenMined 2Ford R&A 3ManifoldComputing Group 4Harvard University 5Indian Institute ofTechnology Jodhpur 6University of Oxford. Correspondenceto: Sahib Singh <[email protected]>, Harshvard-han D. Sikka <[email protected]>, Andrew Trask <[email protected]>.

images taken at separate time intervals, such as before andduring an illness, are often useful to physicians during thediagnosis. In general, these images form up an importantpart of an often multi-stage diagnosis process. The percent-age of deaths attributed to pneumonia and influenza is 8.2%in the United States, exceeding the threshold of epidemicclassification, which is 7.2% (cdc, 2020). Recently, deathsdue to pneumonia have sharply increased due to the world-wide presence of COVID-19 and the SARS-CoV-2 virus.The rapid construction and evaluation of relevant models totrack, diagnose, or support the treatment and mitigation ofthe COVID-19 is critical given global circumstances.

2) APTOS 2019 Blindness Detection Dataset (Hospital,2019): The dataset comprises of approximately 3600 train-ing images of retina taken using fundus photography undera variety of imaging conditions. These retinal images canhelp identify diabetic retinopathy automatically. Diabeticretinopathy (DR) refers to a diabetes complication affectingeyes. It’s caused by damage to the blood vessels of the light-sensitive tissue at the back of the eye (retina). A clinicianhas rated each image for the severity of diabetic retinopathyon a scale of 0 to 4 where 0 is the case of No DR and 4 isProliferative DR (Figure 1).

Given the urgent need for these developments and the inher-ently sensitive nature of medical data, training and evalu-ating models while maintaining obfuscation of critical per-sonal information in the data corpus is vital. This analysisof these DP mechanisms seeks to aid medical professionalsbetter understand the tradeoff between accuracy and data pri-vacy, and serve as a useful reference to better evaluate howsensitive information must be preserved while still ensuringthe data remains useful for research purposes. Furthermorethe analysis will also be useful to evaluate which kind ofDP Mechanism would be more relevant for the use case.

We start off by elucidating the related research previouslyconducted in the DP space, followed by an introduction tothe DP mechanisms as they pertain to the our findings. Wethen evaluate the relevant experiments conducted. Finally,we discuss the significance of these findings and explorefuture directions.

arX

iv:2

005.

1309

9v5

[cs

.LG

] 5

Sep

202

0

Page 2: Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking Differentially Private Residual Networks for Medical Imagery (a) Original Image (b)

Benchmarking Differentially Private Residual Networks for Medical Imagery

(a) No DR (b) Mild DR (c) Moderate DR (d) Severe DR (e) Proliferative DR

Figure 1. Examples from the APTOS Blindness Detection Dataset. Samples progress in severity of Diabetic Retinopathy.

Related WorkIt’s worth noting that there are several techniques to achieveDifferential Privacy for images. The DP techniques used inthis experiment include Local-DP (LDP) which adds pertur-bations directly to the image, and DP-Stochastic GradientDescent (DP-SGD) which achieves the privacy guaranteesby adding noise to the gradient during the model trainingprocess.

Lately, there has also been research in applying DifferentialPrivacy during the test-time inference stage in deep learning.These include mechanisms such as Cloak (Mireshghallahet al., 2020) and ARDEN (Wang et al., 2018). There are alsowell understood cryptographic protocols such as Homomor-phic Encryption (HE) and Secure Multi-Party Computation(SMC) which could be used to encrypt data during train-ing and inference phases. However the computational costsassociated with these cryptographic techniques could be pro-hibitively expensive. A comprehensive list of such privacypreserving techniques can be found in Mirshghallah et al.(2020).

We would additionally like to acknowledge similar workwhich was done earlier in a more theoretical setting (Fan,2019). Our paper builds upon the work and applies it to-wards the Healthcare domain, and is relevant in MedicalResearch in particular. Other works relevant to general pri-vacy preserving techniques for health records include (Kimet al., 2018; Kaissis et al., 2020; Adam et al., 2007; Qayyumet al., 2020; Dankar & El Emam, 2013; Hossein et al., 2019;Faravelon & Verdier, 2010). Research related towards im-plementation techniques for secure deep learning include(Ryffel et al.; Dahl et al., 2018; Mireshghallah et al.).

PreliminariesIn this section we discuss the fundamental privacy preserv-ing concepts used throughout the paper.

Differential Privacy (DP). (Dwork et al., 2006a;b)

The central idea in differential privacy is the introductionof randomized noise to ensure privacy through plausibledeniability. Based on this idea, for ε ≥ 0, an algorithm Ais understood to satisfy Differential Privacy if and only iffor any pair of datasets that differ in only one element, the

following statement holds true.

P [A(D) = t] ≤ eε P [A(D′) = t] ∀t

Where D and D′ are differing datasets by at most one el-ement, and P [A(D) = t] denotes the probability that t isthe output by A. This setting approximates the effect ofindividual opt-outs by minimizing the inclusion effect of anindividuals data.

Local Differential Privacy (LDP). (Kasiviswanathanet al., 2011)

One major limitation of Differential Privacy is that the dataowners will have to trust a central authority, i.e. the databasemaintainer, to ensure their privacy. Hence in order to ensurestronger privacy guarantees we utilize the concept of LocalDifferential Privacy (Bebensee, 2019). We say that an algo-rithm π satisfies ε-Local Differential Privacy where ε > 0 ifand only if for any input v and v

′.

∀y ∈ Range(π) : P [π(v) = y] ≤ eε P [π(v′) = y]

For ε − LDP the privacy loss is captured by ε. Havingε = 0 ensures perfect privacy as e(0) = 1, on the otherhand, ε = ∞ provides no privacy guarantee. The choiceof ε is quite important as the increase in privacy risks isproportional to eε.

Differentially Private Stochastic Gradient Descent(DP-SGD). (Abadi et al., 2016)

Stochastic Gradient Descent (SGD) is an iterative methodfor optimizing differentiable objective functions. It updatesweights and biases by calculating the gradient of a loss func-tion on small batches of data. DP-SGD is a modificationof the stochastic gradient descent algorithm which providesprovable privacy guarantees. It is different from SGD be-cause it bounds the sensitivity of each gradient and is pairedwith a moments accountant algorithm to amplify and trackthe privacy loss across weight updates.

In order to ensure SGD is differentially private (i.e. DP-SGD), there are two modifications to be made to the originalSGD algorithm. First, the sensitivity of each gradient mustbe bounded. This is done by clipping the gradient in thel2 norm. Second, one applies random noise to the earlierclipped gradient, multiplies its sum by the learning rate, and

Page 3: Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking Differentially Private Residual Networks for Medical Imagery (a) Original Image (b)

Benchmarking Differentially Private Residual Networks for Medical Imagery

(a) Original Image (b) β = 1 (c) β = 2 (d) β = 4

Figure 2. Comparison of sample image from dataset before and after Local DP based Obfuscation.

then uses it to update the model parameters.

Laplace Distribution. (Dwork et al., 2014)

The Laplace distribution, also known as the double-exponential distribution, is a symmetric version of the ex-ponential distribution. The distribution centered at 0 (i.e.µ = 0) with scale β has the following probability densityfunction:

Lap(x|b) = 12β exp(−|x|β )

The variance of this distribution is σ2 = 2β2.

Laplace Mechanism. (Dwork et al., 2006b; 2014)

Laplace Mechanism independently perturbs each coordinateof the output with Laplace noise (from the Laplace distri-bution having mean zero) scaled to the sensitivity of thefunction.

Given ε ≥ 0 and a target function f , the Laplace Mechanismis the randomizing algorithm

Af (D) = f(D) + x

where x is random variable drawn from a Laplace distribu-tion , corresponding to perturbation. ∆f corresponds to theglobal sensitivity of function f , defined as over all datasetpairs that differ in only one element (D,D′).

Experimental ResultThe experiments discussed in this section used an 18-LayerResidual Network (ResNet) previously trained to achieveconvergence on the ImageNet task. ResNets share manyideas with the popular VGG architecture (Simonyan & Zis-serman, 2014; Szegedy et al., 2017), with significantly fewerfilters and overall decreased complexity. They make use ofidentity connections between sets of layers as a solution tothe common problem of gradient signals vanishing duringbackpropagation in very deep networks (He et al., 2016). Totrain the model on these images, some pre-processing stepswere undertaken. Input images passed to the deep neuralnetwork were scaled to 256× 256 pixels, and normalizedto 1. Therefore, the function f is the identity function andthe sensitivity ∆f is 1. The experiments are all carried out

Figure 3. Accuracy vs Epochs for Pneumonia Detection with LDP

using Python 3.8.2 and PyTorch 1.4.0.

For the process of Local-DP, 3 other versions of datasetswere generated by the addition of different perturbations tothe images. These alternate, differentially-private datasetswere generated by drawing random samples from the Lapla-cian Mechanism mentioned in (Dwork et al., 2006b) withµ = 0 and varying levels of scale i.e. β. These pertur-bations were added directly to the input image to create anoisy representation for subsequent training. These datasetswere used in different experiments to train the Resnet-18model to convergence. We trained separate instances of thepre-trained Resnet-18 model on these 3 image datasets, over50 epochs. The best models from these runs were saved andanalyzed. The tradeoff in accuracy with varying scales ofperturbations (β = 1;β = 2;β = 4) were examined.

In the case of DP-SGD, we make two main modificationsto the original SGD algorithm. We clip the gradient in thel2 norm, add random noise to it and then multiply it by thelearning rate before updating the model parameters. In asimilar manner to the LDP experiments, the best modelsfrom these runs were saved and analyzed. The tradeoff inaccuracy with varying scales of noise (β = 1;β = 2;β = 4)were examined.

Page 4: Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking Differentially Private Residual Networks for Medical Imagery (a) Original Image (b)

Benchmarking Differentially Private Residual Networks for Medical Imagery

Figure 4. Model Accuracy (%) — DP mechanism vs Dataset

Pneumonia Detection

Setup:

The experimental setup consisted of training the selectedmodel on an image classification task on the Chest X-Raydataset (Kermany et al., 2018). The LDP experiments fol-lowed the process outlined earlier, and resulted in 3 noisydatasets. Examples of perturbed images can be seen inFigure 2. The DP-SGD experiments involved the modi-fications outlined earlier, including gradient clipping andmultiplication by the learning rates prior to model parameterupdates. The ResNet-18 model has around 11M parametersand was pre-trained on ImageNet and trained using the SGDoptimizer, 0.01 learning rate, and batch size of 128. Thenetwork was run over 50 Epochs.

Results: (Figure 4) Best model accuracies on both the trainand test set from the 50 epoch runs were saved. On theoriginal Chest X-Ray dataset, the model achieved 99.76%accuracy on training data and 90.22% on the held out testset. We observe that training accuracy drops significantly,in the case of both LDP and DP-SGD, as we increase βfrom 1 to 4. On the other hand, we see that the test accuracydoesn’t reflect this trend. Results for a longer run of 120epochs are also included in Figure 3 as learning curves.

Blindness Detection

Setup: This experimental setup consisted of training themodel on the APTOS dataset described earlier. The sim-ulations were structured in the same way for both LDPand DP-SGD, using the same Resnet-18 model previouslytrained on Imagenet. Preprocessing, postprocessing, andevaluation steps were identical to the Pneumonia Detectionexperiments.

Results: (Figure 4) As before, the best model accuracies onboth the train and test set from the 50 epoch runs were saved.On the original APTOS dataset, the model achieved 98.39%accuracy on training data and 80.21% on the held out testset. Similar to the behaviour in the Pneumonia detectiontask, we observe that training accuracy drops significantlyin the case of both LDP and DP-SGD as we increase β from1 to 4, while the test accuracy doesn’t reflect this trend.

ConclusionIn this paper, we provided an empirical evaluation of a finetuned Resnet model trained on medically relevant classifica-tion tasks. We observe that the DP noise mechanisms leadto varied results with different perturbation levels and high-lights the inherent trade-offs in these decisions. As expected,we notice the model accuracy declines as we increase thelevel of privacy guarantees (β).

We noticed earlier in the experiments that while the trainingaccuracy dropped significantly as we increased β, the testaccuracy doesn’t change much. We also observe that insome cases the training accuracy is lower than the test ac-curacy (Figure 4). One possible explanation for this is thatLocal-DP adds noise to the training data, making the latentfeatures harder to learn. Later when we run the model onthe test data it performs better because the latent features arenow relatively easy to capture since the model has alreadylearned representations in a noisy scenario.

We observe that while Local-DP maintains the theoreticalguarantee of Differential Privacy, it does not always pro-vide the visual privacy we expect. This could be a problemduring the inference stage where passing an image with allsensitive features intact could cause security breaches. Thisprovides an interesting future direction to experiment withmore Local-DP techniques, particularly those which takeinto account sensitive features. Another interesting researchdirection includes benchmarking differential privacy mecha-nisms across various data modalities including audio files,video, text, tabular data, and various forms of cyber data.

Page 5: Benchmarking Differentially Private Residual Networks for … · 2020. 7. 21. · Benchmarking Differentially Private Residual Networks for Medical Imagery (a) Original Image (b)

Benchmarking Differentially Private Residual Networks for Medical Imagery

ReferencesCovidview week 13. Technical report, Centers for Disease

Control and Prevention(CDC), 2020.

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B.,Mironov, I., Talwar, K., and Zhang, L. Deep learningwith differential privacy. In Proceedings of the 2016 ACMSIGSAC Conference on Computer and CommunicationsSecurity, pp. 308–318, 2016.

Adam, N., White, T., Shafiq, B., Vaidya, J., and He, X.Privacy preserving integration of health care data. InAMIA Annual Symposium proceedings, volume 2007, pp.1. American Medical Informatics Association, 2007.

Bebensee, B. Local differential privacy: a tutorial. arXivpreprint arXiv:1907.11908, 2019.

Dahl, M., Mancuso, J., Dupis, Y., Decoste, B., Giraud,M., Livingstone, I., Patriquin, J., and Uhma, G. Privatemachine learning in tensorflow using secure computation.arXiv preprint arXiv:1810.08130, 2018.

Dankar, F. K. and El Emam, K. Practicing differentialprivacy in health care: A review. Trans. Data Priv., 6(1):35–67, 2013.

Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., andNaor, M. Our data, ourselves: Privacy via distributednoise generation. In Annual International Conferenceon the Theory and Applications of Cryptographic Tech-niques, pp. 486–503. Springer, 2006a.

Dwork, C., McSherry, F., Nissim, K., and Smith, A. Cali-brating noise to sensitivity in private data analysis. In The-ory of cryptography conference, pp. 265–284. Springer,2006b.

Dwork, C., Roth, A., et al. The algorithmic foundationsof differential privacy. Foundations and Trends R© inTheoretical Computer Science, 9(3–4):211–407, 2014.

Fan, L. Differential privacy for image publication. 2019.

Faravelon, A. and Verdier, C. Towards a framework forprivacy preserving medical data mining based on standardmedical classifications. In International Conference onElectronic Healthcare, pp. 204–211. Springer, 2010.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn-ing for image recognition. In Proceedings of the IEEEconference on computer vision and pattern recognition,pp. 770–778, 2016.

Hospital, A. E. Aptos 2019 diabetic retinopathydataset. 2019. URL https://www.kaggle.com/c/aptos2019-blindness-detection/data.

Hossein, K. M., Esmaeili, M. E., Dargahi, T., and khonsari,A. Blockchain-based privacy-preserving healthcare archi-tecture. In 2019 IEEE Canadian Conference of Electricaland Computer Engineering (CCECE), pp. 1–4, 2019.

Kaissis, G. A., Makowski, M. R., Ruckert, D., and Braren,R. F. Secure, privacy-preserving and federated machinelearning in medical imaging. Nature Machine Intelli-gence, pp. 1–7, 2020.

Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhod-nikova, S., and Smith, A. What can we learn privately?SIAM Journal on Computing, 40(3):793–826, 2011.

Kermany, D. S., Goldbaum, M., Cai, W., Valentim, C. C.,Liang, H., Baxter, S. L., McKeown, A., Yang, G., Wu, X.,Yan, F., et al. Identifying medical diagnoses and treatablediseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.

Kim, J. W., Jang, B., and Yoo, H. Privacy-preserving aggre-gation of personal health data streams. PloS one, 13(11):e0207639, 2018.

Mireshghallah, F., Taram, M., Jalali, A., Elthakeb, A. T.,Tullsen, D., and Esmaeilzadeh, H. Interpretable privacyfor deep learning inference.

Mireshghallah, F., Taram, M., Jalali, A., Elthakeb, A. T.,Tullsen, D., and Esmaeilzadeh, H. A principled approachto learning stochastic representations for privacy in deepneural inference. arXiv preprint arXiv:2003.12154, 2020.

Mirshghallah, F., Taram, M., Vepakomma, P., Singh, A.,Raskar, R., and Esmaeilzadeh, H. Privacy in deep learn-ing: A survey. arXiv preprint arXiv:2004.12254, 2020.

Qayyum, A., Qadir, J., Bilal, M., and Al-Fuqaha, A. Secureand robust machine learning for healthcare: A survey.arXiv preprint arXiv:2001.08103, 2020.

Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J.,Rueckert, D., and Passerat-Palmbach, J. A generic frame-work for privacy preserving deep learning. arxiv 2018.arXiv preprint arXiv:1811.04017.

Simonyan, K. and Zisserman, A. Very deep convolu-tional networks for large-scale image recognition. arXivpreprint arXiv:1409.1556, 2014.

Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.Inception-v4, inception-resnet and the impact of residualconnections on learning. In Thirty-first AAAI conferenceon artificial intelligence, 2017.

Wang, J., Zhang, J., Bao, W., Zhu, X., Cao, B., and Yu,P. S. Not just privacy: Improving performance of privatedeep learning in mobile cloud. In Proceedings of the 24thACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining, pp. 2407–2416, 2018.


Recommended