+ All Categories
Home > Documents > A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the...

A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the...

Date post: 26-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
A Bayesian Neural Net to Segment Images with Uncertainty Estimates and Good Calibration Rohit Jena and Suyash P. Awate (B ) Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India [email protected] Abstract. We propose a novel Bayesian decision theoretic deep-neural- network (DNN) framework for image segmentation, enabling us to define a principled measure of uncertainty associated with label probabilities. Our framework estimates uncertainty analytically at test time, unlike the state of the art that relies on approximate and expensive algorithms using sampling or multiple passes. Moreover, our framework leads to a novel Bayesian interpretation of the softmax layer. We propose a novel method to improve DNN calibration. Results on three large datasets show that our framework improves segmentation quality and calibration, and provides more realistic uncertainty estimates, over existing methods. Keywords: Image segmentation · Deep neural network · Bayesian decision theory · Generative model · Bayesian utility · Uncertainty · Calibration 1 Introduction Deep neural networks (DNNs) have been successful at image segmentation in radiology and digital pathology [2, 3, 9, 10, 1417]. Typical DNN methods propose new architectures, e.g., dual-path convolutional [10] or skip connected [3, 16, 17], or new loss functions, e.g., based on weighted cross entropy [14] or the Dice similarity coefficient (DSC) [15, 17], to improve performance. For clinical decision support relying on automated image segmentation, e.g., radiotherapy and neurosurgery, exposing the uncertainty in segmentation [4, 13] can lead to better informed decisions or better outcomes. It can also improve reliability in scientific studies. In DNN-based segmentation, the per-voxel label probabilities can be unreliable because of, e.g., poor quality of the data due to low contrast or high noise, high variability in the object structure, etc. In such cases, a clear unique “answer”, i.e., label probabilities, at a voxel, fails to S. P. Awate—Supported by: Wadhwani Research Centre for Bioengineering (WRCB) IIT Bombay, Department of Biotechnology (DBT) Govt. of India BT/INF/22/ SP23026/2017; Nvidia GPU Grant Program; Whiterabbit.ai Inc.; Aira Matrix. c Springer Nature Switzerland AG 2019 A. C. S. Chung et al. (Eds.): IPMI 2019, LNCS 11492, pp. 3–15, 2019. https://doi.org/10.1007/978-3-030-20351-1_1
Transcript
Page 1: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to SegmentImages with Uncertainty Estimates

and Good Calibration

Rohit Jena and Suyash P. Awate(B)

Computer Science and Engineering, Indian Institute of Technology Bombay,Mumbai, India

[email protected]

Abstract. We propose a novel Bayesian decision theoretic deep-neural-network (DNN) framework for image segmentation, enabling us to definea principled measure of uncertainty associated with label probabilities.Our framework estimates uncertainty analytically at test time, unlikethe state of the art that relies on approximate and expensive algorithmsusing sampling or multiple passes. Moreover, our framework leads to anovel Bayesian interpretation of the softmax layer. We propose a novelmethod to improve DNN calibration. Results on three large datasets showthat our framework improves segmentation quality and calibration, andprovides more realistic uncertainty estimates, over existing methods.

Keywords: Image segmentation · Deep neural network ·Bayesian decision theory · Generative model · Bayesian utility ·Uncertainty · Calibration

1 Introduction

Deep neural networks (DNNs) have been successful at image segmentation inradiology and digital pathology [2,3,9,10,14–17]. Typical DNN methods proposenew architectures, e.g., dual-path convolutional [10] or skip connected [3,16,17],or new loss functions, e.g., based on weighted cross entropy [14] or the Dicesimilarity coefficient (DSC) [15,17], to improve performance.

For clinical decision support relying on automated image segmentation, e.g.,radiotherapy and neurosurgery, exposing the uncertainty in segmentation [4,13]can lead to better informed decisions or better outcomes. It can also improvereliability in scientific studies. In DNN-based segmentation, the per-voxel labelprobabilities can be unreliable because of, e.g., poor quality of the data dueto low contrast or high noise, high variability in the object structure, etc. Insuch cases, a clear unique “answer”, i.e., label probabilities, at a voxel, fails to

S. P. Awate—Supported by: Wadhwani Research Centre for Bioengineering (WRCB)IIT Bombay, Department of Biotechnology (DBT) Govt. of India BT/INF/22/SP23026/2017; Nvidia GPU Grant Program; Whiterabbit.ai Inc.; Aira Matrix.

c© Springer Nature Switzerland AG 2019A. C. S. Chung et al. (Eds.): IPMI 2019, LNCS 11492, pp. 3–15, 2019.https://doi.org/10.1007/978-3-030-20351-1_1

Page 2: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

4 R. Jena and S. P. Awate

exist, but rather multiple answers are almost equally likely. We propose a novelBayesian decision theoretic DNN framework that leads to sound interpretationsof the DNN architecture and outputs, and, in turn, enables us to define andefficiently infer the variability/uncertainty over label probabilities per voxel.

A DNN classifier is well calibrated [8] if, for the subset of the data for whichthe DNN assigns the probability of being in (some) class C to be within theinterval (p − δ, p + δ) (termed “confidence”), the empirical fraction of the dataactually being in class C (termed “accuracy”) equals p. In practice, typical DNNclassifiers are poorly calibrated [8], producing class probabilities that are signif-icantly overestimated (near 1) or underestimated (near 0), even when the dataaround that voxel is very ambiguous. In the literature on DNN frameworks forimage segmentation, miscalibration has been largely ignored. In our Bayesianframework, we propose a novel method to improve calibration for segmentation.

This paper makes several contributions. We propose a novel DNN frameworkfor image segmentation rooted in generative modeling and Bayesian decisiontheory, which leads to a Bayesian interpretation of the DNN’s architecture andoutputs, including novel interpretation of the softmax layer. Furthermore, ourBayesian DNN framework enables a sound definition and efficient inference ofthe uncertainty in per-voxel label probabilities. We propose a novel method toimprove DNN calibration, while improving segmentation quality. Evaluation on3 large datasets, qualitatively and with a variety of quantitative measures, showsthat our framework outperforms the state of the art in improving segmentationquality and calibration, and providing realistic uncertainty estimates.

2 Related Work

Typical works in DNN-based image segmentation propose modifications to theDNN architecture or loss functions heuristically or without a statistical interpre-tation. Such approaches make it difficult to define a sound measure of uncertaintyof the resulting DNN outputs. In contrast, our novel Bayesian decision theoreticframework allows a clear statistical interpretation of label probabilities and theassociated uncertainties, at each voxel. Moreover, our framework (i) provides aBayesian interpretation of the popular softmax layer in DNNs and (ii) includesDSC based optimization through the principle of Bayesian utility.

In non-DNN based segmentation methods, to estimation uncertainty, while[4] uses traditional MCMC to sample nonparametric curves, [13] uses a Gaussian-process approximation for label distributions. [18] performs uncertainty estima-tion in superresolution by formulating it as a patchwise regression problem andusing variational dropout. Recently, [7] proposes perfect MCMC in probabilisticgraphical models to estimate uncertainty, while [1] model expected segmenta-tion per-voxel error in ensembles. On the other hand, this paper focuses onDNN frameworks for image segmentation.

A recent DNN-based method [11] provides uncertainty estimates usinga sampling-based approach, but their approach primarily focuses oncontinuous-valued regression tasks where they assume a Gaussian probability

Page 3: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 5

density function (PDF) on the DNN outputs, which can be a poor approxima-tion for discrete labels or label-probability vectors in segmentation tasks. Forclassification tasks, they append a softmax layer to the DNN and sample theresulting outputs. However, unlike their regression framework, their classifica-tion framework (i) entails estimating twice the number of parameters in thelast layer and (ii) does not penalize small variances explicitly, thereby riskingDNN overfitting or miscalibration. In contrast, we rely on a generative modelthat models DNN outputs as parameters that exactly model a multivariate PDFon the label-probability vectors. The mean and covariance of this PDF giveus label probabilities and their uncertainty analytically, without needing anyMonte Carlo approximations at test time. Another DNN approach estimates(epistemic) uncertainty based on dropout [5] that suffers from poor calibration,which can be improved by parameter tuning after a continuous relaxation ofthe discrete Bernoulli-distribution dropout scheme [6]. However, dropout-basedapproaches focus on epistemic uncertainty (that can be reduced using largertraining sets), instead of aleatoric uncertainty (stemming from noise and arti-facts in the test datum) that we retain focus on. Also, dropout-based methodsrequire MCMC approximation with multiple forward passes through the DNNat test time, unlike our approach that infers uncertainty analytically after a sin-gle forward pass. Furthermore, unlike the theoretical frameworks in [5,6,11], ourBayesian formulation mathematically derives the softmax layer as part of theDNN architecture.

A simple way of calibrating a DNN regressor is temperature scaling [8]. Whilethe literature on DNN-based image segmentation largely ignores the issue ofmiscalibration, we propose a novel scheme to improve calibration in our Bayesianframework by introducing an additional utility function.

3 Methods

Our formulation for DNN training and inference for image segmentation (i) relieson utility maximization in Bayesian decision theory, (ii) gives a mathematicallysound interpretation to DNN outputs to estimate uncertainty in per-voxel labelprobabilities, and (iii) introduces a utility-based scheme to improve calibration.

3.1 Our Bayesian DNN Framework: Modeling and Formulation

The random field X models an acquired intensity image comprising V voxels.The random field Z models an expert-labeled discrete segmentation indicatingthe presence of one of K objects at each voxel. At voxel v, we represent the pres-ence of the k-th object using a K-length vector Zv having its k-th componentZvk as 1 and all other components as 0. Any single expert-rated segmentationZ is typically imperfect because of intra-rater and inter-rater variability stem-ming from imaging artifacts and human errors in the labeling. Thus, our frame-work theoretically allows for multiple expert segmentations associated with eachacquired image X. The random field Y models the true discrete segmentation

Page 4: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

6 R. Jena and S. P. Awate

Fig. 1. Our Bayesian decision theoretic DNN framework to estimate uncertainty withsegmentation. DNN components φ(·) and ψ(·) are generic. Green boxes ≡ our noveltheoretical analysis to estimate per-voxel uncertainty. (Color figure online)

that is unknown. The joint distribution P (X,Y ;Ω) models statistical dependen-cies between acquired images X and their true segmentations Y , where Ω is theset of real-valued DNN parameters. We formulate a novel Bayesian DNN frame-work that, for a given x, outputs distributions on the label-probability imagesand, in turn, the true segmentations Y . The training set {(xn, zn)}N

n=1 has Nimage pairs (xn, zn), one pair for every x and each of its expert segmentationsz (Fig. 1).

Learning to Maximize Bayesian Utility. For DNN training, we propose theBayesian decision strategy of maximizing utility. For an acquired image x, itstrue segmentation y should be close to its expert segmentation(s) z. We measurethe utility of an estimated segmentation y by the multi-label DSC DSCK(y, z)between y and z. If the training set provides multiple expert segmentations zfor a single x, then our formulation would effectively add such utilities over allavailable z. The multi-label DSC is DSCK(Y,Z) :=

∑Kk=1 DSC(Yk, Zk), where Yk

and Zk are binary label images indicating the k-th object. For gradient-basedtraining, we use the differentiable DSC as DSC(Yk, Zk) := (2

∑Vv=1 YvkZvk +

ε)/(∑V

v=1 Yvk +∑V

v=1 Zvk + ε) [15], with a fixed small ε ∈ R≥0 for numeri-cal stability; we set ε = 1. Because our DNN output indicates a distributionP (Y |x;Ω) of true segmentations, we define an expected utility with respect toz as EP (Y |x;Ω)[DSCK(Y, z)]. We train the DNN by optimizing parameters Ω tomaximize the empirical expected utility over the training set as

arg maxΩ

N∑

n=1

EP (Y |xn;Ω)[DSCK(Y, zn)]. (1)

Bayesian DNN Model, Formulation, and Architecture. We formulate anovel DNN framework that produces, at each voxel v, a PDF on label-probabilityvectors Θv ∈ R

K , where each component Θvk ∈ (0, 1) and∑K

k=1 Θvk = 1. ThisPDF indicates the variability in the probability-vector outputs Θv at each voxelv, as a function of the acquired image X and the DNN parameters Ω. A small

Page 5: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 7

variability indicates a lower uncertainty on the label probabilities.We propose to model the PDF on Θv, at each voxel v, by a Dirichlet PDF

parametrized by the positive concentration vector αv ∈ RK>0. The per-voxel

Dirichlet PDF is P (Θv|αv) :=∏K

k=1(Θvk)αvk−1/B(αv), where the Beta functionB(αv) :=

∏Kk=1 Γ (αvk)/Γ (

∑Kk=1 αvk) with Γ (·) as the gamma function. Subse-

quently, we propose the measure of uncertainty in the per-voxel label-probabilityvector Θv as the square root of the trace of the covariance matrix, which evaluatesto the square root of (α2

v0 −∑Kk=1 α2

vk)/(α2v0(1+αv0)), where αv0 :=

∑Kk=1 αvk.

We model our DNN framework to output the values αk as a combination oftwo parts in sequence. The first part is modeled by a set of real-valued parametersw, underlying the deep architecture, and represents a nonlinear transformationφ(X,ω) that performs feature extraction on the acquired intensity image X. Thesubsequent part takes the features φ(x, ω) and produces K images, where thek-th image is the same size as the acquired image x and is modeled by thetransformation ψ(·, ωk) using real-valued parameters ωk. Let the DNN parame-ters Ω := {w, {ωk}K

k=1}. Let [ψ(φ(x, ω), ωk)]v denote the v-th voxel in the k-thtransformed image. At voxel v, we model the Dirichlet PDF Dir(Θv|αv) withparameters αvk := exp([ψ(φ(x, ω), ωk)]v), ensuring αvk > 0. This gives

P (Θ|α) :=V∏

v=1

P (Θv|αv) :=V∏

v=1

Dir(Θv|αv) =V∏

v=1

1B(αv)

K∏

k=1

(Θvk)αvk−1. (2)

The PDF P (Θv|αv) generates label-probability vectors θv that, in turn, gen-erate discrete labels yv, at voxel v. We model P (Yv|Θv) as a categorical distri-bution Cat(Yv|Θv), parametrized by the probability vector Θv, on the one-hotvector Yv that indicates the the true (latent) segmentation at voxel v. Thus,

P (Y |Θ) :=V∏

v=1

P (Yv|Θv) :=V∏

v=1

Cat(Yv|Θv) =V∏

v=1

K∏

k=1

(Θvk)Yvk . (3)

We have a generative model for the true segmentation Y starting from theacquired image X, i.e., (i) map X to α, (ii) then draw θ ∼ P (Θ|α(X,Ω)),and (iii) then draw y ∼ P (Y |θ). We propose to model P (Y |X) by treatingΘ as a hidden random variable and marginalizing it out. Thus, we simplifyP (Y |X) = P (Y |α(X,Ω)) =

∫θP (Y |θ)P (θ|α(X,Ω))dθ. P (Θ|α) factors into per-

voxel Dirichlet PDFs that are conjugate “priors” to the categorical distributionfactors in P (Y |Θ). Thus, we can model per-voxel “posterior” factors P (Yv|αv) =

θv

P (Yv|θv)P (θv|αv)dθv =∫

θv

∏Kk=1(θvk)αvk+Yvk−1dθv

B(αv)=

B(αv + Yv)B(αv)

. (4)

This makes P (Yv = [1, 0, · · · , 0]|αv) = αv1/∑K

k=1 αvk that equals the softmaxvalue exp([ψ(φ(x, ω), ω1)]v)/

∑Kk=1 exp([ψ(φ(x, ω), ωk)]v). For other one-hot vec-

tors Yv, similar expressions hold, which are the outputs of the softmax function

Page 6: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

8 R. Jena and S. P. Awate

applied to the DNN layer giving [ψ(φ(x, ω), ωk)]v at voxel v. Thus, we derive

P (Y |X;Ω) =V∏

v=1

P (Yv|αv(X,Ω)) =V∏

v=1

Cat(Yv;βv(X,Ω)), (5)

where βv is a (softmax) label-probability K-vector with βvk := αvk/∑K

k=1 αvk.

3.2 Our Bayesian DNN Training

DNN training optimizes parameters Ω to maximize empirical expected utility:

maxΩ

N∑

n=1

EP (Y |xn;Ω)[DSCK(Y, zn)] ≈ maxΩ

N∑

n=1

S∑

s=1

DSCK(yns(xn, Ω), zn), (6)

where we evaluate the intractable expectation by Monte-Carlo integration sam-pling discrete segmentations yns drawn (easily) from

∏Vv=1 Cat(Yv|βv(xn, Ω)).

To use gradient-based optimization for the DNN parameters Ω, we need anappropriate representation of the sampled segmentations yns involving theacquired image xn and parameters Ω. We can sample exactly from a categoricaldistribution with parameters {βvk ∈ R

K>0}K

k=1 by (i) drawing {gvk}Kk=1 indepen-

dently from a Gumbel PDF with location 0 and scale 1, and then (ii) takingarg maxk(log βvk + gvk) [12]. Further, we can approximate the non-differentiablearg maxk(·) function by the softmax function to give a K-length representa-tion of the categorical variable whose k-th component equals exp((log βvk +gvk)/τ)/(

∑Kk=1 exp((log βvk + gvk)/τ)), where τ ∈ R>0 is a free parameter

that balances the fidelity of the approximation with the ease of differentiability.The aforementioned softmax fraction equals exp(([ψ(φ(xn, ω), ωk)]v + gvk)/τ)/∑K

k=1 exp(([ψ(φ(xn, ω), ωk)]v + gvk)/τ). So, our training formulation is

arg maxΩ

N∑

n=1

S∑

s=1

DSCK(yns(xn, Ω), zn), where

∀v, k : ynsvk :=

exp(([ψ(φ(xn, ω), ωk)]v + gnsvk)/τ)

∑Kk=1 exp(([ψ(φ(xn, ω), ωk)]v + gns

vk)/τ); gns

vk ∼ Gum(0, 1). (7)

We tune τ using annealing. We start training to optimize Ω with a larger τ thatmakes the objective function smoother. Subsequently, we decrease τ , initializeΩ with the solution obtained in the previous step, and again optimize for Ω.A few annealing iterations suffice in practice. In our experiments, we start withτ = 10 and linearly decrease it to τ = 0.1 during training.

Training to Improve Calibration. To make our DNN well calibrated, duringoptimization of the parameters Ω, we introduce another utility function focus-ing on the intermediate-layer DNN outputs [ψ(φ(x, ω), ωk)]v, to prevent thesevalues from having large magnitudes. Larger magnitudes of these intermediate-layer outputs typically lead to larger discrepancies between magnitudes of the

Page 7: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 9

Dirichlet parameters αvk, which, in turn, lead to generated probability vec-tors θvk that severely overestimate or underestimate class probabilities at voxelv, causing miscalibration. We propose to measure the utility of the DNNparameters/model by (i) not only being able to lead to high DSCs with theexpert segmentations, but also (ii) being well calibrated. Thus, in addition toDSCK(y(x,Ω), z), we introduce an additional utility term on the DNN model,i.e., λ

∑Vv=1

∑Kk=1([ψ(φ(x, ω), ωk)]v)2, to improve calibration, where the weight-

ing factor λ ∈ R>0 is a free parameter that we tune by cross validation. A valueof λ ranging from 5 × 10−4 to 5 × 10−3 were sufficient to provide good calibra-tion results on the validation set without hampering the other scores. Section 4shows that the augmented utility in the objective function improves calibrationwithout reducing DSC between the predicted and expert segmentations.

3.3 Our Bayesian DNN Inference

After training our DNN model to optimize Ω, we apply the DNN on a newacquired image x0 to estimate (i) its probabilistic segmentation and (ii) theuncertainty associated with the probabilistic segmentation. Given x0, we com-pute the Dirichlet parameters αvk := exp([ψ(φ(x0, ω), ωk)]v) that model the dis-tribution on probability vectors Θv as Dir(Θv;αv). This Dirichlet distributionleads, at each voxel v, to (i) the label-probability estimate that is the DirichletPDF’s mean vector, i.e., the vector whose k-th component equals αvk/

∑Kk=1 αvk

that is the same as the softmax output βvk, and (ii) the uncertainty esti-mate that is the square-root of the trace of the covariance matrix for DirichletPDF on Θv.

4 Results and Discussion

We compare our framework with a recent DNN-based uncertainty estima-tion method [11] as the baseline. We evaluate our framework, quantita-tively and qualitatively, on 3 large publicly available datasets: (i) braintumor segmentation (BraTS 2017) in 3D magnetic resonance images (MRI)(braintumorsegmentation.org) with K = 2 classes (background versus wholetumor) on both high-grade and low-grade glioma subjects, (ii) organ segmenta-tion in 2D chest radiographs (db.jsrt.or.jp/eng.php) with K = 6 classes, and(iii) cell membrane segmentation in 2D transmission electron microscopy images(brainiac2.mit.edu/isbi challenge) with K = 2 classes. We partition theavailable data, i.e., images and their segmentations, as follows: (i) 55% into atraining set, (ii) 10% into a validation set, to tune the DNN free parameters,and (iii) 35% into a test set. We use a spectrum of 14 quantitative measuresto evaluate performance: (i) 11 to evaluate the quality of discrete segmentationsobtained by assigning a voxel to the segment with the maximum probability,i.e., DSC (on soft segmentations), Jaccard, Hausdorff distance between pre-dicted and expert-labeled segment boundaries, negative log likelihood (NLL),NLL-weighted (NLLw) that introduces weights to re-balance varying fractions

Page 8: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

10 R. Jena and S. P. Awate

(a1)image(b1)expert(c1)ours:seg(d1)ours:unc(e1)base:seg(f1)base:unc

(a2)image(b2)expert(c2)ours:seg(d2)ours:unc(e2)base:seg(f2)base:unc

(a3)image(b3)expert(c3)ours:seg(d3)ours:unc(e3)base:seg(f3)base:unc

(g)

Fig. 2. Results: brain MRI tumor segmentation. (a1)–(a3) Test image. (b1)–(b3) Expert segmentation. (c1)–(c3) Our segmentation. (d1)–(d3) Our uncertainty(e1)–(e3) Baseline segmentation [11]. (f1)–(f3) Baseline uncertainty [11]. (g) Quantita-tive measures with mean, standard deviation (std), median, 25-th percentile (25per),75-th percentile (75per) across test set; green entry indicates significant improvement(≥10%) over other method, red entry indicates opposite, white entry indicates bothmethods give perform equally well. (Color figure online)

of voxels in classes, precision, recall, F1-score (on binary segmentations), falsepositive rate (FPR), false negative rate (FNR), Cohen’s kappa coefficient; and(ii) 3 to evaluate calibration performance, i.e., expected calibration error (ECE),maximum calibration error (MCE), and average calibration error (ACE) [8],which we compute as one scalar over the entire test set, to ensure sufficient sam-ple sizes in each bin of the confidence-accuracy plot. We evaluate uncertaintyestimates qualitatively. We choose the same DNN architecture for modeling ψ(·)and φ(·) for both methods. Both methods use the Adam optimizer with learningrate 10−3, with a small batch size of 4 and instance normalization, with 2000iterations for the chest and cell datasets and 8000 iterations for the brain dataset.

Page 9: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 11

(a1)image (b1)expert (c1)ours:seg(d1)ours:unc(e1)base:seg(f1)base:unc

(a2)image (b2)expert (c2)ours:seg(d2)ours:unc(e2)base:seg(f2)base:unc

(a3)image (b3)expert (c3)ours:seg(d3)ours:unc(e3)base:seg(f3)base:unc

(g)

Fig. 3. Results: cell membrane microscopy segmentation. (a)–(g) Analogous to thedescriptions in Fig. 2(a)–(g). Images are zoomed-into regions.

Brain MRI Tumor Segmentation. We use the Wnet architecture for ψ(·)and φ(·), training on a 3D block of 19 axial slices as in [19]. The results(Fig. 2) show that our method, with its principled Bayesian design producinganalytical estimates of uncertainty and a novel utility improving calibration,outperforms the baseline in estimating not only uncertainty, but also segmen-tations. The baseline, overall, significantly overestimates the uncertainty, andincorrectly/undesirably indicate high uncertainty in regions far from the tumor(Fig. 2(f1)–(f3)), whereas our method correctly exhibits uncertainty (i) nearobject boundaries and (ii) ambiguous regions (Fig. 2(d2)) where the expertsegmentation (Fig. 2(b2)) seems to mismatch with the image data (Fig. 2(a2))in the region that likely exhibits partial voluming of cerebrospinal fluid andtumor. The baseline also misses big parts of some tumors (Fig. 2(e1)–(e3)). Ourmethod clearly improves segmentation quality, over the baseline, quantitatively(Fig. 2(g)). Improved uncertainty estimates of our method are closely related tosignificantly improved quantitative calibration measures (Fig. 6).

Page 10: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

12 R. Jena and S. P. Awate

(a1)image (b1)expert (c1)ours:seg(d1)ours:unc(e1)base:seg(f1)base:unc

(a2)image (b2)expert (c2)ours:seg(d2)ours:unc(e2)base:seg(f2)base:unc

(a3)image (b3)expert (c3)ours:seg(d3)ours:unc(e3)base:seg(f3)base:unc

Fig. 4. Results: chest radiographs organ segmentation. (a)–(f) Analogous to thedescriptions in Fig. 2(a)–(f).

Cell Membrane Microscopy Segmentation. We use Unet to model ψ(·)and φ(·). Our uncertainty estimates (Fig. 3(d1)–(d3)) are far more realistic inregions where the cell membrane location is clearly ambiguous, whereas thebaseline underestimates the uncertainty (Fig. 3(f1)–(f3)). Our segmentations(Fig. 3(c1)–(c3)) have fewer false positives and false negatives. Quantitatively,our method performs significantly better for many more segmentation-qualitymeasures (Fig. 3(g)) and all calibration measures (Fig. 6).

Chest Radiographs Organ Segmentation. We use a modified Wnet for 2Dimages to model ψ(·) and φ(·). Unlike our method that gives high uncertaintyaround organ boundaries, the baseline gives high uncertainty inside organs, e.g.,inside lungs showing the rib structure. When our method makes segmentationerrors (Fig. 4(c3); left lung), our uncertainty is also high in that region; how-ever, when the baseline makes segmentation errors (Fig. 4(e1), (e2); heart), theuncertainty dangerously stays low. The baseline performs worse quantitativelyon overall segmentation quality (Fig. 5) and calibration (Fig. 6).

Calibration. Our method’s confidence and accuracy values have far less dis-crepancy (Fig. 6(a2)–(c2)), unlike the baseline. Our lower calibration errors(Fig. 6(d)) agree with our visually better uncertainty maps shown before.

Conclusion. We proposed a novel Bayesian DNN framework to segment images,enabling us to define a principled measure of uncertainty, associated with thelabel probabilities. We estimate uncertainty analytically at test time, withoutneeding approximate or expensive algorithms. Our framework naturally derives

Page 11: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 13

(g1) Lungs

(g2) Heart

(g3) Clavicles

Fig. 5. Results: chest radiographs segmentation. (g1)–(g3) Analogous to the descrip-tions in Fig. 2(g), for the lung, heart, and clavicles, respectively.

the need for a softmax layer. We propose a novel method to improve calibrationof the segmentation. Results show that our framework improves segmentationquality and calibration, and provides more realistic uncertainty estimates.

Page 12: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

14 R. Jena and S. P. Awate

(a1) Baseline: Brain (b1) Baseline: Chest (c1) Baseline: Cells

(a2) Ours: Brain (b2) Ours: Chest (c2) Ours: Cells

(d) Calibration quality measures: both methods, three datasets

Fig. 6. Results: calibration performance. Confidence-accuracy plots for baseline andour methods, respectively, for (a1)–(a2) brain, (b1)–(b2) chest, and (c1)–(c2) celldatasets. (d) Calibration quality measures.

References

1. Awate, S.P., Whitaker, R.: Multiatlas segmentation as nonparametric regression.IEEE Trans. Med. Imag. 33(9), 1803–1817 (2014)

2. Brosch, T., Yoo, Y., Tang, L.Y.W., Li, D.K.B., Traboulsee, A., Tam, R.: Deep con-volutional encoder networks for multiple sclerosis lesion segmentation. In: Navab,N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol.9351, pp. 3–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4 1

3. Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importanceof skip connections in biomedical image segmentation. In: Carneiro, G., et al. (eds.)LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 179–187. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46976-8 19

4. Fan, A.C., Fisher, J.W., Wells, W.M., Levitt, J.J., Willsky, A.S.: MCMC curvesampling for image segmentation. In: Ayache, N., Ourselin, S., Maeder, A. (eds.)MICCAI 2007. LNCS, vol. 4792, pp. 477–485. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75759-7 58

5. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representingmodel uncertainty in deep learning. In: Advances in Neural Information ProcessingSystems (2016)

6. Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Informa-tion Processing Systems, pp. 3584–3593 (2017)

Page 13: A Bayesian Neural Net to Segment Images with Uncertainty ... · training, we use the differentiable DSC as DSC(Y k,Z k):=(2 V v=1 Y vkZ vk +)/(V v=1 Y vk + V v=1 Z vk + )[15], with

A Bayesian Neural Net to Segment Images 15

7. Garg, S., Awate, S.P.: Perfect MCMC sampling in Bayesian MRFs for uncer-tainty estimation in segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos,C., Alberola-Lopez, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp.673–681. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1 76

8. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.: On calibration of modern neuralnetworks. In: International Conference on Machine Learning, pp. 1321–1330 (2017)

9. Chen, H., Qi, X., Yu, L., Heng, P.A.: DCAN: deep contour-aware networks foraccurate gland segmentation. In: IEEE Computer Vision and Pattern Recognition,pp. 2487–2496 (2016)

10. Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med.Imag. Anal. 35, 18–31 (2017)

11. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learningfor computer vision? In: Advances in Neural Information Processing Systems, pp.5580–5590 (2017)

12. Kingma, D., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (2013)13. Le, M., Unkelbach, J., Ayache, N., Delingette, H.: Sampling image segmentations

for uncertainty quantification. Med. Imag. Anal. 34, 42–51 (2016)14. Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object

detection. In: International Conference on Computer Vision (2017)15. Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks

for volumetric medical image segmentation. In: International Conference on 3DVision, pp. 565–571 (2016)

16. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomed-ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-24574-4 28

17. Shah, M.P., Merchant, S.N., Awate, S.P.: MS-net: mixed-supervision fully-convolutional networks for full-resolution segmentation. In: Frangi, A.F., Schn-abel, J.A., Davatzikos, C., Alberola-Lopez, C., Fichtinger, G. (eds.) MICCAI 2018.LNCS, vol. 11073, pp. 379–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3 44

18. Tanno, R., et al.: Bayesian image quality transfer with CNNs: exploring uncertaintyin dMRI super-resolution. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin,P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 611–619.Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7 70

19. Wang, G., Li, W., Ourselin, S., Vercauteren, T.: Automatic brain tumor segmenta-tion using cascaded anisotropic convolutional neural networks. In: Crimi, A., Bakas,S., Kuijf, H., Menze, B., Reyes, M. (eds.) BrainLes 2017. LNCS, vol. 10670, pp.178–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75238-9 16


Recommended