Scalable Plug-and-Play ADMM with Convergence Guarantees · data-ﬁdelity terms and deep neural net...

Scalable Plug-and-Play ADMMwith Convergence Guarantees

Yu SunWashington University in St. Louis

[email protected]

Zihui WuWashington University in St. Louis

[email protected]

Brendt WohlbergLos Alamos National Laboratory

[email protected]

Ulugbek S. KamilovWashington University in St. Louis

[email protected]

Abstract

Plug-and-play priors (PnP) is a broadly applicable methodology for solving inverseproblems by exploiting statistical priors specified as denoisers. Recent work hasreported the state-of-the-art performance of PnP algorithms using pre-trained deepneural nets as denoisers in a number of imaging applications. However, currentPnP algorithms are impractical in large-scale settings due to their heavy computa-tional and memory requirements. This work addresses this issue by proposing anincremental variant of the widely used PnP-ADMM algorithm, making it scalableto large-scale datasets. We theoretically analyze the convergence of the algorithmunder a set of explicit assumptions, extending recent theoretical results in thearea. Additionally, we show the effectiveness of our algorithm with nonsmoothdata-fidelity terms and deep neural net priors, its fast convergence compared toexisting PnP algorithms, and its scalability in terms of speed and memory.

1 Introduction

Plug-and-play priors (PnP) is a simple yet flexible methodology for imposing statistical priors withoutexplicitly forming an objective function [1, 2]. PnP algorithms alternate between imposing dataconsistency by minimizing a data-fidelity term and imposing a statistical prior by applying an additivewhite Gaussian noise (AWGN) denoiser. PnP draws its inspiration from the proximal algorithmsextensively used in nonsmooth composite optimization [3], such as the proximal-gradient method(PGM) [4–7] and alternating direction method of multipliers (ADMM) [8–11]. The popularity of deeplearning has led to a wide adoption of PnP for exploiting learned priors specified through pre-traineddeep neural nets, leading to its state-of-the-art performance in a variety of applications [12–16].Its empirical success has spurred a follow-up work that provided theoretical justifications to PnPin various settings [17–23]. Despite this progress, current PnP algorithms are not practical foraddressing large-scale problems due to their computation time and memory requirements. To thebest of our knowledge, the only prior work on developing PnP algorithms that are suitable forlarge-scale problems is the stochastic gradient descent variant of PnP (PnP-SGD), whose fixed-pointconvergence was recently analyzed for smooth data-fidelity terms [20].

In this work, we present a new incremental PnP-ADMM (IPA) algorithm for solving large-scaleinverse problems. As an extensions of the widely used PnP-ADMM [1,2], IPA can integrate statisticalinformation from a data-fidelity term and a pre-trained deep neural net. However, unlike PnP-ADMM,IPA can effectively scale to datasets that are too large for traditional batch processing by using asingle element or a small subset of the dataset at a time. The memory and per-iteration complexityof IPA is independent of the number of measurements, thus allowing it to deal with very large

Preprint. Under review.

arX

iv:2

006.

0322

4v1

[cs

.LG

] 5

Jun

202

0

datasets. Additionally, unlike PnP-SGD [20], IPA can effectively address problems with nonsmoothdata-fidelity terms, and generally has faster convergence. We present a detailed convergence analysisof IPA under a set of explicit assumptions on the data-fidelity term and the denoiser. Our analysisextends the recent fixed-point analysis of PnP-ADMM in [23] to partial randomized processingof data. To the best of our knowledge, the proposed scalable PnP algorithm and correspondingconvergence analysis are absent from the current literature in this area. Our numerical validationdemonstrates the practical effectiveness of IPA for integrating nonsmooth data-fidelity terms anddeep neural net priors, its fast convergence compared to PnP-SGD, and its scalability in terms of bothspeed and memory. In summary, we establish IPA as a flexible, scalable, and theoretically sound PnPalgorithm applicable to a wide variety of large-scale problems.

All proofs and some technical details have been omitted due to space restrictions, and are included inthe supplement, which also provides additional background and additional simulations.

2 Background

Consider the problem of estimating an unknown vector x ∈ Rn from a set of noisy measurementsy ∈ Rm. It is common to formulate the solution to this estimation as an optimization problem

minx∈Rn

f(x) with f(x) := g(x) + h(x), (1)

where g is a data-fidelity term that quantifies consistency with the observed data y and h is aregularizer that encodes prior knowledge on x. As an example, consider the nonsmooth `1-norm data-fidelity term g(x) = ‖y −Ax‖1, which assumes a linear observation model y = Ax+ e, and theTV regularizer h(x) = τ‖Dx‖1, where D is the gradient and τ > 0 is the regularization parameter.Common applications of (1) include sparse vector recovery in compressive sensing [24, 25], imagerestoration using total variation (TV) [26], and low-rank matrix completion [27].

Proximal algorithms are often used for solving problems of form (1) when g or h are nonsmooth [3].For example, ADMM is one such standard algorithm that can be summarized as

zk = proxγg(xk−1 + sk−1) (2a)

xk = proxγh(zk − sk−1) (2b)

sk = sk−1 + xk − zk, (2c)

where γ > 0 is the penalty parameter [11]. ADMM relies on the proximal operator

proxτh(z) := arg minx∈Rn

{1

2‖x− z‖22 + τh(x)

}, τ > 0, z ∈ Rn, (3)

which is well-defined for any proper, closed, and convex function h [3]. The proximal operator can beinterpreted as a maximum a posteriori probability (MAP) estimator for the AWGN denoising problem

z = x+ n where x ∼ px, n ∼ N (0, τI), (4)by setting h(x) = −log(px(x)). This perspective inspired the development of PnP [1, 2], where theproximal operator is simply replaced by a more general denoiser D : Rn → Rn such as BM3D [28]or DnCNN [29]. For example, the widely used PnP-ADMM can be summarized as

zk = proxγg(xk−1 + sk−1) (5a)

xk = Dσ(zk − sk−1) (5b)

sk = sk−1 + xk − zk, (5c)

where, in analogy to τ > 0 in (3), we introduce the parameter σ > 0 controlling the relative strengthof the denoiser. Remarkably, this heuristic of using denoisers not associated with any h within aniterative algorithm exhibited great empirical success [12–15] and spurred a great deal of theoreticalwork on PnP algorithms [17–23].

An elegant fixed-point convergence analysis of PnP-ADMM was recently presented in [23]. Bysubstituting vk = zk − sk−1 into PnP-ADMM, the algorithm is expressed in terms of an operator

P :=1

2I +

1

2(2G− I)(2Dσ − I) with G := proxγg, (6)

2

where I denotes the identity operator. The convergence of PnP-ADMM is then established throughits equivalence to the fixed-point convergence of the sequence vk = P(vk−1). The equivalence ofPnP-ADMM to the iterations of the operator (6) originates from the well-known relationship betweenADMM and the Douglas-Rachford splitting [3, 8].

Scalable optimization algorithms have become increasingly important in the context of large-scaleproblems arising in machine learning and data science [30]. Stochastic and online optimizationtechniques have been investigated for traditional ADMM [31–35], where proxγg is approximatedusing a subset of observations (with or without subsequent linearization). Our work contributes tothis area by investigating the scalability of PnP-ADMM that is not minimizing any explicit objectivefunction. Since PnP-ADMM can integrate powerful deep neural net denoisers, there is a pressingneed to understand its theoretical properties and ability to address large-scale imaging applications.

Before introducing our algorithm, it is worth briefly mentioning an emerging paradigm of usingdeep neural nets for solving ill-posed imaging inverse problems (see, reviews [36–39]). This work ismost related to techniques that explicitly decouple the measurement model from the learned prior.For example, learned denoisers have been adopted for a class of algorithms in compressive sensingknown as approximate message passing (AMP) [40–43]. The key difference of PnP from AMP isthat it does not assume random measurement operators. Regularization by denoising (RED), and theclosely related deep mean-shift priors, rely on the denoiser to specify an explicit regularizer that hasa simple gradient [44–47]. PnP does not assume existence of such an objective, instead interpretingsolutions as equilibrum points balancing the data-fit and the prior [19]. Finally, a recent line ofwork has investigated the recovery and convergence guarantees for priors specified by generativeadversarial networks (GANs) [48–52]. PnP does not seek to project its iterates to the range of a GAN,instead it directly uses the output of a simple AWGN denoiser to improve the estimation quality. Thisgreatly simplifies the training and application of learned priors within the PnP methodology. Ourwork contributes to this broad area by providing new conceptual, theoretical, and empirical insightsinto incremental ADMM optimization under statistical priors specified as deep neural net denoisers.

3 Incremental PnP-ADMM

Batch PnP algorithms operate on the whole observation vector y ∈ Rm. We are interested in partialrandomized processing of observations by considering the decomposition of Rm into b ≥ 1 blocks

Rm = Rm1 × Rm2 × · · · × Rmb with m = m1 +m2 + · · ·+mb.

We thus consider data-fidelity terms of the form

g(x) =1

b

b∑

i=1

gi(x), x ∈ Rn, (7)

where each gi is evaluated only on the subset yi ∈ Rmi of the full data y.

PnP-ADMM is often impractical when b is very large due to the complexity of computing proxγg.IPA extends stochastic variants of traditional ADMM [31–35] by integrating denoisers Dσ that arenot associated with any regularizer h. Its per-iteration complexity is independent of the number ofdata blocks b, since it processes only a single component function gi at every iteration. IPA can alsobe implemented as a minibatch algorithm, processing several blocks in parallel at every iteration, thusimproving its efficiency on multi-processor hardware architectures (see details in Supplement A).

Algorithm 1 Incremental Plug-and-Play ADMM (IPA)1: input: initial values x0, s0 ∈ Rn, parameters γ, σ > 0.2: for k = 1, 2, 3, . . . do3: Choose an index ik ∈ {1, . . . , b}4: zk ← Gik(x

k−1 + sk−1) where Gik(z) := proxγgik(z) . impose data-consistency

5: xk ← Dσ(zk − sk−1) . impose prior knowledge

6: sk ← sk−1 + xk − zk

7: end for

3

In principle, IPA can be implemented using different block selection rules. The strategy adoptedfor our theoretical analysis focuses on the usual strategy of selecting indices ik as independent andidentically distributed (i.i.d.) random variables distributed uniformly over {1, . . . , b}. An alternativewould be to proceed in epochs of b consecutive iterations, where at the start of each epoch the set{1, . . . , b} is reshuffled, and ik is selected from this ordered set [53]. In some applications, it mightalso be beneficial to select indices ik in an online data-adaptive fashion by taking into account thestatistical relationships among observations [54, 55].

Unlike PnP-SGD, IPA does not require smoothness of the functions gi. Instead of computing thepartial gradient∇gi, as is done in PnP-SGD, IPA evaluates the partial proximal operator Gi. Thus,the maximal benefit of IPA is expected for problems in which Gi is efficient to evaluate. This is acase for a large number of functions commonly used in computational imaging, compressive sensing,and machine learning (see the extensive discussion on proximal operators in [56]).

Let us discuss two widely used scenarios. The proximal operator of the `2-norm data-fidelity termgi(x) =

12‖yi −Aix‖22 has a closed-form solution

Gi(z) = proxγgi(z) =(I+ γAT

i Ai

)−1 (z + γAT

i y), γ > 0, z ∈ Rn. (8)

Prior work has extensively discussed efficient strategies for evaluating (8) for a variety of linearoperators, including convolutions, partial Fourier transforms, and subsampling masks [9, 57–59]. Asa second example, consider the `1-data fidelity term gi(x) = ‖yi−Aix‖1, which is nonsmooth. Thecorresponding proximal operator has a closed form solution for any orthogonal operator Ai and canalso be efficiently computed in many other settings [56]. We numerically evaluate the effectivenessof IPA on both `1- and `2-norm data-fidelity terms and deep neural net priors in Section 5.

4 Theoretical Analysis

We now present a theoretical analysis of IPA. We fist present an intuitive interpretation of its solutions,and then present our convergence analysis under a set of explicit assumptions.

4.1 Fixed Point Interpretation

IPA cannot be interpreted using the standard tools from convex optimization, since its solution is gen-erally not a minimizer of an objective function. Nonetheless, we develop an intuitive operator basedinterpretation. The full technical exposition of the discussion here can be found in Supplement D.

Consider the following set-valued operator

T := γ∂g + (D−1σ − I), γ > 0, (9)

where ∂g is the subdifferential of the data-fidelity term and D−1σ (x) := {z ∈ Rn : x = Dσ(z)} isthe inverse operator of the denoiser Dσ. Note that this inverse operator exists even when Dσ is notone-to-one [8, 60]. By characterizing the fixed points of PnP algorithms, it can be shown that theirsolutions can be interpreted as vectors in the zero set of T

0 ∈ T(x∗) = γ∂g(x∗) + (D−1σ (x∗)− x∗) ⇔ x∗ ∈ zer(T) := {x ∈ Rn : 0 ∈ T(x)}.Consider the following two sets

zer(∂g) := {x ∈ Rn : 0 ∈ ∂g(x)} and fix(Dσ) := {x ∈ Rn : x = Dσ(x)},where zer(∂g) is the set of all critical points of the data-fidelity term and fix(Dσ) is the set of allfixed points of the denoiser. Intuitively, the fixed points of Dσ correspond to all vectors that are notdenoised, and therefore can be interpreted as vectors that are noise-free according to the denoiser.

If x∗ ∈ zer(∂g) ∩ fix(Dσ), then x∗ ∈ zer(T), which implies that x∗ is one of the solutions. Hence,any vector that minimizes a convex data-fidelity term g and noiseless according to Dσ is in thesolution set. On the other hand, when zer(∂g) ∩ fix(Dσ) = ∅, then x∗ ∈ zer(T) corresponds to anequilibrium point between two sets.

This interpretation of PnP highlights one important aspect that is often overlooked in the literature,namely that, unlike in the traditional formulation (1), the regularization in PnP depends on both thedenoiser parameter σ > 0 and the penalty parameter γ > 0, with both influencing the solution through

4

different mechanisms. Hence, the best performance is obtained by jointly tuning both parameters fora given experimental setting. In the special case of Dσ = proxγh with γ = σ2, we have

fix(Dσ) = {x ∈ Rn : 0 ∈ ∂h(x)} and zer(T) := {x ∈ Rn : 0 ∈ ∂g(x) + ∂h(x)},which corresponds to the optimization formulation (1) whose solutions are independent of γ.

4.2 Convergence Analysis

Our analysis requires three assumptions that jointly serve as sufficient conditions.

Assumption 1. Each gi is proper, closed, convex, and Lipschitz continuous with constant Li > 0.We define the largest Lipschitz constant as L = max{L1, . . . , Lb}.

This assumption is commonly adopted in nonsmooth optimization and is equivalent to existence of aglobal upper bound on subgradients [32,61,62]. It is satisfied by a large number of functions, such asthe `1-norm. The `2-norm also satisfies Assumption 1 when it is evaluated over a bounded subset ofRn. We next state our assumption on the denoiser Dσ .

Assumption 2. The residual Rσ := I− Dσ of the denoiser Dσ is firmly nonexpansive.

We review firm nonexpansiveness and other related concepts in the supplement. Firmly nonexpansiveoperators are a subset of nonexpansive operators (those that are Lipschitz continuous with constantone). A simple strategy to obtain a firmly nonexpansive operator is to create a (1/2)-averaged operatorfrom a nonexpansive operator [3]. The residual Rσ is firmly nonexpansive if and only if Dσ is firmlynonexpansive, which implies that the proximal operator automatically satisfies Assumption 2 [3].

The rationale for stating Assumption 2 for Rσ is based on our interest in residual deep neural nets. Thesuccess of residual learning in the context of image restoration is well known [29]. Prior work has alsoshown that Lipschitz constrained residual networks yield excellent performance without sacrificingstable convergence [23, 46]. Additionally, there has recently been an explosion of techniques fortraining Lipschitz constrained and firmly nonexpansive deep neural nets [23, 63–65].

Assumption 3. The operator T in (9) is such that zer(T) 6= ∅. There also exists R <∞ such that

‖xk − x∗‖2 ≤ R for all x∗ ∈ zer(T).

The first part of the assumptions simply ensures the existence of a solution. The existence of thebound R often holds in practice, as many denoisers have bounded range spaces. In particular, this istrue for a number of image denoisers whose outputs live within the bounded subset [0, 255]n ⊂ Rn.

We will state our convergence results in terms of the operator S : Rn → Rn defined as

S := Dσ − G(2Dσ − I). (10)

Both IPA and traditional PnP-ADMM can be interpreted as algorithms for computing an element inzer(S), which is equivalent to finding an element of zer(T) (see details in Supplement D).

We are now ready to state our main result on IPA.Theorem 1. Run IPA for t ≥ 1 iterations with random i.i.d. block selection under Assumptions 1-3using a fixed penalty parameter γ > 0. Then, the sequence vk = zk − sk−1 satisfies

E

[1

t

t∑

k=1

‖S(vk)‖22

]≤ (R+ 2γL)2

t+ max{γ, γ2}C, (11)

where C := 4LR+ 12L2 is a positive constant.

In order to contextualize this result, we also review the convergence of the traditional PnP-ADMM.

Theorem 2. Run PnP-ADMM for t ≥ 1 iterations under Assumptions 1-3 using a fixed penaltyparameter γ > 0. Then, the sequence vk = zk − sk−1 satisfies

1

t

t∑

k=1

‖S(vk)‖22 ≤(R+ 2γL)2

t. (12)

5

kP(uk)� ukk22/kukk22

<latexit sha1_base64="F1ffEJsVO/IiWe10OVTVym/k8H8=">AAAMV3icjZbfT9xGEMedtE0J0BZa9akvKxASqQi5Q5GqPlQKHAkgIXQlkESpA1r71r4Vttfsrrk7HP8P/Wv62v4b+WvaWds7tjla1RJ49jMz++M7Y5+9NOJK93qfHjz87PMvHn258Hhxafmrr79ZWf32jRKZ9Nm5LyIh33lUsYgn7FxzHbF3qWQ09iL21rsaGP/bGyYVF8mZnqXsQ0zDhAfcpxrQ5cqP7kd3qIJNN/Pii6snT6u7+/Fy52LnGfhaw8uV9d52r7zIvNGvjXWnvoaXq0vfuyPhZzFLtB9RpX7r91L9IadScz9ixaKbKZZS/4qGLPeEuNLUUwWprg2SShEwZXZOo6fXGY24nhEIiZjqZNJYBSLRrUwvApcnqByRmOoxUbPYE9FclsE2CbJuqEyEHvMk7AQm3GeBpH4zvS9icGo7LQmEJP1nO1uEaX+7kxtzXwoje2FzLRGhpOl41pUgLh7DtUG4hsP6BCYfdQJ8rllRRoRSZCmBcVnG7snKmbk/LUxgwMNM3hHMaKIF7LyKgN2vmUZK2PX1WlejKBSS6zFsuoBIHKr7o4q7AhtWblePmZDsTh5LshgOFJe70IKM6Q0jEdOaSWJ8TN5zuDiLNJdi0lQDMkOWmGB2X3cAyiIqQY3/FT4emyfpzknkbKTGUVJudCRggyNiohThCaHVLJ2EaflcFouLGxsbZCiFCGARq4GeRSxPI8qTwjS5Mcgvdcm3yIgFPOHm3ACliGmy6CZsUifn8EykQpX+Ih+2Bp2oZpIi32/sTkx9L/Kz2uh4IxbHtMiPy1vH4wspIhB0VuQDNDsR8KBncVqtvtvY3Vng2LD2oLxVOonUFEVIkAr2b/6S3N1ybWe6W0XloDKMObhMF4u0uqsAihS6W+AoilYcnf5LHJ1iXMTjuSBg4IfygMF1FXffotV6pA64Z7VqoTpAZelcADDcCU+COT8w9EPxp+gwAzyCCJuti9DiWyYRg21xwJtZwLZ4pALELjSi5fvJ4OSEKp3nrmZTnZfj4sIFVNSl8+Ep1RRewVXGsVEiP75sptM8mZFKizql84ory+XFsAL8o3hezyIPkW+Rj4hZxBCNLRojUhYpRFOLml4ILQoRzSyaIbq16BaRtEgiurboGlFiUdOlmUUZomOLjhHdWHSDaGLRBJGIWYgyViP0wYOnG185Klp9IgyFv55luzZ2F8P2LdpHdGTR0X2FJRP4VSDQJ0zyKWpetVIpu20iKzS6Zl3Xbct12229ueXGVONSYEPShI+YsWyp60xFk7pfz6BFsVfP8HCv2/h1I0MbN1K8auNXiA/b+BDxSRufID5o44NG5jY+Qjxs4yHi0zY+RXzexuf/scFKntY3VCnupmJaEREQ+GX24IPySZU+qHI9Lx/g+ohOLXqJ6CUqgOjEoveI3uObBapbf8/U6wGp92usQaNo19HStOtoVN3rOvawM5qX0r5nxfHatT5s46aoB23crLPbxuZxgg/p/t3P5nnjzc52//n2z78+X3+xV39SLzg/OGvOptN3fnJeOIfO0Dl3fOd35w/nT+evpU9Lfy8/Wl6oQh8+qHO+czrX8uo/DQiGSg==</latexit>

0 400010-6

100

0

40

0 400010-6

100

0

40

0 400010-6

100

0

40

0 4000iteration

40

0

SNR

(dB)

100

10-60 4000iteration

33.68 dB

0 4000iteration

33.88 dB33.88 dB

3.59⇥ 10�6

<latexit sha1_base64="7hYJE4oFGbeOq/g5X86knZsReNY=">AAAMP3icjZbNbtw2EMeV9Cvx9sNJ0VMvhA0DKeBsd+04bg4FYq8T24BhbBM7CdJNDEpLaQlLokxS3l0LepVe29foY/QJeit67a1DSRxJ9qaoAFvkb2b48Z+hlm4ScqV7vT9u3f7o408+/ezO3aXO5198+dXyvfuvlEilx049EQr5xqWKhTxmp5rrkL1JJKORG7LX7vnA2F9fMqm4iE/0PGHvIhrE3Oce1YDOlu9vdreejDSPmCL93vvs4eP8bHm11+1tbj/u9Umvu9nvb2xuQ2Nr64etjT7pd3vFs+pUz/DsXueb0Vh4acRi7YVUqZ/7vUS/y6jU3AtZvjRKFUuod04DlrlCnGvqqpyUzxpJpPCZMiuk4cOLlIZczwm4hEy1ImmkfBHrRqQbgskVVI5JRPWEqHnkivBGlME2CKIuqYyFnvA4aDnG3GO+pF49vCciMGo7LPGFJP3vN9YJ0163FRtxTwojb25jLRGBpMlk3pYgyu/Cs0a4hs16BAYftxw8rlleeARSpAmBfpGu9s6Kkbk3y42jz4NUXhPMaKIFrLz0gNWvmIKJ2cXFSlujMBCS6wksOgdP7KrFXvl1gQ0rlqsnTEh2LY7FaQQbiopVaEEm9JKRkGnNJDE2JhdsLkpDzaWY1tmAyIDFxpktqg5AaUglqPG/3CcTc2Ku7UTOx2oSxsVCxwIWOCbGSxEeE1qO0gqYFecvX1paW1sjQymED5NYDfQ8ZFkSUh7npshNg/xYpXydjJnPY272DVCKiMZLo5hNq+AMzkQiVGHPs2Gj0/KqB8mzvbrd8qneeXZSNVrWkEURzbOj4tWyeEKKEASd59kAmy0POOhplJSz79Tt9iiwbZh7ULxKnURikiIkSAXrN39xNlof2cocreelgcog4mAyVSyS8q18SFIwWgdDnjf86OwDfnSGfiGPbjgBAzukBxpcl36LJi3nI5XDgtnKiSoHlSY3HIDhSnjs37ADQzskf4YG08EtiKBeuggsvmISMbQt9nk9CrQtHisf8QgK0fK9eHB8TJXOspFmM50V/fz9CFBepc6DU6opfILLiCOjRHZ0Vg+neTwnpRZVSOsTV6TLjWAG+Edxv65FLiLPIg8Rs4ghmlg0QaQsUohmFtW1EFgUIJpbNEd0ZdEVImmRRHRh0QWi2KK6SlOLUkRHFh0hurToEtHUoikiEbEAZSx7aIODp2tb0csbdSIMhb+eZTvWdwfd9izaQ3Ro0eGixJIp/CoQqBMm+Qw1L0upkN0WkRUaTfO26aphumqX3o3pJlTjVNCGoCkfM9Oyqa4iFY2rej2BEsVaPcHNvWzil7UMTVxL8byJnyM+aOIDxMdNfIx4v4n3a5mb+BDxsImHiF808QvEp018+h8LLOVp3KEKcR8ophURPoFfZhcujt+V4YMy1nWzAc6P6IVFzxA9QwUQHVv0FtFb/LJAdqv7TDUfkGq9pjWoFW0bGpq2DbWqu23DLlZG/VHac604bjPXB01cJ3W/iet5dprYHCe4SNvbMvlw49VGt/+o++SnR6tPd6sr9R3nW2fFeeD0nW3nqXPgDJ1Tx3Nmzi/Or85vnd87f3b+6vxdut6+VcV87bSezj//Aj+3fws=</latexit>

6.79⇥ 10�6

<latexit sha1_base64="tXCVHbVDvpt04VzbRO855Osqw2g=">AAAMP3icjZbNbttGEMeZ9Cux+uGk6KmXhQ0DKeCoUhrbyaFAbDmxDRiG6thJkCoxltSSWpjk0rtLSzLBV+m1fY0+Rp+gt6LX3jpLcoekrRQlYHP4m5n9+M8sRTcJudK93h+3bn/08Seffnbn7lLn8y++/Gr53v1XSqTSY6eeCIV841LFQh6zU811yN4kktHIDdlr93xg/K8vmVRcxCd6nrB3EQ1i7nOPakBny/c3u1tPR5pHTJF+7332cDM/W17tdXtbvf4PfdLrbmyCtQHGk42tjScbpN/tFdeqU13Ds3udb0Zj4aURi7UXUqV+7vcS/S6jUnMvZPnSKFUsod45DVjmCnGuqatyUl5rJJHCZ8qskIYPL1Iacj0nEBIy1cqkkfJFrBuZbgguV1A5JhHVE6LmkSvCG1kG2yTIuqQyFnrC46AVGHOP+ZJ69fCeiMCp7bDEF5L0v3+0Tpj2uq3ciHtSGHlzm2uJCCRNJvO2BFF+F641wjVs1iMw+LgV4HHN8iIikCJNCDwX5WrvrBiZe7PcBPo8SOU1wYwmWsDKywhY/YppmJhdXKy0NQoDIbmewKJziMRHtTgqvy6wYcVy9YQJya7lsTiNYENRsQotyIReMhIyrZkkxsfkgs1Faai5FNO6GpAZsNgEs0XdASgNqQQ1/lf4ZGJOzLWdyPlYTcK4WOhYwALHxEQpwmNCy1FaCbPi/OVLS2tra2QohfBhEquBnocsS0LK49w0uTHIj1XJ18mY+TzmZt8ApYhovDSK2bRKzuBMJEIV/jwbNh5aUfUgebZb262Y6p5nJ5XR8oYsimieHRa3lscTUoQg6DzPBmi2IuCgp1FSzr5d2+1RYNsw96C4lTqJxBRFSJAK1m/+4my0PrKdOVrPSweVQcTBZbpYJOVd+VCkYLQOjjxvxNHZB+LoDONCHt0IAgZ+KA8YXJdxiyYt5yNVwILZyomqAJUmNwKA4Up47N/wA0M/FH+GDvOAWxBBvXQRWHzFJGKwLfZ5PQrYFo+Vj3gEjWj5bjw4OqJKZ9lIs5nOiuf8/QhQXpXOg1OqKbyCy4xDo0R2eFYPp3k8J6UWVUrrFVeUy41gBvhHcb+uRS4izyIPEbOIIZpYNEGkLFKIZhbVvRBYFCCaWzRHdGXRFSJpkUR0YdEFotiiuktTi1JEhxYdIrq06BLR1KIpIhGxAGUsn9AHB0/XvuIpb/SJMBT+epZt29htDNu1aBfRgUUHiwpLpvCrQKBPmOQz1LxspUJ220RWaHTN266rhuuq3Xo3pptQjVOBDUlTPmbGsqWuMhWNq349gRbFXj3Bzb1s4pe1DE1cS/GiiV8g3m/ifcRHTXyEeK+J92qZm/gA8bCJh4iPm/gY8WkTn/7HAkt5Gt9QhbgPFNOKCJ/AL7MLH47flemDMtd1swHOj+jYoueInqMCiI4seovoLb5ZoLrV90w1H5BqvcYa1Iq2HQ1N245a1Z22Ywc7o34p7bpWHLdZ6/0mrou618T1PNtNbI4TfEjbr2XyYePVo27/cffpT49Xn+1Un9R3nG+dFeeB03e2nGfOvjN0Th3PmTm/OL86v3V+7/zZ+avzdxl6+1aV87XTujr//Asebn8e</latexit>

1.07⇥ 10�5

<latexit sha1_base64="sWo6JWmP15/7nwsPSBITRY4YsgA=">AAAMP3icjZbNbtw2EMeV9Cvx9sNJ0VMvhA0DKeBstHYap4cCsdeJbcAwto6dBOkmBqWlJMISKZOUd9eCXqXX9jX6GH2C3opee+tQEinJ3hQVYIv8zQw//jPU0ktjKpXr/nHr9kcff/LpZ3fuLvU+/+LLr5bv3X8leSZ8curzmIs3HpYkpoycKqpi8iYVBCdeTF5750Ntf31JhKScnah5St4lOGQ0oD5WgM6W7w/67tZY0YRINHDf5w+/L86WV92++9TddAfI7T/ZdDfcLd14svl0w0XgXz6rTv2Mzu71vhlPuJ8lhCk/xlL+PHBT9S7HQlE/JsXSOJMkxf45DknucX6usCcLVD1rKBU8IFKvEMcPLzIcUzVH4BIT2YnEiQw4U61ILwaTx7GYoASrCMl54vH4RpTGJgiiLrFgXEWUhR1HRn0SCOw3w/s8AaMyw6KACzR4tLGOiPL7ndiE+oJreQsTawgPBU6jeVeCpLgLzxqiCjbrIxh80nHwqSJF6REKnqUI+mW6ujsrR6b+rNCOAQ0zcU0wrYnisPLKA1a/oguGkYuLla5GccgFVREsugBP25WLvYrrAmtWLldFhAtyLY6wLIENJeUqFEcRviQoJkoRgbSNiAWbS7JYUcGnTTYgMiRMO5NF1QEoi7EANf6XexTpE3NtJ2I+kVHMyoVOOCxwgrSXRJQhXI3SCZiV569YWlpbW0MjwXkAkxgN1DwmeRpjygpd5LqBfqxTvo4mJKCM6n0DFDzBbGnMyLQOzuFMpFyW9iIftTodr2aQIt9t2h2f+l3kJ3WjY41JkuAiPyxfHYvPBY9B0HmRD22z4wEHPUvSavbtpt0dBbYNcw/LV6UTT3VSuACpYP36j+Xj9bGpzPF6URmwCBMKJl3FPK3eMoAkheN1MBRFyw/PPuCHZ9YvpskNJ2Bgh/RAg6rKb9Gk1XyodlgwWzVR7SCz9IYDMLsSyoIbdmDWDsmfWYPu2C3wsFk6Dw2+IsJiaBsc0GYUaBs8kYHFYyhEw3fZ8OgIS5XnY0VmKi/7xfsxoKJOnQ+nVGH4BFcRh1qJ/PCsGU5RNkeVFnVI5xNXpstLYAb4h+1+PYM8i3yDfIuIQcSiyKDIImmQtGhmUFMLoUGhRXOD5hZdGXRlkTBIWHRh0IVFzKCmSjODMosODTq06NKgS4umBk0t4gkJrYxVz9rg4KnGVvaKVp1wTeHPNWzb+G5bt12Ddi06MOhgUWLRFH4VENQJEXRmNa9KqZTdFJER2prmXdNVy3TVLb0b00VY2amgDUFTOiG6ZVJdR0rM6no9gRK1tXpiN/eyjV82MrRxI8WLNn5h8X4b71t81MZHFu+18V4jcxsfWDxq45HFx218bPFpG5/+xwIreVp3qFLcB5IoiXiA4JfZg4vjd1X4sIr1vHxo57fo2KDnFj23Clh0ZNBbi97aLwtkt77P1PMBqderW8NG0a6hpWnX0Ki60zXs2MpoPkq7nhHHa+d6v42bpO61cTPPdhvr4wQXaXNbRh9uvNroDx73f/jp8eqznfpKfcf51llxHjgDZ8t55uw7I+fU8Z2Z84vzq/Nb7/fen72/en9Xrrdv1TFfO52n98+/jrB+/A==</latexit>

kS(v

)k2 2/k

vk2 2

<latexit sha1_base64="fG7f68/bEJtSWZ3sTQOJm0CrWs4=">AAAMTHicjZZbb9xEFMfdAqXNFpqCeOKBUaJIqRTS3agS4gGpyaZNIkXRkktbFbfR2Du2R7FnnJnxXuL6kU/DK3wN3vkevCEkztiesZ1dEJYSn/mdc+byP8dee2lMper3/7hz96OPP7n36f0HK72Hn33+aPXxF68kz4RPLnwec/HGw5LElJELRVVM3qSC4MSLyWvvaqj9rydESMrZuZqn5F2CQ0YD6mMF6HL1G/eDeyaDTXfiJU/cD5c773eeAoJRNbhcXe9v98sLLRqD2lh36mt0+bj3lTvmfpYQpvwYS/nToJ+qdzkWivoxKVbcTJIU+1c4JLnH+ZXCnixQdW2gVPCASL1dHH97neGYqjmCkJjITiZOZMCZamV6Mbg8jsUYJVhFSM4Tj8cLWRqbJMiaYMG4iigLO4GM+iQQ2G+m93kCTmWmRQEXaPB0ZwsR5W93chPqC661LkyuITwUOI3mXQmS4gFcG4gqOKyPYPJxJ8CnihRlRCh4liIYl7XrnqycmfqzQgcGNMzELcG0JorDzqsI2P2a7h5Grq/XuhrFIRdURbDpAiLtUC6PKm4LrFm5XRURLsitPMKyBA6UlLtQHEV4QlBMlCICaR8RSw6XZLGigk+bakBmSJgOJsu6A1AWYwFq/K/wKNKPz62TiPlYRjErNzrmsMEx0lESUYZwNUsnYVY+jMXKysbGBhoJzgNYxGig5jHJ0xhTVugm1wb6oS75FhqTgDKqzw1Q8ASzFZeRaZ2cwzORcln6i3zUGnSimkmKfL+xOzH1vcjPa6PjjUmS4CI/Lm8dj88Fj0HQeZEPrdmJgAc9S9Jq9d3G7s4Cx4a1h+Wt0omnuihcgFSwf/3HcnfLNZ3pbhWVA4swoeDSXczT6i4DKFLoboGjKFpxePYvcXhm42KaLAQBAz+UBwyqqrhli1broTpgyWrVQnWAzNKFAGB2J5QFC35g1g/Fn1mHHtgj8LDZOg8NviHCYrANDmgzC9gGj2VgsQuNaPg+G56cYKny3FVkpvJyXLx3ARV16Xx4ShWGV3CVcayVyI8vm+kUZXNUaVGndF5xZbm8BFaAf9ie1zPIs8g3yLeIGEQsigyKLJIGSYtmBjW9EBoUWjQ3aG7RjUE3FgmDhEXXBl1bxAxqujQzKLPo2KBjiyYGTSyaGjS1iCcktDJWI+uDB081vnJUtPqEawp/fcN2TeyuDds3aN+iI4OOlhUWTeFXAUGfEEFnVvOqlUrZTRMZoa1r3nXdtFw33dZbWC7Cyi4FNiRN6Zhoy5S6zpSY1f16Di1qe/XcHu6sjc8aGdq4keJlG7+0+LCNDy0+aeMTiw/a+KCRuY2PLB618cji0zY+tfiijS/+Y4OVPK1vqFLcTUmURDxA8MvswVfkkyp9WOV6Xj6061t0atALi15YBSw6MeitRW/tmwWqW3/P1OsBqferrWGjaNfR0rTraFTd6zr2bGc0L6V9z4jjtWt92MZNUQ/auFlnt4314wQf0oPbn82Lxqud7cGz7e9/fLb+fK/+pL7vfO2sOZvOwPnOee4cOiPnwvGdn51fnF+d33q/9/7s/dX7uwq9e6fO+dLpXA/v/QP05YO3</latexit>

�0

<latexit sha1_base64="ZpJ66p+crHERsGalT1IoBgwe3zA=">AAAMM3icjZbNbtw2EMeV9Cvx9iNp0VMvhA0DKeC6u0GAoocCsdeJbcAwto6dBOkmBqWlJMISKZOUd9eCXqLX9jX6MEVvRa99hw4lcSR53aICbI1+M8OP/wy18rOEazMc/n7n7nvvf/DhR/furw0+/uTTzx48/PyllrkK2FkgE6le+1SzhAt2ZrhJ2OtMMZr6CXvlX4yt/9UVU5pLcWqWGXub0kjwkAfUAHo9jWia0vPh+YON4fawusiqMWqMDa+5JucPB19OZzLIUyZMkFCtfxoNM/O2oMrwIGHl2jTXLKPBBY1Y4Ut5YaivS1JfmyRTMmTaroom31zmNOFmSSAkYbqXSVMdSmE6mX4CLl9SNSMpNTHRy9SXyUqWxS4Jsq6oEtLEXES9QMEDFioatMMHMgWnccOSUCoy+vbxFmEm2O7lpjxQ0kpaulxHZKRoFi/7EqTlfbg2CTew2YDA4LNeQMANK6uISMk8I/Bclai/s2pkHixKGxjyKFc3BLOaGAkrryNg9eu2SQS7vFzva5REUnETw6JLiMRHfXtUeVNgy6rlmphJxW7kMZGnsKG0WoWRJKZXjCTMGKaI9TF1y+bSPDFcyXlbDciMmLDB7LbuAJQnVIEa/ys8ju0pubETtZzpOBHVQmcSFjgjNkoTLgitR+klLKozV66tbW5ukomSMoRJnAZmmbAiSygXpW1ya5AfmpJvkRkLueB23wCVTKlYmwo2b5ILOBOZ1JW/LCadh15UO0hZ7LV2L6a5l8VpY/S8CYMjXxZH1a3nCaSSCQi6LIsxmr0IOOh5mtWz77R2fxTYNsw9rm61TjKzRZEKpIL12z9RTLemrjOnW2XtoCpKObhsF8usvusQihRNt8BRlp04uviXOLrAuISnK0HAwA/lAYObOu62Sev5SBNwy2z1RE2AzrOVAGC4Ei7CFT8w9EPxF+iwD7gFGbVLl5HD10whBtvhkLejgO3wTIeIp9CIju+J8fEx1aYopoYtTFE9l++mgMqmdAGcUkPhFVxnHFkliqPzdjjDxZLUWjQpvVdcVS4/hRngH8X9+g75iAKHAkTMIYYodihGpB3SiBYOtb0QORQhWjq0RHTt0DUi5ZBCdOnQJSLhUNuluUM5oiOHjhBdOXSFaO7QHJFMWYQy1k/og4NnWl/1VHb6RFoKf0PHdlzsDobtObSH6NChw9sKS+bwq0CgT5jiC9S8bqVKdtdETmh0Lfuu647rut96K9PF1OBUYEPSnM+YtVypm0xNRdOvp9Ci2KunuLkXXfyilaGLWymed/FzxAddfID4uIuPEe938X4rcxcfIp508QTxSRefID7r4rP/WGAtT+cbqhL3kWZGExkS+GX24WPx6zp9XOf6fjHG+RGdOPQM0TNUANGxQ28QvcE3C1S3+Z5p5gPSrNda41bRvqOjad/Rqrrbd+xiZ7QvpT3fieN3a33QxW1R97u4nWeni+1xgg/p0c3P5lXj5ePt0ZPt7398svF0t/mkvud95a17j7yR95331DvwJt6ZF3iJ97P3i/fr4LfBH4M/B3/VoXfvNDlfeL1r8Pc/M8R7oQ==</latexit>

�0/2

<latexit sha1_base64="dK8wveYY+KT4ai3Q7CSwJJePFlw=">AAAMNXicjZbNbttGEMeZ9Cux+pG06KmXhQ0DKeA6khGg6KFAbDmxDRiG6thJ4Co1ltSSWpjcpXeXlmSCb9Fr+xp9lh56K3rtK3SW5A5JSy1KwObwNzP78Z9Zin4ac236/d/v3X/v/Q8+/OjBw7Xex598+tmjx5+/1jJTATsPZCzVW59qFnPBzg03MXubKkYTP2Zv/Kuh9b+5YUpzKc7MImXvEhoJHvKAGkAX44gmCb3sP925fLTR3+6XF1k2BrWx4dXX6PJx78vxRAZZwoQJYqr1j4N+at7lVBkexKxYG2eapTS4ohHLfSmvDPV1Qaprk6RKhkzbddH4m+uMxtwsCITETHcyaaJDKUwr04/B5UuqJiShZkr0IvFlvJRlsUuCrBuqhDRTLqJOoOABCxUNmuEDmYDTuGFJKBUZPN3ZIswE253chAdKWlELl+uIjBRNp4uuBEnxEK5Nwg1sNiAw+KQTEHDDijIiUjJLCTyXRerurByZB/PCBoY8ytQdwawmRsLKqwhY/bptE8Gur9e7GsWRVNxMYdEFROKjXh1V3BXYsnK5ZsqkYnfymMgS2FBSrsJIMqU3jMTMGKaI9TG1YnNJFhuu5KypBmRGTNhgtqo7AGUxVaDG/wqfTu05ubMTtZjoaSzKhU4kLHBCbJQmXBBajdJJmJenrlhb29zcJCMlZQiTOA3MImZ5GlMuCtvk1iDf1yXfIhMWcsHtvgEqmVCxNhZsVifncCZSqUt/kY9aD52oZpAi32/sTkx9L/Kz2uh4YwaHvsiPy1vHE0glYxB0UeRDNDsRcNCzJK1m323s7iiwbZh7WN4qnWRqiyIVSAXrt38iH2+NXWeOt4rKQVWUcHDZLpZpddchFCkab4GjKFpxdP4vcXSOcTFPloKAgR/KAwY3VdyqSav5SB2wYrZqojpAZ+lSADBcCRfhkh8Y+qH4c3TYB9yCjJqly8jhW6YQg+1wyJtRwHZ4okPEY2hEx/fF8OSEapPnY8PmJi+fi5/GgIq6dAGcUkPhFVxlHFsl8uPLZjjDxYJUWtQpnVdcWS4/gRngH8X9+g75iAKHAkTMIYZo6tAUkXZII5o71PRC5FCEaOHQAtGtQ7eIlEMK0bVD14iEQ02XZg5liI4dOkZ049ANoplDM0QyYRHKWD2hDw6eaXzlU9HqE2kp/PUd23Wxuxi279A+oiOHjlYVlszgV4FAnzDF56h51Uql7K6JnNDoWnRdty3Xbbf1lqabUoNTgQ1JMz5h1nKlrjM1FXW/nkGLYq+e4eZetfGrRoY2bqR42cYvER+28SHikzY+QXzQxgeNzG18hHjUxiPEp218ivi8jc//Y4GVPK1vqFLcJ5oZTWRI4JfZh8/Fr6v0YZXr+/kQ50d06tALRC9QAUQnDl0gusA3C1S3/p6p5wNSr9daw0bRrqOladfRqLrXdexhZzQvpX3fieO3a33Yxk1RD9q4mWe3je1xgg/pwd3P5mXj9c724Nn2dz8823i+V39SP/C+8ta9J97A+9Z77h16I+/cCzzh/ez94v3a+633R+/P3l9V6P17dc4XXufq/f0PD1B8Fg==</latexit>

�0/4

<latexit sha1_base64="+IVzSuF6q9ScCdSinBbg9hy11zg=">AAAMNXicjZbNbttGEMeZ9Cux+pG06KmXhQ0DKeA6UmCg6KFAbDmxDRiG6thJ4Co1ltSSWpjcpXeXlmSCb9Fr+xp9lh56K3rtK3SW5A5JSy1KwObwNzP78Z9Zin4ac236/d/v3X/v/Q8+/OjBw7Xex598+tmjx5+/1jJTATsPZCzVW59qFnPBzg03MXubKkYTP2Zv/Kuh9b+5YUpzKc7MImXvEhoJHvKAGkAX44gmCb3sP925fLTR3+6XF1k2BrWx4dXX6PJx78vxRAZZwoQJYqr1j4N+at7lVBkexKxYG2eapTS4ohHLfSmvDPV1Qaprk6RKhkzbddH4m+uMxtwsCITETHcyaaJDKUwr04/B5UuqJiShZkr0IvFlvJRlsUuCrBuqhDRTLqJOoOABCxUNmuEDmYDTuGFJKBUZPH22RZgJtju5CQ+UtKIWLtcRGSmaThddCZLiIVybhBvYbEBg8EknIOCGFWVEpGSWEngui9TdWTkyD+aFDQx5lKk7gllNjISVVxGw+nXbJoJdX693NYojqbiZwqILiMRHvTqquCuwZeVyzZRJxe7kMZElsKGkXIWRZEpvGImZMUwR62NqxeaSLDZcyVlTDciMmLDBbFV3AMpiqkCN/xU+ndpzcmcnajHR01iUC51IWOCE2ChNuCC0GqWTMC9PXbG2trm5SUZKyhAmcRqYRczyNKZcFLbJrUG+r0u+RSYs5ILbfQNUMqFibSzYrE7O4UykUpf+Ih+1HjpRzSBFvt/YnZj6XuRntdHxxgwOfZEfl7eOJ5BKxiDoosiHaHYi4KBnSVrNvtvY3VFg2zD3sLxVOsnUFkUqkArWb/9EPt4au84cbxWVg6oo4eCyXSzT6q5DKFI03gJHUbTi6Pxf4ugc42KeLAUBAz+UBwxuqrhVk1bzkTpgxWzVRHWAztKlAGC4Ei7CJT8w9EPx5+iwD7gFGTVLl5HDt0whBtvhkDejgO3wRIeIx9CIju+L4ckJ1SbPx4bNTV4+Fz+NARV16QI4pYbCK7jKOLZK5MeXzXCGiwWptKhTOq+4slx+AjPAP4r79R3yEQUOBYiYQwzR1KEpIu2QRjR3qOmFyKEI0cKhBaJbh24RKYcUomuHrhEJh5ouzRzKEB07dIzoxqEbRDOHZohkwiKUsXpCHxw80/jKp6LVJ9JS+Os7tutidzFs36F9REcOHa0qLJnBrwKBPmGKz1HzqpVK2V0TOaHRtei6bluu227rLU03pQanAhuSZnzCrOVKXWdqKup+PYMWxV49w829auNXjQxt3Ejxso1fIj5s40PEJ218gvigjQ8amdv4CPGojUeIT9v4FPF5G5//xwIreVrfUKW4TzQzmsiQwC+zD5+LX1fpwyrX9/Mhzo/o1KEXiF6gAohOHLpAdIFvFqhu/T1TzwekXq+1ho2iXUdL066jUXWv69jDzmheSvu+E8dv1/qwjZuiHrRxM89uG9vjBB/Sg7ufzcvG62fbg53t737Y2Xi+V39SP/C+8ta9J97A+9Z77h16I+/cCzzh/ez94v3a+633R+/P3l9V6P17dc4XXufq/f0PJuB8GA==</latexit>

Figure 1: Illustration of the influence of the penalty parameter γ > 0 on the convergence of IPA fora DnCNN prior. The average normalized distance to zer(S) and SNR (dB) are plotted against theiteration number with the shaded areas representing the range of values attained over 12 test images.The accuracy of IPA improves for smaller values of γ. However, the SNR performance is nearlyidentical, indicating that in practice IPA can achieve excellent results for a range of fixed γ values.

Both proofs are provided in the supplement. The proof of Theorem 2 is a minor modification of theanalysis in [23], obtained by relaxing the strong convexity assumption in [23] by Assumption 1 andreplacing the assumption that Rσ is a contraction in [23] by Assumption 2. Theorem 2 establishesthat the iterates of PnP-ADMM satisfy ‖S(vt)‖ → 0 as t→∞. Since S is firmly nonexpansive andDσ is nonexpansive, the Krasnosel’skii-Mann theorem (see Section 5.2 in [66]) directly implies thatvt → zer(S) and xt = Dσ(v

t)→ zer(T).

Theorem 1 establishes that IPA approximates the solution obtained by the full PnP-ADMM up toan error term that depends on the penalty parameter γ. One can precisely control the accuracy ofIPA by setting γ to a desired level. In practice, γ can be treated as a hyperparameter and tuned tomaximize performance for a suitable image quality metric, such as SNR or SSIM. Our numericalresults in Section 5 corroborate that excellent SNR performance of IPA can be achieved withouttaking ‖S(vt)‖2 to zero, which simplifies practical applicability of IPA.

Finally, note that our analysis can be also performed under assumptions adopted in [23], namely thatgi are strongly convex and Rσ is a contraction. Such an analysis leads to the following statement

E[‖xt − x∗‖2

]≤ ηt(2R+ 4γL) + (4γL)/(1− η), 0 < η < 1, (13)

which establishes a linear convergence to zer(T) up to an error term. A proof of (13) is providedin the supplement. As corroborated by our simulations in Section 5, the actual convergence of IPAholds even more broadly than suggested by both sets of sufficient conditions. This motivates furtheranalysis of IPA under more relaxed assumptions that we leave to future work.

5 Numerical Validation

Recent work has shown the excellent performance of PnP algorithms for smooth data-fidelity termsusing advanced denoising priors. Our goal in this section is to extend these studies with simulationsvalidating the effectiveness of IPA for nonsmooth data-fidelity terms and deep neural net priors, aswell as demonstrating its scalability to large-scale inverse problems. We consider two applicationsof the form y = Ax + e, where e ∈ Rm denotes the noise and A ∈ Rm×n denotes either arandom Gaussian matrix in compressive sensing (CS) or the transfer function in intensity diffractiontomography [67].

Our deep neural net prior is based on the DnCNN architecture [29], with its batch normalizationlayers removed for controlling the Lipschitz constant of the network via spectral normalization [68](see details in Supplement G.1). We train a nonexpansive residual network Rσ by predicting thenoise residual from its noisy input. This means that Rσ satisfies the necessary condition for firmnonexpansiveness of Dσ. The training data is generated by adding AWGN to the images from theBSD400 dataset [69]. The reconstruction quality is quantified using the signal-to-noise ratio (SNR)in dB. We pre-train several deep neural net models as denoisers for σ ∈ [1, 10], using σ intervals of0.5, and use the denoiser achieving the best SNR in each experiment.

6

SPA (Ours)

<latexit sha1_base64="5y/aBI/x720AylOA4xONnQjV3Jk=">AAADVnicfZLNbtNAEMe3CaUlfKX0yMUiQiqismzHn7cCF24EQdpKSRSt12PXqr22dteowfJD9Gl6hbeAl0GsHatKHMRI6xnN/mb379nx8yTmQtN+7/X6D/YfHhw+Gjx+8vTZ8+HRi3OeFYzAlGRJxi59zCGJKUxFLBK4zBng1E/gwr/+UO9ffAPG44x+FascFimOaBzGBAuZWg7fzgXciLL5NqeVDIKq/DJ5p5w0ST8sPxWMV2+qajkcaarWmLIb6G0wQq1Nlke943mQkSIFKkiCOZ/pWi4WJWYiJglUg3nBIcfkGkcwkyHFKfBF2eiolNcyEyhhxuSiQmmymxUlTjlfpb4kUyyueHevTv5rb1aI0F2UMc0LAZSsLwqLRBGZUrdICWIGRCQrGWDCYqlVIVeYYSJkI7duuVlLHcwDCOUDrBsYMQBalSzyq1I71VTbdSzpHMO2OyT+XjDYJE2zJl3XdjqknxSb4NhrQHtsuNVgm8wYptE9q8q76wLLtk3pdMP1Oic3z93CjjeW8Km280N4dc9YlunIk0zHaNzYsf4jVTV006312o5Rq3A9qVeOkd4dmt3g3FB1U/U+m6Oz9+1AHaKX6BU6QTpy0Bn6iCZoigi6RXfoB/rZ+9X709/vH6zR3l5bc4y2rD/8C6nKCIY=</latexit>

PnP-ADMM

<latexit sha1_base64="Feywo/wboGE4ktT4b6lV93LwIVQ=">AAADTnicfZJLb5tAEIA3pA/XfSXNsRdUq1IPKQLM85Y+Dr1EcqU6iWRb0bIMBAUWtLtUcRA/IL+m1/Zf9No/0lvVLuBGNq46Esxo5psHwwRFmnCh6z92lN07d+/dHzwYPnz0+MnTvf1nJzwvGYEpydOcnQWYQ5pQmIpEpHBWMMBZkMJpcPmuiZ9+BsaTnH4SywIWGY5pEiUEC+k63xvNBVyJqn231aqcYRpDXU3o5PWb98fHdS0pXdNbUbcNY2WM0Eom5/vKwTzMSZkBFSTFnM8MvRCLCjORkBTq4bzkUGByiWOYSZPiDPiiavvX6kvpCdUoZ/KhQm296xkVzjhfZoEkMywueD/WOP8Vm5Ui8hZVQotSACVdo6hMVZGrzWrUMGFARLqUBiYskbOq5AIzTIRc4EaXq27U4TyESC6+W1zMAGhdsTioK/1Q1xzPtaVyTcfpkfi6ZLBOWlZDep7j9sggLdfBsd+Cztj06uEm+fe3dawmezcJtuNYUhmm5/cqMwhvYdcfS/hQ3/ogvLxlbNtyZSXLNVs1du3/jKqZhuU18zqu2Uzh+XJeeUZG/2i2jRNTMyzN/2iNjt6uDmqAnqMX6BUykIuO0Ac0QVNE0A36gr6ib8p35afyS/ndocrOKucAbcju4A8xTgbw</latexit>

PnP-FISTA

<latexit sha1_base64="0z/d2iel6TyIEjOsNXNR7qGk0WM=">AAADTXicfZJLb9NAEIC3LtASHk3pkYvVCIlDsWzHz1sBCcEtiKatlETRejN2rdpra3eNGizf+TVc4V9w5odwQ4i1Y6rEQYy0ntHMNw+PJsiTmAtd/7Gj7N65e29v/37vwcNHjw/6h0/OeVYwAmOSJRm7DDCHJKYwFrFI4DJngNMggYvg+nUdv/gIjMcZPRPLHGYpjmgcxgQL6Zr3j6cCbkTZfJtqZcTwsipHdPTizbsPZy+rat4f6JreiLptGK0xQK2M5ofK0XSRkSIFKkiCOZ8Yei5mJWYiJglUvWnBIcfkGkcwkSbFKfBZ2bSv1GfSs1DDjMlHhdp41zNKnHK+TANJplhc8W6sdv4rNilE6M3KmOaFAEpWjcIiUUWm1ptRFzEDIpKlNDBhsZxVJVeYYSLk/ja63KxG7U0XEMq9/90bAK1KFgVVqZ/omuO5tlSu6TgdEn8qGKyTllWTnue4HTJIinVw6DegMzS9qrdJZgzT6JbVZO86wXYcSyrD9PxOZQaLW9j1hxI+0bd+qD6ElrFty5WVLNds1NC1/zOqZhqWV8/ruGY9hefLeeUZGd2j2TbOTc2wNP+9NTh91R7UPnqKjtFzZCAXnaK3aITGiKDP6Av6ir4p35Wfyi/l9wpVdtqcI7Qhu3t/AMKMBnU=</latexit>

PnP-SGD

<latexit sha1_base64="+EEeY05S1jzWEiuN1BULXGIHrEg=">AAADS3icfZLNbtNAEIC3LoUSfprSIxeLCIlDsWzHv7cKkOAYBGkrJVG0Xo9dq/ba2l2jBstXnoYrvAUPwHNwQxxY21HUOIiR7BnNfPPj8QRFmnCh6z/3lP07B3fvHd4fPHj46PHR8PjJOc9LRmBK8jRnlwHmkCYUpiIRKVwWDHAWpHARXL9u4hefgPEkpx/FqoBFhmOaRAnBQrqWQ3Uu4EZU7butVgVpCXU1oZOXH96+qevlcKRreivqrmGsjRFay2R5rJzMw5yUGVBBUsz5zNALsagwEwlJoR7MSw4FJtc4hpk0Kc6AL6q2ea0+l55QjXImHyrU1ns7o8IZ56sskGSGxRXvxxrnv2KzUkTeokpoUQqgpGsUlakqcrXZixomDIhIV9LAhCVyVpVcYYaJkNvb6nLTjTqYhxDJrXdbixkArSsWB3Wln+qa47m2VK7pOD0Sfy4Z3CYtqyE9z3F7ZPcnNuDYb0FnbHr1YJvMGabxhtVk7ybBdhxLKsP0/F5lBuEGdv2xhE/1nQ/Cqw1j25YrK1mu2aqxa/9nVM00LK+Z13HNZgrPl/PKMzL6R7NrnJuaYWn+e2t09mp9UIfoKXqGXiADuegMvUMTNEUEfUFf0Tf0Xfmh/FJ+K386VNlb55ygLdk/+AuUYgW9</latexit>

d = 512, N = 300 d = 512, N = 300

0 2500

5

10

15

20

25

30

35

0 2500

5

10

15

20

25

30

35

0 2500

5

10

15

20

25

30

35

0 250time (sec)

35

0

SNR

(dB

)

0 250time (sec) 0 250time (sec)

IPA (60)

PnP-ADMM

PnP-SGD (60)

PnP-FISTA

n=5122, b=300 n=5122, b=600 n=10242, b=600

Figure 2: Illustration of scalability of IPA and several widely used PnP algorithms on problems ofdifferent sizes. The parameters n and b denote the image size and the number of acquired intensityimages, respectively. The average SNR is plotted against time in seconds. Both IPA and PnP-SGDuse random minibatches of 60 measurements at every iteration, while PnP-ADMM and PnP-FISTAuse all the measurements. The figure highlights the fast empirical convergence of IPA compared toPnP-SGD as well as its ability to address larger problems compared to PnP-ADMM and PnP-FISTA.

Table 1: Final average SNR (dB) and Runtime obtained by several PnP algorithms on all test images.

Simulations Parameters n = 5122 n = 5122 n = 10242

(b = 300) (b = 600) (b = 600)

Algorithms σ γ SNR in dB (Runtime)

PnP-FISTA 1 5×10−4 22.60 (19.4 min) 22.79 (42.6 min) 23.56 (8.1 hr)

PnP-SGD (60) 1 5×10−4 22.31 (7.1 min) 22.74 (5.2 min) 23.42 (44.3 min)

PnP-ADMM 2.5 1 24.23 (7.4 min) 24.40 (14.7 min) 25.50 (1.4 hr)

IPA (60) 2.5 1 23.65 (1.7 min) 23.88 (2 min) 24.95 (11 min)

5.1 Integration of Non-smooth Data-Fidelity Terms and Pre-Trained Deep Priors

We first validate the effectiveness of Theorem 1 for non-smooth data-fidelity terms. The matrix Ais generated with i.i.d. zero-mean Gaussian random elements of variance 1/m, and e as a sparseBernoulli-Gaussian vector with the sparsity ratio of 0.1. This means that, in expectation, ten percentof the elements of y are contaminated by AWGN. The sparse nature of noise motivates the usage ofthe `1-norm g(x) = ‖y −Ax‖1, since it can effectively mitigate outliers. The nonsmoothness of`1-norm prevents the usage of gradient-based algorithms such as PnP-SGD. On the other hand, theapplication IPA is facilitated by efficient strategies for computing the proximal operator [26, 70].

We set the measurement ratio to be approximately m/n = 0.7 with AWGN of standard deviation5. Twelve standard images from Set 12 are used in testing, each resized to 64× 64 pixels for rapidparameter tuning and testing. We quantify the convergence accuracy using the normalized distance‖S(vk)‖22/‖vk‖22, which is expected to approach zero as IPA converges to a fixed point.

Theorem 1 characterizes the convergence of IPA in terms of ‖S(vk)‖2 up to a constant error termthat depends on γ. This is illustrated in Fig. 1 for three values of the penalty parameter γ ∈{γ0, γ0/2, γ0/4} with γ0 = 0.02. The average normalized distance ‖S(vk)‖22/‖vk‖22 and SNR areplotted against the iteration number and labeled with their respective final values. The shaded areasrepresent the range of values attained across all test images. IPA is implemented to use a random halfof the elements in y in every iteration to impose the data-consistency. Fig. 1 shows the improvedconvergence of IPA to zer(S) for smaller values of γ, which is consistent with our theoretical analysis.Specifically, the final accuracy improves approximately 3× (from 1.07× 10−5 to 3.59× 10−6) whenγ is reduced from γ0 to γ0/4. On the other hand, the SNR values are nearly identical for all threeexperiments, indicating that in practice different γ values lead to fixed points of similar quality. Thisindicates that IPA can achieve high-quality result without taking ‖S(vk)‖2 to zero.

7

Table 2: Per-iteration memory usage specification for reconstructing 1024×1024 images

Algorithms PnP-ADMM IPA (Ours)

Variables size memory size memory

{Ai} real 1024× 1024× 600 9.38 GB 1024× 1024× 60 0.94 GB

imaginary 1024× 1024× 600 9.38 GB 1024× 1024× 60 0.94 GB

{yi} 1024× 1024× 600 18.75 GB 1024× 1024× 60 1.88 GB

others combined — 0.13 GB — 0.13 GB

Total 37.63 GB 3.88 GB

5.2 Scalability in Large-scale Optical Tomography

We now discuss the scalability of IPA on intensity diffraction tomography, which is a data intensivecomputational imaging modality [67]. The goal is to recover the spatial distribution of the complexpermittivity contrast of an object given a set of its intensity-only measurements. In this problem, Aconsists of a set of b complex matrices [A1, . . . ,Ab]

T, where each Ai is a convolution correspondingto the ith measurement yi. We adopt the `2-norm loss g(x) = ‖y −Ax‖22 as the data-fidelity termto empirically compare the performance of IPA and PnP-SGD on the same problem.

In the simulation, we follow the experimental setup in [67] under AWGN corresponding to aninput SNR of 20 dB. We select six images from the CAT2000 dataset [71] as our test examples,each cropped to n pixels. We assume real permittivity functions, but still consider complex valuedmeasurement operator A that accounts for both absorption and phase [67]. Due to the large size ofdata, we process the measurements in epochs using minibatches of size 60 (see also Supplement G.2).

Fig. 2 illustrates the evolution of average SNR against runtime for several PnP algorithms, namelyPnP-ADMM, PnP-FISTA, PnP-SGD, and IPA, for images of size n ∈ {512× 512, 1024× 1024}and the total number of intensity measurements b ∈ {300, 600}. The final values of SNR as well asthe total runtimes are summarized in Table 1. The table highlights the overall best SNR performancein bold and the shortest runtime in light-green. In every iteration, PnP-ADMM and PnP-FISTAuse all the measurements, while IPA and PnP-SGD use only a small subset of 60 measurements.IPA thus retains its effectiveness for large values of b, while batch algorithms become significantlyslower. Moreover, the scalability of IPA over PnP-ADMM becomes more notable when the imagesize increases. For example, Table 1 highlights the convergence of IPA to 24.95 dB within 11minutes, while PnP-ADMM takes 1.4 hours to reach a similar SNR value. Note the rapid progress ofPnP-ADMM in the first few iterations, followed by a slow but steady progress until its convergenceto the values reported in Table 1. This behavior of ADMM is well known and has been widelyreported in the literature (see Section 3.2.2 “Convergence in Practice” in [11]). We also observefaster convergence of IPA compared to both PnP-SGD and PnP-FISTA, further highlighting thepotential of IPA to address large-scale problems where partial proximal operators are easy to evaluate.

Another key feature of IPA is its memory efficiency due to incremental processing of data. Thememory considerations in optical tomography include the size of all the variables related to the desiredimage x, the measured data {yi}, and the variables related to the forward model {Ai}. Table 2records the total memory (GB) used by IPA and PnP-ADMM for reconstructing a 1024× 1024 pixelpermittivity image, with the smallest value highlighted in light-green. PnP-ADMM requires 37.63GB of memory due to its batch processing of the whole dataset, while IPA uses only 3.88 GB—nearlyone-tenth of the former—by adopting incremental processing of data (see also Supplement G.2 foradditional examples). In short, our numerical evaluations highlight both fast and stable convergenceand flexible memory usage of IPA in the context of large-scale optical tomographic imaging.

6 Conclusion

This work provides several new insights into the widely used PnP methodology in the context oflarge-scale imaging problems. First, we have proposed IPA as a new incremental PnP algorithm.IPA extends PnP-ADMM to randomized partial processing of measurements and extends traditional

8

optimization-based ADMM by integrating pre-trained deep neural nets. Second, we have theoreticallyanalyzed IPA under a set of realistic assumptions, showing that IPA can approximate PnP-ADMMto a desired precision by controlling the penalty parameter. Third, our simulations highlight theeffectiveness of IPA for nonsmooth data-fidelity terms and deep neural net priors, as well as itsscalability to large-scale imaging. We observed faster convergence of IPA compared to severalbaseline PnP methods, including PnP-ADMM and PnP-SGD, when partial proximal operators can beefficiently evaluated. IPA can thus be an effective alternative to existing algorithms for addressinglarge-scale imaging problems. For future work, we would like to explore strategies to further relax ourassumptions and explore distributed variants of IPA to enhance its performance in parallel settings.

Broader Impact

This work is expected to impact the area of computational imaging with potential applications tocomputational microscopy, computerized tomography, medical imaging, and image restoration. Thereis a growing need in computational imaging to deal with noisy and incomplete measurements byintegrating multiple information sources, including physical information describing the imaginginstrument and learned information characterizing the statistical distribution of the desired image.The ability to solve large-scale computational imaging problems has the potential to enable new tech-nological advances in 3D (space), 4D (space + time), or 5D (space, time, and spectrum) applications.These advances might lead to new imaging tools for diagnosing health conditions, understandingbiological processes, or inferring properties of complex materials.

Traditionally, imaging relied on linear models and fixed transforms (filtered back projection, wavelettransform) that are relatively straightforward to understand. Learning based methods, including ouralgorithm, have the potential to enable new technological capabilities; yet, they also come with adownside of being much more complex. Their usage might thus lead to unexpected outcomes andsurprising results when used by non-experts. While we aim to use our method to enable positivecontributions to humanity, one can also imagine nonethical usage of large-scale imaging technology.This work focuses on imaging using large-scale algorithms with learned priors, but it might be adoptedwithin broader data science, which might lead to broader impacts that we have not anticipated.

Acknowledgments and Disclosure of Funding

Research presented in this article was supported by NSF award CCF-1813910 and by the LaboratoryDirected Research and Development program of Los Alamos National Laboratory under projectnumber 20200061DR.

References[1] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based recon-

struction,” in Proc. IEEE Global Conf. Signal Process. and INf. Process. (GlobalSIP), 2013.

[2] S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A.Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Trans.Comp. Imag., vol. 2, no. 4, pp. 408–423, December 2016.

[3] N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, no. 3, pp.123–231, 2014.

[4] M. A. T. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-based image restoration,” IEEETrans. Image Process., vol. 12, no. 8, pp. 906–916, August 2003.

[5] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problemswith a sparsity constraint,” Commun. Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, November 2004.

[6] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle, “A `1-unified variational framework for imagerestoration,” in Proc. ECCV, Springer, Ed., New York, 2004, vol. 3024, pp. 1–13.

[7] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,”SIAM J. Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.

[8] J. Eckstein and D. P. Bertsekas, “On the Douglas-Rachford splitting method and the proximal pointalgorithm for maximal monotone operators,” Mathematical Programming, vol. 55, pp. 293–318, 1992.

9

[9] M. V. Afonso, J. M.Bioucas-Dias, and M. A. T. Figueiredo, “Fast image recovery using variable splittingand constrained optimization,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2345–2356, September2010.

[10] M. K. Ng, P. Weiss, and X. Yuan, “Solving constrained total-variation image restoration and reconstructionproblems via alternating direction methods,” SIAM J. Sci. Comput., vol. 32, no. 5, pp. 2710–2736, August2010.

[11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learningvia the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3,no. 1, pp. 1–122, 2011.

[12] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” inProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.

[13] W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network forimage restoration,” IEEE Trans. Patt. Anal. and Machine Intell., vol. 41, no. 10, pp. 2305–2318, Oct. 2019.

[14] K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” inProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 2019,pp. 1671–1681.

[15] R. Ahmad, C. A. Bouman, G. T. Buzzard, S. Chan, S. Liu, E. T. Reehorst, and P. Schniter, “Plug-and-playmethods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE Signal ProcessingMagazine, vol. 37, no. 1, pp. 105–116, 2020.

[16] K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, C.-B. Schnlieb, and H. Huang, “Tuning-free plug-and-playproximal algorithm for inverse imaging problems,” in Proc. 37th Int. Conf. Machine Learning (ICML),2020, arXiv:2002.09611.

[17] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-pointconvergence and applications,” IEEE Trans. Comp. Imag., vol. 3, no. 1, pp. 84–98, March 2017.

[18] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoisingnetworks for regularizing inverse imaging problems,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), Venice,Italy, Oct. 2017, pp. 1799–1808.

[19] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plug-and-play unplugged: Optimization freereconstruction using consensus equilibrium,” SIAM J. Imaging Sci., vol. 11, no. 3, pp. 2001–2020, 2018.

[20] Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized imagereconstruction,” IEEE Trans. Comput. Imaging, vol. 5, no. 3, pp. 395–408, Sept. 2019.

[21] T. Tirer and R. Giryes, “Image restoration by iterative denoising and backward projections,” IEEE Trans.Image Process., vol. 28, no. 3, pp. 1220–1234, 2019.

[22] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, “A convergent image fusion algorithmusing scene-adapted Gaussian-mixture-based denoising,” IEEE Trans. Image Process., vol. 28, no. 1, pp.451–463, Jan. 2019.

[23] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably convergewith properly trained denoisers,” in Proc. 36th Int. Conf. Machine Learning (ICML), Long Beach, CA,USA, June 2019, pp. 5546–5557.

[24] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction fromhighly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, February2006.

[25] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, April 2006.

[26] A. Beck and M. Teboulle, “Fast gradient-based algorithm for constrained total variation image denoisingand deblurring problems,” IEEE Trans. Image Process., vol. 18, no. 11, pp. 2419–2434, November 2009.

[27] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations vianuclear norm minimization,” SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010.

[28] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domaincollaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 16, pp. 2080–2095, August 2007.

[29] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning ofdeep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, July 2017.

[30] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAMRev., vol. 60, no. 2, pp. 223–311, 2018.

[31] H. Wang and A. Banerjee, “Online alternating direction method,” in Proc. 29th Int. Conf. MachineLearning (ICML), Edinburgh, Scotland, UK, June 26-July 1, 2012, pp. 1699–1706.

10

[32] H. Ouyang, N. He, L. Q. Tran, and A. Gray, “Stochastic alternating direction method of multipliers,” inProc. 30th Int. Conf. Machine Learning (ICML), Atlanta, GA, USA, 16-21 June, 2013, pp. 80–88.

[33] T. Suzuki, “Dual averaging and proximal gradient descent for online alternating direction multipliermethod,” in Proc. 30th Int. Conf. Machine Learning (ICML), Atlanta, GA, USA, June 2013, pp. 392–400.

[34] W. Zhong and J. Kwok, “Fast stochastic alternating direction method of multipliers,” in Proc. 31th Int.Conf. Machine Learning (ICML), Bejing, China, Jun 22-24, 2014, pp. 46–54.

[35] F. Huang, S. Chen, and H. Huang, “Faster stochastic alternating direction method of multipliers fornonconvex optimization,” in Proc. 36th Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, June10-15, 2019, pp. 2839–2848.

[36] M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks for inverse problems in imaging:A review,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 85–95, 2017.

[37] A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural networks for inverse problemsin imaging: Beyond analytical methods,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 20–36, Jan. 2018.

[38] F. Knoll, K. Hammernik, C. Zhang, S. Moeller, T. Pock, D. K. Sodickson, and M. Akcakaya, “Deep-learningmethods for parallel magnetic resonance imaging reconstruction: A survey of the current approaches,trends, and issues,” IEEE Signal Process. Mag., vol. 37, no. 1, pp. 128–140, Jan. 2020.

[39] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniquesfor inverse problems in imaging,” 2020, arXiv:2005.06001.

[40] J. Tan, Y. Ma, and D. Baron, “Compressive imaging via approximate message passing with imagedenoising,” IEEE Trans. Signal Process., vol. 63, no. 8, pp. 2085–2092, Apr. 2015.

[41] C. A. Metzler, A. Maleki, and R. Baraniuk, “BM3D-PRGAMP: Compressive phase retrieval based onBM3D denoising,” in Proc. IEEE Int. Conf. Image Proc., 2016.

[42] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Trans. Inf.Theory, vol. 62, no. 9, pp. 5117–5144, September 2016.

[43] A. Fletcher, S. Rangan, S. Sarkar, and P. Schniter, “Plug-in estimation in high-dimensional linear inverseproblems: A rigorous analysis,” in Proc. Advances in Neural Information Processing Systems 32, Montréal,Canada, Dec 3-8, 2018, pp. 7451–7460.

[44] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),”SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1804–1844, 2017.

[45] S. A. Bigdeli, M. Jin, P. Favaro, and M. Zwicker, “Deep mean-shift priors for image restoration,” inProc. Advances in Neural Information Processing Systems 31, Long Beach, CA, USA, Dec 4-9, 2017, pp.763–772.

[46] Y. Sun, J. Liu, and U. S. Kamilov, “Block coordinate regularization by denoising,” in Proc. Advances inNeural Information Processing Systems 33, Vancouver, BC, Canada, December 8-14, 2019, pp. 382–392.

[47] G. Mataev, M. Elad, and P. Milanfar, “DeepRED: Deep image prior powered by RED,” in Proc. IEEE Int.Conf. Comp. Vis. Workshops (ICCVW), Seoul, South Korea, Oct 27-Nov 2, 2019, pp. 1–10.

[48] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative priors,” in Proc.34th Int. Conf. Machine Learning (ICML), Sydney, Australia, Aug. 2017, pp. 537–546.

[49] V. Shah and C. Hegde, “Solving linear inverse problems using GAN priors: An algorithm with provableguarantees,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., Calgary, AB, Canada, Apr.2018, pp. 4609–4613.

[50] R. Hyder, V. Shah, C. Hegde, and M. S. Asif, “Alternating phase projected gradient descent with generativepriors for solving compressive phase retrieval,” in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcess., Brighton, UK, May 2019, pp. 7705–7709.

[51] A. Raj, Y. Li, and Y. Bresler, “GAN-based projector for faster recovery in compressed sensing withconvergence guarantees,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), Seoul, South Korea, Oct 27-Nov 2,2019, pp. 5601–5610.

[52] F. Latorre, A. Eftekhari, and V. Cevher, “Fast and provable ADMM for learning with generative priors,” inAdvances in Neural Information Processing Systems 33, Vancouver, BC, USA, December 8-14, 2019, pp.12027–12039.

[53] D. P. Bertsekas, “Incremental proximal methods for large scale convex optimization,” Math. Program. Ser.B, vol. 129, pp. 163–195, 2011.

[54] L. Tian, Z. Liu, L. Yeh, M. Chen, J. Zhong, and L. Waller, “Computational illumination for high-speed invitro fourier ptychographic microscopy,” Optica, vol. 2, no. 10, pp. 904–911, 2015.

[55] M. R. Kellman, E. Bostan, N. A. Repina, and L. Waller, “Physics-based learned design: Optimizedcoded-illumination for quantitative phase imaging,” IEEE Trans. Comput. Imag., vol. 5, no. 3, pp. 344–353,2020.

11

[56] A. Beck, First-Order Methods in Optimization, chapter The Proximal Operator, pp. 129–177, MOS-SIAMSeries on Optimization. SIAM, 2017.

[57] B. Wohlberg, “Efficient algorithms for convolutional sparse representations,” IEEE Trans. Image Process.,vol. 25, no. 1, pp. 301–315, January 2016.

[58] S. Ramani and J. A. Fessler, “A splitting-based iterative algorithm for accelerated statistical X-ray CTreconstruction,” IEEE Trans. Med. Imaging, vol. 31, no. 3, pp. 677–688, March 2012.

[59] M. S. C. Almeida and M. A. T. Figueiredo, “Deconvolving images with unknown boundaries using thealternating direction method of multipliers,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3074–3086,August 2013.

[60] E. K. Ryu and S. Boyd, “A primer on monotone operator methods,” Appl. Comput. Math., vol. 15, no. 1,pp. 3–43, 2016.

[61] S. Boyd and L. Vandenberghe, “Subgradients,” April 2008, Class notes for Convex Optimization II.http://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf.

[62] Y.-L. Yu, “Better approximation and faster algorithm using the proximal average,” in Proc. Advances inNeural Information Processing Systems 26, 2013.

[63] M. Terris, A. Repetti, J.-C. Pesquet, and Y. Wiaux, “Building firmly nonexpansive convolutional neuralnetworks,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., Barcelona, Spain, May 2020,pp. 8658–8662.

[64] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarialnetworks,” in International Conference on Learning Representations (ICLR), 2018.

[65] M. Fazlyab, A. Robey, Hassani. H., M. Marari, and G. Pappas, “Efficient and accurate estimation ofLipschitz constants for deep neural networks,” in Proc. Advances in Neural Information ProcessingSystems 33, Vancouver, BC, Canada, Dec. 2019, pp. 11427–11438.

[66] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces,Springer, 2 edition, 2017.

[67] R. Ling, W. Tahir, H. Lin, H. Lee, and L. Tian, “High-throughput intensity diffraction tomography with acomputational microscope,” Biomed. Opt. Express, vol. 9, no. 5, pp. 2130–2141, May 2018.

[68] H. Sedghi, V. Gupta, and P. M. Long, “The singular values of convolutional layers,” in InternationalConference on Learning Representations (ICLR), 2019.

[69] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and itsapplication to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. IEEE Int.Conf. Comp. Vis. (ICCV), Vancouver, Canada, July 7-14, 2001, pp. 416–423.

[70] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of MathematicalImaging and Vision, vol. 20, no. 1, pp. 89–97, 2004.

[71] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” Comput. Vis.Patt. Recong. (CVPR) 2015 Workshop on "Future of Datasets", 2015.

[72] H. H. Bauschke, R. Goebel, Y. Lucet, and X. Wang, “The proximal average: Basic theory,” SIAM J. Optim.,vol. 19, no. 2, pp. 766–785, 2008.

[73] R. T. Rockafellar and R. J-B Wets, Variational Analysis, Springer, 1998.

[74] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge Univ. Press, 2004.

[75] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers,2004.

[76] R. T. Rockafellar, Convex Analysis, chapter Conjugate Saddle-Functions and Minimax Theorems, pp.388–398, Princeton Univ. Press, Princeton, NJ, 1970.

[77] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference onLearning Representations (ICLR), 2015.

12

http://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf

Supplementary Material for“Scalable Plug-and-Play ADMM with Convergence Guarantees”We adopt the monotone operator theory [60, 66] for a unified analysis of IPA. In Supplement A, wediscuss a minibatch variant of IPA that enables parallel evaluation of several component proximaloperators. In Supplement B, we present the convergence analysis of IPA. In Supplement C, weanalyze the convergence of the algorithm for strongly convex data-fidelity terms and contractivedenoisers. In Supplement D, we discuss interpretation of IPA’s fixed-points from the perspective ofmonotone operator theory. For completeness, in Supplement E, we discuss the convergence resultsfor traditional PnP-ADMM [23]. In Supplement F, we provide the background material used in ouranalysis. In Supplement G, we provide additional technical details, omitted from the main paper dueto space, such as the details on our deep neural net architecture and results of additional simulations.

For the sake of simplicity, this supplement uses ‖ · ‖ to denote the standard `2-norm in Rn. We willalso use D(·) instead of Dσ(·) to denote the denoiser, thus dropping the explicit notation for σ.

A Minibatch Implementation of IPA

Algorithm 2 Incremental Plug-and-Play ADMM (IPA) (Minibatch Version)1: input: initial values x0, s0 ∈ Rn, parameters γ, σ > 0, minibatch size p ≥ 1.2: for k = 1, 2, 3, . . . do3: Choose indices i1, . . . , ip from the set {1, . . . , b}.4: zk ← G(xk−1 + sk−1) where G(z) := 1

p

∑pj=1 proxγgij

(z) . impose data-consistency

5: xk ← Dσ(zk − sk−1) . impose prior knowledge

6: sk ← sk−1 + xk − zk

7: end for

Algorithm 2 presents the minibatch version of IPA that averages several proximal operators evaluatedover different data blocks. When the minibatch size p = 1, Algorithm 2 reverts to Algorithm 1.The main benefit of minibatch IPA is its suitability for parallel computation of G, which can takeadvantage of multi-processor architectures. The convergence analysis in Theorem 1 can be easilyextended to minibatch IPA with a straightforward extension of Lemma 1 in Supplement B.2 to severalindices, and by following the steps of the main proof in Supplement B.1.

Minibatch IPA is related to the proximal average approximation of G = proxγg [62, 72]

G(x) =1

b

b∑

i=1

proxγgi(x), x ∈ Rn.

When Assumption 1 is satisfied, then the approximation error is bounded for any x ∈ Rn as [62]

‖G(x)− G(x)‖ ≤ 2γL.

Minibatch IPA thus simply uses a minibatch approximation G of the proximal average G. Oneimplication of this is that even when the minibatch is exactly equal to the full measurement vector,minibatch IPA is not exact due to the approximation error introduced by the proximal average.However, the resulting approximation error can be made as small as desired by controlling the penaltyparameter γ > 0.

B Convergence Analysis of IPA

In this section, we present one of the main results in this paper, namely the convergence analysis ofIPA. A fixed-point convergence of averaged operators is well-known under the name of Krasnosel’skii-Mann theorem (see Section 5.2 in [66]) and was recently applied to the analysis of PnP-SGD [20].Additionally, PnP-ADMM was analyzed for strongly convex data-fidelity terms g and contractiveresidual denoisers Rσ [23]. Our analysis here extends these results to IPA by providing an explicitupper bound on the convergence of IPA. In Section B.1, we present the main steps of the proof, whilein Section B.2 we prove two technical lemmas useful for our analysis.

13

B.1 Proof of Theorem 1

Supplement D.3 establishes that S defined in (10) is firmly nonexpansive. Consider any v∗ ∈ zer(S)and any v ∈ Rn, then we have

‖v − v∗ − Sv‖2 = ‖v − v∗‖ − 2(Sv − Sv∗)T(v − v∗) + ‖Sv‖2 ≤ ‖v − v∗‖2 − ‖Sv‖2, (14)

where we used the firm nonexpansiveness of S and Sx∗ = 0. The direct consequence of (14) is that

‖v − v∗ − Sv‖ ≤ ‖v − v∗‖.

We now consider the following two equivalent representations of IPA for some iteration k ≥ 1

zk = Gik(xk−1 + sk−1)

xk = D(zk − sk−1)sk = sk−1 + xk − zk,

⇔

xk−1 = D(vk−1)zk = Gik(2x

k−1 − vk−1)vk = vk−1 + zk − xk−1

(15)

where ik is a random variable uniformly distributed over {1, . . . , b}, Gi = proxγgi is the proximaloperator with respect to gi, and D is the denoiser. To see the equivalence between the left and theright sides of (15), simply introduce the variable vk = zk − sk−1 into the right side of (15). It isstraightforward to verify that the right side of (15) can also be rewritten as

vk = vk−1 − Sik(vk−1) with Sik := D− Gik(2D− I). (16)

Then, for any v∗ ∈ zer(S), we have that

‖vk − v∗‖2 = ‖(vk−1 − v∗ − Svk−1) + (Svk−1 − Sikvk−1)‖2

= ‖vk−1 − v∗ − Svk−1‖2 + 2(Svk−1 − Sikvk−1)T(vk−1 − v∗ − Svk−1) + ‖Svk−1 − Sikv

k−1‖2

≤ ‖vk−1 − v∗‖2 − ‖Svk−1‖2 + 2‖Svk−1 − Sikvk−1‖‖vk−1 − v∗‖+ ‖Svk−1 − Sikv

k−1‖2

≤ ‖vk−1 − v∗‖2 − ‖Svk−1‖2 + 2(R+ 2γL)‖Svk−1 − Sikvk−1‖+ ‖Svk−1 − Sikv

k−1‖2,where in the first inequality we used Cauchy-Schwarz and (14), and in the second inequality we usedLemma 2 in Supplement B.2. By taking the conditional expectation on both sides, invoking Lemma 1in Supplement B.2, and rearranging the terms, we get

‖Svk−1‖2 ≤ ‖vk−1 − v∗‖2 − E[‖vk − v∗‖2 | vk−1

]+ 4γLR+ 12γ2L2

Hence, by averaging over t ≥ 1 iterations and taking the total expectation, we obtain

E

[1

t

t∑

k=1

‖Svk−1‖2]≤ (R+ 2γL)2

t+ 4γLR+ 12γ2L2.

The final result is obtained by noting that

4γLR+ 12γ2L2 ≤ max{γ, γ2}(4LR+ 12L2).

B.2 Lemmas Useful for the Proof of Theorem 1

This section presents two technical lemmas used in our analysis in Section B.1.Lemma 1. Assume that Assumptions 1-3 hold and let ik be a uniform random variable over{1, . . . , b}. Then, we have that

E[‖Sikv − Sv‖2

]≤ 4γ2L2, v ∈ Rn.

Proof. Let zi = Gi(x) and z = G(x) for any 1 ≤ i ≤ b and x ∈ Rn. From the optimality conditionsfor each proximal operator

Gix = proxγgi(x) = x− γgi(zi), gi(zi) ∈ ∂gi(zi)and

Gx = proxγg(x) = x− γg(z) such that g(z) =1

b

b∑

i=1

gi(z) ∈ ∂g(z),

14

where we used Proposition 7 in Supplement F.2. By using the bound on all the subgradients (due toAssumption 1 and Proposition 8 in Supplement F.2), we obtain

‖Gix− Gx‖ = ‖proxγgi(x)− proxγg(x)‖ = γ‖gi(zi)− g(z)‖ ≤ 2γL,

where L > 0 is the Lipschitz constant of all gis and g. This inequality directly implies that

‖Sv − Siv‖ = ‖G(2Dv − v)− Gi(2Dv − v)‖ ≤ 2γL.

Since, this inequality holds for every i, it also holds in expectation.

Lemma 2. Assume that Assumptions 1-3 hold and let the sequence {vk} be generated via theiteration (16). Then, for any k ≥ 1, we have that

‖vk − v∗‖ ≤ (R+ 2γL) for all v∗ ∈ zer(S).

Proof. The optimality of the proximal operator in (16) implies that there exists gik(zk) ∈ ∂gik(zk)

such thatzk = Gik(2x

k−1 − vk−1) ⇔ 2xk−1 − vk−1 − zk = γgik(zk).

By applying vk = vk−1 − Sik(vk−1) = vk−1 + zk − xk−1 to the equality above, we obtain

xk−1 − vk = γgik(zk) ⇔ vk = xk−1 − γgik(zk).

Additionally, for any v∗ ∈ zer(S) and x∗ = D(v∗), we have that

S(v∗) = D(v∗)− G(2D(v∗)− v∗) = x∗ − G(2x∗ − v∗) = 0

⇒ x∗ − v∗ = γg(x∗) for some g(x∗) ∈ ∂g(x∗).Thus, by using Assumption 3 and the bounds on all the subgradients (due to Assumption 1 andProposition 8 in Supplement F.2), we obtain

‖vk − v∗‖ = ‖xk−1 − γgik(zk)− x∗ − γg(x∗)‖≤ ‖xt−1 − x∗‖+ 2γL

≤ (R+ 2γL).

C Analysis of IPA for Strongly Convex Functions

In this section, we perform analysis of IPA under a different set of assumptions, namely under theassumptions adopted in [23].

Assumption 4. Each gi is proper, closed, strongly convex with constant Mi > 0, and Lipschitzcontinuous with constant Li > 0. We define the smallest strong convexity constant as M =min{M1, . . . ,Mb} and the largest Lipschitz constant as L = max{L1, . . . , Lb}.

This assumption further restricts Assumption 1 in the main paper to strongly convex functions.

Assumption 5. The residual Rσ := I− Dσ of the denoiser Dσ is a contraction. It thus satisfies

‖Rx− Ry‖ ≤ ε‖x− y‖,for all x,y ∈ Rn for some constant 0 < ε < 1.

This assumption replaces Assumption 2 in the main paper by assuming that the residual of thedenoiser is a contraction. Note that this can be practically imposed on deep neural net denoisers viaspectral normalization [64]. We can then state the following.Theorem 3. Run IPA for t ≥ 1 iterations with random i.i.d. block selection under Assumptions 3-5using a fixed penalty parameter γ > 0. Then, the iterates of IPA satisfy

E[‖xk − x∗‖

]≤ ηk(2R+ 4γL) +

4γL

1− η , 0 < η < 1.

15

Proof. It was shown in Theorem 2 of [23] that under Asumptions 4 and 5, we have that

‖(I− S)x− (I− S)y‖ ≤ η‖x− y‖ with η :=

(1 + ε+ εγM + 2ε2γM

1 + αM + 2εγM

),

for all x,y ∈ Rn, where S is given in (10). Hence, whenε

γM(1 + ε− 2ε2)< 1,

the operator (I− S) is a contraction.

Using the reasoning in Supplement B, the sequence vk = zk − sk−1 can be written asvk = vk−1 − Sik(v

k−1) with Sik := D− Gik(2D− I). (17)

Then, for any v∗ ∈ zer(S), we have that

‖vk − v∗‖2 = ‖(I− Sik)vk−1 − (I− S)v∗ + (I− S)vk−1 − (I− S)vk−1‖2

= ‖(I− S)vk−1 − (I− S)v∗‖2 + 2((I− S)vk−1 − (I− S)v∗)T((I− Sik)vk−1 − (I− S)vk−1)

+ ‖(I− Sik)vk−1 − (I− S)vk−1‖2

≤ η2‖vk−1 − v∗‖2 + 2η‖vk−1 − v∗‖‖Sikvk−1 − Svk−1‖+ ‖Sikvk−1 − Svk−1‖2,where we used the Cauchy-Schwarz inequality and the fact that (I− S) is η-contractive. By takingthe conditional expectation on both sides, invoking Lemma 1 in Supplement B.2, and completing thesquare, we get

E[‖vk − v∗‖2|vk−1

]≤(η‖vk−1 − v∗‖+ 2γL

)2.

Then, by applying the Jensen inequality and taking the total expectation, we getE[‖vk − v∗‖

]≤ ηE

[‖vk−1 − v∗‖

]+ 2γL.

By iterating this result and invoking Lemma 2 from Supplement B.2, we obtainE[‖vk − v∗‖

]≤ ηk(R+ 2γL) + (2γL)/(1− η).

Finally by using the nonexpansiveness of (1/(1 + ε))D (see Lemma 9 in [23]) and the fact thatx∗ = D(v∗), we obtain

E[‖xk − x∗‖

]≤ (1 + ε)

[ηk(R+ 2γL) + (2γL)/(1− η)

]≤ ηk(2R+ 4γL) + (4γL)/(1− η).

This concludes the proof.

D Fixed Point Interpretation

Fixed points of PnP algorithms have been extensively discussed in the recent literature [18, 19, 23].Our goal in this section is to revisit this topic in a way that leads to a more intuitive equilibriuminterpretation of PnP. Our formulation has been inspired from the classical interpretation of ADMMas an algorithm for computing a zero of a sum of two monotone operators [8].

D.1 Equilibrium Points of PnP Algorithms

It is known that a fixed point (x∗, z∗, s∗) of PnP-ADMM (and of all PnP algorithms [18]) satisfiesx∗ = G(x∗ + s∗) (18a)x∗ = D(x∗ − s∗), (18b)

with x∗ = z∗, where G = proxγg. Consider the inverse of D at x ∈ Rn, which is a set-valuedoperator D−1(x) := {z ∈ Rn : x = Dσ(z)}. Note that the inverse operator exists even when D isnot a bijection (see Section 2 of [60]). Then, from the definition of D−1 and optimality conditions ofthe proximal operator, we can equivalently rewrite (18) as follows

s∗ ∈ γ∂g(x∗)−s∗ ∈ D−1(x∗)− x∗.

This directly leads to the following equivalent representation of PnP fixed points0 ∈ T(x∗) := γ∂g(x∗) + (D−1(x∗)− x∗). (19)

Hence, a vector x∗ computed by PnP can be interpreted as an equilibrium point between two termswith γ > 0 explicitly influencing the balance.

16

D.2 Equivalence of Zeros of T and S

Define v∗ := z∗ − s∗ for a given fixed point (x∗, z∗, s∗) of PnP-ADMM and consider the operator

S = D− G(2D− I) with G = proxγg,

which was defined in (10) of the main paper. Note that from (18), we also have x∗ = D(v∗) andv∗ = x∗ − s∗ (due to z∗ = x∗). We then have the following equivalence

0 ∈ T(x∗) = γ∂g(x∗) + (D−1(x∗)− x∗)

⇔{x∗ = G(x∗ + s∗)x∗ = D(x∗ − s∗)

⇔{x∗ = G(2x∗ − v∗)x∗ = D(v∗)

⇔ S(v∗) = D(v∗)− G(2D(v∗)− v∗) = 0,

where we used the optimality conditions of the proximal operator G. Hence, the condition thatv∗ = z∗ − s∗ ∈ zer(S) is equivalent to x∗ = D(v∗) ∈ zer(T).

D.3 Firm Nonexpansiveness of S

We finally would like to show that under Assumptions 1-3, the operator S is firmly nonexpansive.Assumption 2 and Proposition 6 in Supplement F.2 imply that D and G are firmly nonexpansive.Then, Proposition 4 in Supplement F.1 implies that (2D− I) and (2G− I) are nonexpansive. Thus,the composition (2G− I)(2D− I) is also nonexpansive and

(I− S) =1

2I +

1

2(2G− I)(2D− I) (20)

is (1/2)-averaged. Then, Proposition 4 in Supplement F.1 implies that S is firmly nonexpansive.

E Convergence Analysis of PnP-ADMM

The following analysis has been adopted from [23]. For completeness, we summarize the key resultsuseful for our own analysis by restating them under the assumptions in the main paper.

E.1 Equivalence between PnP-ADMM and PnP-DRS

An elegant analysis of PnP-ADMM emerges from its interpretation as the Douglas–Rachford splitting(DRS) algorithm [23]. This equivalence is well-known and has been extensively studied in the contextof convex optimization [8]. Here, we restate the relationship for completeness.

Consider the following DRS (left) and ADMM (right) sqeuences

xk−1 = D(vk−1)zk = G(2xk−1 − vk−1)vk = vk−1 + zk − xk−1

⇔

zk = G(xk−1 + sk−1)xk = D(zk − sk−1)sk = sk−1 + xk − zk,

(21)

where G := proxγg is the proximal operator and D is the denoiser. To see the equivalence betweenthem, simply introduce the variable change vk = zk − sk−1 into DRS. Note also the DRS sequencecan be equivalently written as

vk = vk−1 − S(vk−1) with S := D− G(2D− I).

To see this simply rearrange the terms in DRS as follows

vk = vk−1 + G(2xk−1 − vk−1)− xk−1

= vk−1 −[D(vk−1)− G(2D(vk−1)− vk−1)

].

17

E.2 Convergence Analysis of PnP-DRS and PnP-ADMM

It was established in Supplement D.3 that S defined in (10) of the main paper is firmly nonexpansive.

Consider a single iteration of DRS v+ = v − Sv. Then, for any v∗ ∈ zer(S), we have

‖v+ − v∗‖2 = ‖v − v∗ − Sv‖2

= ‖v − v∗‖2 − 2(Sv − Sv∗)T(v − v∗) + ‖Sv‖2

≤ ‖v − v∗‖2 − ‖Sv‖2,where we used Sv∗ = 0 and firm nonexpansiveness of S. By rearranging the terms, we obtain thefollowing upper bound at iteration k ≥ 1

‖Svk−1‖2 ≤ ‖vk−1 − v∗‖2 − ‖vk − v∗‖2. (22)

By averaging the inequality (22) over t ≥ 1 iterations, we obtain

1

t

t∑

k=1

‖Svk−1‖2 ≤ ‖v0 − v∗‖2t

≤ (R+ 2γL)2

t

where used the bound on ‖v0 − v∗‖ ≤ (R+ 2γL) that can be easily obtained by following the stepsin Lemma 2 in Supplement B.2.

This result directly implies that ‖Svt‖ → 0 as t→ 0. Additionally, Krasnosel’skii-Mann theorem(see Section 5.2 in [66]) implies that vt → zer(S). Then, from continuity of D, we have thatxt = D(vt)→ zer(T) (see also Supplement D.2). This completes the proof.

F Background material

This section summarizes well-known results from the optimization literature that can be found indifferent forms in standard textbooks [66, 73–75].

F.1 Properties of Monotone Operators

Definition 1. An operator T is Lipschitz continuous with constant λ > 0 if

‖Tx− Ty‖ ≤ λ‖x− y‖, x,y ∈ Rn.

When λ = 1, we say that T is nonexpansive. When λ < 1, we say that T is a contraction.

Definition 2. T is monotone if

(Tx− Ty)T(x− y) ≥ 0, x,y ∈ Rn.

We say that it is strongly monotone or coercive with parameter µ > 0 if

(Tx− Ty)T(x− y) ≥ µ‖x− y‖2, x,y ∈ Rn.

Definition 3. T is cocoercive with constant β > 0 if

(Tx− Ty)T(x− y) ≥ β‖Tx− Ty‖2, x,y ∈ Rn.

When β = 1, we say that T is firmly nonexpansive.

The following results are derived from the definition above.

Proposition 1. Consider R = I− T where T : Rn → Rn.

T is nonexpansive ⇔ R is (1/2)-cocoercive.

18

Proof. First suppose that R is 1/2 cocoercive. Let h := x− y for any x,y ∈ Rn. We then have1

2‖Rx− Ry‖2 ≤ (Rx− Ry)Th = ‖h‖2 − (Tx− Ty)Th.

We also have that1

2‖Rx− Ry‖2 =

1

2‖h‖2 − (Tx− Ty)Th+

1

2‖Tx− Ty‖2.

By combining these two and simplifying the expression

‖Tx− Ty‖ ≤ ‖h‖.The converse can be proved by following this logic in reverse.

Proposition 2. Consider R = I− T where T : Rn → Rn.

T is Lipschitz continuous with constant λ < 1 ⇒ R is (1− λ)-strongly monotone.

Proof. By using the Cauchy-Schwarz inequality, we have for all x,y ∈ Rn

(Rx− Ry)T(x− y) = ‖x− y‖2 − (Tx− Ty)T(x− y)

≥ ‖x− y‖2 − ‖Tx− Ty‖‖x− y‖≥ ‖x− y‖2 − λ‖x− y‖2 ≥ (1− λ)‖x− y‖2.

Definition 4. For a constant α ∈ (0, 1), we say that T is α-averaged, if there exists a nonexpansiveoperator N such that T = (1− α)I + αN.

The following characterization is often convenient.Proposition 3. For a nonexpansive operator T, a constant α ∈ (0, 1), and the operator R := I− T,the following are equivalent

(a) T is α-averaged

(b) (1− 1/α)I + (1/α)T is nonexpansive

(c) ‖Tx− Ty‖2 ≤ ‖x− y‖2 −(1−αα

)‖Rx− Ry‖2, x,y ∈ Rn.

Proof. See Proposition 4.35 in [66].

Proposition 4. Consider T : Rn → Rn and β > 0. Then, the following are equivalent

(a) T is β-cocoercive

(b) βT is firmly nonexpansive

(c) I− βT is firmly nonexpansive.

(d) βT is (1/2)-averaged.

(e) I− 2βT is nonexpansive.

Proof. For any x,y ∈ Rn, let h := x− y. The equivalence between (a) and (b) is readily observedby defining P := βT and noting that

(Px− Py)Th = β(Tx− Ty)Th

and ‖Px− Py‖2 = β2‖Tx− Ty‖. (23)

Define R := I− P and suppose (b) is true, then

(Rx− Ry)Th = ‖h‖2 − (Px− Py)Th

= ‖Rx− Ry‖2 + (Px− Py)Th− ‖Px− Py‖2

≥ ‖Rx− Ry‖2.

19

By repeating the same argument for P = I− R, we establish the full equivalence between (b) and (c).

The equivalence of (b) and (d) can be seen by noting that

2‖Px− Py‖2 ≤ 2(Px− Py)Th

⇔ ‖Px− Py‖2 ≤ 2(Px− Py)Th− ‖Px− Py‖2

= ‖h‖2 − (‖h‖2 − 2(Px− Py)Th+ ‖Px− Py‖2)= ‖h‖2 − ‖Rx− Ry‖2.

To show the equivalence with (e), first suppose that N := I− 2P is nonexpansive, thenP = 1

2 (I + (−N)) is 1/2-averaged, which means that it is firmly nonexpansive. On the other hand, ifP is firmly nonexpansive, then it is 1/2-averaged, which means that from Proposition 3(b) we havethat (1− 2)I+2P = 2P− I = −N is nonexpansive. This directly means that N is nonexpansive.

F.2 Convex functions, subdifferentials, and proximal operators

Proposition 5. Let f be a proper, closed, and convex function. Then, ∂f is a monotone operator

(g − h)T(x− y) ≥ 0, x,y ∈ Rn, g ∈ ∂f(x), h ∈ ∂f(y).If f is strongly convex with constant µ > 0, then ∂f is strongly monotone with the same constant.

(g − h)T(x− y) ≥ µ‖x− y‖2, x,y ∈ Rn, g ∈ ∂f(x), h ∈ ∂f(y).

Proof. Consider a strongly convex function f with a constant µ ≥ 0. Then, we have that{f(y) ≥ f(x) + gT(y − x) + µ

2 ‖y − x‖2f(x) ≥ f(y) + hT(x− y) + µ

2 ‖x− y‖2 ⇒ (g − h)T(x− y) ≥ µ‖x− y‖2.

The proof for a weakly convex f is obtained by considering µ = 0 in the inequalities above.

It is well-known that the proximal operator is firmly nonexpansive.

Proposition 6. Proximal operator proxγf of a proper, closed, and convex f is firmly nonexpansive.

Proof. Denote with x1 = Gz1 = proxγf (z1) and x2 = Gz2 = proxγf (z2), then{(z1 − x1) ∈ γ∂f(x1)

(z2 − x2) ∈ γ∂f(x2)⇒ (z1 − x1 − z2 + x2)

T(x1 − x2) ≥ 0

⇒ (Gz1 − Gz2)T(z1 − z2) ≥ ‖Gz1 − Gz2‖2

The following proposition is sometimes referred to as Moreau-Rockafellar theorem. It establishesthat for functions defined over all of Rn, we have that ∂f = ∂f1 + · · ·+ ∂fm.

Proposition 7. Consider f = f1 + · · ·+ fm, where f1, . . . , fm are proper, closed, and convexfunctions on Rn. Then

∂f1(x) + · · ·+ ∂fm(x) ⊂ ∂f(x), x ∈ Rn

Moreover, suppose that convex sets ri(dom fi) have a point in common, then we also have

∂f(x) ⊂ ∂f1(x) + · · ·+ ∂fm(x), x ∈ Rn.

Proof. See Theorem 23.8 in [76].

20

Residual Learning

L(bx, x)

<latexit sha1_base64="Pq2Rkz3cgvTSHPtPfcND7Ph9v/k=">AAAMRnicjZbdbts2FMfV7quNuy3dsIthN0SCAC3gZXZRYNjFgCZOmwQwAi9N2qJTF1AyJRORRIWkYjuCsKfZ7fYae4W9xO6G3e5QEo+k2BsmIBH5O+fw438OZXppxJUeDP64c/e99z/48KN79zd6Dz7+5NPNh5+9UiKTPjv3RSTkG48qFvGEnWuuI/YmlYzGXsRee5cjY399zaTiIjnTy5S9i2mY8ID7VAO62PzSjame+TTKx8Ujd+HFM6r75v34YnN7sDsoH7LaGNaNbad+JhcPe1+4U+FnMUu0H1GlfhwOUv0up1JzP2LFhpspllL/koYs94S41NRTBameHZJKETBlFkqjr68yGnG9JOASMdWJpLEKRKJbkV4EJk9QOSVmM0QtY09EK1EG2yCIuqYyEXrGk7DjmHCfBZL6zfC+iMGo7bAkEJIMv3nSJ0z7u53YmPtSGJULG2uJCCVNZ8uuBHFxH54dwjVs1icw+LTj4HPNitIjlCJLCfTLrHV3Vo7M/UVhHAMeZvKWYEYTLWDllQesfsvUTcKurra6GkWhkFzPYNEFeGJXrfcqbgtsWLlcPWNCsltxLMli2FBcrkILMqPXjERMayaJsTG5ZnNxFmkuxbzJBkSGLDHObF11AMoiKkGN/+U+m5mDc2sncjlVsygpFzoVsMApMV6K8ITQapROwKI8hsXGxs7ODplIIQKYxGqglxHL04jypDBFbhrk+zrlfTJlAU+42TdAKWKabLgJm9fBOZyJVKjSXuSTVqfj1QxS5AdNu+NTv4v8rG50rBGLY1rk4/LVsfhCiggEXRb5CJsdDzjoWZxWs+817e4osG2Ye1S+Kp1EapIiJEgF6zd/Se72XVuZbr+oDFSGMQeTqWKRVm8VQJJCtw+Gomj50cW/+NEF+kU8XnECBnZIDzS4rvzWTVrNR2qHNbNVE9UOKktXHIDhSngSrNiBoR2Sv0CD6eAWRNgsXYQW3zCJGNoWB7wZBdoWT1WA2IVCtPwgGZ2cUKXz3NVsofOyX/zkAirq1PlwSjWFT3AVMTZK5OOLZjjNkyWptKhDOp+4Ml1eDDPAP4r79SzyEPkW+YiYRQzRzKIZImWRQrSwqKmF0KIQ0dKiJaIbi24QSYskoiuLrhAlFjVVmlmUIRpbNEZ0bdE1orlFc0QiZiHKWPXQBgdPN7ayV7TqRBgKfwPL9qzvHrodWHSA6Nii43WJJXP4VSBQJ0zyBWpelVIpuy0iKzSall3TTct00y29lenguoJTQRuC5nzKTMumuo5UNKnr9QxKFGv1DDf3so1fNjK0cSPFizZ+gfiojY8Qn7TxCeLDNj5sZG7jY8STNp4gPm3jU8TnbXz+Hwus5GndoUpxHymmFREBgV9mD+6Pj6vwURXrefkI50d0atFzRM9RAUQnFr1F9Ba/LJDd+j5Tz2dupjneUUeNol1DS9OuoVF1v2vYx8poPkoHnhXHa+f6qI2bpB62cTPPXhub4wQX6eHta/Nq49WT3eHT3e9+eLr9bL++Ut9zvnK2nEfO0PnWeeYcORPn3PGdn51fnF+d33q/9/7s/dX7u3K9e6eO+dzpPA+cfwCeNYIJ</latexit>

Back-propagation

3x3 Conv. ReLUR�

<latexit sha1_base64="elsw6q1RBE09gg6RtoigQTIhkkk=">AAAMPnicjZbdbts2FMfV7quN95Fs2NVuiAQBOiDL7KLAsIsBTZw2CRAEXpq0Rac2oGRKIiKJCknFdgQ9ym6319hr7AV2N+x2lzuUxCMp9oYJsH34O+fw489DmV4Wc6WHw9/v3X/v/Q8+/OjBw7XBx598+tn6xucvlcilzy58EQv52qOKxTxlF5rrmL3OJKOJF7NX3tXY+F/dMKm4SM/1ImNvExqmPOA+1YAu1zfchOpIBcVZeekqHib0cn1ruDusHrJsjBpjy2meyeXG4Et3Kvw8Yan2Y6rUT6Nhpt8WVGrux6xcc3PFMupf0ZAVnhBXmnqqJPWzTTIpAqbMBGn8zXVOY64XBEJipnqZNFGBSHUn04vB5Qkqp8QsgqhF4ol4KctgmwRZN1SmQkc8DXuBKfdZIKnfdu+LBJzadksCIcno28c7hGl/t5ebcF8Ko25pcy0RoaRZtOhLkJQP4dkmXMNifQKdT3sBPtesrCJCKfKMQLvarf7Kqp65Py9NYMDDXN4RzGiiBcy8joDZb5p6Sdn19WZfozgUkusIJl1CJDbV6qjyrsCGVdPVEROS3cljaZ7AgpJqFlqQiN4wEjOtmSTGx+SKxSV5rLkUs3Y3IDNkqQlmq6oDUB5TCWr8r/AoMgfmzkrkYqqiOK0mOhUwwSkxUYrwlNC6l17CvDp+5dra9vY2mUghAhjEaqAXMSuymPK0NEVuDPJDs+U7ZMoCnnKzboBSJDRdc1M2a5ILOBOZUJW/LCadRi+q7aQsDlq7F9P8lsV5Y/S8MUsSWhYn1U/P4wspYhB0URZjNHsRcNDzJKtH32vtfi+wbBh7XP3UOonMbIqQIBXM33zSwt1xbWW6O2XtoDJMOLhMFYussK8pwO4OOMqyE0fn/xJH5xgX82QpCBj4YXvA4LqOWzVoPR5pAlaMVg/UBKg8WwoAhjPhabDkB4Z+2Pw5OkwDlyDCduoitPiWScRgWxzwthewLZ6qALELhWj5QTo+PaVKF4Wr2VwXVbt85wIqm63z4ZRqCq/gOuPEKFGcXLbdaZ4uSK1Fk9J7xVXb5SUwAnxRXK9nkYfIt8hHxCxiiCKLIkTKIoVoblFbC6FFIaKFRQtEtxbdIpIWSUTXFl0jSi1qqzS3KEd0YtEJohuLbhDNLJohEgkLUca6hT44eLr1Va2yUyfCUPgMLduzsXsYdmDRAaJji45XbSyZwb8CgTphks9R87qUKtltEVmh0bXou247rtt+6S0NF1GNQ4ENSTM+ZcayW91kKpo29XoOJYq1eo6Le9HFL1oZuriV4nkXP0d81MVHiE+7+BTxYRcftjJ38THiSRdPEJ918Rniiy6++I8J1vJ07lCVuI8U04qIgMA/swf3xq/r9HGd63nFGMdHdGbRM0TPUAFEpxa9QfQG3yywu819phkPSDNfY41bRfuOjqZ9R6vqft+xj5XRvpQOPCuO193roy5uN/Wwi9tx9rrYHCe4SI/uXpuXjZePd0dPdr//8cnW0/3mSv3A+crZdB45I+c756lz5EycC8d3Zs7Pzi/Or4PfBn8M/hz8VYfev9fkfOH0nsHf/wA1X4Az</latexit>

I� R�

<latexit sha1_base64="6HSwdZgd+0UhE0JuVgxDfeY8FsE=">AAAMQ3icjZbdbts2FMfV7quN95Fu2G52QyQI0AFpZhcFhl0MaOK0SYAg8NKkLTp1ASVTEhFJVEgqtqPpZXa7vcYeYs+wu2G3A3YoiUdS7A0TYPvwd87hx5+HMr0s5koPh7/fufvOu++9/8G9+2uDDz/6+JP1B5++VCKXPjv3RSzka48qFvOUnWuuY/Y6k4wmXsxeeZdj4391zaTiIj3Ti4y9TWiY8oD7VAO6WP/CPVLBIzehOlJBcVpeuIqHCb1Y3xzuDKuHLBujxth0mmdy8WDwuTsVfp6wVPsxVeqH0TDTbwsqNfdjVq65uWIZ9S9pyApPiEtNPVWS+tkimRQBU2aWNH50ldOY6wWBkJipXiZNVCBS3cn0YnB5gsopMYsgapF4Il7KMtgmQdY1lanQEU/DXmDKfRZI6rfd+yIBp7bdkkBIMvr68TZh2t/p5Sbcl8JIXNpcS0QoaRYt+hIk5X14tgjXsFifQOfTXoDPNSuriFCKPCPQrrasv7KqZ+7PSxMY8DCXtwQzmmgBM68jYPYbpmhSdnW10dcoDoXkOoJJlxCJTbU6qrwtsGHVdHXEhGS38liaJ7CgpJqFFiSi14zETGsmifExuWJxSR5rLsWs3Q3IDFlqgtmq6gCUx1SCGv8rPIrMqbm1ErmYqihOq4lOBUxwSkyUIjwltO6llzCvzmC5tra1tUUmUogABrEa6EXMiiymPC1NkRuDfNds+TaZsoCn3KwboBQJTdfclM2a5ALORCZU5S+LSafRi2o7KYv91u7FNL9lcdYYPW/MkoSWxXH10/P4QooYBF2UxRjNXgQc9DzJ6tF3W7vfCywbxh5XP7VOIjObIiRIBfM3n7Rwt11bme52WTuoDBMOLlPFIivsawqwuw2OsuzE0fm/xNE5xsU8WQoCBn7YHjC4ruNWDVqPR5qAFaPVAzUBKs+WAoDhTHgaLPmBoR82f44O08AliLCduggtvmESMdgWB7ztBWyLpypA7EIhWr6fjk9OqNJF4Wo210XVLn90AZXN1vlwSjWFV3CdcWyUKI4v2u40Txek1qJJ6b3iqu3yEhgBviiu17PIQ+Rb5CNiFjFEkUURImWRQjS3qK2F0KIQ0cKiBaIbi24QSYskoiuLrhClFrVVmluUIzq26BjRtUXXiGYWzRCJhIUoY91CHxw83fqqVtmpE2EofIaW7drYXQzbt2gf0ZFFR6s2lszgX4FAnTDJ56h5XUqV7LaIrNDoWvRdNx3XTb/0loaLqMahwIakGZ8yY9mtbjIVTZt6PYMSxVo9w8W96OIXrQxd3ErxvIufIz7s4kPEJ118gvigiw9ambv4CPGkiyeIT7v4FPF5F5//xwRreTp3qErch4ppRURA4J/Zg8vjV3X6uM71vGKM4yM6tegZomeoAKITi94geoNvFtjd5j7TjAekma+xxq2ifUdH076jVXWv79jDymhfSvueFcfr7vVhF7ebetDF7Ti7XWyOE1ykR7evzcvGy8c7oyc7337/ZPPpXnOlvud86Ww4D52R843z1Dl0Js654zs/OT87vzi/Dn4b/DH4c/BXHXr3TpPzmdN7Bn//A+osghA=</latexit>

Figure 3: Illustration of the architecture of DnCNN used in all experiments. Vectors x and x denotethe denoised image and ground truth, respectively. The neural net is trained to remove the AWGNfrom its noisy input image. We also constrains the Lipschitz constant of Rσ to be smaller than 1by using the spectral normalization technique in [68]. This provides a necessary condition for thesatisfaction of Assumption 2.

Proposition 8. Let f be a convex function, then we have that

f is Lipschitz continuous with constant L > 0 ⇔ ‖g(x)‖ ≤ L, g(x) ∈ ∂f(x), x ∈ Rn.

Proof. First assume that ‖g(x)‖ ≤ L for all subgradients. Then, from the definition of subgradient{f(x) ≥ f(y) + g(y)T(x− y)

f(y) ≥ f(x) + g(x)T(y − x)⇔ g(y)T(x− y) ≤ f(x)− f(y) ≤ g(x)T(x− y).

Then, from Cauchy-Schwarz inequality, we obtain

−L‖x− y‖ ≤ −‖g(y)‖‖x− y‖ ≤ f(x)− f(y) ≤ ‖g(x)‖‖x− y‖ ≤ L‖x− y‖.Now assume that g is L-Lipschitz continuous. Then, we have for any x,y ∈ Rn

g(x)T(y − x) ≤ f(y)− f(x) ≤ L‖y − x‖.Consider v = y − x 6= 0, then we have that

g(x)T(

v

‖v‖

)≤ L.

Since, this must be true for any v 6= 0, we directly obtain ‖g(x)‖ ≤ L.

G Additional Technical Details

In this section, we present several technical details that were omitted from the main paper for space.Section G.1 discusses the architecture and training of the DnCNN prior. Section G.2 presents extradetails and validations that compliment the experiments in Section 5 of the main paper with additionalinsights for IPA.

G.1 Architecture and Training of the DnCNN Prior

Fig. 3 visualizes the architectural details of the DnCNN prior used in our experiments. In total, thenetwork contains 7 layers, of which the first 6 layers consist of a convolutional layer and a rectifiedlinear unit (ReLU), while the last layer is just a convolution. A skip connection from the input to theoutput is implemented to enforce residual learning. The output images of the first 6 layers have 64feature maps while that of the last layer is a single-channel image. We set all convolutional kernelsto be 3 × 3 with stride 1, which indicates that intermediate images have the same spatial size asthe input image. We generated 11101 training examples by adding AWGN to 400 images from theBSD400 dataset [69] and extracting patches of 128× 128 pixels with stride 64. We trained DnCNNto optimize the mean squared error by using the Adam optimizer [77].

We use the spectral normalization technique in [68] to control the global Lipschitz constant (LC) ofDnCNN. In the training, we constrain the residual network Rσ to have LC smaller than 1. Since thefirmly non-expansiveness implies non-expansiveness, this provides a necessary condition for Rσ tosatisfy Assumption 2.

21

Table 3: Per-iteration memory usage specification for reconstructing 512×512 images

Algorithms IPA (60) PnP-ADMM (300) PnP-ADMM (600)

Variables size memory size memory size memory

{Ai}real 512× 512× 60 0.23 GB 512×512×300 1.17 GB 512×512×600 2.34 GB

imaginary 512× 512× 60 0.23 GB 512×512×300 1.17 GB 512×512×600 2.34 GB

{yi} 512× 512× 60 0.47 GB 512×512×300 2.34 GB 512×512×600 4.69 GB

others combined — 0.03 GB — 0.03 GB — 0.03 GB

Total 0.97 GB 4.72 GB 9.41 GB

0 80010-5

100

0

30

0 80010-5

100

0

30

0 80010-5

100

0

30

0 800iteration

30

0

SNR

(dB)

100

10-50 800iteration

19.67 dB

0 800iteration

19.66 dB19.62 dB

3.80×10-4 2.75×10-47.16×10-5

�0/400

<latexit sha1_base64="81GWJrxI8NZoAhk0v0HgvcvwbVk=">AAAMN3icjZbNbtw2EMeV9Cvx9iNp0VMvhA0DKeA6u4GBIocAsdeJbcAwto6dBKlSg9JSEmFJlEnKu2tBr9Fr+xp9lJ56K3rtG3QoiSPJuy0qwNboNzP8+M9QKy+LudLD4e937n7w4Ucff3Lv/trg088+/+LBwy9fK5FLn537IhbyrUcVi3nKzjXXMXubSUYTL2ZvvMux8b+5ZlJxkZ7pRcbeJzRMecB9qgG5bkiThF4MH+8MhxcPNobbw+oiy8aoMTac5ppcPBx87U6Fnycs1X5MlfpxNMz0+4JKzf2YlWturlhG/UsassIT4lJTT5WkvjZJJkXAlFkZjb+7ymnM9YJASMxUL5MmKhCp7mR6Mbg8QeWUJFRHRC0ST8RLWQbbJMi6pjIVOuJp2AtMuc8CSf12eF8k4NR2WBIISUaPn2wRpv3tXm7CfSmMrKXNtUSEkmbRoi9BUt6Ha5NwDZv1CQw+7QX4XLOyigilyDMCz1WZ+jurRub+vDSBAQ9zeUswo4kWsPI6Ala/bholZVdX632N4lBIriNYdAmR+KhWR5W3BTasWq6OmJDsVh5L8wQ2lFSr0IJE9JqRmGnNJDE+JldsLsljzaWYtdWAzJClJpit6g5AeUwlqPG/wqPInJRbO5GLqYritFroVMACp8REKcJTQutRegnz6tyVa2ubm5tkIoUIYBKrgV7ErMhiytPSNLkxyLOm5FtkygKecrNvgFIkNF1zUzZrkgs4E5lQlb8sJp2HXlQ7SFnst3YvprmXxVlj9Lwxg2NfFsfVrefxhRQxCLooizGavQg46HmS1bPvtnZ/FNg2zD2ubrVOIjNFERKkgvWbv7Rwt1zbme5WWTuoDBMOLtPFIqvvKoAihe4WOMqyE0fn/xJH5xgX82QpCBj4oTxgcF3HrZq0no80AStmqydqAlSeLQUAw5XwNFjyA0M/FH+ODvOAWxBhu3QRWnzDJGKwLQ54OwrYFk9VgNiFRrR8Px2fnFCli8LVbK6L6rn8yQVUNqXz4ZRqCq/gOuPYKFEcX7TDaZ4uSK1Fk9J7xVXl8hKYAf5R3K9nkYfIt8hHxCxiiCKLIkTKIoVoblHbC6FFIaKFRQtENxbdIJIWSURXFl0hSi1quzS3KEd0bNExomuLrhHNLJohEgkLUcb6CX1w8HTrq57KTp8IQ+FvaNmujd3FsH2L9hEdWXS0qrBkBr8KBPqEST5HzetWqmS3TWSFRtei77rpuG76rbc0XUQ1TgU2JM34lBnLlrrJVDRt+vUMWhR79Qw396qLX7UydHErxcsufon4sIsPEZ908Qnigy4+aGXu4iPEky6eID7t4lPE5118/h8LrOXpfENV4j5STCsiAgK/zB58MH5bp4/rXM8rxjg/olOLXiB6gQogOrHoHaJ3+GaB6jbfM818QJr1GmvcKtp3dDTtO1pV9/qOPeyM9qW071lxvG6tD7u4LepBF7fz7HaxOU7wIT26/dm8bLx+sj3a2X76w87G873mk/qe842z7jxyRs73znPn0Jk4547vZM7Pzi/Or4PfBn8M/hz8VYfevdPkfOX0rsHf/wD3ZHyM</latexit>

�0/20

<latexit sha1_base64="pyHjGctMhcKV6dEc4bsvWz95ZGQ=">AAAMNnicjZbNbtw2EMeV9Cvx9iNp0VMvhA0DKeA6u0aAoocCsdeJbcAwto6dBMkmBqWlJMKSKJOUd2VBj9Fr+xp9lV56K3rtI3QoiSPJuy0qwNboNzP8+M9QKzeNuNLD4e937n7w4Ucff3Lv/trg088+/+LBwy9fKpFJj517IhLytUsVi3jCzjXXEXudSkZjN2Kv3Mux8b+6ZlJxkZzpPGXvYhok3Oce1YDeTgMax/Ri+HhnePFgY7g9rC6ybIwaY8NprsnFw8HX05nwspgl2ouoUm9Hw1S/K6jU3ItYuTbNFEupd0kDVrhCXGrqqpLU1yZJpfCZMguj0XdXGY24zgmEREz1MmmsfJHoTqYbgcsVVM5ITHVIVB67IlrKMtgmQdY1lYnQIU+CXmDCPeZL6rXDeyIGp7bDEl9IMnq8s0WY9rZ7uTH3pDCqljbXEhFImoZ5X4K4vA/XJuEaNusRGHzWC/C4ZmUVEUiRpQSeqyr1d1aNzL1FaQJ9HmTylmBGEy1g5XUErH7d9EnCrq7W+xpFgZBch7DoEiLxUa2OKm8LbFi1XB0yIdmtPJZkMWworlahBQnpNSMR05pJYnxMrthcnEWaSzFvqwGZAUtMMFvVHYCyiEpQ43+Fh6E5KLd2IvOZCqOkWuhMwAJnxEQpwhNC61F6CYvq2JVra5ubm2QihfBhEquBziNWpBHlSWma3Bjkx6bkW2TGfJ5ws2+AUsQ0WZsmbN4kF3AmUqEqf1lMOg+9qHaQsthv7V5Mcy+Ls8boeSMGp74sjqtbz+MJKSIQNC+LMZq9CDjoWZzWs++2dn8U2DbMPa5utU4iNUUREqSC9Zu/pJhuTW1nTrfK2kFlEHNwmS4WaX1XPhQpmG6Boyw7cXTxL3F0gXERj5eCgIEfygMG13Xcqknr+UgTsGK2eqImQGXpUgAwXAlP/CU/MPRD8RfoMA+4BRG0SxeBxTdMIgbbYp+3o4Bt8Uz5iKfQiJbvJ+OTE6p0UUw1W+iiei7fTwGVTek8OKWawiu4zjg2ShTHF+1wmic5qbVoUnqvuKpcbgwzwD+K+3UtchF5FnmImEUMUWhRiEhZpBAtLGp7IbAoQJRblCO6segGkbRIIrqy6ApRYlHbpZlFGaJji44RXVt0jWhu0RyRiFmAMtZP6IODp1tf9VR2+kQYCn9Dy3Zt7C6G7Vu0j+jIoqNVhSVz+FUg0CdM8gVqXrdSJbttIis0uvK+66bjuum33tJ0IdU4FdiQNOczZixb6iZT0aTp1zNoUezVM9zciy5+0crQxa0Uz7v4OeLDLj5EfNLFJ4gPuviglbmLjxBPuniC+LSLTxGfd/H5fyywlqfzDVWJ+0gxrYjwCfwyu/C9+G2dPq5zXbcY4/yITi16hugZKoDoxKI3iN7gmwWq23zPNPMBadZrrHGraN/R0bTvaFXd6zv2sDPal9K+a8Vxu7U+7OK2qAdd3M6z28XmOMGH9Oj2Z/Oy8XJne/Rk+4efnmw83Ws+qe853zjrziNn5HzvPHUOnYlz7niOcH52fnF+Hfw2+GPw5+CvOvTunSbnK6d3Df7+B3d3fFA=</latexit>

kS(v

)k2 2/k

vk2 2

<latexit sha1_base64="fG7f68/bEJtSWZ3sTQOJm0CrWs4=">AAAMTHicjZZbb9xEFMfdAqXNFpqCeOKBUaJIqRTS3agS4gGpyaZNIkXRkktbFbfR2Du2R7FnnJnxXuL6kU/DK3wN3vkevCEkztiesZ1dEJYSn/mdc+byP8dee2lMper3/7hz96OPP7n36f0HK72Hn33+aPXxF68kz4RPLnwec/HGw5LElJELRVVM3qSC4MSLyWvvaqj9rydESMrZuZqn5F2CQ0YD6mMF6HL1G/eDeyaDTXfiJU/cD5c773eeAoJRNbhcXe9v98sLLRqD2lh36mt0+bj3lTvmfpYQpvwYS/nToJ+qdzkWivoxKVbcTJIU+1c4JLnH+ZXCnixQdW2gVPCASL1dHH97neGYqjmCkJjITiZOZMCZamV6Mbg8jsUYJVhFSM4Tj8cLWRqbJMiaYMG4iigLO4GM+iQQ2G+m93kCTmWmRQEXaPB0ZwsR5W93chPqC661LkyuITwUOI3mXQmS4gFcG4gqOKyPYPJxJ8CnihRlRCh4liIYl7XrnqycmfqzQgcGNMzELcG0JorDzqsI2P2a7h5Grq/XuhrFIRdURbDpAiLtUC6PKm4LrFm5XRURLsitPMKyBA6UlLtQHEV4QlBMlCICaR8RSw6XZLGigk+bakBmSJgOJsu6A1AWYwFq/K/wKNKPz62TiPlYRjErNzrmsMEx0lESUYZwNUsnYVY+jMXKysbGBhoJzgNYxGig5jHJ0xhTVugm1wb6oS75FhqTgDKqzw1Q8ASzFZeRaZ2cwzORcln6i3zUGnSimkmKfL+xOzH1vcjPa6PjjUmS4CI/Lm8dj88Fj0HQeZEPrdmJgAc9S9Jq9d3G7s4Cx4a1h+Wt0omnuihcgFSwf/3HcnfLNZ3pbhWVA4swoeDSXczT6i4DKFLoboGjKFpxePYvcXhm42KaLAQBAz+UBwyqqrhli1broTpgyWrVQnWAzNKFAGB2J5QFC35g1g/Fn1mHHtgj8LDZOg8NviHCYrANDmgzC9gGj2VgsQuNaPg+G56cYKny3FVkpvJyXLx3ARV16Xx4ShWGV3CVcayVyI8vm+kUZXNUaVGndF5xZbm8BFaAf9ie1zPIs8g3yLeIGEQsigyKLJIGSYtmBjW9EBoUWjQ3aG7RjUE3FgmDhEXXBl1bxAxqujQzKLPo2KBjiyYGTSyaGjS1iCcktDJWI+uDB081vnJUtPqEawp/fcN2TeyuDds3aN+iI4OOlhUWTeFXAUGfEEFnVvOqlUrZTRMZoa1r3nXdtFw33dZbWC7Cyi4FNiRN6Zhoy5S6zpSY1f16Di1qe/XcHu6sjc8aGdq4keJlG7+0+LCNDy0+aeMTiw/a+KCRuY2PLB618cji0zY+tfiijS/+Y4OVPK1vqFLcTUmURDxA8MvswVfkkyp9WOV6Xj6061t0atALi15YBSw6MeitRW/tmwWqW3/P1OsBqferrWGjaNfR0rTraFTd6zr2bGc0L6V9z4jjtWt92MZNUQ/auFlnt4314wQf0oPbn82Lxqud7cGz7e9/fLb+fK/+pL7vfO2sOZvOwPnOee4cOiPnwvGdn51fnF+d33q/9/7s/dX7uwq9e6fO+dLpXA/v/QP05YO3</latexit>

�0

<latexit sha1_base64="ZpJ66p+crHERsGalT1IoBgwe3zA=">AAAMM3icjZbNbtw2EMeV9Cvx9iNp0VMvhA0DKeC6u0GAoocCsdeJbcAwto6dBOkmBqWlJMISKZOUd9eCXqLX9jX6MEVvRa99hw4lcSR53aICbI1+M8OP/wy18rOEazMc/n7n7nvvf/DhR/furw0+/uTTzx48/PyllrkK2FkgE6le+1SzhAt2ZrhJ2OtMMZr6CXvlX4yt/9UVU5pLcWqWGXub0kjwkAfUAHo9jWia0vPh+YON4fawusiqMWqMDa+5JucPB19OZzLIUyZMkFCtfxoNM/O2oMrwIGHl2jTXLKPBBY1Y4Ut5YaivS1JfmyRTMmTaroom31zmNOFmSSAkYbqXSVMdSmE6mX4CLl9SNSMpNTHRy9SXyUqWxS4Jsq6oEtLEXES9QMEDFioatMMHMgWnccOSUCoy+vbxFmEm2O7lpjxQ0kpaulxHZKRoFi/7EqTlfbg2CTew2YDA4LNeQMANK6uISMk8I/Bclai/s2pkHixKGxjyKFc3BLOaGAkrryNg9eu2SQS7vFzva5REUnETw6JLiMRHfXtUeVNgy6rlmphJxW7kMZGnsKG0WoWRJKZXjCTMGKaI9TF1y+bSPDFcyXlbDciMmLDB7LbuAJQnVIEa/ys8ju0pubETtZzpOBHVQmcSFjgjNkoTLgitR+klLKozV66tbW5ukomSMoRJnAZmmbAiSygXpW1ya5AfmpJvkRkLueB23wCVTKlYmwo2b5ILOBOZ1JW/LCadh15UO0hZ7LV2L6a5l8VpY/S8CYMjXxZH1a3nCaSSCQi6LIsxmr0IOOh5mtWz77R2fxTYNsw9rm61TjKzRZEKpIL12z9RTLemrjOnW2XtoCpKObhsF8usvusQihRNt8BRlp04uviXOLrAuISnK0HAwA/lAYObOu62Sev5SBNwy2z1RE2AzrOVAGC4Ei7CFT8w9EPxF+iwD7gFGbVLl5HD10whBtvhkLejgO3wTIeIp9CIju+J8fEx1aYopoYtTFE9l++mgMqmdAGcUkPhFVxnHFkliqPzdjjDxZLUWjQpvVdcVS4/hRngH8X9+g75iAKHAkTMIYYodihGpB3SiBYOtb0QORQhWjq0RHTt0DUi5ZBCdOnQJSLhUNuluUM5oiOHjhBdOXSFaO7QHJFMWYQy1k/og4NnWl/1VHb6RFoKf0PHdlzsDobtObSH6NChw9sKS+bwq0CgT5jiC9S8bqVKdtdETmh0Lfuu647rut96K9PF1OBUYEPSnM+YtVypm0xNRdOvp9Ci2KunuLkXXfyilaGLWymed/FzxAddfID4uIuPEe938X4rcxcfIp508QTxSRefID7r4rP/WGAtT+cbqhL3kWZGExkS+GX24WPx6zp9XOf6fjHG+RGdOPQM0TNUANGxQ28QvcE3C1S3+Z5p5gPSrNda41bRvqOjad/Rqrrbd+xiZ7QvpT3fieN3a33QxW1R97u4nWeni+1xgg/p0c3P5lXj5ePt0ZPt7398svF0t/mkvud95a17j7yR95331DvwJt6ZF3iJ97P3i/fr4LfBH4M/B3/VoXfvNDlfeL1r8Pc/M8R7oQ==</latexit>

Figure 4: Illustration of the convergence of IPA for a DnCNN prior under drastically changed γvalues. The average normalized distance to zer(S) and SNR (dB) are plotted against the iterationnumber with the shaded areas representing the range of values attained over 12 test images. Inpractice, the convergence speed improves with larger values of γ. However, IPA still can achievesame level of SNR results for a wide range of γ values.

G.2 Extra Details and Validations for Optical Tomography

All experiments are run on the machine equipped with an Intel Core i7 Processor that has 6 cores of3.2 GHz and 32 GBs of DDR memory. We trained all neural nets using NVIDIA RTX 2080 GPUs.We define the SNR (dB) used in the experiments as

SNR(x,x) , maxa,b∈R

{20 log10

( ‖x‖`2‖x− ax+ b‖`2

)}

where x represents the estimate and x denotes the ground truth.

In the intensity diffraction tomography, we implemented epoch-based selection rule due to the largesize of data. We randomly divide the measurements (along with the corresponding forward operators)into non-overlapping chunks of size 60 and save these chunks on the hard drive. At every iteration,IPA loads only a single random chunk into the memory while the full-batch PnP-ADMM loads allchunks sequentially and process the full set of measurements. This leads to the lower per iterationcost and less memory usage of IPA than PnP-ADMM. Table 3 shows extra examples of the memoryusage specification for reconstructing 512× 512 pixel permittivity images. These results follow thesame trend observed in Table 2 of the main paper.

We also conduct some extra validations that provides additional insights into IPA. In these simulations,we use images of size 254× 254 pixels from Set 12 as test examples. We assume real permittivityfunctions with the total number of measurement b = 60.

Fig. 4 illustrates the evolution of the convergence of IPA for different values of the penalty parameter.We consider three different values of γ ∈ {γ0, γ0/20, γ/400} with γ0 = 20. The average normalizeddistance ‖S(vk)‖22/‖vk‖22 and SNR are plotted against the iteration number and labeled with theirrespective final values. The shaded areas represent the range of values attained across all test images.IPA randomly select 5 measurements in every iteration to impose the data-consistency. Fig. 4compliments the results in Fig 1 by showing the fast convergence speed in practice with larger valuesof γ. On the other hand, this plot further demonstrates that IPA is stable in terms of the SNR resultsfor a wide range of γ values.

Prior work has discussed the influence of the denoising prior on the final result. Our last simulationcompares the final reconstructed images of IPA by using TV, BM3D, and DnCNN. Since TV is

22

PnP-ADMM (5) DnCNN

IPA (5) TV

IPA (5) BM3D

IPA (5) DnCNN

PnP-ADMM (full) DnCNN

19.18 dB17.13 dB

a

18.60 dB

a

18.27 dB

a

18.72 dB

a a

b b b b b

a b a b a b a b a b

22.94 dB19.22 dB

a

b

21.79 dB

a

b

21.97 dB

a

b

22.45 dB

a

b

a

b

a b a b a b a b a b

Figure 5: Visual examples of the reconstructed House (upper) and Parrot (bottom) images by IPA andPnP-ADMM. The first and last columns correspond to PnP-ADMM under DnCNN with 5 fixedmeasurements and with the full 60 measurements, respectively. The second, third, and fourth columncorrespond to IPA with a small minibatch of size 5 under TV, BM3D, and DnCNN, respectively. Eachimage is labeled by its SNR (dB) with respect to the original image, and the visual difference ishighlighted by the boxes underneath. Note that IPA recovers the details lost by the batch algorithmwith the same computational cost and achieves the same high-quality results as the full batchalgorithm.

Table 4: Optimized SNR (dB) obtained by IPA under different priors for images from Set 12

Algorithms PnP-ADMM IPA (Ours) PnP-ADMM(Fixed 5) (Random 5 from full 60) (Full 60)

Denoisers DnCNN TV BM3D DnCNN DnCNN

Cameraman 15.95 17.45 17.38 18.16 18.34House 19.22 21.79 21.97 22.45 22.94Pepper 17.06 18.68 19.55 20.60 21.11Starfish 18.20 19.29 20.29 21.64 22.22

Monarch 17.70 19.81 18.66 20.85 21.60Aircraft 17.15 18.67 18.83 19.28 19.54

Parrot 17.13 18.60 18.27 18.72 19.18Lenna 15.41 16.48 16.32 16.94 17.13

Barbara 13.63 16.00 17.53 16.58 16.85Boat 17.98 19.35 20.21 20.95 21.34

Pirate 17.93 19.36 19.45 19.88 20.10Couple 15.40 17.31 17.53 18.24 18.57

Average 16.90 18.57 18.83 19.52 19.91

a proximal operator, it serves as a baseline. Table 4 compares the average SNR values obtainedby different image priors. We include the results of PnP-ADMM using 5 fixed measurementsand the full batch as reference. Visual examples of House and Parrot are shown in Fig. 5. First,the table numerically illustrates significant improvement of IPA over PnP-ADMM under the samecomputational budget. Second, leveraging learned priors in IPA leads to the better reconstructionthan other priors. For instance, DnCNN outperforms TV and BM3D by 0.7 dB in SNR. Last, theagreement between IPA and the full batch PnP-ADMM highlights the nearly optimal performance ofour algorithm at a significantly lower computational cost and memory usage.

23

Date post:	21-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Scalable Plug-and-Play ADMM with Convergence Guarantees · data-ﬁdelity terms and deep neural net...

Documents