Deep Learning-based Solvability of
Underdetermined Inverse problems in medicines
Learn useful output
Jin Keun Seowith Chang Min Hyun & Seong Hyeon Baek
Yonsei Univ., Korea
Jan 2020@ IPAM
lunar New Year 2020
This talk is based on joint work with my PhD students.
In this talk, many of my personal opinions (not rigorous) are included to give an exaggerated emphasis on deep learning.
A Solve
Measured Data
Medical image
Forward Matrix
Learn useful output
ill-posed inverse problemsHadamard's well-posedness excluding existence
๐จ๐ ๐ is well-posed if the following two conditions hold:1) for each b, ๐จ๐ ๐ has a unique solution;
2) the solution is stable under perturbation of ๐.
โข Whether or not a problem is well-posed may be dependent on how the solution is expressed.
โข Many problems are ill-posed because we are overly ambitious or lacking in expressiveness.
A Solve
Measured Data
Medical image
Forward Matrix
My personal opinion
Learn
Conventional CT and MRI data collections are designed for the corresponding forward matrix A
to be well-expressed & to be reasonably complete.
A
of equations (data) of unknowns (pixels of image)
Forward Matrix
โข MRI measures approximately an imageโs Fourier transform. Nyquist sampling is required for the analytic reconstruction.
โ ๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐ โ ๐ฌ๐๐ฆ๐ฉ๐ฅ๐ข๐ง๐ ๐ฌ ๐ข๐ง k-space
โข CT measures approximately an imageโs Radon transform. According to Nyquist sampling & Fourier slice theorem,
โ ๐ฉ๐ข๐ฑ๐๐ฅ๐ฌ ๐ข๐ง ๐ข๐ฆ๐๐ ๐ โ ๐ฉ๐ซ๐จ๐ฃ๐๐๐ญ๐ข๐จ๐ง ๐๐ง๐ ๐ฅ๐๐ฌ
Tomography with Nyquist Sampling
fully sampled
Data
The classical principle that make problems well-posed is:
โ of equations (number of samples) โ of unknowns (number of pixels of image).
Inverse Fourier Transform
Filtered Backprojection
Why do we pay attention to underdetermined problems (fewer equations than unknowns) in CT & MRI ?
# of equations
# of unknowns (# of pixels)
It is because of the great needs to reduce radiation dose in CT & data acquisition time in MRI.
Is it possible to solve it ?
A
y
b=Solving is to find
the reconstruction map ๐๐๐๐๐
๐๐๐๐.
โข ๐๐๐๐๐ denotes the "fully sampled" data (e.g, sinogram in CT and k-space data in MRI).
โข ๐ ๐บ๐๐๐๐๐๐๐ where ๐บ๐๐ denotes a subsampling operator.
โข ๐จ๐๐๐๐ is discrete Fourier transform in MRI &Radon transform in CT.
Subsampling operator
Undersampled MRI problem
Subsampling (30%) Full sampling
๐๐ฎ๐ฅ๐ฅ๐
๐๐๐๐
๐ : Pseudo-Inverse of A.
Is it possible?
Without imposing prior knowledge on the solution, this problem has infinitely many solutions.
๐ ๐
Need to choose one out of infinitely many images in ๐ต๐ ๐จ : ๐: ๐จ๐ ๐ .
How to solve
๐ ๐๐ ๐ต๐ ๐จ # ๐๐๐๐๐๐๐ # ๐๐๐๐
Is it possible to find ๐๐๐๐๐
๐๐๐๐ ?
Ay
b=
๐๐๐๐๐
๐๐๐๐
Pseudo-Inverse
Solving ๐จ๐ ๐ depends on an appropriate use of a priori information about medical CT or MRI images as solutions.
We need to consider a constraint problem:
๐จ๐ ๐ subject to (Solution Manifold)
Example: sparse view CT
unknown
Sparse View CT
Example 1
๐ ๐ ๐๐ฒ ๐ฒ โ ๐ฒ โ ๐๐ฆ๐๐ ๐ ๐๐๐ง๐ข๐๐จ๐ฅ๐
Similar noise patterns regardless of images
Well-expressed๐๐๐ฎ๐ฅ๐ฅ๐ฒ ๐๐๐ฎ๐ฅ๐ฅ
๐๐๐๐๐ ๐บ๐๐๐โ ๐บ๐๐๐๐๐๐๐๐
Local CTExample 2
Well-expressed๐๐๐ฎ๐ฅ๐ฅ๐ฒ ๐๐๐ฎ๐ฅ๐ฅ
Dental CBCT: Need to develop a reconstruction method that addresses the problems caused by โOffset detector, FOV truncation, Low X-ray dose".
DENTRI, HDXWILL
Local CT
Avoid methods having many iteration steps!
Underdetermined MRIExample 3
Violating Nyquist Sampling Rule
Well-expressed๐๐๐ฎ๐ฅ๐ฅ๐ฒ ๐๐๐ฎ๐ฅ๐ฅ ill-posed
๐ ๐ ๐๐ฒ ๐ฒ โ ๐ฒ โ ๐๐ฆ๐๐ ๐ ๐๐๐ง๐ข๐๐จ๐ฅ๐
Hand-made Sparse Sensing
โข Use sparse representation of yโข Regularized data fitting method : ๐ ๐ ๐๐ก, ๐ก ๐๐ซ๐ ๐ฆ๐ข๐ง
๐ก||๐จ๐พ๐ก ๐||โ๐
๐ ||๐ก||โ๐
โข Single data fidelity
Methods to impose Prior Knowledge on the solution
Machine-made Deep Regression
โข Use training data ๐ ๐ : ๐ ๐, โฏ , ๐ต to get the prior knowledge.
โข Deep Learning : ๐ ๐๐๐๐๐๐
๐โ๐๐๐ฎ๐ซ๐๐ฅ ๐๐๐ญ๐ฌ โ ||๐๐ ๐ ๐๐ ||๐
๐
โข Group data fidelity
This is a highly nonlinear problem! The degree of nonlinearity depends on the sampling of data b and solution manifold.
Methods to solve the ill-posed problem Ay
b=
Ay
b=Comparison
โข Hand-made Sparse Sensing
๐โ๐๐๐ฎ๐ซ๐๐ฅ ๐๐๐ญ๐ฌ๐ ๐
๐
๐
versus๐ก โ๐
๐โ๐
Single data fidelity
Image Prior
โข Machine-made DL ApproachImage prior ๐ ๐ , ๐ ๐ : ๐ ๐, โฏ , ๐ต
Group data fidelity
Find
Ay
b=
Test problem: Sparse View CT model with specially chosen ๐ด๐ข๐ฆ๐๐ ๐
In this sparse-view CT model, CS methods are known to work well.
Comparison: Hand-Made vs Machine-Made
Feature 1 Feature 2
Performance Evaluation
Comparison: Man-Made vs Machine-Made
Since this solution manifold is only 7 dimension,
๐จ๐ ๐ can be solvable only with 7 equations.
Feature 1 Feature 2
Performance Evaluation
For this sparse-view CT problem, we use a special solution manifold ๐ด๐๐๐๐๐ (assumed to be unknown).
Dimension of ๐๐๐๐๐
Ay
b=
Comparison: Hand-Made vs Machine-Made
โข Deep learning preserves the feature 1.
feature 1
โข CS and linear approaches eliminate the feature 1.
Man-Made vs Machine-Made
Ay
b=๐: ๐ โ ๐
PCA Total Variation Deep Learning
Remove key features.Keep key features.
Produce terrible outcome due to the use of insufficient orthogonal basis.
Man-Made vs Machine-Made Ay
b=Linear methods (PCA, Wavelet decomposition) may be unable to deal with the highly curved solution manifold.
Consider the vector space spanned by images ๐ ๐ , ๐ ๐ , โฏ , ๐ ๐ต } where ๐ ๐ is ๐ค๐ /๐ต rotation of image ๐ ๐ .
The middle image between ๐ ๐ & ๐ ๐ cannot be expressed properly by the space spanned by ๐ ๐ , ๐ ๐ , โฏ , ๐ ๐ต }.
Can CS methods preserve the feature 1 while removing artifacts?
May be NOT.
TV approach: ๐๐ ๐ โ๐๐
โ๐
It may not preserve some detailed structures that may contain crucial medical information.
Remove everything within this interval without exception.
๐๐ถ๐๐ถ
TV approach: ๐๐๐ ๐ ๐๐ซ๐ ๐ฆ๐ข๐ง๐
||๐จ๐ฒ ๐||โ๐๐ ๐||๐๐ฒ||โ๐
The performance depends on the regularization parameter. Need several iterations to find a sparse expression.
Man-Made vs Machine-Made
โข DL approach can selectively preserve the feature 1.
feature 1
Use training data to learn both ๐ and image manifold such that
๐๐ฆ๐๐ ๐ ๐๐๐ง๐ข๐๐จ๐ฅ๐.
ASensitivity matrix๐ ๐
Deep Learning Approach
What is learnable. What is NOT.
The necessary condition for learning ๐ is that
.
unknown
One of DLโs most important advantages is to provide non-iterative reconstruction methods for highly non-linear problems.
Use training data ๐ ๐ : ๐ ๐, โฏ , ๐ต to get prior knowledge.
Training Data 1
Training Data 2
Training Data 3
Impact of Training Data: It is critical to choose suitable training datasets to reflect the appropriate image priors, in order to preserve detailed features of the images.
No small anomaly
small anomaly inside rectangle
small anomaly inside disk
Joint work with Hyungsuk Park
Observation: The reconstruction map ๐: ๐ฑ ๐ ๐ โ ๐ is learnable if ๐จ satisfies the M-RIP (manifold restricted isometry property) condition.
Necessary condition for learnability is
๐๐๐๐๐
๐ด๐๐๐๐๐ indicates the unknown solution Manifold
โข ๐ด๐๐๐๐๐ indicates a solution manifold that is assumed to be a good regression of MR head image data distributions ๐ ๐ : ๐ ๐, โฏ , ๐ต .
โข ๐ฑ ๐ ๐ is the minimum-norm solution which will be used to find the true solution ๐.
M-RIP condition: ๐ต๐ ๐จ โฉ ๐ด๐๐๐๐๐ ๐ (uniqueness & stability).
is a highly nonlinear problem!
The map ๐: ๐ฑ ๐ ๐ โ ๐ can be viewed as an image restoration function with filling-in missing data in ๐ฑ. Therefore, ๐๐ ๐ depends on the image structure in ๐ฑ.
The nonlinearity of ๐ is affected by sampling and the degree of bending of the manifold ๐ด๐๐๐๐๐
Observation: ๐: ๐ฑ ๐ ๐ โ ๐ is nonlinear if dim (span ๐๐๐ฎ ๐ : ๐ โ ๐ฒ # ๐๐๐๐๐๐๐๐๐ .
Ay
b=
โข ๐จ is (# equations) (#unknowns) matrixโข ๐ด๐๐๐๐๐ ๐: ๐ ๐ฎ ๐ , ๐ โ ๐ฒ , where ๐ฎ is a generator &
๐ฒ is a compact subset of ๐น๐.
Observation: ๐: ๐ฑ ๐ ๐ โ ๐ is nonlinear if dim (span ๐๐๐ฎ ๐ : ๐ โ ๐ฒ # ๐๐๐๐๐๐๐๐๐ .
โข ๐ ๐ ๐ฒ โ ๐ ๐ ๐๐ฒ ๐ฒ โ ๐ ๐ ๐๐ ๐ก ๐๐ ๐ก
โข If ๐: ๐ ๐จ ๐ โ ๐ is linear, ๐ฉ ๐ต๐ โ is a constant matrix &
๐๐ ๐๐๐ ๐ก ๐๐ ๐ก for all ๐ก โ ๐ฒ.
Hence, all ๐๐๐ฎ ๐ โ ๐๐ข๐ ๐๐ง๐ ๐๐ ๐ , the eigenspace of ๐๐ ๐ corresponding to the eigenvalue 1.
This is not possible if dim (span ๐๐๐ฎ ๐ : ๐ โ ๐ฒ # ๐๐๐๐๐๐๐๐๐.
Message: The degree of nonlinearity depends on the sampling of data b &the degree of bending of the solution manifold ๐ด๐๐๐๐๐ .
Proof:
๐๐๐ ๐ฌ ๐ โ ||๐ ๐พ๐ ๐||๐ ๐ || ๐||โ๐
๐ณ ๐๐๐ถ ๐ ๐ถ๐|| ๐ ๐๐ณ ๐ฑ||๐
Assume that there exist W such that ๐ฒ ๐พ๐ ๐ฐ๐ข๐ญ๐ก z being sparse.
Both deep learning and compressed sensing work very well for this kind of problems.
๐๐๐ถ ๐)=๐๐๐๐ ๐ ๐๐๐ ๐ ๐๐ถ, ๐
๐๐ถ๐๐ถ
๐ณ๐ Regularized data fitting technique
By eliminating this simple noise structure.
Example 1: Sparse View CT
Deep learning works well because of unique continuation of analytic function along the vertical direction.
Example 2: Local CT
InverseFourier Transform
InverseFourier Transform
According to the Poisson summation formula, the discrete Fourier transform of ๐ ๐บ๐๐๐๐๐๐ฎ๐ฅ๐ฅ(uniformly subsampled data with factor 4) produces the following four-folded image.
๐ ๐บ๐๐๐๐๐๐ฎ๐ฅ๐ฅ
uniform subsampling with factor 4
๐๐๐ฎ๐ฅ๐ฅ
Full sampling
๐๐๐ฎ๐ฅ๐ฅ๐ ๐๐๐ฎ๐ฅ๐ฅ
Example 3: Underdetermined MRI
If we use uniform subsampling ๐บ๐๐๐ with factor 4,
it is difficult to learn ๐ ๐. ๐. ๐ ๐ ๐๐ฒ ๐ฒ โ ๐ฒ โ ๐๐ฆ๐๐ ๐ ๐๐๐ง๐ข๐๐จ๐ฅ๐
Fail to satisfy M-RIP condition. DL is NOT a magic.
Null spacePoisson summation formula
Fourier Transform
Fourier Transform
Adding one line
However, the result changes dramatically by adding only one line in k-space.
๐๐๐ ๐๐ฎ๐ฅ๐ฅ
Because of this rough signature, it is capable of learning ๐ ๐. ๐. ๐ ๐ ๐๐ฒ ๐ฒ โ ๐ฒ โ ๐๐ฆ๐๐ ๐ ๐๐๐ง๐ข๐๐จ๐ฅ๐
Why does the learning effect dramatically change by adding only one line in k-space?
Inverse Fourier transform of the single line in k-space
โข Let ๐๐ be the sensitivity matrix corresponding to uniform sampling with factor 4.
โข Let ๐๐ be the sensitivity matrix corresponding to the row just above the center.
๐ ๐
๐ ๐Indistinguishable
Distinguishable
Patch images vs
full image
Learning ability
๐ผ ๐ผ ๐ผ
Dimension of the manifold ๐ด๐๐๐๐๐๐ผ does not increase
proportionally to ๐ผ. Hence, the learning ability about ๐๐ผ: ๐๐ผ โ ๐๐ผ is gradually
improved as ๐ผ increases.
๐ด๐๐๐๐๐๐ผ ๐๐ผ: ๐๐ผ ๐ข๐ฌ ๐ ๐๐๐ ๐ ๐ข๐ฆ๐๐ ๐ ๐ฉ๐๐ญ๐๐ก ๐๐ฑ๐ญ๐ซ๐๐๐ญ๐๐ ๐๐ซ๐จ๐ฆ ๐ โ ๐ด๐๐๐๐๐
My personal opinion
Let us consider learning ability issue:
As ๐ผ increases, the number of unknowns increases more rapidly than the number of equations.
Experimental results demonstrate that the learning ability about ๐๐ผ: ๐๐ผ โ ๐๐ผ is gradually improved as ๐ผincreases..
Assume that ๐ด๐๐๐๐๐ is the set of all the human head MR images. Then, all the images in ๐ด๐๐๐๐๐ possess a similar anatomical
structure that consists of skull, gray matter, white matter, cerebellum, among others.
In addition, every skull and tissue in the image have distinct features that can be represented nonlinearly by a relatively small number of latent variables, and so does for the entire image.
Notably, the skull and tissues of the image are spatially interconnected, and even if a part of the image is missing, the missing part can be recovered with the help of the surrounding image information.
๐ด๐๐๐๐๐๐ผ ๐๐ผ: ๐๐ผ ๐ข๐ฌ ๐ ๐๐๐ ๐ ๐ข๐ฆ๐๐ ๐ ๐ฉ๐๐ญ๐๐ก ๐๐ฑ๐ญ๐ซ๐๐๐ญ๐๐ ๐๐ซ๐จ๐ฆ ๐ โ ๐ด๐๐๐๐๐
Reasons for expecting dim ๐ด๐๐๐๐๐๐ผ to grow
significantly slowly as ๐ผ increases.
Challenging Issue: Low-dimensional representation of MR and CT images (high dimensional data: ๐๐๐ ๐๐๐ ๐๐๐ voxels)
Given data distributions ๐ ๐ : ๐ ๐, โฏ , ๐ต in medical images (e.g. dental CBCT data), can we find a low dimensional latent generator (decoder) ๐ฟ: ๐ก โ ๐ and an encoder ๐ฝ โถ ๐ โ๐ such that ๐ฟ โ ๐ฝ ๐ ๐ for all ๐ โ ๐ด๐๐๐๐๐ .
GAN (Generative Adversarial Network)
VAE (Variational Autoencoder)๐๐
๐๐๐๐ ๐๐
๐๐
ฮจ ๐ก
5 Latent variables
๐๐
๐๐๐๐ ๐๐
๐๐
ฮจ ๐ก
Generator/decoder
Latent variable
๐ก โ , โฏ , โ
One of challenging issues for solving an ill-posed problem is to find a low-dimensional representation.
Disentangled expression with extracting the underlying explanatory axis
??
Ay
b=
Electrical Impedance Tomography is known to be a highly ill-posed problem.
๐๐๐ A ๐ ๐
๐ ๐(sensitivity matrix)
๐๐๐๐๐ ๐ b
๐๐๐ ๐ โ โ โถ ๐ด๐ฒ 0 16384 208
Example
Data acquisition
However, it can be well-posed if we give up excessive ambition or find a way to make a low dimensional expression.
๐ธ argmin๐ธ
|| ๐จ ๐ธ ๐ฝ|| ๐ ๐น๐๐ ๐ธ
Despite myriads of profound theories of EIT over the past 40 years, there still are some problems for clinical use.
๐ธ โ๐True ๐ธ
Hand-made regularization techniques may not be effective for EIT imaging.
(๐ณ๐, ๐ณ๐, ๐ป๐ฝ regularization)
208=โ of equations (data) โช 16384=โ of unknowns (pixels of image).
๐๐๐ A ๐ ๐
๐ ๐(sensitivity matrix)
๐๐๐๐๐ ๐ b
๐๐๐ ๐ โ โ โถ ๐ด๐ฒ 0 16384 208
This can be well-posed if we can find a low dimensional representation of soltuions.
Deep learning framework may provide a nonlinear regression on
training datawhich acts as learning complex prior knowledge on the output.
โข Interpolation between two points ๐ก๐ and ๐ก๐ in the latent space. Between the two given images, VAE can generate the interpolated image.
โข Tangent vectors on manifold ๐
Low-dimensional latent representation produces anifold.
ฮจ ๐ก๐ข ฮจ ๐ก๐
๐๐๐๐
๐๐ ๐๐๐๐
ฮจ ๐ก
What about low-dimensional representation of high dimensional images such as MR and CT images.
So far, my team has tried several kinds of GANs and VAE, but has not succeeded.
=
For high dimensional data, AEs suffer from image blurring and loss of small details.
GANs have shown remarkable success in generation of various realistic images. However, there exist some limitations in synthesizing high resolution medical data.
The GAN's approach makes it difficult to deal with high-dimensional data because thegenerated image can be easily distinguished from the training data, which can lead to collapse or instability during training process.
Generative Adversarial Network
๐๐
๐๐๐๐ ๐๐
๐๐
ฮจ ๐ก
GANs have a remarkable ability to generate these images.
AEs learns a bidirectional mapping(encoder and decoder), while GANs learn only the unidirectional mapping (decoding) in high dimensional medical images.
AE can control this latent variables
GANs have difficulties in encoding high dimensional images.
However, for high dimensional data, AEs suffer from image blurring andloss of small details.
My personal opinion
Challenging Issue: Generalization
Training error Test error 0
Memorize learning materials well
Problems that do not appear in the tutorial also find the correct answer.
Recognize and generalize features
||๐๐ ๐ ๐๐ ||๐ ๐๐
๐๐๐๐๐ ๐ ๐๐๐๐๐ 0Hope
Adversarial attacks against medical deep learning systemsby Samuel G. Finlayson et al (2018)
The percentage represents the probability of Pneumothorax.
Recently, several experiments regarding adversarial classifications (false positive output of cancer) have shown that deep neural networks (obtained via gradient descent-based error minimization procedure) are vulnerable to various noisy-like perturbations, resulting in incorrect output (that can be critical in medical environments).
Example of Memorization without Generalization
f ๐ ๐ ๐พ๐ณ โ ๐ โ ๐ท โ ๐ ๐พ๐ณ ๐ โ โฏ โฏ ๐ โ ๐ท โ ๐ ๐พ๐ โ ๐ ๐๐ โฏ โฏ ๐๐ณ ๐ ๐๐ณ
However, deep learning may provide
) = 1
One pixel attack
๐ ๐๐๐๐๐๐๐๐ , ๐
MNIST example of Memorization without Generalization
Library of 16 features for 6
6
We will focus on this 13th signal and analysis
what it means.
14th MnistData
Classified as 1
Classified as 6
Classified as 6
Confuses between 6 and 0
Adversarial attacks against MNIST handwritten classification
These adversarial examples show that a well-trained function ๐: ๐ โ ๐ works only in the immediate vicinity of a manifold, whereas producing incorrect results if the input deviates even slightly from the training data manifold.
In practice, the measured data is exposed to various noise sources such as machine dependent noise; therefore, the developed algorithm must be stable against the perturbations due to noise sources.
Hence, normalization of the input data is essential for improving robustness and generalizability of the deep learning network against adversarial attacks.
Challenging issue: Normalization of input data
Adversarial attacks against medical deep learning systems
by Samuel G. Finlayson et al (2018)
๐ ๐๐ ๐
๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐, ๐๐ ๐ & ๐ ๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐, ๐๐ ๐
Without the constraint ๐ด ๐พ๐,๐ ๐ , ๐๐ฎ ๐ has infinitely many
solutions in ๐ช ๐ : ๐ ๐, ๐ฝ ๐๐๐๐ ๐
๐๐๐ ๐ฌ๐ข๐ง ๐๐
๐๐ฝ , ๐ ๐, ๐, ๐, โฏ
Wit the constraint ๐ด ๐พ๐,๐ ๐ , ๐๐ฎ ๐ has the unique solution ๐ฎ ๐.
๐ ๐ซ, ๐ฝ : ๐ ๐ ๐, ๐ ๐ฝ๐๐
๐ ๐จ๐ ๐ ๐ โ ๐ ๐ฎ ๐ ๐ข๐ง ๐
๐๐๐ด
๐
In terms of M-RPI, note that || ๐๐ฎ ๐๐ฎ ||๐
๐๐ ๐๐
|| ๐ ๐ ||๐ฏ๐ ๐
for all u, u โ ๐ด ๐ โ ๐พ๐,๐ ๐ : ๐ โ ๐ ๐ฎ ๐ ๐ข๐ง ๐ .
Final Remark: Historically, our mathematicians have tried to find well-posed model by imposing appropriate constraints to solution spaces. In the simple Dirichletproblem, it took decades to find the appropriate space ๐พ๐,๐ ๐ . It can take decades to solve the challenging problems in DL .
The Dirichlet problem may not be well-posed without the constraint ๐พ๐,๐ ๐ .
I hope that we will discuss various challenging issues during this meeting.
Thank you!