Stationary Multi-source AI-powered Real-time Tomography ...

Stationary Multi-source AI-powered Real-time

Tomography (SMART) for Dynamic Cardiac Imaging

Weiwen Wu1#, Yaohui Tang2#, Tianling Lv3, Chuang Niu1, Cheng Wang2, Yiyan Guo2,

Yunheng Chang3, Ge Wang1*, Yan Xi3*

1Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies,

Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

2Med-X Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong University,

1954 Hua Shan Road, Shanghai, 200030, China.

3Jiangsu First-Imaging Medical Equipment Co., Ltd., Jiangsu, 226100, China.

# indicates the co-first authors and * represents the co-corresponding authors

Abstract: A first stationary multi-source computed tomography (CT) system is prototyped for

preclinical imaging to achieve real-time temporal resolution for dynamic cardiac imaging. This

unique is featured by 29 source-detector pairs fixed on a circular track for each detector to

collect x-ray signals only from the opposite x-ray source. The new system architecture

potentially leads to a major improvement in temporal resolution. To demonstrate the feasibility

of this Stationary Multi-source AI-based Real-time Tomography (SMART) system, we develop

a novel reconstruction scheme integrating both sparsified image prior (SIP) and deep image

prior (DIP), which is referred to as the SIP-DIP network. Then, the SIP-DIP network for cardiac

imaging is evaluated on preclinical cardiac datasets of alive rats. The reconstructed image

volumes demonstrate the feasibility of the SMART system and the SIP-DIP network and the

merits over other reconstruction methods.

Key Words: Computed tomography (CT), deep learning, multi-source, image reconstruction,

real-time, cardiac imaging, preclinical imaging.

I. Introduction

As a common non-invasive medical imaging tool, computed tomography (CT) is used to perform

diagnostic tasks in clinical and preclinical settings. However, a modern CT system is equipped

with only one or two source–detector assemblies with a sub-optimal temporal resolution [1].

Over the past decades, major temporal resolutioin improvements have been made for

increasingly faster rotation speed [2], two tube-detector pairs [3], and advanced reconstruction

techniques [4]. Commonly, a CT scanner with a single x-ray source scans at a speed up to 3

Hz. Ultimately, the centrifugal force limits the scanning speed. Although the rotation CT gantry

dominates in hospitals and clinics, it fails to provide an ideal imaging performance for many

patients [5]. Importantly, cardiovascular diseases (CVDs) are the leading cause of death

globally, taking almost 17.9 million lives each year [6]. CVDs includes a group of disorders of

the heart and associated blood vessels, such as coronary heart disease, cerebrovascular

disease, rheumatic heart disease, and other conditions. Four out of five CVD deaths are due to

heart attacks and strokes. Dynamic cardiac studies continue challenging medical imaging

technologies and has been a primary driving force for CT development. Since CT temporal

resolution is not sufficientlyhigh, electrocardiogram (ECG)-gating is widely employed to account

for the cyclical cardiac motion, improving temporal resolution and minimizing image artifacts.

Unfortunately, this approach has major limitations that become most evident in patients with

irregular and/or fast heart rates. Furthermore, radiation exposure is relatively high with ECG-

gated cardiac CT, given the requirement for continuous overlapped scanning and retrospective

data grouping.

Extensive efforts were made to address these challenges. The system with multiple tube-

detector chains is a feasible solution. To reach this goal, various system designs were proposed.

The rationale is that increasing the number of source-detector chains on a given gantry will

proportionally reduce the data acquisition time and improve the temporal resolution [8]. The first

real example is the multisource CT prototype known as the dynamic spatial reconstructor (DSR)

[9], which still demands a mechanical scan, and is not stationary.

Subsequently, several multi-source CT schemes were designed. Liu et al. demonstrated the

improved image quality in a simulated five-source cone-beam micro-CT using a Feldkamp-type

reconstruction algorithm [10]. Zhao et al. conceptualized a triple-source helical/saddle cone-

beam CT system and developed an exact volumetric reconstruction algorithm. Cao et al.

proposed an 80 multi-source interior CT architecture that employs three stationary x-ray source

arrays and three detector operated in the interior tomography mode [11]. For most of multiple

source-based x-ray imaging system designs, a general challenge is how to collect high-quality

data and perform interior CT reconstruction [12]. In interior tomography, x-ray beams are

restricted to travel through a local region of interest (ROI), thereby compromising the

measurement on the ROI by both surrounding tissues and data noise due to Poisson noise and

Compton scattering [13, 14]. Nevertheless, interior tomography enables utilization of smaller x-

ray detectors, allows more source-detector chains in a given gantry space, and provides high

temporal resolution thanks to the parallelism offered by multiple imaging chains. Clearly, the

multi-source interior CT architecture has the potential to achieve ultrahigh temporal resolution

for all-phase cardiac CT imaging. However, most of these multiple-source CT cardiac imaging

systems have not been prototyped so far for sparsity of the resultant dataset practically feasible,

complexity of the system engineering, and cost incurred in such a highly non-trivial undertaking.

Over recent years, the use of artificial intelligence (AI) [15], specifically deep learning, has

become instrumental for processing, reconstruction, analysis and interpretation of medical

images [16]. In the field of dynamic cardiac imaging, the adaption of deep learning techniques

is now the mainstream to remove image artifacts in the cases of limited and compromised

measurements [17]. Bello et Al. took image sequences of the heart acquired using cardiac MRI

to create time-resolved 3D segmentations using a fully convolutional network aided by

anatomical shape priors [18]. To provide a high-quality image in phase-contrast magnetic

resonance imaging, Vishnevskiy et Al. proposed an efficient model-based deep neural

reconstruction network to avoid hyperparametric turning and expensive computational

overhead of the compressed sensing reconstruction methods for clinical aortic flow analysis

[19]. To improve the overall quality of 4D CBCT images, the two CNN models, named N-Net

and CycN-Net were proposed in [20]. By all means, any improvement in the analysis of cardiac

dynamics may lead to better diagnosis and monitoring of CVDs.

The state-of-the-art cardiac CT scanner with a wide-area detector covers an entire heart within

a single cardiac cycle. For example, the RevolutionTM CT scanner (GE HealthCare) achieves a

temporal resolution 140ms [7]. As the posterior left ventricular wall moves at a maximum

velocity of 52.5 mm/s, a scan time of 19.1ms or less is ideal to avoid motion artifacts. If the

average mean velocity is considered, the scan time should still be 41.8ms. Hence, currently it

remains a major challenge in improving temporal resolution for high-quality cardiac CT. Even if

deep learning is being used on the current CT systems, the temporal resolution remains clearly

sub-optimal.

The rest of the paper is organized as follows. In the next section, we introduce our experimental

multi-source CT system prototype, the first Stationary Multi-source AI-based Real-time

Tomography (SMART) system for dynamic cardiac imaging. Then, we describe our

unsupervised deep reconstruction method that integrates both sparsified image prior (SIP) and

deep image prior (DIP), referred to as the SIP-DIP network. In the third section, we report our

representative results, showing the feasibility and merits of our SMART system. It produces

decent image quality from ultra-few projections and is clearly superior to the results using

competing methods. In the last section, we discuss related issues and conclude the paper.

II. Methods

A. SMART System Prototype

Figure 1. Multi-source imaging system “SMART” prototyped at the FirstImaging. (a) A

photograph of the real system and (b) the imaging geometry.

The SMART system consists of 29 x-ray source and detector pairs, all of which are fixed

on a circular track. In each pair, a 5 kW monoblock x-ray source and an IGZO flat panel detecor

of 153.6 × 153.6 mm2 imaging area are used. Both the source-isocenter-distance (SID) and

detector-isocenter-distance (DID) are set to 2,000 mm and 1,000 mm respectively. Each

detector cell covers an area of 0.2×0.2 mm2. The X-ray beam generated by the x-ray source is

collimated through the gap between neighboring detectors. An animal to be imaged is place

inside the imaging ring with a zooming factor of 1.87. For more details, please refer to the recent

patent [21].

During the CT data collection, these imaging pairs are turned on to capture cone-beam

projections simulaneously. A sequence of x-ray pulses is fired at 10 frames per second (fps).

Since the x-ray sources are symmetrically distributed, a rotation range of 12.4 degrees is

sufficient for high density sampling in the data domain, which can be used for evaluation of the

imaging fidelity. In our rat experiments, the x-ray energy is set at 70 kV, the current to 30 mA,

and the pulse width of exposure 20 ms. Since there is no grid mounted on the detector surface,

projection calibration and scattering correction are applied by our imaging software First4D.

B. Compressed Sensing Inspired Reconstruction

Solving the CT image reconstruction problem is to recover an underlying image from projection

datal. Let 𝑨 ∈ ℝ𝑚×𝑁(𝑚 ≪ 𝑁) be a discrete-to-discrete linear transform representing a CT

system model from image pixels to detector readings; 𝒚 ∈ ℝ𝑚 is an original dataset, 𝒆 ∈ ℝ𝑚 is

data noise in 𝒚 , and 𝒙 ∈ ℝ𝑁 is the image to be reconstructed, and most relevantly 𝑚 ≪ 𝑁

signifies that the inverse problem is highly under-determined. Furthermore, 𝑳 represents a

sparsifying transform to enforce prior knowledge on the image content. Conventionally, a

feasible solution can be achieved by optimizing the 𝓵1-norm surrogate as follows:

𝒙∗ = argmin𝒙

‖𝑳𝒙‖1, subject to 𝒚 = 𝑨𝒙 + 𝒆. (1)

In most cases of CT image reconstruction, the optimization problem Eq. (1) is solved using an

iterative algorithm. Eq. (1) can be converted to the following minimization problem:

𝒙∗ = argmin𝒙

1

2‖𝒚 − 𝑨𝒙‖2

2 + λ‖𝑳𝒙‖1, (2)

where λ > 0 balances the data fidelity 1

2‖𝒚 − 𝑨𝒙‖2

2 and an image-based sparsity‖𝑳𝒙‖1 . The

goal of Eq. (2) is to find an optimized solution by minimizing the objective function. In this context,

there are different regularized priors considered in the past years, including total variation [22],

low-rank [23], low-dimensional manifold [24], sparse coding [25], and especially tensor-based

dictionary learning [26] which is both effective and efficient in our previous studies

A tensor is a multidimensional array. The Nth order tensor is defined as 𝓧 ∈ ℛ𝐼1×𝐼2×...×𝐼𝑁,

whose element is 𝑥𝑖1×𝑖2×...𝑖𝑁, 1 ≤ 𝑖𝑛 ≤ 𝐼𝑛and𝑛 = 1,2, . . . , 𝑁. Particularly, if N equals 1 or 2, the

corresponding tensor is degraded to a vector or matrix. A tensor can be multiplied by a

vector or a matrix. Therefore, the mode-n product of a tensor 𝓧 with a matrix 𝜢 ∈ ℛ 𝐽×𝐼𝑛 can

be defined by 𝑿 ×𝑛 𝜢 ∈ ℛ 𝐼1×𝐼2×...×𝐼𝑛−1×𝐽×𝐼𝑛+1×...×𝐼𝑁, whose element in ℛ𝐼1×𝐼2×...×𝐼𝑛−1×𝐽×𝐼𝑛+1×...×𝐼𝑁

is calculated 𝑎𝑠 ∑ 𝑥𝑖1×𝑖2×...𝑖𝑁

𝐼𝑛𝑖𝑛=1 ℎ𝑗×𝑖𝑛

. In this work, we only consider the case where 𝓧 is a

3rd tensor.

Suppose that there are a set of the 3rd-order tensors 𝓧(𝑡) ∈ ℛ𝐼1×𝐼2×𝐼3 and 𝑡 = 1,2, . . . , 𝑇 .

Tensor-based dictionary learning can be implemented by solving the following optimization

problem:

argmin𝑫,𝜶𝑡

∑ ‖𝓧(𝑡) − 𝑫×4𝛼𝑡‖

𝐹

2𝑇𝑡=1 , s.t., ‖𝜶𝑡‖0 ≤ 𝐿1, (3)

where 𝑫 = {𝑫(𝑘)} ∈ ℛ 𝐼1×𝐼2×𝐼3×𝐾 is a tensor dictionary, 𝑘 and 𝐿1 represent the number of atoms

in the dictionary and level of sparsity respectively, ‖∙‖𝐹and ‖∙‖0 denote the Frobenius-norm and

L0-norm respectively.

The K-CPD algorithm can be employed to train a tensor dictionary. The minimization

problem Eq. (1) can be solved using the alternative direction minimization method (ADMM).

The first step is to update the sparse coefficient matrix using the multilinear orthogonal matching

pursuit (MOMP) technique for a fixed tensor dictionary. The second step is to update the tensor

dictionary given a sparse coefficient matrix. Through alternatively updating the sparse

coefficient and tensor dictionary, both of them will be gradually optimized.

The tensor dictionary reconstruction model in cone-beam geometry can be formulated

as

𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝜶𝑠,𝒎𝑠

1

2‖𝓨 − 𝑨𝓧‖2

2 + 𝜆(∑ ‖ℤ𝑠(𝓧) − 𝑫𝑚 ×4 𝒎𝑟 − 𝑫 ×4 𝜶𝑠‖𝐹2 + ∑ 𝜅𝑠‖𝜶𝑠‖0𝑠𝑠 ). (4)

where 𝓧 ∈ ℛ𝐼1×𝐼2×𝐼3 and 𝓨 ∈ ℛ𝐽1×𝐽2 are the 3rd-order reconstructed image and projection

tensors respectively, 𝐼1 , 𝐼2 and 𝐼3 are for the reconstructed image volume, 𝐽1 and 𝐽2 for the

numbers of detector cells and projection views respectively, 𝒎𝑟 presents the mean vector

of each channel, the operator ℤ𝑠 extracts the sth tensor block (𝑁 × 𝑁 × 𝑀) from 𝓧, and 𝜶𝑠 ∈

ℛ𝐾 is the sparse representation coefficient of the rth tensor block. The 𝑫 = {𝑫(𝑘)} ∈

ℛ𝑁×𝑁×𝑆×𝐾 is a trained tensor dictionary. 𝑫𝑚 = {𝑫𝑚

(𝑘)} ∈ ℛ𝑁×𝑁×𝑆×𝑆

represents the mean

removal process.

To solve the problem of Eq. (4), we introduce 𝓩 and convert Eq. (4) as follows:

𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝓩,𝓦,𝜶𝑠,𝒎𝑠

1

2‖𝓨 − 𝑨𝓧‖2

2 +𝜂

2‖𝓧 − 𝓩 − 𝓦‖2

2 + 𝜆(∑ ‖ℤ𝑠(𝓩) − 𝑫𝑚 ×4 𝒎𝑠 − 𝑫 ×4 𝜶𝑠‖𝐹2 +𝑠

∑ 𝜅𝑠‖𝜶𝑠‖0𝑠 ). (5)

where 𝜂 > 0 is a balance factor. The problem Eq. (5) can be solved by dividing it into

following sub-problems:

𝑎𝑟𝑔𝑚𝑖𝑛𝓧

1

2‖𝓨 − 𝑨𝓧‖2

2 +𝜂

2‖𝓧 − 𝓩(𝑘) − 𝓦(𝑘)

‖2

2. (6)

𝑎𝑟𝑔𝑚𝑖𝑛𝓩,𝜶𝑠

1

2‖𝓧(𝑘+1) − 𝓩 − 𝓦(𝑘)

‖2

2+ 𝜆 (∑ ‖ℤ𝑠(𝓩) − 𝑫𝑚 ×4 𝒎𝑠

(𝑘) − 𝑫 ×4 𝜶𝑠‖𝐹

2

𝑠 + ∑ 𝜅𝑠‖𝜶𝑠‖0𝑠 ), (7)

𝑎𝑟𝑔𝑚𝑖𝑛𝒎𝑠

‖ℤ𝑠(𝓩(𝑘+1)) − 𝑫𝑚 ×4 𝒎𝑠 − 𝑫 ×4 𝜶𝑠

(𝑘+1)‖𝐹

2, 𝑠 = 1, … , 𝑆 , (8)

𝑎𝑟𝑔𝑚𝑖𝑛𝓦

1

2‖𝓧(𝑘+1) − 𝓩(𝑘+1) − 𝓦‖

2

2. (9)

Based on Eq. (6), we compute 𝓧 iteratively:

𝓧(𝑘+1) = 𝓧(𝑘) − (𝑨𝑻𝑨 + 𝜂𝑰)−1

(𝑨𝑻(𝑨𝓧(𝑘) − 𝒚) + 𝜂(𝓧(𝑘) − 𝓩(𝑘) − 𝓦(𝑘))). (10)

Eq. (7) is atypical tensor dictionary learning problem, and can be easily solved. The

solutions to Eqs. (7) and (8) can be also directly obtained.

C. Deep Network Prior

For our intended dynamic cardiac preclinical CT imaging, there is not feasible to have the

ground truth for supervised deep reconstruction, excluding direct adaption of published deep

reconstruction networks such as FBPConvNet, RED-CNN, etc. On the other hand, deep

convolutional networks enjoy excellent performance in learning realistic image priors from many

example images. In fact, the structure of a properly designed convolutional network is sufficient

to capture a great deal of low-level information as a deep image prior. Specifically, deep image

prior (DIP) theory shows that a randomly-initialized neural network can serve as a novel image

prior with excellent results in the field of inverse problems [27]. In other words, not all image

priors must be learned from data, and a great deal of image statistics can be captured by the

structure of a deep convolutional network through independent learning. This is true to

capitalize data-driven image statistics to solve imaging problems without a ground trutht.

Assuming a deep decoding network with a parametric function 𝓧 = 𝒇𝜽(𝐰) that maps a code

vector 𝐰 to an image 𝓧 . The recovery network can be used to model a complex mapping

function over images. The idea of DIP is that a significant amount of information about the

distribution of permissible images is reflected in the network structure. Rather than training on

a big dataset, 𝒇𝜽 has no ability of understanding specific concepts on or features of a specified

object. However, it has a great power to capture low-level statistics of relevant images .

Similar to a conventional prior regularizing an inverse problem, we formulate the energy

minimization problem:

𝓧∗ = argmin𝓧

𝐸(𝓧, 𝓧0) + 𝑟(𝓧), (11)

where 𝐸(𝓧, 𝓧0) represents a task-dependent data term, 𝓧0 is a degraded image, and 𝑟(𝓧)

is a regularizer. The 𝐸(𝓧, 𝓧0) is usually chosen as the L2-norm or L1-norm. The regularizer

𝑟(𝓧) is often not tied to a specific application, because it captures a general knowledge of

images. Total Variation (TV) is a simple example to encourage uniform regions in an image. In

our reconstruction scheme, DIP replaces an explicit analytic regularizer 𝑟(𝓧) with an implicit

prior captured by the deep neural network as follows:

𝜽∗ = argmin𝜽

𝐸(𝒇𝜽(𝐰); 𝓧0), 𝓧∗ = 𝒇𝜽∗(𝐰), (12)

That is, the (local) minimizer 𝜽∗ is obtained using an optimizer such as a gradient descent

search algorithm, starting from a random initialization of the parameters 𝜽.

D. SIP-DIP Network

Given the above-described key algorithmic ingredients, we are now ready to describe our

overall reconstruction scheme integrating both sparsified image prior (SIP) and deep image

prior (DIP), which is referred to as the SIP-DIP network. To fully understand our proposed

reconstruction methodology, let us assume a continuously moving region within an object.

All sources are initially positioned on the circular imaging ring to simultaneously radiate an

object at the current time frame but a dataset of 29 projections appears too sparse to obtain

high quality images.

To reconstruct a high-quality image from such an under-sampling dataset, it is helpful

to explore the synergy among different time frames. One way is to incorporate a prior image

to impose a constrain in the image space. The quality of a prior image will have a great

impact on the final reconstruction. To obtain a high-quality prior image, we can collect

sufficient data from many x-ray source positions to reconstruct a prior image. However,

because the object varies aperiodically, such a complete dataset cannot be obtained over

time. To the first order approximation, the precision rotation table is rotated to acquire data

for different cardiac phases with inconsistent anatomical configurations due to cardiac

motion. In this case, the resultant projection dataset from different time frames can be

considered as complete (sufficiently many viewing angles) but inconsistent (cardiac motion

patterns). Then, we pre-process the projection data and perform tensor dictionary-based

reconstruction. That is, we rearrange the projections in a chronological order for

spatiotemporal sparsification-promoting image reconstruction.

The overall workflow of our reconstruction approach is illustrated in Figure 2. The first

stage of our proposed SIP-DIP network focuses on performing compressed sending based

reconstruction using the complete but inconsistent projection dataset, where the structure

and intensity of different time frames are taken into account. In this stage, we only need to

reconstruct initial images, which can be treated as the prior image. In the second stage,

we reconstruct a high-quality image using the following model

𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝜶𝑠1 ,𝒎𝑠1 ,𝜶𝑠2 ,𝒎𝑠2

1

2‖𝓨 − 𝑨𝓧‖2

2 + 𝜆1 (∑ ‖ℤ𝑠1(𝓧) − 𝑫𝑚 ×4 𝒎𝑠1

− 𝑫 ×4 𝜶𝑠1‖

𝐹

2+ ∑ 𝜅𝑠1

‖𝜶𝑠1‖

0𝑠1𝑠1) +

𝜆2 (∑ ‖ℤ𝑠2(𝓧 − 𝓧D) − 𝑫𝑚 ×4 𝒎𝑠2

− 𝑫 ×4 𝜶𝑠2‖

𝐹

2+ ∑ 𝜅𝑠2

‖𝜶𝑠2‖

0𝑠2𝑠2), (13)

To obtain the solution of Eq. (13), a similar strategy for solving Eq. (6) is employed. Here,

we introduce two 𝓩1 and 𝓩2 to replace with 𝓧 and 𝓧 − 𝓧D , and 𝓧D denotes the prior

image obtained in the CS-based reconstruction step. Hence, Eq. (6) can be converted into

𝑎𝑟𝑔𝑚𝑖𝑛𝓧

1

2‖𝓨 − 𝑨𝓧‖2

2 +𝜂1

2‖𝓧 − 𝓩1

(𝑘)− 𝓦1

(𝑘)‖

2

2

+𝜂2

2‖𝓧 − 𝓩2

(𝑘)− 𝓦2

(𝑘)‖

2

2

. (14)

where 𝜂1 > 0 and 𝜂2 > 0 need to be empirically chosen. Similar to what we described above,

𝓦1 and 𝓦2 are error feed-back variables to be updated next. Finally, in the deep network

estimation stage, we incorporate the aforementioned deep image prior to further improve

image quality according Eq. (12), where the code is generated from a noise image and the

target image is reconstructed via prior-constrained reconstruction. The advantages of the

deep network is that it helps remove residual image artifacts without relying on the ground

truth. In this study, we particularly designed the network architecture to encode and decode

images of interest, which are image volumes of dead animals of the same type as that

used in our in vivo studies.

Figure 2. SIP-DIP reconstruction approach for our proposed SMART system. (a) The whole SIP-DIP

workflow and (b) the adapted network to extract a deep image prior.

III. Pre-clinical Imaging Results

A. Experimental Design

(1) Setup, Data, and Codes

To validate the feasibility of our SMART system and SIP-DIP network for high temporal

resolution tomography, we performed initial pre-clinical experiments with encouraging results.

Several preclinical datasets were collected from alive rats. Experimental animal studies were

performed under the Animal Research: Reporting of in vivo Experiments (ARRIVE) guidelines.

Five adult male animals of 250-300 grams were purchased from Jie Si Jie Laboratory Animal

Co., Ltd. (Shanghai, China). The animal experimental protocol was approved by the Institutional

Animal Care and Use Committee (IACUC) of Shanghai Jiao Tong University, Shanghai, China.

Since the original scans are in cone-beam geometry, we need to reconstruct multiple time

frames to observe the dynamic process. The source to detector distances are between

2,016mm and 2,087mm. The source to isocenter distances are between 1,079mm and

1,167mm. The plat detector contains 768×768 cells, each of which covers an area of 0.2 × 0.2

mm2. There are 29 source-detector pairs simultaneously activated in every scan. As a result,

29 cone-beam projections are distributed over a full-scan range.

To highlight the advantages of our reconstruction approach over the traditional algorithms, The

total-variation [28] and FBP methods were selected for comparison. In our fidelity evaluation,

one died rat was scanned and reconstructed using the 348-view FDK and 29-view SIP-DIP

network methods respectively.

In this study, all the source codes for deep learning reconstruction were programmed in Matlab

2021 and Python with the Pytorch library on a NVIDA RTX3080 card. All programs were

implemented on a PC (24 CPUs @3.70GHz, 32.0GB RAM) with Windows 10.

(2) Training, Validation, and Testing

The Adam method was employed to optimize all of the networks [29]. To address the

inconsistency in sizes of feature maps and that of the input, we padded zeros around the

boundaries before convolution. The batch size was set to 1. The number of epochs was set to

40 in all the cases. The learning rate was set to 2.5×10-4 and decreased by 0.8 after each of 5

epochs.

B. Performance Evaluation

Figure 3 shows three reconstructed images from an alive rat without any lesion so that we only

evaluate image quality in terms of anatomical features. Compared with the FDK results, TV

improved image quality with clear features regularized by the sparsity prior. However, the TV-

based method oversmoothed image details and edges, some tissue features were missing, and

severe blocky artifacts were evident. In contrast, our SIP-DIP network improved image quality

by incorporating both tensor dictionary learning and deep prior learning simultaneously.

Specifically, the image feature indicated by the blue arrow was well preserved in our SIP-DIP

results, while it is difficult to see in the TV-based reconstruction. Generally speaking, the

proposed SIP-DIP network recovered image features significantly better than the TV-based

counterparts, as confirmed by the details indicated by the blue arrows. In reference to the image

structures indicated by the red ellipse in Figure 3, the results reconstructed using all competing

methods were corrupted and distorted. Moreover, the image quality associated with the

competing method was severely compromised by limited-angle and sparse-view artifacts.

These cases show consistently that the imaging performance of our proposed SIP-DIP network

is better than compressed sensing-based reconstruction for this multi-source CT imaging task.

Figure 3. CT images of the alive rat at time frame 1# reconstructed from 29 projections. The transverse images in (a)-

(c) were reconstructed using the FDK, TV, SIP-DIP algorithms, respectively. The images in (a) and (b) have strong

image artifacts. Typical coronal and sagittal slices were also reconstructed using the FDK, TVM and SIP-DIP algorithms

respectively, as shown in (d)-(f) and (g)-(i) respectively. Small features and image edges can be only clearly seen in

our SIP-DIP reconstructions. The display window is [0 0.065] in terms of the linear attenuation coefficient value.

To further demonstrate the advantages of SIP-DIP network, the reconstruction results from

another time frame are given in Figure 4. It can be seen in Figure 4 that our SIP-DIP results

provide more features with clearer image edges than the competing results. Specifically, the

image structure indicated by the blue arrow was well preserved in our SIP-DIP images, while it

is challenging to find these in the TV-based reconstructions. The proposed SIP-DIP method

recovers image features significantly better in these cases, as confirmed by the details indicated

by the blue arrows. In reference to the image structures indicated by the red ellipse in Figure 4,

the results reconstructed using the competing methods were corrupted, and the image

structures indicated by the arrows were signficnatly compromised by limited-angle and sparse-

view artifacts. In contrast, our proposed SIP-DIP network produces clearer image edges and

more natural structures. These cases show consistently that the imaging performance of our

proposed SIP-DIP network is superior for multi-source CT imaging.

Figure 4. CT images of the alive rat at time frame 2# reconstructed from 29 projections. The transverse images in (a)-

(c) were reconstructed using the FDK, TV, SIP-DIP algorithms, respectively. The images in (a-c) have unacceptable

image artifacts. The artifacts in (d) mess up image structures. Typical coronal and sagittal slices were also

reconstructed using the FDK, TVM and SIP-DIP algorithms respectively, as shown in (d)-(f) and (g)-(i) respectively. The

display window is [0 0.065]. The images (a)-(b), (d)-(e) and (g)-(h) from sparse data using different methods

demonstrate remarkable image quality variations. The images (a), (d) and (g) are unacceptable for strong artifacts,

poor texture and inability to assess small and/or large structures. The image (b), (e) and (h) has little artifacts but is too

smooth with unacceptable texture for clinical usability. The image (c), (f) and (i) reconstructed with our proposed network

has optimal image quality in terms of texture (yellow circle), no major artifacts, and acceptable visualization of small

structures (red circle).

Dynamic Cardiac Imaging

To showcase a dynamic cardiac imaging capability of the SMART system, the reconstruction

results from 5 time frames are presented in Figure 5. It can be seen in Figure 5 that our SIP-

DIP network provides dynamic cardiac features defined with clear edges. Specifically, the

cardiac features indicated by the yellow oval are well visualized in our SIP-DIP images, while

they are blurry in the TV-based reconstruction and even completely disappeared in the FDK

results. The proposed SIP-DIP method improves cardiac features significantly. In reference to

the image structures indicated by the green ellipse in Figure 5, the results reconstructed using

the competing methods were inferior. These promising cases show consistently that our

proposed SIP-DIP network satisfies the requirement of dynamic cardiac imaging, while results

the compressed sensing-based reconstruction method failed to do so.

Figure 5. Sequential CT images of the alive rat at different 5 time frames to visualize the dynamic changes from only

29 projections. The images in (a-e), (f-j) and (k-o) were reconstructed using the FDK, TV, SIP-DIP methods,

respectively. The display window is [0 0.065] in terms of the linear attenuation coefficient.

Fidelity Study

Although the SIP-DIP network offers the best reconstruction performance visually, how to

quantify the imaging performance is important. Toward this goal, one died rat was scanned and

reconstructed in Figure 6. Here, the results were reconstructed using FDK from full-scan

projections; i.e., 348 views (29×12). In contrast, our SIP-DIP network performed the image

reconstruction from only 29 views. Clearly, it can be inferred based on Figure 6 that our

proposed SIP-DIP network can reconstruct excellent images without inducing artifacts, which

we believe represents the fastest dynamic preclinical cardiac imaging study among all the

similar results ever reported so far.

Figure 6. CT images of the dead rat to validate the accuracy and reliability of our SIP-DIP reconstruction. The images

in (a) and (c), (b) and (d) were reconstructed using FDK from 348 views and SIP-DIP from only 29 views,

respectively. The display window is [0 0.065] in terms of the linear attenuation coefficient.

SIP-DIP Parameters

Our SIP-DIP network belongs to the category of hybrid reconstruction methods, since it

combines deep learning, compressed sensing and algebraic iteration, which means that there

are regularization parameters to be chosen in a task-specific fashion. In our SIP-DIP network,

𝜂, 𝜂1 and 𝜂2 represent the coupling factors to balance the associated components. 𝑘 and 𝐿1

represent the number of dictionary atoms and the level of sparsity in compressed reconstruction

step. 𝑘1 and 𝐿2 represent the number of dictionary atoms and the level of sparsity in the prior-

constrained reconstruction step. In this study, 𝐿1 was set to be the same as 𝐿2. The specific

parameters values are summarized in Table I.

Table I. SIP-DIP network parameters for dynamic cardiac imaging on the SMART system.

Parameters 𝜂 𝜂1 𝜂2 𝑘 𝐿1 𝑘1 𝐿2

Rat Study 0.01 0.2 0.2 5 0.0001 5 0.0005

IV. Discussions & Conclusion

Key features of our approach are multiple. First, our SMART system enables the state of the

art temporal resolution through parallel acquisition of 29 cone-beam projections. Second, our

SIP-DIP network produces decent image quality from only 29 views, setting a record in the area

of sparse-data tomographic reconstruction. Third, the unprecedented spatiotemporal

tomographic imaging performance opens a door to many research opportunities in not only

dynamic cardiac imaging tasks but also contrast-enhanced cancer studies. Finally, our SMART

system could be also viewed as a precursor to a clinical prototype.

Our SMART has been shown successful in significantly improving temporal resolution to meet

the real-time imaging requirement for cardiac imaging. Given the parallel-imaging hardware

architecture, a major attention should be paid to scatter correction, geometric calibration, and

noise reduction. As far as the image reconstruction is concerned, compared with classic priors

(such as total variation), the advantages of SIP-DIP network are in the following aspects: (1)

incorporating an advanced image prior, i.e., tensor dictionary-based sparsified reconstruction,

to regularize the solution space by combining different time frames; (2) approaching an

instantaneous image reconstruction mainly based on the current data frame and effectively

regularized by the image prior; and (3) achieving superior image quality by leveraging the deep

image prior.

Future studies on cardiac imaging of larger animals will be explored to establish the SMART

system architecture relative to the current CT imaging systems. An important observation in our

study is that realistic image texture and conspicuity of subtle low-contrast lesions are retained

in the SIP-DIP images [30]. If these advantages remain in large patient studies, it could lead to

an improved clinical CT imaging performance from sparsely sampled data.

In conclusion, we have for the first time reported the feasibility of the multi-source CT imaging

system in cardiac imaging applications using compressed sensing and deep learning in a hybrid

reconstruction scheme. It has been established in our preclinical imaging experiments that our

proposed SMART system and SIP-DIP network reconstructs images of ultrahigh temporal

resolution. Our SMART system has consistently produced nearly real-time reconstruction

results of the beating heart in a rat model with contrast material injected. We believe that such

a SMART imaging technology has a significant potential for dynamic biomedical imaging

applications in general.

References

1. Buzug, T.M., Computed tomography, in Springer handbook of medical technology. 2011,

Springer. p. 311-342.

2. Primak, A.N., et al., Relationship between noise, dose, and pitch in cardiac multi–detector row

CT. Radiographics, 2006. 26(6): p. 1785-1794.

3. Russo, V., et al., 128-slice CT angiography of the aorta without ECG-gating: efficacy of faster

gantry rotation time and iterative reconstruction in terms of image quality and radiation dose.

European radiology, 2016. 26(2): p. 359-369.

4. Nien, H. and J.A. Fessler, Relaxed linearized algorithms for faster X-ray CT image reconstruction.

IEEE transactions on medical imaging, 2015. 35(4): p. 1090-1098.

5. Samson, K., A Mobile Stroke CT Unit Cuts tPA Administration Times By One-Third: Would It Work

in the US? Neurology Today, 2012. 12(24): p. 1-16.

6. Cardiovascular diseases. 2021; Available from: https://www.who.int/health-

topics/cardiovascular-diseases/.

7. Pontone, G., et al., Dynamic stress computed tomography perfusion with a whole-heart

coverage scanner in addition to coronary computed tomography angiography and fractional

flow reserve computed tomography derived. JACC: Cardiovascular Imaging, 2019. 12(12): p.

2460-2471.

8. Kalender, W.A., CT: the unexpected evolution of an imaging modality. European Radiology

Supplements, 2005. 15(4): p. d21-d24.

9. Robb, R.A., et al., The dynamic spatial reconstructor. Journal of medical systems, 1980. 4(2): p.

253-288.

10. Liu, Y., et al., Half‐scan cone‐beam CT fluoroscopy with multiple x‐ray sources. Medical

physics, 2001. 28(7): p. 1466-1471.

11. Gong, H., et al., X‐ray scatter correction for multi‐source interior computed tomography.

Medical physics, 2017. 44(1): p. 71-83.

12. Wang, G. and H. Yu, The meaning of interior tomography. Physics in Medicine & Biology, 2013.

58(16): p. R161.

13. Yu, H. and G. Wang, Compressed sensing based interior tomography. Physics in medicine &

biology, 2009. 54(9): p. 2791.

14. Sharma, K.S., et al., Scout-view assisted interior micro-CT. Physics in Medicine & Biology, 2013.

58(12): p. 4297.

15. Russell, S. and P. Norvig, Artificial intelligence: a modern approach. 2002.

16. Litjens, G., et al., A survey on deep learning in medical image analysis. Medical image analysis,

2017. 42: p. 60-88.

17. Hernandez, K.A.L., et al., Deep learning in spatiotemporal cardiac imaging: A review of

methodologies and clinical usability. Computers in Biology and Medicine, 2020: p. 104200.

18. Bello, G.A., et al., Deep-learning cardiac motion analysis for human survival prediction. Nature

machine intelligence, 2019. 1(2): p. 95-104.

19. Vishnevskiy, V., J. Walheim, and S. Kozerke, Deep variational network for rapid 4D flow MRI

reconstruction. Nature Machine Intelligence, 2020. 2(4): p. 228-235.

20. Zhi, S., et al., CycN-Net: A Convolutional Neural Network Specialized for 4D CBCT Images

Refinement. IEEE Transactions on Medical Imaging, 2021.

21. Xi, Y., CT imaging system and imaging method. 2021: China

https://www.who.int/health-topics/cardiovascular-diseases/

https://www.who.int/health-topics/cardiovascular-diseases/

22. Wang, Y., et al., A new alternating minimization algorithm for total variation image

reconstruction. SIAM Journal on Imaging Sciences, 2008. 1(3): p. 248-272.

23. Trémoulhéac, B., et al., Dynamic MR Image Reconstruction–Separation From Undersampled

(${\bf k}, t $)-Space via Low-Rank Plus Sparse Prior. IEEE transactions on medical imaging, 2014.

33(8): p. 1689-1701.

24. Cong, W., et al., CT image reconstruction on a low dimensional manifold. Inverse Problems &

Imaging, 2019. 13(3).

25. Lee, H., et al. Efficient sparse coding algorithms. in Advances in neural information processing

systems. 2007.

26. Wu, W., et al., Low-dose spectral CT reconstruction using image gradient ℓ0–norm and tensor

dictionary. Applied Mathematical Modelling, 2018. 63: p. 538-557.

27. Ulyanov, D., A. Vedaldi, and V. Lempitsky. Deep image prior. in Proceedings of the IEEE

conference on computer vision and pattern recognition. 2018.

28. Sidky, E., et al., Do CNNs solve the CT inverse problem. IEEE Transactions on Biomedical

Engineering, 2020.

29. Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv, 2014.

30. Singh, R., et al., Image quality and lesion detection on deep learning reconstruction and

iterative reconstruction of submillisievert chest and abdominal CT. American Journal of

Roentgenology, 2020. 214(3): p. 566-573.

Date post:	13-Jan-2022
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Stationary Multi-source AI-powered Real-time Tomography ...

Documents