Stationary Multi-source AI-powered Real-time
Tomography (SMART) for Dynamic Cardiac Imaging
Weiwen Wu1#, Yaohui Tang2#, Tianling Lv3, Chuang Niu1, Cheng Wang2, Yiyan Guo2,
Yunheng Chang3, Ge Wang1*, Yan Xi3*
1Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies,
Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
2Med-X Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong University,
1954 Hua Shan Road, Shanghai, 200030, China.
3Jiangsu First-Imaging Medical Equipment Co., Ltd., Jiangsu, 226100, China.
# indicates the co-first authors and * represents the co-corresponding authors
Abstract: A first stationary multi-source computed tomography (CT) system is prototyped for
preclinical imaging to achieve real-time temporal resolution for dynamic cardiac imaging. This
unique is featured by 29 source-detector pairs fixed on a circular track for each detector to
collect x-ray signals only from the opposite x-ray source. The new system architecture
potentially leads to a major improvement in temporal resolution. To demonstrate the feasibility
of this Stationary Multi-source AI-based Real-time Tomography (SMART) system, we develop
a novel reconstruction scheme integrating both sparsified image prior (SIP) and deep image
prior (DIP), which is referred to as the SIP-DIP network. Then, the SIP-DIP network for cardiac
imaging is evaluated on preclinical cardiac datasets of alive rats. The reconstructed image
volumes demonstrate the feasibility of the SMART system and the SIP-DIP network and the
merits over other reconstruction methods.
Key Words: Computed tomography (CT), deep learning, multi-source, image reconstruction,
real-time, cardiac imaging, preclinical imaging.
I. Introduction
As a common non-invasive medical imaging tool, computed tomography (CT) is used to perform
diagnostic tasks in clinical and preclinical settings. However, a modern CT system is equipped
with only one or two source–detector assemblies with a sub-optimal temporal resolution [1].
Over the past decades, major temporal resolutioin improvements have been made for
increasingly faster rotation speed [2], two tube-detector pairs [3], and advanced reconstruction
techniques [4]. Commonly, a CT scanner with a single x-ray source scans at a speed up to 3
Hz. Ultimately, the centrifugal force limits the scanning speed. Although the rotation CT gantry
dominates in hospitals and clinics, it fails to provide an ideal imaging performance for many
patients [5]. Importantly, cardiovascular diseases (CVDs) are the leading cause of death
globally, taking almost 17.9 million lives each year [6]. CVDs includes a group of disorders of
the heart and associated blood vessels, such as coronary heart disease, cerebrovascular
disease, rheumatic heart disease, and other conditions. Four out of five CVD deaths are due to
heart attacks and strokes. Dynamic cardiac studies continue challenging medical imaging
technologies and has been a primary driving force for CT development. Since CT temporal
resolution is not sufficientlyhigh, electrocardiogram (ECG)-gating is widely employed to account
for the cyclical cardiac motion, improving temporal resolution and minimizing image artifacts.
Unfortunately, this approach has major limitations that become most evident in patients with
irregular and/or fast heart rates. Furthermore, radiation exposure is relatively high with ECG-
gated cardiac CT, given the requirement for continuous overlapped scanning and retrospective
data grouping.
Extensive efforts were made to address these challenges. The system with multiple tube-
detector chains is a feasible solution. To reach this goal, various system designs were proposed.
The rationale is that increasing the number of source-detector chains on a given gantry will
proportionally reduce the data acquisition time and improve the temporal resolution [8]. The first
real example is the multisource CT prototype known as the dynamic spatial reconstructor (DSR)
[9], which still demands a mechanical scan, and is not stationary.
Subsequently, several multi-source CT schemes were designed. Liu et al. demonstrated the
improved image quality in a simulated five-source cone-beam micro-CT using a Feldkamp-type
reconstruction algorithm [10]. Zhao et al. conceptualized a triple-source helical/saddle cone-
beam CT system and developed an exact volumetric reconstruction algorithm. Cao et al.
proposed an 80 multi-source interior CT architecture that employs three stationary x-ray source
arrays and three detector operated in the interior tomography mode [11]. For most of multiple
source-based x-ray imaging system designs, a general challenge is how to collect high-quality
data and perform interior CT reconstruction [12]. In interior tomography, x-ray beams are
restricted to travel through a local region of interest (ROI), thereby compromising the
measurement on the ROI by both surrounding tissues and data noise due to Poisson noise and
Compton scattering [13, 14]. Nevertheless, interior tomography enables utilization of smaller x-
ray detectors, allows more source-detector chains in a given gantry space, and provides high
temporal resolution thanks to the parallelism offered by multiple imaging chains. Clearly, the
multi-source interior CT architecture has the potential to achieve ultrahigh temporal resolution
for all-phase cardiac CT imaging. However, most of these multiple-source CT cardiac imaging
systems have not been prototyped so far for sparsity of the resultant dataset practically feasible,
complexity of the system engineering, and cost incurred in such a highly non-trivial undertaking.
Over recent years, the use of artificial intelligence (AI) [15], specifically deep learning, has
become instrumental for processing, reconstruction, analysis and interpretation of medical
images [16]. In the field of dynamic cardiac imaging, the adaption of deep learning techniques
is now the mainstream to remove image artifacts in the cases of limited and compromised
measurements [17]. Bello et Al. took image sequences of the heart acquired using cardiac MRI
to create time-resolved 3D segmentations using a fully convolutional network aided by
anatomical shape priors [18]. To provide a high-quality image in phase-contrast magnetic
resonance imaging, Vishnevskiy et Al. proposed an efficient model-based deep neural
reconstruction network to avoid hyperparametric turning and expensive computational
overhead of the compressed sensing reconstruction methods for clinical aortic flow analysis
[19]. To improve the overall quality of 4D CBCT images, the two CNN models, named N-Net
and CycN-Net were proposed in [20]. By all means, any improvement in the analysis of cardiac
dynamics may lead to better diagnosis and monitoring of CVDs.
The state-of-the-art cardiac CT scanner with a wide-area detector covers an entire heart within
a single cardiac cycle. For example, the RevolutionTM CT scanner (GE HealthCare) achieves a
temporal resolution 140ms [7]. As the posterior left ventricular wall moves at a maximum
velocity of 52.5 mm/s, a scan time of 19.1ms or less is ideal to avoid motion artifacts. If the
average mean velocity is considered, the scan time should still be 41.8ms. Hence, currently it
remains a major challenge in improving temporal resolution for high-quality cardiac CT. Even if
deep learning is being used on the current CT systems, the temporal resolution remains clearly
sub-optimal.
The rest of the paper is organized as follows. In the next section, we introduce our experimental
multi-source CT system prototype, the first Stationary Multi-source AI-based Real-time
Tomography (SMART) system for dynamic cardiac imaging. Then, we describe our
unsupervised deep reconstruction method that integrates both sparsified image prior (SIP) and
deep image prior (DIP), referred to as the SIP-DIP network. In the third section, we report our
representative results, showing the feasibility and merits of our SMART system. It produces
decent image quality from ultra-few projections and is clearly superior to the results using
competing methods. In the last section, we discuss related issues and conclude the paper.
II. Methods
A. SMART System Prototype
Figure 1. Multi-source imaging system “SMART” prototyped at the FirstImaging. (a) A
photograph of the real system and (b) the imaging geometry.
The SMART system consists of 29 x-ray source and detector pairs, all of which are fixed
on a circular track. In each pair, a 5 kW monoblock x-ray source and an IGZO flat panel detecor
of 153.6 × 153.6 mm2 imaging area are used. Both the source-isocenter-distance (SID) and
detector-isocenter-distance (DID) are set to 2,000 mm and 1,000 mm respectively. Each
detector cell covers an area of 0.2×0.2 mm2. The X-ray beam generated by the x-ray source is
collimated through the gap between neighboring detectors. An animal to be imaged is place
inside the imaging ring with a zooming factor of 1.87. For more details, please refer to the recent
patent [21].
During the CT data collection, these imaging pairs are turned on to capture cone-beam
projections simulaneously. A sequence of x-ray pulses is fired at 10 frames per second (fps).
Since the x-ray sources are symmetrically distributed, a rotation range of 12.4 degrees is
sufficient for high density sampling in the data domain, which can be used for evaluation of the
imaging fidelity. In our rat experiments, the x-ray energy is set at 70 kV, the current to 30 mA,
and the pulse width of exposure 20 ms. Since there is no grid mounted on the detector surface,
projection calibration and scattering correction are applied by our imaging software First4D.
B. Compressed Sensing Inspired Reconstruction
Solving the CT image reconstruction problem is to recover an underlying image from projection
datal. Let 𝑨 ∈ ℝ𝑚×𝑁(𝑚 ≪ 𝑁) be a discrete-to-discrete linear transform representing a CT
system model from image pixels to detector readings; 𝒚 ∈ ℝ𝑚 is an original dataset, 𝒆 ∈ ℝ𝑚 is
data noise in 𝒚 , and 𝒙 ∈ ℝ𝑁 is the image to be reconstructed, and most relevantly 𝑚 ≪ 𝑁
signifies that the inverse problem is highly under-determined. Furthermore, 𝑳 represents a
sparsifying transform to enforce prior knowledge on the image content. Conventionally, a
feasible solution can be achieved by optimizing the 𝓵1-norm surrogate as follows:
𝒙∗ = argmin𝒙
‖𝑳𝒙‖1, subject to 𝒚 = 𝑨𝒙 + 𝒆. (1)
In most cases of CT image reconstruction, the optimization problem Eq. (1) is solved using an
iterative algorithm. Eq. (1) can be converted to the following minimization problem:
𝒙∗ = argmin𝒙
1
2‖𝒚 − 𝑨𝒙‖2
2 + λ‖𝑳𝒙‖1, (2)
where λ > 0 balances the data fidelity 1
2‖𝒚 − 𝑨𝒙‖2
2 and an image-based sparsity‖𝑳𝒙‖1 . The
goal of Eq. (2) is to find an optimized solution by minimizing the objective function. In this context,
there are different regularized priors considered in the past years, including total variation [22],
low-rank [23], low-dimensional manifold [24], sparse coding [25], and especially tensor-based
dictionary learning [26] which is both effective and efficient in our previous studies
A tensor is a multidimensional array. The Nth order tensor is defined as 𝓧 ∈ ℛ𝐼1×𝐼2×...×𝐼𝑁,
whose element is 𝑥𝑖1×𝑖2×...𝑖𝑁, 1 ≤ 𝑖𝑛 ≤ 𝐼𝑛and𝑛 = 1,2, . . . , 𝑁. Particularly, if N equals 1 or 2, the
corresponding tensor is degraded to a vector or matrix. A tensor can be multiplied by a
vector or a matrix. Therefore, the mode-n product of a tensor 𝓧 with a matrix 𝜢 ∈ ℛ 𝐽×𝐼𝑛 can
be defined by 𝑿 ×𝑛 𝜢 ∈ ℛ 𝐼1×𝐼2×...×𝐼𝑛−1×𝐽×𝐼𝑛+1×...×𝐼𝑁, whose element in ℛ𝐼1×𝐼2×...×𝐼𝑛−1×𝐽×𝐼𝑛+1×...×𝐼𝑁
is calculated 𝑎𝑠 ∑ 𝑥𝑖1×𝑖2×...𝑖𝑁
𝐼𝑛𝑖𝑛=1 ℎ𝑗×𝑖𝑛
. In this work, we only consider the case where 𝓧 is a
3rd tensor.
Suppose that there are a set of the 3rd-order tensors 𝓧(𝑡) ∈ ℛ𝐼1×𝐼2×𝐼3 and 𝑡 = 1,2, . . . , 𝑇 .
Tensor-based dictionary learning can be implemented by solving the following optimization
problem:
argmin𝑫,𝜶𝑡
∑ ‖𝓧(𝑡) − 𝑫×4𝛼𝑡‖
𝐹
2𝑇𝑡=1 , s.t., ‖𝜶𝑡‖0 ≤ 𝐿1, (3)
where 𝑫 = {𝑫(𝑘)} ∈ ℛ 𝐼1×𝐼2×𝐼3×𝐾 is a tensor dictionary, 𝑘 and 𝐿1 represent the number of atoms
in the dictionary and level of sparsity respectively, ‖∙‖𝐹and ‖∙‖0 denote the Frobenius-norm and
L0-norm respectively.
The K-CPD algorithm can be employed to train a tensor dictionary. The minimization
problem Eq. (1) can be solved using the alternative direction minimization method (ADMM).
The first step is to update the sparse coefficient matrix using the multilinear orthogonal matching
pursuit (MOMP) technique for a fixed tensor dictionary. The second step is to update the tensor
dictionary given a sparse coefficient matrix. Through alternatively updating the sparse
coefficient and tensor dictionary, both of them will be gradually optimized.
The tensor dictionary reconstruction model in cone-beam geometry can be formulated
as
𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝜶𝑠,𝒎𝑠
1
2‖𝓨 − 𝑨𝓧‖2
2 + 𝜆(∑ ‖ℤ𝑠(𝓧) − 𝑫𝑚 ×4 𝒎𝑟 − 𝑫 ×4 𝜶𝑠‖𝐹2 + ∑ 𝜅𝑠‖𝜶𝑠‖0𝑠𝑠 ). (4)
where 𝓧 ∈ ℛ𝐼1×𝐼2×𝐼3 and 𝓨 ∈ ℛ𝐽1×𝐽2 are the 3rd-order reconstructed image and projection
tensors respectively, 𝐼1 , 𝐼2 and 𝐼3 are for the reconstructed image volume, 𝐽1 and 𝐽2 for the
numbers of detector cells and projection views respectively, 𝒎𝑟 presents the mean vector
of each channel, the operator ℤ𝑠 extracts the sth tensor block (𝑁 × 𝑁 × 𝑀) from 𝓧, and 𝜶𝑠 ∈
ℛ𝐾 is the sparse representation coefficient of the rth tensor block. The 𝑫 = {𝑫(𝑘)} ∈
ℛ𝑁×𝑁×𝑆×𝐾 is a trained tensor dictionary. 𝑫𝑚 = {𝑫𝑚
(𝑘)} ∈ ℛ𝑁×𝑁×𝑆×𝑆
represents the mean
removal process.
To solve the problem of Eq. (4), we introduce 𝓩 and convert Eq. (4) as follows:
𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝓩,𝓦,𝜶𝑠,𝒎𝑠
1
2‖𝓨 − 𝑨𝓧‖2
2 +𝜂
2‖𝓧 − 𝓩 − 𝓦‖2
2 + 𝜆(∑ ‖ℤ𝑠(𝓩) − 𝑫𝑚 ×4 𝒎𝑠 − 𝑫 ×4 𝜶𝑠‖𝐹2 +𝑠
∑ 𝜅𝑠‖𝜶𝑠‖0𝑠 ). (5)
where 𝜂 > 0 is a balance factor. The problem Eq. (5) can be solved by dividing it into
following sub-problems:
𝑎𝑟𝑔𝑚𝑖𝑛𝓧
1
2‖𝓨 − 𝑨𝓧‖2
2 +𝜂
2‖𝓧 − 𝓩(𝑘) − 𝓦(𝑘)
‖2
2. (6)
𝑎𝑟𝑔𝑚𝑖𝑛𝓩,𝜶𝑠
1
2‖𝓧(𝑘+1) − 𝓩 − 𝓦(𝑘)
‖2
2+ 𝜆 (∑ ‖ℤ𝑠(𝓩) − 𝑫𝑚 ×4 𝒎𝑠
(𝑘) − 𝑫 ×4 𝜶𝑠‖𝐹
2
𝑠 + ∑ 𝜅𝑠‖𝜶𝑠‖0𝑠 ), (7)
𝑎𝑟𝑔𝑚𝑖𝑛𝒎𝑠
‖ℤ𝑠(𝓩(𝑘+1)) − 𝑫𝑚 ×4 𝒎𝑠 − 𝑫 ×4 𝜶𝑠
(𝑘+1)‖𝐹
2, 𝑠 = 1, … , 𝑆 , (8)
𝑎𝑟𝑔𝑚𝑖𝑛𝓦
1
2‖𝓧(𝑘+1) − 𝓩(𝑘+1) − 𝓦‖
2
2. (9)
Based on Eq. (6), we compute 𝓧 iteratively:
𝓧(𝑘+1) = 𝓧(𝑘) − (𝑨𝑻𝑨 + 𝜂𝑰)−1
(𝑨𝑻(𝑨𝓧(𝑘) − 𝒚) + 𝜂(𝓧(𝑘) − 𝓩(𝑘) − 𝓦(𝑘))). (10)
Eq. (7) is atypical tensor dictionary learning problem, and can be easily solved. The
solutions to Eqs. (7) and (8) can be also directly obtained.
C. Deep Network Prior
For our intended dynamic cardiac preclinical CT imaging, there is not feasible to have the
ground truth for supervised deep reconstruction, excluding direct adaption of published deep
reconstruction networks such as FBPConvNet, RED-CNN, etc. On the other hand, deep
convolutional networks enjoy excellent performance in learning realistic image priors from many
example images. In fact, the structure of a properly designed convolutional network is sufficient
to capture a great deal of low-level information as a deep image prior. Specifically, deep image
prior (DIP) theory shows that a randomly-initialized neural network can serve as a novel image
prior with excellent results in the field of inverse problems [27]. In other words, not all image
priors must be learned from data, and a great deal of image statistics can be captured by the
structure of a deep convolutional network through independent learning. This is true to
capitalize data-driven image statistics to solve imaging problems without a ground trutht.
Assuming a deep decoding network with a parametric function 𝓧 = 𝒇𝜽(𝐰) that maps a code
vector 𝐰 to an image 𝓧 . The recovery network can be used to model a complex mapping
function over images. The idea of DIP is that a significant amount of information about the
distribution of permissible images is reflected in the network structure. Rather than training on
a big dataset, 𝒇𝜽 has no ability of understanding specific concepts on or features of a specified
object. However, it has a great power to capture low-level statistics of relevant images .
Similar to a conventional prior regularizing an inverse problem, we formulate the energy
minimization problem:
𝓧∗ = argmin𝓧
𝐸(𝓧, 𝓧0) + 𝑟(𝓧), (11)
where 𝐸(𝓧, 𝓧0) represents a task-dependent data term, 𝓧0 is a degraded image, and 𝑟(𝓧)
is a regularizer. The 𝐸(𝓧, 𝓧0) is usually chosen as the L2-norm or L1-norm. The regularizer
𝑟(𝓧) is often not tied to a specific application, because it captures a general knowledge of
images. Total Variation (TV) is a simple example to encourage uniform regions in an image. In
our reconstruction scheme, DIP replaces an explicit analytic regularizer 𝑟(𝓧) with an implicit
prior captured by the deep neural network as follows:
𝜽∗ = argmin𝜽
𝐸(𝒇𝜽(𝐰); 𝓧0), 𝓧∗ = 𝒇𝜽∗(𝐰), (12)
That is, the (local) minimizer 𝜽∗ is obtained using an optimizer such as a gradient descent
search algorithm, starting from a random initialization of the parameters 𝜽.
D. SIP-DIP Network
Given the above-described key algorithmic ingredients, we are now ready to describe our
overall reconstruction scheme integrating both sparsified image prior (SIP) and deep image
prior (DIP), which is referred to as the SIP-DIP network. To fully understand our proposed
reconstruction methodology, let us assume a continuously moving region within an object.
All sources are initially positioned on the circular imaging ring to simultaneously radiate an
object at the current time frame but a dataset of 29 projections appears too sparse to obtain
high quality images.
To reconstruct a high-quality image from such an under-sampling dataset, it is helpful
to explore the synergy among different time frames. One way is to incorporate a prior image
to impose a constrain in the image space. The quality of a prior image will have a great
impact on the final reconstruction. To obtain a high-quality prior image, we can collect
sufficient data from many x-ray source positions to reconstruct a prior image. However,
because the object varies aperiodically, such a complete dataset cannot be obtained over
time. To the first order approximation, the precision rotation table is rotated to acquire data
for different cardiac phases with inconsistent anatomical configurations due to cardiac
motion. In this case, the resultant projection dataset from different time frames can be
considered as complete (sufficiently many viewing angles) but inconsistent (cardiac motion
patterns). Then, we pre-process the projection data and perform tensor dictionary-based
reconstruction. That is, we rearrange the projections in a chronological order for
spatiotemporal sparsification-promoting image reconstruction.
The overall workflow of our reconstruction approach is illustrated in Figure 2. The first
stage of our proposed SIP-DIP network focuses on performing compressed sending based
reconstruction using the complete but inconsistent projection dataset, where the structure
and intensity of different time frames are taken into account. In this stage, we only need to
reconstruct initial images, which can be treated as the prior image. In the second stage,
we reconstruct a high-quality image using the following model
𝑎𝑟𝑔𝑚𝑖𝑛𝓧,𝜶𝑠1 ,𝒎𝑠1 ,𝜶𝑠2 ,𝒎𝑠2
1
2‖𝓨 − 𝑨𝓧‖2
2 + 𝜆1 (∑ ‖ℤ𝑠1(𝓧) − 𝑫𝑚 ×4 𝒎𝑠1
− 𝑫 ×4 𝜶𝑠1‖
𝐹
2+ ∑ 𝜅𝑠1
‖𝜶𝑠1‖
0𝑠1𝑠1) +
𝜆2 (∑ ‖ℤ𝑠2(𝓧 − 𝓧D) − 𝑫𝑚 ×4 𝒎𝑠2
− 𝑫 ×4 𝜶𝑠2‖
𝐹
2+ ∑ 𝜅𝑠2
‖𝜶𝑠2‖
0𝑠2𝑠2), (13)
To obtain the solution of Eq. (13), a similar strategy for solving Eq. (6) is employed. Here,
we introduce two 𝓩1 and 𝓩2 to replace with 𝓧 and 𝓧 − 𝓧D , and 𝓧D denotes the prior
image obtained in the CS-based reconstruction step. Hence, Eq. (6) can be converted into
𝑎𝑟𝑔𝑚𝑖𝑛𝓧
1
2‖𝓨 − 𝑨𝓧‖2
2 +𝜂1
2‖𝓧 − 𝓩1
(𝑘)− 𝓦1
(𝑘)‖
2
2
+𝜂2
2‖𝓧 − 𝓩2
(𝑘)− 𝓦2
(𝑘)‖
2
2
. (14)
where 𝜂1 > 0 and 𝜂2 > 0 need to be empirically chosen. Similar to what we described above,
𝓦1 and 𝓦2 are error feed-back variables to be updated next. Finally, in the deep network
estimation stage, we incorporate the aforementioned deep image prior to further improve
image quality according Eq. (12), where the code is generated from a noise image and the
target image is reconstructed via prior-constrained reconstruction. The advantages of the
deep network is that it helps remove residual image artifacts without relying on the ground
truth. In this study, we particularly designed the network architecture to encode and decode
images of interest, which are image volumes of dead animals of the same type as that
used in our in vivo studies.
Figure 2. SIP-DIP reconstruction approach for our proposed SMART system. (a) The whole SIP-DIP
workflow and (b) the adapted network to extract a deep image prior.
III. Pre-clinical Imaging Results
A. Experimental Design
(1) Setup, Data, and Codes
To validate the feasibility of our SMART system and SIP-DIP network for high temporal
resolution tomography, we performed initial pre-clinical experiments with encouraging results.
Several preclinical datasets were collected from alive rats. Experimental animal studies were
performed under the Animal Research: Reporting of in vivo Experiments (ARRIVE) guidelines.
Five adult male animals of 250-300 grams were purchased from Jie Si Jie Laboratory Animal
Co., Ltd. (Shanghai, China). The animal experimental protocol was approved by the Institutional
Animal Care and Use Committee (IACUC) of Shanghai Jiao Tong University, Shanghai, China.
Since the original scans are in cone-beam geometry, we need to reconstruct multiple time
frames to observe the dynamic process. The source to detector distances are between
2,016mm and 2,087mm. The source to isocenter distances are between 1,079mm and
1,167mm. The plat detector contains 768×768 cells, each of which covers an area of 0.2 × 0.2
mm2. There are 29 source-detector pairs simultaneously activated in every scan. As a result,
29 cone-beam projections are distributed over a full-scan range.
To highlight the advantages of our reconstruction approach over the traditional algorithms, The
total-variation [28] and FBP methods were selected for comparison. In our fidelity evaluation,
one died rat was scanned and reconstructed using the 348-view FDK and 29-view SIP-DIP
network methods respectively.
In this study, all the source codes for deep learning reconstruction were programmed in Matlab
2021 and Python with the Pytorch library on a NVIDA RTX3080 card. All programs were
implemented on a PC (24 CPUs @3.70GHz, 32.0GB RAM) with Windows 10.
(2) Training, Validation, and Testing
The Adam method was employed to optimize all of the networks [29]. To address the
inconsistency in sizes of feature maps and that of the input, we padded zeros around the
boundaries before convolution. The batch size was set to 1. The number of epochs was set to
40 in all the cases. The learning rate was set to 2.5×10-4 and decreased by 0.8 after each of 5
epochs.
B. Performance Evaluation
Figure 3 shows three reconstructed images from an alive rat without any lesion so that we only
evaluate image quality in terms of anatomical features. Compared with the FDK results, TV
improved image quality with clear features regularized by the sparsity prior. However, the TV-
based method oversmoothed image details and edges, some tissue features were missing, and
severe blocky artifacts were evident. In contrast, our SIP-DIP network improved image quality
by incorporating both tensor dictionary learning and deep prior learning simultaneously.
Specifically, the image feature indicated by the blue arrow was well preserved in our SIP-DIP
results, while it is difficult to see in the TV-based reconstruction. Generally speaking, the
proposed SIP-DIP network recovered image features significantly better than the TV-based
counterparts, as confirmed by the details indicated by the blue arrows. In reference to the image
structures indicated by the red ellipse in Figure 3, the results reconstructed using all competing
methods were corrupted and distorted. Moreover, the image quality associated with the
competing method was severely compromised by limited-angle and sparse-view artifacts.
These cases show consistently that the imaging performance of our proposed SIP-DIP network
is better than compressed sensing-based reconstruction for this multi-source CT imaging task.
Figure 3. CT images of the alive rat at time frame 1# reconstructed from 29 projections. The transverse images in (a)-
(c) were reconstructed using the FDK, TV, SIP-DIP algorithms, respectively. The images in (a) and (b) have strong
image artifacts. Typical coronal and sagittal slices were also reconstructed using the FDK, TVM and SIP-DIP algorithms
respectively, as shown in (d)-(f) and (g)-(i) respectively. Small features and image edges can be only clearly seen in
our SIP-DIP reconstructions. The display window is [0 0.065] in terms of the linear attenuation coefficient value.
To further demonstrate the advantages of SIP-DIP network, the reconstruction results from
another time frame are given in Figure 4. It can be seen in Figure 4 that our SIP-DIP results
provide more features with clearer image edges than the competing results. Specifically, the
image structure indicated by the blue arrow was well preserved in our SIP-DIP images, while it
is challenging to find these in the TV-based reconstructions. The proposed SIP-DIP method
recovers image features significantly better in these cases, as confirmed by the details indicated
by the blue arrows. In reference to the image structures indicated by the red ellipse in Figure 4,
the results reconstructed using the competing methods were corrupted, and the image
structures indicated by the arrows were signficnatly compromised by limited-angle and sparse-
view artifacts. In contrast, our proposed SIP-DIP network produces clearer image edges and
more natural structures. These cases show consistently that the imaging performance of our
proposed SIP-DIP network is superior for multi-source CT imaging.
Figure 4. CT images of the alive rat at time frame 2# reconstructed from 29 projections. The transverse images in (a)-
(c) were reconstructed using the FDK, TV, SIP-DIP algorithms, respectively. The images in (a-c) have unacceptable
image artifacts. The artifacts in (d) mess up image structures. Typical coronal and sagittal slices were also
reconstructed using the FDK, TVM and SIP-DIP algorithms respectively, as shown in (d)-(f) and (g)-(i) respectively. The
display window is [0 0.065]. The images (a)-(b), (d)-(e) and (g)-(h) from sparse data using different methods
demonstrate remarkable image quality variations. The images (a), (d) and (g) are unacceptable for strong artifacts,
poor texture and inability to assess small and/or large structures. The image (b), (e) and (h) has little artifacts but is too
smooth with unacceptable texture for clinical usability. The image (c), (f) and (i) reconstructed with our proposed network
has optimal image quality in terms of texture (yellow circle), no major artifacts, and acceptable visualization of small
structures (red circle).
Dynamic Cardiac Imaging
To showcase a dynamic cardiac imaging capability of the SMART system, the reconstruction
results from 5 time frames are presented in Figure 5. It can be seen in Figure 5 that our SIP-
DIP network provides dynamic cardiac features defined with clear edges. Specifically, the
cardiac features indicated by the yellow oval are well visualized in our SIP-DIP images, while
they are blurry in the TV-based reconstruction and even completely disappeared in the FDK
results. The proposed SIP-DIP method improves cardiac features significantly. In reference to
the image structures indicated by the green ellipse in Figure 5, the results reconstructed using
the competing methods were inferior. These promising cases show consistently that our
proposed SIP-DIP network satisfies the requirement of dynamic cardiac imaging, while results
the compressed sensing-based reconstruction method failed to do so.
Figure 5. Sequential CT images of the alive rat at different 5 time frames to visualize the dynamic changes from only
29 projections. The images in (a-e), (f-j) and (k-o) were reconstructed using the FDK, TV, SIP-DIP methods,
respectively. The display window is [0 0.065] in terms of the linear attenuation coefficient.
Fidelity Study
Although the SIP-DIP network offers the best reconstruction performance visually, how to
quantify the imaging performance is important. Toward this goal, one died rat was scanned and
reconstructed in Figure 6. Here, the results were reconstructed using FDK from full-scan
projections; i.e., 348 views (29×12). In contrast, our SIP-DIP network performed the image
reconstruction from only 29 views. Clearly, it can be inferred based on Figure 6 that our
proposed SIP-DIP network can reconstruct excellent images without inducing artifacts, which
we believe represents the fastest dynamic preclinical cardiac imaging study among all the
similar results ever reported so far.
Figure 6. CT images of the dead rat to validate the accuracy and reliability of our SIP-DIP reconstruction. The images
in (a) and (c), (b) and (d) were reconstructed using FDK from 348 views and SIP-DIP from only 29 views,
respectively. The display window is [0 0.065] in terms of the linear attenuation coefficient.
SIP-DIP Parameters
Our SIP-DIP network belongs to the category of hybrid reconstruction methods, since it
combines deep learning, compressed sensing and algebraic iteration, which means that there
are regularization parameters to be chosen in a task-specific fashion. In our SIP-DIP network,
𝜂, 𝜂1 and 𝜂2 represent the coupling factors to balance the associated components. 𝑘 and 𝐿1
represent the number of dictionary atoms and the level of sparsity in compressed reconstruction
step. 𝑘1 and 𝐿2 represent the number of dictionary atoms and the level of sparsity in the prior-
constrained reconstruction step. In this study, 𝐿1 was set to be the same as 𝐿2. The specific
parameters values are summarized in Table I.
Table I. SIP-DIP network parameters for dynamic cardiac imaging on the SMART system.
Parameters 𝜂 𝜂1 𝜂2 𝑘 𝐿1 𝑘1 𝐿2
Rat Study 0.01 0.2 0.2 5 0.0001 5 0.0005
IV. Discussions & Conclusion
Key features of our approach are multiple. First, our SMART system enables the state of the
art temporal resolution through parallel acquisition of 29 cone-beam projections. Second, our
SIP-DIP network produces decent image quality from only 29 views, setting a record in the area
of sparse-data tomographic reconstruction. Third, the unprecedented spatiotemporal
tomographic imaging performance opens a door to many research opportunities in not only
dynamic cardiac imaging tasks but also contrast-enhanced cancer studies. Finally, our SMART
system could be also viewed as a precursor to a clinical prototype.
Our SMART has been shown successful in significantly improving temporal resolution to meet
the real-time imaging requirement for cardiac imaging. Given the parallel-imaging hardware
architecture, a major attention should be paid to scatter correction, geometric calibration, and
noise reduction. As far as the image reconstruction is concerned, compared with classic priors
(such as total variation), the advantages of SIP-DIP network are in the following aspects: (1)
incorporating an advanced image prior, i.e., tensor dictionary-based sparsified reconstruction,
to regularize the solution space by combining different time frames; (2) approaching an
instantaneous image reconstruction mainly based on the current data frame and effectively
regularized by the image prior; and (3) achieving superior image quality by leveraging the deep
image prior.
Future studies on cardiac imaging of larger animals will be explored to establish the SMART
system architecture relative to the current CT imaging systems. An important observation in our
study is that realistic image texture and conspicuity of subtle low-contrast lesions are retained
in the SIP-DIP images [30]. If these advantages remain in large patient studies, it could lead to
an improved clinical CT imaging performance from sparsely sampled data.
In conclusion, we have for the first time reported the feasibility of the multi-source CT imaging
system in cardiac imaging applications using compressed sensing and deep learning in a hybrid
reconstruction scheme. It has been established in our preclinical imaging experiments that our
proposed SMART system and SIP-DIP network reconstructs images of ultrahigh temporal
resolution. Our SMART system has consistently produced nearly real-time reconstruction
results of the beating heart in a rat model with contrast material injected. We believe that such
a SMART imaging technology has a significant potential for dynamic biomedical imaging
applications in general.
References
1. Buzug, T.M., Computed tomography, in Springer handbook of medical technology. 2011,
Springer. p. 311-342.
2. Primak, A.N., et al., Relationship between noise, dose, and pitch in cardiac multi–detector row
CT. Radiographics, 2006. 26(6): p. 1785-1794.
3. Russo, V., et al., 128-slice CT angiography of the aorta without ECG-gating: efficacy of faster
gantry rotation time and iterative reconstruction in terms of image quality and radiation dose.
European radiology, 2016. 26(2): p. 359-369.
4. Nien, H. and J.A. Fessler, Relaxed linearized algorithms for faster X-ray CT image reconstruction.
IEEE transactions on medical imaging, 2015. 35(4): p. 1090-1098.
5. Samson, K., A Mobile Stroke CT Unit Cuts tPA Administration Times By One-Third: Would It Work
in the US? Neurology Today, 2012. 12(24): p. 1-16.
6. Cardiovascular diseases. 2021; Available from: https://www.who.int/health-
topics/cardiovascular-diseases/.
7. Pontone, G., et al., Dynamic stress computed tomography perfusion with a whole-heart
coverage scanner in addition to coronary computed tomography angiography and fractional
flow reserve computed tomography derived. JACC: Cardiovascular Imaging, 2019. 12(12): p.
2460-2471.
8. Kalender, W.A., CT: the unexpected evolution of an imaging modality. European Radiology
Supplements, 2005. 15(4): p. d21-d24.
9. Robb, R.A., et al., The dynamic spatial reconstructor. Journal of medical systems, 1980. 4(2): p.
253-288.
10. Liu, Y., et al., Half‐scan cone‐beam CT fluoroscopy with multiple x‐ray sources. Medical
physics, 2001. 28(7): p. 1466-1471.
11. Gong, H., et al., X‐ray scatter correction for multi‐source interior computed tomography.
Medical physics, 2017. 44(1): p. 71-83.
12. Wang, G. and H. Yu, The meaning of interior tomography. Physics in Medicine & Biology, 2013.
58(16): p. R161.
13. Yu, H. and G. Wang, Compressed sensing based interior tomography. Physics in medicine &
biology, 2009. 54(9): p. 2791.
14. Sharma, K.S., et al., Scout-view assisted interior micro-CT. Physics in Medicine & Biology, 2013.
58(12): p. 4297.
15. Russell, S. and P. Norvig, Artificial intelligence: a modern approach. 2002.
16. Litjens, G., et al., A survey on deep learning in medical image analysis. Medical image analysis,
2017. 42: p. 60-88.
17. Hernandez, K.A.L., et al., Deep learning in spatiotemporal cardiac imaging: A review of
methodologies and clinical usability. Computers in Biology and Medicine, 2020: p. 104200.
18. Bello, G.A., et al., Deep-learning cardiac motion analysis for human survival prediction. Nature
machine intelligence, 2019. 1(2): p. 95-104.
19. Vishnevskiy, V., J. Walheim, and S. Kozerke, Deep variational network for rapid 4D flow MRI
reconstruction. Nature Machine Intelligence, 2020. 2(4): p. 228-235.
20. Zhi, S., et al., CycN-Net: A Convolutional Neural Network Specialized for 4D CBCT Images
Refinement. IEEE Transactions on Medical Imaging, 2021.
21. Xi, Y., CT imaging system and imaging method. 2021: China
22. Wang, Y., et al., A new alternating minimization algorithm for total variation image
reconstruction. SIAM Journal on Imaging Sciences, 2008. 1(3): p. 248-272.
23. Trémoulhéac, B., et al., Dynamic MR Image Reconstruction–Separation From Undersampled
(${\bf k}, t $)-Space via Low-Rank Plus Sparse Prior. IEEE transactions on medical imaging, 2014.
33(8): p. 1689-1701.
24. Cong, W., et al., CT image reconstruction on a low dimensional manifold. Inverse Problems &
Imaging, 2019. 13(3).
25. Lee, H., et al. Efficient sparse coding algorithms. in Advances in neural information processing
systems. 2007.
26. Wu, W., et al., Low-dose spectral CT reconstruction using image gradient ℓ0–norm and tensor
dictionary. Applied Mathematical Modelling, 2018. 63: p. 538-557.
27. Ulyanov, D., A. Vedaldi, and V. Lempitsky. Deep image prior. in Proceedings of the IEEE
conference on computer vision and pattern recognition. 2018.
28. Sidky, E., et al., Do CNNs solve the CT inverse problem. IEEE Transactions on Biomedical
Engineering, 2020.
29. Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv, 2014.
30. Singh, R., et al., Image quality and lesion detection on deep learning reconstruction and
iterative reconstruction of submillisievert chest and abdominal CT. American Journal of
Roentgenology, 2020. 214(3): p. 566-573.