Note: Descriptions are shown in the official language in which they were submitted.
LHS3-0128CA PATENT
CONTRAST-AGENT-FREE MEDICAL DIAGNOSTIC IMAGING
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to medical imaging, and more particularly
medical imaging
acquired without administration of a contrast agent.
Description of the Related Art
Contrast agent (CA) administration to a patient is a frequent prerequisite for
medical
imaging.
Taking heart imaging as an example, gadolinium-based CA is administered to a
patient
for cardiac magnetic resonance (MR) imaging, for example as part of current
ischemic heart
disease (IHD) treatment workflow in cardiac radiology (Beckett et al., 2015).
IHD
diagnosis/treatment is a relevant example of cardiac MR imaging as a majority
of patients
undergoing cardiac MR imaging are being evaluated for possible myocardial
ischemia, and IHD
in general and subtypes of IHD may be distinguished according to patterns of
contrast
enhancement. CA imaging uses chemical substances in MR scans. After the CA is
injected into
the body, CA imaging produces a late gadolinium enhancement (LGE) image to
illustrate IHD
scars that are invisible under regular MR imaging and improves the clarity of
other internal and
surrounding cardiac tissues (i.e., muscles, cavities, and even blood).
Terminology of early versus late gadolinium enhancement references a lapsed
time after
injection for acquiring imaging data. An advantage of LGE is due to a relative
contrast
enhancement change between healthy and diseased tissue at a later time after
injection of CA
favoring enhancement of diseased tissue. For example, at an early time (1-3
min post-injection)
gadolinium resides primarily in the blood pool and healthy myocardium. At a
later time (5-20
min post-injection) gadolinium is relatively cleared from healthy tissue and
is relatively retained
by diseased tissue.
After the CA imaging, manual segmentation helps radiologists to segment
multiple
cardiac tissues to delineate diagnosis-related tissues (scars, myocardium,
etc.), and the subsequent
quantitative evaluation of these segmented tissues results in various
diagnosis metrics to
accurately report the presence or the progression of IHD.
However, with this workflow (i.e., CA imaging first followed by manual
segmentation
second), there are still concerns regarding toxicity, high inter-observer
variability, and
-1-
Date Recue/Date Received 2020-12-30
ineffectiveness. 1) CAs have been highlighted in numerous clinical papers
showing their potential
toxicity, retention in the human body, and importantly, their potential to
induce fatal nephrogenic
systemic fibrosis (Ordovas and Higgins, 2011). 2) Manual segmentation has well-
known issues
regarding high inter-observer variability and non-reproducibility, which are
caused by the
difference in expertise among clinicians (Ordovas and Higgins, 2011). 3) CA
imaging followed
by segmentation leads to additional time and effort for patient and clinician,
as well as high
clinical resource costs (labor and equipment).
To date, a few initial CA-free and automatic segmentation methods have been
reported.
However, even the state-of-the-art methods only produce a binary scar image
that fails to provide
.. a credible diagnosis (Xu et al., 2018a; 2018b).
As another example of medical imaging acquired with CA administration, the MR
examination of liver relies heavily on CA injection. For example, in liver
cancer diagnosis, non-
contrast enhanced MR imaging (NCEMRI) obtained without CA injection can barely
distinguish
areas of hemangioma (a benign tumor) and hepatocellular carcinoma (HCC, a
malignant tumor).
On the contrary, contrast-enhanced MRI (CEMRI) obtained with CA injection
shows the area of
hemangioma as a gradual central filling and bright at the edge and the area of
HCC as entirely or
mostly bright through the whole tumor, which provides an accurate and easy way
to diagnose
hemangioma and HCC.
However, gadolinium-based CA brings inevitable shortcomings, suffering from
high-risk,
time-consuming, and expensive disadvantages. The high-risk disadvantage is due
to potential
toxic effect of gadolinium-based CA injection. The time-consuming disadvantage
comes from the
MRI process itself and the waiting-time after CA injection. The expensive
disadvantage mainly
comes from the cost of CA; in the USA alone, conservatively, if each dose of
CA is $60, the
direct material expense alone equates to roughly $1.2 billion in 2016
(Statistics from IQ-AI
Limited Company, USA).
Accordingly, there is a need for contrast-agent-free medical diagnostic
imaging.
SUMMARY OF THE INVENTION
In an aspect there is provided, a medical imaging method for concurrent and
simultaneous
synthesis of a medical CA-free-AI-enhanced image and medical diagnostic image
analysis
comprising:
receiving a medical image acquired by a medical scanner in absence of contrast
agent
enhancement;
-2-
Date Recue/Date Received 2020-12-30
providing the medical image to a computer-implemented machine learning model;
concurrently performing a medical CA-free-AI-enhanced image synthesis task and
a
medical diagnostic image analysis task with the machine learning model;
reciprocally communicating between the image synthesis task and the image
analysis task
for mutually dependent training of both tasks.
In another aspect there is provided, a medical imaging method for concurrent
and
simultaneous synthesis and segmentation of a CA-free-AI-enhanced image
comprising:
receiving a magnetic resonance (MR) image acquired by a medical MR scanner in
absence of contrast agent enhancement;
providing the MR image to a progressive framework of a plurality of generative
adversarial networks (GAN);
inputting the MR image into a first GAN;
obtaining a coarse tissues mask from the first GAN;
inputting the coarse tissues mask and the MR image into a second GAN;
obtaining a CA-free-AI-enhanced image from the second GAN;
inputting the CA-free-AI-enhanced image and the MR image into a third GAN;
obtaining a diagnosis-related tissue segmented image from the third GAN.
In yet another aspect there is provided, a medical imaging method for
concurrent and
simultaneous synthesis of a CA-free-AI-enhanced image and tumor detection
comprising:
receiving a magnetic resonance (MR) image acquired by a medical MR scanner in
absence of contrast agent enhancement;
providing the MR image to a tripartite generative adversarial network (GAN)
comprising
a generator network, a discriminator network and a detector network;
inputting the MR image into the generator network;
obtaining a CA-free-AI-enhanced image and an attention map of tumor specific
features
from the generator network;
inputting the CA-free-AI-enhanced image and the attention map into the
detector
network;
obtaining a tumor location and a tumor classification extracted from the CA-
free-AI-
enhanced image by the detector network;
training the generator network by both adversarial learning with the
discriminator network
and back-propagation with the detector network.
-3 -
Date Recue/Date Received 2020-12-30
In further aspects there are provided, systems and non-transitory computer
readable media
for execution of concurrent and simultaneous synthesis of a medical CA-free-AI-
enhanced image
and medical diagnostic image analysis described herein.
For example, there is provided, a medical imaging system for concurrent and
simultaneous synthesis of a medical CA-free-AI-enhanced image and medical
diagnostic image
analysis comprising:
an interface device configured for receiving a medical image acquired by a
medical
scanner in absence of contrast agent enhancement;
a memory configured for storing the medical image and a computer-implemented
machine learning model;
a processor configured for:
inputting the medical image to the computer-implemented machine learning
model;
concurrently performing a medical CA-free-AI-enhanced image synthesis task and
a
medical diagnostic image analysis task with the machine learning model;
reciprocally communicating between the image synthesis task and the image
analysis task
for mutually dependent training of both tasks.
As another example there is provided, a non-transitory computer readable
medium
embodying a computer program for concurrent and simultaneous synthesis of a
medical CA-free-
AI-enhanced image and medical diagnostic image analysis comprising:
computer program code for receiving a medical image acquired by a medical
scanner in
absence of contrast agent enhancement;
computer program code for providing the medical image to a computer-
implemented
machine learning model;
computer program code for concurrently performing a medical CA-free-AI-
enhanced
image synthesis task and a medical diagnostic image analysis task with the
machine learning
model;
computer program code for reciprocally communicating between the image
synthesis task
and the image analysis task for mutually dependent training of both tasks.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a schematic of a contrast-agent-free (CA-free) medical imaging
system.
Figure 2 shows a flow diagram of a CA-free medical imaging method.
-4-
Date Recue/Date Received 2020-12-30
Figure 3 shows a schematic of advantages of the present CA-free medical
imaging
technology (Fig. 3A) compared to existing deep learning-based methods (Fig.
3B) and existing
clinical CA-based imaging and manual segmentation by medical experts (Fig.
3C).
Figure 4 shows a schematic of a variant of the present CA-free medical
technology
formed as a progressive sequential causal GAN (PSCGAN) framework.
Figure 5 shows a schematic of a sequential causal learning network (SCLN)
component
integrated within the PSCGAN framework shown in Figure 4.
Figure 6 shows that by integrating SCLN into the GAN architecture as the
encoder of cine
MR images in the generator, the SCLN-based GAN improves the learning
effectiveness of
interest distribution from the latent space of cine MR images, thereby
effectively improving the
generating.
Figure 7 shows a schematic of data flow within the PSCGAN framework and three
linked
GANs executing three progressive phases of PSCGAN ¨ priori generation GAN
(panel a),
conditional synthesis GAN (panel b), and segmentation GAN (panel c). GANs in
the three phases
leverage adversarial training and dedicated loss terms to enhance the
performance of image
synthesis and segmentation tasks. The conditional synthesis GAN and
segmentation GAN
leverage the output of the respective previous GANs to guide the training of
the next GAN as part
of its input.
Figure 8 shows PSCGAN-based synthesis of high-quality late-gadolinium-enhanced-
equivalent (LGE-equivalent) images and PSCGAN-based accurate diagnosis-related
tissue
segmentation images. In LGE-equivalent images, the scar (dashed box, the high
contrast area in
left ventricle (LV) wall) has a clear and accurate presentation when compared
to the real LGE
image. This high contrast area is invisible in cine MR images without CA
injection. In diagnosis-
related tissue segmentation images, the segmented scar (light grey), health
myocardium (dark
grey), and blood pool (intermediate grey) from our method are highly
consistent with the ground
truth in terms of shape, location, and size.
Figure 9 shows that PSCGAN generated an accurate diagnosis-related tissue
segmentation
image and that each component in the PSCGAN effectively improved the
segmentation accuracy.
Figure 10 shows that PSCGAN calculated scar sizes and scar ratios are highly
consistent
with those from the current clinical workflow as shown by comparisons with
Bland-Altman
analysis.
Figure 11 shows that each component in the PSCGAN effectively improves LGE-
equivalent image quality.
-5-
Date Recue/Date Received 2020-12-30
Figure 12 shows that a hinge adversarial loss term in the PSCGAN improved
performance
in LGE-equivalent image synthesis.
Figure 13 shows that PSCGAN corrects the overestimation and boundary error
issues in
existing state-of-the-art scar segmentation methods.
Figure 14 shows that two-stream pathways and the weighing unit in the SCLN
effectively
improve segmentation accuracy, as does multi-scale, causal dilated
convolution.
Figure 15 shows visual examples of the synthesis and segmentation, including
both good
case and bad cases (arrows). The segmented scar appears as light grey and
intermediate grey
areas in our method and the ground truth, respectively. The segmented
myocardium appears as
dark grey and white areas in our method and the ground truth, respectively.
Figure 16 shows a schematic of advantages of the present CA-free medical
imaging
technology (synthetic CEMRI) compared to existing non-contrast enhanced MRI
(NCEMRI)
methods and compared to existing contrast enhanced MRI (CEMRI) methods. There
are four
cases of synthesizing CEMRI from NCEMRI. Subjectl and Subject2 are hemangioma,
a benign
tumor. Subject3 and Subject4 are hepatocellular carcinoma (HCC), a malignant
tumor.
Figure 17 shows a schematic of a variant of the present CA-free medical
technology
formed as a Tripartite-GAN that generates synthetic CEMRI for tumor detection
by the
combination of three associated-task networks, the attention-aware generator,
the CNN-based
discriminator and the R-CNN-based detector. The R-CNN-based detector directly
detects tumor
from the synthetic CEMRI and improves the accuracy of synthetic CEMRI
generation via back-
propagation. The CNN-based discriminator urges generator to generate more
realistic synthetic
CEMRI via adversarial-learning-strategy.
Figure 18 shows that the generator in the Tripartite-GAN aims to synthesize
accurate and
realistic synthetic CEMRI. It uses a hybrid convolution including standard
convolution layers,
dilated convolution layers, and deconvolution layers. The dilated convolution
is utilized to
enlarge receptive fields. The two standard convolution layers and two
deconvolution layers are
connected to the front and back of dilated convolution, which reduces the size
of feature maps to
expand the receptive fields more efficiently. Following the hybrid
convolution, the dual attention
module (DAM including MAM and GAM) enhances the detailed feature extraction
and
aggregates long-range contextual information of the generator, which improves
the detailed
synthesis of the specificity of the tumor and the spatial continuity of the
multi-class liver MRI.
Figure 19 shows a schematic of MAM that enhances the detailed feature
extraction by
utilizing the interdependencies between channel maps X.
-6-
Date Recue/Date Received 2020-12-30
Figure 20 shows a schematic of GAM that captures global dependencies of multi-
class
liver MRI by encoding global contextual information into local features.
Figure 21 shows a schematic of a CNN architecture of the discriminative
network that
receives the ground truth of CEMRI and the synthetic CEMRI, and then outputs
the
discriminative results of real or fake. Its adversarial strategy eagerly
supervises attention-aware
generator to find its own mistakes, which increased the authenticity of the
synthetic CEMRI.
Figure 22 shows a schematic of architecture of the tumor detection network
that receives
synthetic CEMRI and then accurately localizes the Region of Interest (RoI) of
the tumor and
classifies the tumor. Attention maps from the generator newly added into the
detector in the
manner of residual connection improve VGG-16 based convolution operation to
extract tumor
information better, which improves the performance of tumor detection.
Meanwhile, the back-
propagation of Lch prompts the generator to focus on the specificity between
two types of
tumors. Added Lch into Tripartite-GAN achieves a win-win between detector and
generator via
back-propagation.
Figure 23 shows that Tripartite-GAN generated synthetic CEMRI has an equal
diagnostic
value to real CEMRI. In the first and second rows, it is clear that the area
of hemangioma
becomes gradual central filling and bright at the edge in synthetic CEMRI, and
the area of HCC
becomes entirely or mostly bright through the whole tumor. The dark grey and
light grey
windows/boxes represent the hemangioma and HCC, respectively, and enlarge them
on the right.
The third (bottom) row is the synthesis result of healthy subjects.
Figure 24 shows Tripartite-GAN outperforms three other methods in comparison
of
detailed expression of the tumor and highly realistic synthetic-CEMRI. The
pixel intensity curve
and zoomed local patches of tumor area show that our Tripartite-GAN is more
accurate than three
other methods.
Figure 25 shows that the ablation studies of No discriminator, No DAM, No
detector, No
dilated convolution, and No residual learning, which demonstrate contribution
of various
components of Tripartite-GAN to generation of synthetic CEMRI. The pixel
intensity curve and
zoomed local patches of tumor area demonstrate that our Tripartite-GAN is more
accurate and
more powerful in the detailed synthesis. The horizontal coordinate denotes
pixel positions of the
white line drawn in the ground truth, and the vertical coordinate is the pixel
intensity of the
corresponding pixel.
-7-
Date Recue/Date Received 2020-12-30
Figure 26 show contributions of DAM, GAM, and MAM in Tri-partite GAN generated
synthetic CEMRI. The subjectl demonstrates that DAM enhances the detailed
synthesis of
anatomy specificity and the spatial continuity. The subject2 demonstrates that
GAM improves the
spatial continuity of CEMRI synthesis. The subject3 demonstrates that MAM
enhances the
.. detailed feature extraction to improve the discrimination of hemangioma and
HCC. The subject3
shows the failure of not being able to differentiate HCC and Hemangioma when
MAM is
removed, which incorrectly synthesizes the specificity of hemangioma into the
specificity of
HCC. The dark grey windows/boxes of zoomed local patches represent the tumor
area. From left
to right, they are the NCEMRI, the synthetic CEMRI without attention module,
the synthetic
CEMRI with attention module, and the ground truth, respectively.
Figure 27 shows two examples of CEMRI synthesis. The dark grey windows/boxes
marked in the feature maps represent the difference of spatial continuity
between Tripartite-GAN
with and without GAM. The light grey windows/boxes marked in feature maps
represent the
difference of detailed feature extraction between Tripartite-GAN with and
without MAM. The
last two columns show the synthesis results and zoomed local patches of the
tumor area. It is
clear that MAM helps Tripartite-GAN enhance detailed synthesis, and GAM helps
Tripartite-
GAN improve the spatial continuity of synthetic CEMRI.
Figure 28 shows that the generator of Tripartite-GAN with residual learning
has lower
training loss compared with the Tripartite-GAN without residual learning.
Figure 29 shows that attention maps not only focus on the tumor but pay more
attention to
extract all features of all anatomy structure in liver MRI for multi-class
liver MRI synthesis. The
feature maps of VGG-16 without attention maps are more focused on tumor
information. The
feature maps of VGG-16 with attention maps also focus on tumor information but
more accurate
and detailed than without attention maps.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
With reference to the drawings, a system and method for CA-free-AI-enhanced
imaging
devoid of CA administration is described. The system and method compare
favourably with
current CA imaging techniques. The full wording of the term CA-free-AI-
enhanced is contrast-
agent-free-artificial-intelligence-enhanced with the CA-free component
indicative of image or
scan data acquired without CA administration and the AI-enhanced component
indicative of
machine learning enhancement of image/scan data acquired without CA
administration.
-8-
Date Recue/Date Received 2020-12-30
Fig. 1 shows am example of a computer implemented imaging system 2,
incorporating a
MR scanner 4. The MR scanner 4 typically comprises a static field magnet, a
gradient coil, and a
radio frequency (RF) coil disposed in a cylindrical housing of a gantry and an
adjustable, often
motorized, support or table for maintaining a subject in a desired position
(for example, a prone
or supine position) in an open central chamber or bore formed in the gantry
during a scan
procedure.
The static field magnet of the gantry is typically substantially in
cylindrical form, and
generates a static magnetic field inside the open central chamber of the
gantry which is an
imaging region of a subject (patient) using electric current provided from a
static magnetic field
power source in an excitation mode. The gradient coil is also typically
substantially in cylindrical
form, and located interior to the static field magnet. The gradient coil
applies gradient magnetic
fields to the subject in the respective directions of the X axis, the Y axis
and the Z axis, by using
the electric currents supplied from the gradient magnetic field power sources.
The RF coil
transmits RF pulses toward the subject and receives MR signals as RF radiation
emitted from the
subject due to nuclear spin excitation and relaxation. RF pulse transmission
includes an RF pulse
synthesizer and pulse amplifier communicative with an RF coil, while MR signal
reception
includes an RF coil communicative with a signal amplifier and signal
processor. One or more RF
coils may be used for RF pulse transmission and MR signal reception, such that
the RF coil for
RF pulse transmission and MR signal reception may be the same or different.
The static field
magnet, the gradient coil and the RF coil are driven by one or more
controllers.
Directed by a data acquisition scheme, the one or more controllers coordinate
a scan of
the subject by driving gradient magnetic fields, RF pulse transmission and MR
signal reception,
and then communicating the received scan data to a data acquisition component
6.
The data acquisition component 6 incorporates a data acquisition scheme or
data
acquisition computer code that receives, organizes and stores MR scan data
from the RF
coil/controller of the MR scanner. The scan data is sent to an image
reconstruction component 8
incorporating an image reconstruction computer code. The scan data can then be
processed using
the image reconstruction computer code resulting in image data including
multiple images of
predetermined sampling site(s) of the subject. The image reconstruction
computer code can easily
be varied to accommodate any available MR imaging technique. The image data
can then be
processed by a machine learning image synthesis component 10 incorporating
image synthesis
computer code tasked with processing of image data to generate a CA-free-AI-
enhanced image.
The image data can be concurrently processed by a machine learning image
analysis component
-9-
Date Recue/Date Received 2020-12-30
12 incorporating image analysis computer code tasked with processing of image
data to generate
a diagnostic image analysis, such as a tissue segmentation or a tumour
detection. The image
synthesis component 10 and image analysis component 12 are communicative to
reciprocally
guide their respective CA-free-AI-enhanced image synthesis and image analysis
tasks, such that a
.. synthesized CA-free-AI-enhanced image or a precursor thereof generated by
the image synthesis
component 10 is communicated to the image analysis component 12 to guide the
diagnostic
image analysis task, and conversely a diagnostic image result or precursor
thereof generated by
image analysis component 12 is communicated to the image synthesis component
10 to guide the
image synthesis task.
The imaging system 2 is controlled by one or more computers 16 with data and
operational commands communicated through bus 14. The imaging system 2 may
include any
additional component as desired for CA-free-AI-enhanced image synthesis and
image analysis
including multiplexers, digital/analog conversion boards, microcontrollers,
physical computer
interface devices, input/output devices, display devices, data storage devices
and the like. The
imaging system 2 may include controllers dedicated to different components of
the MR scanner
4, such as a sequence controller to provide power and timing signals to
control the gradient coil
magnetic field, RF pulse transmission and/or MR signal reception, or such as a
table controller to
provide power and timing signals to a table motor to control table position
and thereby control
position of a subject in the gantry by moving the subject along a z-axis
through an opening of the
gantry communicative with the interior open chamber of the gantry.
Fig. 2 shows a computer implemented method 20 for contrast agent-free medical
imaging.
The method 20 comprises a pre-scan preparation 30 and positioning of a subject
for MR scanning
of a desired sampling site or anatomical region of interest. Once the subject
is prepared and
positioned within a MR scanner, MR scanning 40 is performed to acquire scan
data at the
sampling site. The scan data is processed to reconstruct 50 image data from
the scan data. The
image data is then concurrently processed in an image synthesis task 60 and a
diagnostic image
analysis task 70. The image synthesis task and the image analysis task
reciprocally communicate
for mutual dependent training of both tasks.
The contrast agent-free medical imaging system and method have been validated
by
experimental testing. Experimental testing results demonstrate the ability of
the contrast agent-
free medical imaging system and method to concurrently provide CA-free-AI-
enhanced image
synthesis and diagnostic image analysis. The following experimental examples
are for illustration
purposes only and are not intended to be a limiting description.
-10-
Date Recue/Date Received 2020-12-30
Experimental Example 1.
The details of Experimental Example 1 are extracted from a prior scientific
publication
(Xu et al., (2020) "Contrast agent-free synthesis and segmentation of ischemic
heart disease
images using progressive sequential causal GANs", Medical Image Analysis, Vol.
62: article
101668), and this scientific publication is incorporated herein by reference
in its entirety. In the
event of inconsistency between the incorporated material and the express
disclosure of the current
document, the incorporated material should be considered supplementary to that
of the current
document; for irreconcilable inconsistencies, the current document controls.
In this Experimental Example 1, a CA-free image is an image that is
synthesized from
image data acquired in absence of contrast agent (CA) administration by a
machine learning
model to achieve an imaging equivalent to CA-enhanced imaging for purposes of
a concurrent
diagnostic image analysis by the machine learning model achieving diagnostic
results comparable
to human expert diagnosis using CA-enhanced imaging. Therefore, in
Experimental Example 1,
the term CA-free can be used interchangeably with the term CA-free-AI-enhanced
(or contrast-
agent-free-artificial-intelligence-enhanced); for example the term CA-free
image can be used
interchangeably with the term CA-free-AI-enhanced image.
Current state-of-the-art CA-free segmentation methods only produce a binary
scar image
that fails to provide a credible diagnosis (Xu et al., 2018a; 2018b). As shown
in Fig. 3B, this
binary scar image can only indicate two categories of pixels: scar and
background. This limited
resolution thus fails to highlight several relevant tissues (e.g., myocardium
and healthy
myocardium, blood pool) recommended according to the clinical protocols of
comprehensive
IHD evaluation. Subsequently, it fails to help radiologists quantitatively
assess multiple tissues to
obtain the most powerful metrics for a credible IHD diagnosis (for example as
shown in Fig. 3C,
scar ratio = size of the scar/size of the myocardium). Because the use of
multiple metrics based
on multiple tissues results in far greater accuracy than using only a metric
based on scar tissue
alone in a credible IHD diagnosis, the limitations of existing segmentation
methods need to be
addressed. Thus, clinicians desire development of more advanced CA-free
technology that can
produce an LGE-equivalent image (i.e., an image that is equivalent to an LGE
image in terms of
usefulness in an IHD diagnosis or from which clinical metrics can be obtained
without CA
injections) and a segmented image (including diagnosis-related tissues, i.e.,
scar, healthy
myocardium, and blood pools, as well as other pixels) (Leiner, 2019).
However, it is very challenging to synthesize an LGE-equivalent image and
accurately
segment diagnosis-related tissues (i.e., scar, healthy myocardium and blood
pools) from 2D+T
-11 -
Date Recue/Date Received 2020-12-30
cine MR images. The pixel-level understanding of LGE images by representation
learning of the
2D+T cine MR images faces numerous challenges. The differences in the
enhancement effects of
the CAs on different cardiac cells result in each of the numerous pixels of
the LGE image
requiring a definite non-linear mapping from the cine MR images.
Representation learning of the
2D+T cine MR has a number of high-complexity issues. The time series
characteristics of 2D+T
cine MR images result in each non-linear mapping requiring a complex mixing of
the spatial and
temporal dependencies of a mass of pixels in the images, especially since
these pixels often have
high local variations. More importantly, a pixel-level understanding of LGE
images is needed to
differentiate between pixels that have very similar appearances (Xu et al.,
2017). The highly
similar intensity of pixels within the tissue on an LGE image often results in
high similarities
between the learned spatial and temporal dependencies of these pixels and
often causes
interference and inaccuracy during mixing.
Existing CA-free automated IHD-diagnosing methods are inefficient in the
representation
learning of cine MR images, as they must contend with a fixed local
observation in both spatial
dependency and temporal dependency extraction (e.g., only adjacent temporal
frames of optical
flow and a fixed spatial convolutional kernel size for deep learning).
However, pixels in 2D+T
cine MR images often have high local variations (i.e., different positions and
motion ranges in
different regions and timestamps). Furthermore, current spatial-temporal
feature learning
methods still struggle with constant learning weights during the mixing of
spatial dependencies
with temporal dependencies (e.g., both 3DConv and ConvLSTM often simply treat
the two
dependencies on each pixel as equal during learning) (Xu et al., 2017).
However, different pixels
have different selection requirements in terms of temporal dependencies and
spatial
dependencies.
Existing progressive networks. Recently, progressive generative adversarial
networks
(GAN) have shown great potential in the tasks of image synthesis and
segmentation (Huang et
al., 2017; Karras et al., 2017; Zhang et al., 2018b). Progressive GAN inherit
the advantage of
adversarial semi-supervised learning from GAN to effectively learn to map from
a latent space to
a data distribution of interest. More importantly, the progressive framework
of such progressive
GAN stacks multiple sub-GAN networks as different phases to take advantage of
the result of the
previous phase to guide the performance of the next phase and greatly
stabilize training.
However, current progressive GAN are designed to train on a single task
because they lack a two-
task generation scheme to handle the synthesis task and segmentation task.
-12-
Date Recue/Date Received 2020-12-30
Existing generative adversarial networks (GANs). GANs (Goodfellow et al.,
2014) have
become one of the most promising deep learning architectures for either image
segmentation
tasks or synthesis tasks in recent years but may face inefficient and unstable
results when two or
more tasks must be solved. GAN comprises two networks, a generator and a
discriminator, where
one is pitted against the other. The generator network learns to map from a
latent space to a data
distribution of interest, while the discriminative network distinguishes
candidates produced by
the generator from the true data distribution. However, a GAN may learn an
erroneous data
distribution or a gradient explosion when the latent space of the
distributions of two tasks
interfere with each other. Conditional GAN, a type of GAN implementation, has
the potential to
learn reciprocal commonalities of the two tasks to avoid interference with
each other because of
its considerable flexibility in how two hidden representations are composed
(Mirza and Osindero,
2014). In conditional GAN, a conditioned parameter y is added to the generator
to generate the
corresponding data using the following equation:
min max V (D, G) = Ex_ pdata (x)[log D (xly)]
G D
Ez_pz(z) [ log(1 ¨ D(G(zly)))]
(1)
where
P data X) represents the distribution of the real data; and
P2 represents the distribution of the generator.
Existing attention model. An attention model successfully weighs the positions
that are
highly related to the task, thereby improving the performance of the
application in various tasks
(Vaswani et al., 2017). It is inspired from the way humans observe images,
wherein more
attention is paid to a key part of the image in addition to understanding an
image as a whole.
Such a model uses convolutional neural networks as basic building blocks and
calculates long-
range representations that respond to all positions in the input and output
images. It then
determines the key parts that have high responses in the long-range
representations and weights
these parts to motivate the networks to better learn the images. Recent work
on attention models
embedded an auto regressive model to achieve image synthesis and segmentation
by calculating
the response at a position in a sequence through attention to all positions
within the same
-13 -
Date Recue/Date Received 2020-12-30
sequence (Zhang et al., 2018a). This model has also been integrated into GANs
by attending to
internal model states to efficiently find global, long-range dependencies
within the internal
representations of the images. The attention model has been formalized as a
non-local operation
to model the spatial-temporal dependencies in video sequences (Wang et al.,
2018). Despite this
progress, the attention model has not yet been explored for the internal
effects of different spatial
and temporal combinations on synthesis and segmentation in the context of
GANs.
A novel progressive sequential causal GAN. A novel progressive sequential
causal GAN
(PSCGAN) described herein, provides a CA-free technology capable of both
synthesizing an
LGE equivalent image and segmenting a diagnosis-related tissue segmentation
image (for
example, scar, healthy myocardium, and blood pools, as well as other pixels)
from cine MR
images to diagnose IHD. As shown schematically in Fig. 3A, this is the first
technology to
synthesize an image equivalent to a CA-based LGE-image and to segment multiple
tissues
equivalently to the manual segmentation performed by experts. A further
advantage of the
described technology is that it is capable of performing concurrent or
simultaneous synthesis and
segmentation.
PSCGAN builds three phases in a step-by-step cascade of three independent GANs
(i.e.,
priori generation GAN, conditional synthesis GAN, and enhanced segmentation
GAN). The first
phase uses the priori generation GAN to train the network on a coarse tissue
mask; the second
phase uses the conditional synthesis GAN to synthesize the LGE-equivalent
image; and the third
phase uses the enhanced segmentation GAN to segment the diagnosis related
tissue image. The
PSCGAN creates a pipeline to leverage the commonalities between the synthesis
task and the
segmentation task, which takes pixel categories and distributions in the
coarse tissues mask as a
priori condition to guide the LGE-equivalent image synthesis and the fine
texture in the LGE-
equivalent image as a priori condition to guide the diagnosis-related tissue
segmentation.
PSCGAN use these two reciprocal guidances between the two tasks to gain an
unprecedentedly
high performance in both tasks while performing stable training.
The PSCGAN further includes the following novel features: (1) a novel
sequential causal
learning network (SCLN) and (2) the adoption of two specially designed loss
terms. First, the
SCLN creatively builds a two-stream dependency-extraction pathway and a multi-
attention
weighing unit. The two-stream pathway multi-scale extracts the spatial and
temporal
dependencies separately in the spatiotemporal representation of the images to
include short-range
to long-range scale variants; the multi-attention weighing unit computes the
responses within and
between spatial and temporal dependencies at task output as a weight and mixes
them according
-14-
Date Recue/Date Received 2020-12-30
to assigned weights. This network also integrates with GAN architecture to
further facilitate the
learning of interest dependencies of the latent space of cine MR images in all
phases. Second, the
two specially designed loss terms are a synthetic regularization loss term and
a self-supervised
segmentation auxiliary loss term for optimizing the synthesis task and the
segmentation task,
.. respectively. The synthetic regularization loss term uses a spare
regularization learned from the
group relationship between the intensity of the pixels to avoid noise during
synthesis, thereby
improving the quality of the synthesized image, while the self-supervised
segmentation auxiliary
loss term uses the number of pixels in each tissue as a compensate output
rather than only the
shape of the tissues to improve the discrimination performance of the
segmented image and
thereby improve segmentation accuracy.
Overview of PSCGAN. As depicted in Fig. 4, PSCGAN cascades three GANs to build
three phases and connect them by taking the output of the previous GAN as an
input of the next
GAN. Moreover, to reduce the randomness during training, all three GANs encode
the cine MR
images by using the same foundational network architecture, a SCLN-based GAN
that includes
an encoder-decoder generator and a discriminator to specially design and
handle time-series
images. Thus, PSCGAN not only have great training stability by using divide-
and-conquer to
separate the segmentation task and synthesis task into different phases but
also undergo effective
training by progressively taking the output of the previous phase as the
priori condition input to
guide the next phase.
Phase I: priori generation GAN. This phase uses the priori generation GAN
(Pri) to
generate a coarse tissue mask (Mpri) from the cine MR images X by adversarial
training. This
coarse segmented image is a rich priori condition, as it contains all pixel
categories and tissue
shapes, locations, and boundaries.
Phase II: conditional synthesis GAN. This phase uses the conditional synthesis
GAN
(Sys) to integrate the coarse tissue mask and the cine MR image to build a
conditional joint
mapping to use the obtained pixel attributes and distributions from the mask
to guide image
synthesis for generating a high-quality LGE-equivalent image
Phase III: enhanced segmentation GAN. This phase uses the enhanced
segmentation GAN
(Seg) to introduce the synthesized image from Sys as a priori condition to
generate the diagnosis-
related tissue segmentation image 'Seg. The synthesized image and all detailed
textures effectively
guide the classification of the tissue boundary pixels.
A component of the PSCGAN is a novel SCLN. SCLN improves the accuracy of time-
series image representations by task-specific dependence selecting between and
within extracted
-15-
Date Recue/Date Received 2020-12-30
spatial and temporal dependencies. By integrating SCLN into the GAN
architecture as the
encoder of cine MR images in the generator, the SCLN-based GAN improves the
learning
effectiveness of the interest distribution from the latent space of cine MR
images, thereby
effectively improving the generating performance on adversarial training.
Sequential causal learning network (SCLN). The SCLN uses a two-stream
structure that
includes a spatial perceptual pathway, a temporal perceptual pathway and a
multi-attention
weighing unit. This network gains diverse and accurate spatial and temporal
dependencies for
improving the representation of the time-series images. In addition, this is a
general layer that can
be used individually or stacked flexibly as the first or any other layer.
Two-stream structure for multi-scale spatial and temporal dependency
extraction. As
shown in Fig. 5, a two-stream structure, which includes a spatial perceptual
pathway and a
temporal perceptual pathway, correspondingly match the two-aspect dependencies
in the time-
series image. It uses two independent, stacked dilated convolutions as multi-
scale extractors to
respectively focus the spatial dependencies and the temporal dependencies in
the time-series
images. Dilated convolution includes sparse filters that use skip points
during convolution to
exponentially grow the receptive field to aggregate multi-scale context
information. It improves
the diversity of both spatial dependencies and temporal dependencies to
include all short-range to
long-range scale variants. The 1D/2D dilated convolutions are formulated as:
Dc
1D : (kernel 441 x)t
kernels ft-ls
S=-Dc
(2)
2D : (x */ kernel)(p) = x(s)kernel(t)
s+It=p
(3)
where x is the 1D/2D signal/image, and 1 is the dilation rater.
The spatial perceptual pathway uses 2D dilated convolution, and the temporal
perceptual
pathway uses 1D dilated convolution. The inputs of both pathways are cine MR
images. The
spatial perceptual pathway regards 2D + T cine MR images as multiple (time t
to time t + n)
independent 2D images. Each input image is learned by a 2D dilated
convolution, where the
number of 2D dilated convolution is the same as the number of frames. The
output of the 2D
dilated convolution in time t is the spatial feature convolved with the frame
of time t only. Thus,
the spatial feature of 2D + T cine MR images can be effectively captured when
combining all 2D
-16-
Date Recue/Date Received 2020-12-30
dilated convolution from time t to time t + n. By contrast, the spatial
perceptual pathway regards
2D + T cine MR images as a whole 1D data. This 1D data is learned by 1D
dilated convolutions
according to its order, where the hidden units of the 1D dilated convolution
that are the same
length as the 1D form of each frame (the length of a 64x64 frame is 4096). The
output of each 1D
dilated convolution time t is the temporal feature convolved with the frame of
time t and the
earlier time in the previous layer. Thus, the temporal feature of 2D + T cine
MR can be
effectively captured when the 1D dilated convolution process reaches the time
t + n.
In this experimental example, both pathways initially stack 6 dilated
convolutions, and the
corresponding dilation rate is [1, 1, 2, 4, 6, 8]. This setting allows the
learned representation to
include all 3 x 3 to 65 x 65 motion and deformation scales. Note that the
stack number still varies
with the spatial and temporal resolution of the time-series image during
encoding. Moreover,
both spatial and temporal perceptual pathways stack 3 stacked dilated
convolutions (1D/2D)
again to build a residual block framework for deepening the network layers and
enriching
hierarchical features. In this experimental example, both paths also adopt a
causal padding to
ensure that the output at time t is only based on the convolution operation at
the previous time.
This causal-based convolution means that there is no information leakage from
the future to the
past. Advantages of this two-stream structure include: 1) two pathways used to
focus on two
aspect dependencies independently; 2) dilated convolution with residual blocks
and shortcut
connections used to extract multiscale and multilevel dependencies and 3)
causal padding used to
understand the time order within the dependencies.
Multi-attention weighing unit for task-specific dependence selection. As shown
in Fig. 5,
the multi-attention weighing unit includes three independent self-attention
layers and an add
operator to adaptively weigh the high-contribution dependences between and
within spatial and
temporal dependencies at the output to perform accurate task-specific
dependence selection
(Vaswani et al., 2017). Two self-attention layers first embed behind both the
spatial perceptual
pathway and the temporal perceptual pathway to adaptively compute the response
of each
pathway's dependence at the output as their weights; then, the add operator
element-wise fuses
the weighed spatial and temporal dependencies; finally, the third self-
attention layer determines
which of the fused spatial-temporal dependences is the task-specific
dependence. The spatial
dependencies from the spatial perceptual pathway are defined as
RC AN
where:
-17-
Date Recue/Date Received 2020-12-30
C is the number of channels; and
N is the number of dependencies.
The spatial self-attention layer first maps these spatial dependencies into
two feature spaces:
f(.) = wf Fs con and 1-14') = Wg-FSc.
or,
It calculates the weight ai to the ith dependencies, where
CxN
a= (al a2 , = = = , , = = = , aN) E R
exp (S i)
CYi
________________________ , where si = f(Tsc..vi) g(-Tscariv) = ENi=1 exp (Si)
(4)
The weighed spatial dependencies aFScom are as follows:
N
V L'aih(Tsconvi)
(5)
W
h VSCon i) = W h FSConv V VSCortv i) ."1",-,Con v
(6)
where VV8, Wf. VV/I,51/14 are the learned weight matrices.
W ) RC'c"
For memory efficiency, g- W W I - ¨ , where
C is the reduced channel number; and
C = C/8..
Note that 8 is a hyperparameter.
By the same token, the temporal self-attention layer enhances the temporal
dependencies
ow from the temporal perceptual path to an attention-weighted
AFTrome õHE RCxN
where
13 (131 , 132, = = = ,13, = = = , 13N) E RCxN are the weights of the
temporal dependencies.
-18-
Date Recue/Date Received 2020-12-30
The add operator elementwise fuses the weighed spatial dependencies and
temporal
dependencies:
FSTConv = FSCon FTConv
(7)
The fused self-attention layer weighs the fused spatial-temporal dependencies:
j-STC our" .
The output of this layer is ' n STI:firn - FID "C x N
This output further adds the input of the map layer after modification with a
learnable scalar (y).
Therefore, the final output is given by Ostcor, TsTror?õ..
Implementation of an SCLN-based GAN for the basic network architecture. This
network
stacks 4 SCLNs and 4 corresponding up-sampling blocks to build a generator.
The network
further stacks 5 convolutional layers to build a discriminator. Both the
generator and
discriminator use conditional adversarial training to effectively perform the
segmentation and
synthesis. As shown in Fig. 6, the generator is an encode-decode 2D+T to 2D
framework
.. modified from U-Net (Ronneberger, 0., Fischer, P., Brox, T., 2015. U-net:
Convolutional
networks for biomedical image segmentation, in: International Conference on
Medical image
computing and computer-assisted intervention, Springer. pp. 234-241). It first
encodes the input
XE R25 x 64 x 64 X 1
(25frames, image size per frame 64 x 64 x 1) by using 4 SCLNs with 2, 2, 2, 2
strides on the spatial perceptual pathway and 4, 4, 4, 4 strides on the
temporal perceptual
pathway. The first SLCN uses two copies of X as the inputs into its spatial
perceptual pathway
and temporal perceptual pathway. Thus, beginning from the second SCLN, the
generator takes
the spatial and temporal perceptual pathway outputs of the previous SCLN as
the input and
encodes a 25 x 4 x 4 x 128 feature from the multi-attention weighing unit
output of the fourth
SCLN. Then, this encoded feature is further reduced to 1 x 1 x 4096 by a fully
connected layer
and is then passed to another fully connected layer to reshape the encoded
feature into a 4 x 4 x
256 feature. Four upsampling blocks (Upsampling-Conv2D-LN) then use this
reshaped feature to
encode an image (i.e., the coarse tissue mask, the LGE-equivalent image or the
diagnosis-related
tissue segmentation image) c R64 x 64 x 1 . Moreover, the generator also uses
a dot layer to reduce
the first dimension of the multi-attention weighing unit output from the first
to the third SCLN
-19-
Date Recue/Date Received 2020-12-30
and a skip connection that is the same as the U-Net to feed the corresponding
upsampling block
with the same feature map size.
The discriminator encodes the output of the generator of the corresponding
phase and
determines whether this output is consistent with the domain of its ground
truth. All 5
convolutional layers have strides of 2. Note that the attention layer is added
between the second
convolutional layer and the third convolutional layer. These attention layers
endow the
discriminator with the ability to verify that highly detailed features in
distant portions of the
image are consistent with each other and to improve the discrimination
performance.
An advantage of this SCLN-based GAN is an accurate encoding of interest
dependencies
from the latent space of cine MR image.
Phase I: priori generation GAN for coarse tissue mask generation. The priori
generation
GAN ( Pri ) is built with the same architecture as the SCLN-based GAN, as
shown in Fig. 7 @art
a). It includes a generator Gpri and a discriminator Dpri . This GAN generates
a coarse tissue mask
Mpri , which focuses on drawing the shape, contour and correct categories for
four classifications
(scar, healthy myocardium, blood pool, and other pixels). This GAN does not
seek a final result
in one step but takes advantage of the shape, contour, and categories of this
rough segmentation
as a priori information to guide the next module to learn the attributes and
distributions of the
pixels. Training of this generator uses multi-class cross-entropy loss.
Although Mpri contains four
classes, the generator is treated as a single classification problem for the
samples in one of these
classes by encoding both the generator output and ground truth to one-hot
vector classes. The
generator can be formulated as follows:
= mce (G pri (X) , Iseg)
n=1
(8)
N
mce = ¨ ¨ _____________________________ [Iseg log M
ISeg) log (1 ¨ Mpri)]
N n=1
(9)
where
keg is the ground truth of Mpri, and N =4.
-20-
Date Recue/Date Received 2020-12-30
Pri
The discriminator training uses the adversarial loss MP' ,
which adopts the recently developed hinge adversarial loss (Vaswani, A.,
Shazeer, N., Parmar,
N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017.
Attention is all you
need, in: Advances in Neural Information Processing Systems, pp. 5998-6008).
This hinge
adversarial loss maps the true sample to a range greater than 1 and maps the
false sample to an
interval less than ¨1. It better converges to the Nash equilibrium between the
discriminator and
generator, thus resulting in less mode collapsing and more stable training
performance than other
GAN losses. It can be formulated as follows:
Dpri _E [min(0, -1 D pri (Iseg))1
Ad v Used"' Pdata
EX,px [ min(0, -1 - Dpri (Gpri (X I
))
LGPri ¨ TR:
¨x-px D Pri (Gni (X))
Ad v
(10)
Phase II: conditional synthesis GAN for high-quality LGE-equivalent image
synthesis.
The conditional synthesis GAN ( Sys ) includes a generator Gsys and a
discriminator Ds ys to
generate an LGE-equivalent image 'Sys . As shown in Fig. 7 @art b), this GAN
introduces the
previously generated course tissue mask to guide the network training by
modifying the SCLN-
based GAN with a fully connected layer in the generator to concatenate the 1 x
1 x 4096 feature
and the mask, the output of which is then fed into the following fully
connected layer and 4
upsampling blocks. Thus, this GAN builds a conditional joint mapping space
between the
segmentation and the synthesis to use the basic attributes and distributions
(i.e., shape, contour,
location, and categories) of the tissues to disentangle different tissue-
feature learning in the cine
MR images and allows the generator to perform accurate and detailed synthesis.
The generator uses the synthetic regularization loss sys
for the training. This loss incorporates an L2-regularization term and an
overlapping group
sparsity anisotropic operator into the recently developed total variation loss
to improve the
quality of the synthesized image (Pumarola, A., Agudo, A., Martinez, A.M.,
Sanfeliu, A.,
Moreno-Noguer, F., 2018. Ganimation: Anatomically-aware facial animation from
a single
image, in: Proceedings of the European Conference on Computer Vision, pp. 818-
833). The total
variation loss has recently shown the ability to significantly reduce the
noise in the synthesized
image during image synthesis. L2-regularization is further incorporated into
the total variation
-21-
Date Recue/Date Received 2020-12-30
loss to measure the computation complexity and prevent overfitting by
penalizing this
complexity. The overlapping group sparsity anisotropic operator is further
incorporated into the
total variation loss. It takes into account group sparsity characteristics of
image intensity
derivatives, thereby avoiding staircase artifacts that erroneously consider
smooth regions as
piecewise regions. This loss is formulated as follows:
y 2
LGsys = E - )VSys 2 v (CISysil hysi )
c/5 hysi:
J
hys¨PG 2
(11)
where i and j are the ith and jth pixel entry of 'Sys,
v> 0 is a regularization parameter, and
(p(.) is overlapping group sparsity function. Overlapping group sparsity
anisotropic operator is
described as
0(1) = E uiK(01 2
i,j=1
(12)
iIC =
i¨n11+1,i¨mi +1, j-7771+1
(13)
where K is the group size;
11.11 = I K-1
L 2 1 ; and
m2 =LU
The discriminator is trained using an adversarial loss term and a synthetic
content loss
=Dcv,
.. term: 1) the synthesis adversarial loss Adv
adopts the hinge adversarial loss and can be formulated as:
-22-
Date Recue/Date Received 2020-12-30
tAdv psYs ¨ ¨ (IsysY'Pdata
E [mm (0, ¨1 + DsegarSys))]
s
-Exp
[ min(0, ¨1 ¨ DSys(GSysq MPri)))]
LGsY' = E
mv X¨Px Dsys (Gsys(X MPH))
(14)
where
Sys is the ground truth (i.e, LGE image);
'Sirs
2) the synthetic content loss Cant
uses feature maps of the 2nd, 3rd and 4th convolution layers outputted from
discriminator to
evaluate Lys by comparing it to its ground truth Ys.S
This multiple feature map evaluation allows the discriminator to discriminate
the image in terms
of both the general detail content and higher detail abstraction during the
activation of the deeper
layers, thereby improving the discriminator performance. It is defined as
follows:
W,
Cony. ¨ COI1V 1
E [ ___ E WH (Dsys asys)xy _ Dsys (Gsys VIMpri)x,y))2
I.Sys"-"Pdato
X=1 y=1
(15)
where:
GM'
nSys is the feature map;
and Wi and Hi obtained by the ith convolution layer (after activation).
Advantages of the conditional synthesis GAN include: 1) the coarse tissue mask
is used as
an a priori condition to guide the accurate synthesis of the tissues, 2) the
synthetic regularization
loss is used to reduce the image noise during synthesis, and 3) the synthetic
content loss is used
to improve the detail restoration in the image synthesis.
Phase III: enhanced segmentation GAN for accurate diagnosis-related tissues
segmentation. The enhanced segmentation GAN (S'eg) includes a generator Gseg
and a
discriminator Dseg to generate an accurate diagnosis-related tissue
segmentation image Iseg , as
shown in Fig. 7 (part c). Compared to the basic SCLN-based GAN, this GAN has
following two
differences: 1) it adds a fully connected layer into the generator at the same
position as that of the
conditional synthesis GAN to introduce the synthesized image output from phase
II as a
-23-
Date Recue/Date Received 2020-12-30
condition to guide the segmentation (the synthesized image already includes
all detailed textures
of the tissues, which effectively aids the fine classification of the tissue
boundary pixels), and 2)
it adds a linear layer at the end of the discriminator to regress the size
(number of pixels) of the 4
different segmentation categories at the end of the discriminator to perform a
self-supervised
segmentation auxiliary loss. This self-supervised loss prevents the
discriminator from only
judging the segmented image based on the segmentation shape, causing the
discriminator to
extract a compensate feature from the input image to improve its
discrimination performance.
The generator with multi-class cross-entropy loss and the discriminator with
segmentation
adversarial loss can be formulated as follows:
= ince (Gseg Is Is
=CGseg y)s eg)
11=1
2Seg _E [min(0, ¨1 +Dseg (ISeg))]
Ad v (iSeg)".-Pdata
[mm (0, ¨1 ¨ Dseg (Gseg ISys)))1
T GSeg
v -1124X¨ px DSeg(GSeg I ISys))
(16)
The discriminator with self-supervised segmentation auxiliary loss can be
formulated as
follows:
I rAux nAux Seg I 'Sys 1
'Seg 1"1.Seg'''Pciatal"seg lo-'111Seg (Si I G (X)) )I
"Seg
(17)
where
Si = (51i Si2 Si3 Si41
-
is the size of the 4 segmentation categories of pixels in the
image outputted from the linear layer of the discriminator
021tux
Seg
Advantages of the enhanced segmentation GAN include: 1) the boundaries of
tissues
within synthesized images are used to guide the tissue's boundary segmentation
and 2) the self-
supervised segmentation auxiliary loss is used to improve the segmentation
adversarial.
Materials and implementation for Experimental Example 1.
A total of 280 (230 IHD and 50 normal control) patients with short-axis cine
MR images
were selected. Cardiac cine MR images were obtained using a 3-T MRI system
(Verio, Siemens,
Erlangen, Germany). Retrospectively gated balanced steady-state free-
precession non-enhanced
-24-
Date Recue/Date Received 2020-12-30