Preliminary Evaluation of a Mobile Platform for the
Non-Invasive Screening and Prevention of Diabetes
by
Kwabena Ofori-Atta
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Molecular Biology
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2020
© Massachusetts Institute of Technology 2020. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 18, 2020
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Richard R. Fletcher
Research Scientist, D-Lab
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Katrina LaCurts
Chair, Master of Engineering Thesis Committee
3
Preliminary Evaluation of a Mobile Platform for the
Non-Invasive Screening and Prevention of Diabetes
by
Kwabena Ofori-Atta
Submitted to the Department of Electrical Engineering and Computer Science on
May 18, 2020, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Molecular Biology
Abstract
Diabetes mellitus is a global health complication that has become increasingly prevalent. With
millions of individuals developing diabetic symptoms, and a similar number of individuals dying
to the disease, it is imperative that doctors and researchers develop tools that aid in diabetes
treatment and prevention to deminish the load on various global healthcare systems. Despite
advancements in treatment technologies, many of the current tools for diabetes screening are too
expensive, too prone in causing infection, or not logistically practical for use in a majority of
developing nations.
This thesis presents a deep exploration of diabetes pathogenesis and etiology, as well as
preliminary analyses of current and emerging non-invasive technologies for diabetes detection.
Evaluation methods include an image quality analysis for patient image data, a diabetes
questionnaire analysis, and the production of a semi-supervised autoencoder for patient labeling.
The exploration of diabetes pathogenesis and etiology revealed that diabetes development
can be broken down into six stages: Healthy (Stage 0), Compensation (Stage 1), Stable
Adaptation (Stage 2), Unstable Early Decomposition (Stage 3), Stable Decomposition (Stage 4),
and Severe Decomposition (Stage 5). With this biological understanding, this thesis reviews
current and emerging non-invasive technologies for diabetes screening—including infrared
thermal imaging, skin fluorescence spectroscopy, retinal and iris imaging, nail fold
capillaroscopy, pulse wave analysis, and breath analysis. The Mobile Technology Group, within
the MIT D-Lab, has designed a mobile platform that integrates several of these non-invasive
tests for diabetes—including clinical questionnaires, thermal imaging, iris imaging, retina
imaging, and finger photoplethysmography (PPG)—that can be used to predict the severity of a
patient’s diabetic condition. These technologies are part of a clinical field study that is currently
ongoing in Mumbai and Bangalore, India. This thesis presents two image data quality metrics—
blur and saturation detection—that were developed and implemented to automatically assess the
quality of image data collected in the field. The results of the analysis showed that blur detection
4
via fast Fourier transform (FFT) and via Laplacian kernel are both effective methods, with the
FFT method providing a tunable and more gradual measure of blur.
The preliminary analyses of the India study data focused on the Diabetes Questionnaire.
Since most study subjects were undergoing a form of treatment for diabetes, little correlation was
found between patient diabetic indicators—as measured by the Indian Diabetes Risk Score
(IDRS)—and patient random blood sugar (RBS) measurements. However, there is moderate
correlation between patient RBS values and IDRS values among un-medicated patients,
indicating that risk score can be used as a proxy for diabetes severity. Having used the IDRS
values to create ground truths for patient labeling, a semi-supervised autoencoder was developed
to enable scalable labeling of patient data. The autoencoder performed reasonably well, having a
class-average area under the receiver operator characteristic (AUROC) of 0.845, and a class-
average area under the precision-recall (AUPR) curve of 0.789. However, clustering methods
using dimensionality reduced patient features (derived via autoencoder, PCA, and t-SNE) were
less effective, yet the autoencoder still outperformed the controls. Since data collection is on-
going, the predictive power of the autoencoder and its dimensionality reduction functionality is
likely to improve with the addition of more patients and more measurements (i.e. retina, iris, and
thermal image scores, PPG scores, and other questionnaire data).
Thesis Supervisor: Richard R. Fletcher
Title: Research Scientist, D-Lab
6
Acknowledgements
I would first like to acknowledge my advisor, Richard Fletcher. Throughout this research
process, Dr. Fletcher has consistently guided and challenged me, pushing my work to new
heights. He is incredibly dedicated to his work, and his drive to improve global health outcomes
through innovative technologies is truly inspiring. From the Mobile Technology Group, I would
like to thank Saadiyah Husnoo for her technical guidance and direct contributions to the project.
I would also like to thank Bernardo García Bulle Bueno and Ellie Simonson for their helpful
advice throughout my research process. I would also like to acknowledge our wonderful partners
in India at S-VYASA and AJFTLE.
Finally, I would like to thank my family and friends who have shown me nothing but
unconditional love, guidance, and support throughout this entire process. I could not have not
made it to this point without them.
8
Contents
1. Introduction and Motivation ..........................................................................................15
1.1 Global Health Crisis ...............................................................................................15
1.2 The Importance of Screening Tools .......................................................................16
1.3 The Benefits of Non-Invasive Screening Tools .....................................................16
1.4 Current Work in Non-Invasive Diabetes Diagnostics ...........................................17
1.5 Scope of Thesis ......................................................................................................17
2. The Time Evolution of Diabetes and Cardiometabolic Syndrome ..............................20
2.1 Stage One: Compensation ......................................................................................21
2.1.1 Description .................................................................................................21
2.1.2 Symptoms ..................................................................................................21
2.1.3 Risk Factors ...............................................................................................22
2.1.4 Diagnostic Tests .........................................................................................23
2.1.5 Concurrent Diseases...................................................................................24
2.2 Stage Two: Stable Adaptation ...............................................................................25
2.2.1 Description .................................................................................................25
2.2.2 Symptoms ..................................................................................................26
2.2.3 Risk Factors ...............................................................................................26
2.2.4 Diagnostic Tests .........................................................................................26
2.2.5 Concurrent Diseases...................................................................................27
2.3 Stage Three: Unstable Early Decomposition .........................................................27
2.3.1 Description .................................................................................................27
2.3.2 Symptoms ..................................................................................................28
2.3.3 Risk Factors ...............................................................................................28
2.3.4 Diagnostic Tests .........................................................................................28
9
2.3.5 Concurrent Diseases...................................................................................29
2.4 Stage Four: Stable Decomposition.........................................................................30
2.4.1 Description .................................................................................................30
2.4.2 Symptoms ..................................................................................................30
2.4.3 Risk Factors ...............................................................................................31
2.4.4 Diagnostic Tests .........................................................................................31
2.4.5 Concurrent Diseases...................................................................................31
2.5 Stage Five: Severe Decomposition ........................................................................31
2.5.1 Description .................................................................................................31
2.5.2 Symptoms ..................................................................................................32
2.5.3 Risk Factors ...............................................................................................33
2.5.4 Diagnostic Tests .........................................................................................33
2.5.5 Concurrent Diseases...................................................................................33
2.6 The Effects of Diabetes Medications .....................................................................34
2.6.1 Metformin ..................................................................................................35
2.6.2 Sulfonylureas and Meglitinides .................................................................35
2.6.3 Thiazolidinediones .....................................................................................36
2.6.4 Insulin ........................................................................................................36
2.7 Discussion ..............................................................................................................36
3. Emerging Technologies and Tools for Non-Invasive Diabetes Detection ...................39
3.1 Infrared Thermal Imaging ......................................................................................39
3.2 Skin Fluorescence Spectroscopy............................................................................41
3.3 Retina and Iris Imaging ..........................................................................................42
3.4 Nail Fold Capillaroscopy .......................................................................................44
3.5 Pulse Wave Analysis..............................................................................................47
3.6 Breath Analysis ......................................................................................................49
4. Implementation of Non-Invasive Diabetes Screening Tools and Clinical Study........52
4.1 Study Design and Protocol.....................................................................................53
4.2 Available Data and Current Status .........................................................................55
5. Image Quality Analysis for Patient Image Data ...........................................................57
10
5.1 Automated Detection or Blur .................................................................................58
5.1.1 Fast Fourier Transform (FFT) Blur Metric ................................................58
5.1.2 Laplace Operator Blur Metric ....................................................................59
5.1.3 Comparing and Contrasting Metrics ..........................................................60
5.2 Automated Detection of Saturation .......................................................................64
6. Diabetes Questionnaire Analysis ....................................................................................68
6.1 Data Preprocessing.................................................................................................68
6.2 Heatmap Correlation Analysis ...............................................................................73
7. Semi-Supervised Autoencoder for Patient Labeling ....................................................80
7.1 Motivation Behind the Semi-Supervised Autoencoder and Initial Assumptions ..80
7.2 Methods..................................................................................................................81
7.2.1 Autoencoder Input Features .......................................................................81
7.2.2 Ground Truth Label Formation ..................................................................82
7.2.3 Autoencoder Hyperparameters and Architecture.......................................83
7.2.4 Dimensionality Reduction Analysis ..........................................................85
7.3 Results ....................................................................................................................86
7.3.1 Patient Labeling via Autoencoder ..............................................................86
7.3.2 Dimensionality Reduction via Autoencoder ..............................................86
7.4 Discussion ..............................................................................................................91
8. Conclusion and Future Work .........................................................................................94
8.1 Contributions of Work ...........................................................................................94
8.1.1 Exploration into the Biological Characteristics of Diabetes and Non-
Invasive Technologies to Detect Them......................................................94
8.1.2 Image Quality Metrics for the Improvement of Image-Based Predictive
Models........................................................................................................94
8.1.3 Preliminary Semi-Supervised Autoencoder for Patient Labeling ..............94
8.2 Future Work ...........................................................................................................95
8.3 Larger Impact .........................................................................................................96
11
List of Figures
2-1 The complete time evolution of diabetes (stage 1 through stage 5) and its adjacent
disorders and complications ...................................................................................34
3-1 Example infrared thermal image of the face ..........................................................40
3-2 Application of various skin fluorescence spectroscopy devices in practice ..........42
3-3 Iridology chart for both the right and left irises .....................................................44
3-4 Example of capillaroscopic alterations in a diabetic patient and a healthy subject
................................................................................................................................46
3-5 Pulse waveform schematic depicting the measured and calculated values during
pulse wave analysis ................................................................................................48
4-1 Diagram of the system architecture developed by the Mobile Technology Group
for clinical study field work regarding the evaluation of non-invasive diabetes
screening tools .......................................................................................................52
4-2 Sample screenshots of mobile applications developed by The Mobile Technology
Group to support field testing of diabetes screening tools .....................................53
5-1 Examples of patient thermal, retina, and iris images (displayed left to right) .......58
5-2 2D Laplacian kernel ...............................................................................................59
5-3 Blur metric comparison using fully gaussian-blurred images with incrementally
increasing blur strength ..........................................................................................61
5-4 Blur metric comparison using partially gaussian-blurred images with an
incrementally increasing number of blurred quadrants .........................................62
5-5 Blur metric comparison using partially gaussian-blurred images that were
gradually blurred to a fully blurred image .............................................................63
5-6 Saturation metric applied to an iris image at varied saturation levels ...................65
5-7 Saturation metric and situational corrections applied to a retina image ................66
6-1 Heatmap analysis of 29 patient features across 174 patients .................................74
12
6-2 Heatmap analysis of 29 patient features across 24 patients who have not
undergone any diabetes treatments ........................................................................76
6-3 Scatterplot depicting the correlation between RBS and IDRS among the 24
untreated patients in the 174-patient population ....................................................77
7-1 Semi-supervised autoencoder architecture ............................................................84
7-2 Division of 212 patients into training and testing datasets via 50/50 split ............85
7-3 Receiver operating characteristic (ROC) curves of the binarized multi-class
predictions of the semi-supervised autoencoder ....................................................87
7-4 Precision-recall curves of the binarized multi-class predictions of the semi-
supervised autoencoder ..........................................................................................88
7-5 Dimensionality reduction of 106 patient feature vectors via autoencoder ............89
7-6 Dimensionality reduction of 106 patient feature vectors via controls (PCA and t-
SNE) .......................................................................................................................90
13
List of Tables
6-1 Numerical conversions applied to the Diabetes Questionnaire patient data ..........69
6-2 Numerical conversions applied to features derives from Diabetes Questionnaire
patient data .............................................................................................................73
6-3 Pearson and Spearman correlation coefficients for the correlation between RBS
and IDRS among the 24 untreated patients in the 174-patient population ............78
7-1 Average silhouette coefficients of the ground truth clusters within the original
dataset and the various dimension-reduced patient representations ......................91
15
Chapter 1
Introduction and Motivation
The world, as a global community, is continuously striving for innovation and groundbreaking
research in the medical and life sciences—numerous studies within the fields of biomedical
engineering, pharmacology, systems biology, etc. have been published to display these feats of
human ingenuity. Yet despite these revolutionary discoveries, many individuals around the world
remain desperately in need of healthcare to combat various common, curable, and preventable
conditions.
1.1 Global Health Crisis
There are numerous causes for this innovation-healthcare disparity, but many of these
contributors ultimately boil down to two factors: the availability of healthcare workers, and the
cost of treatment. Firstly, many people don’t have access to healthcare. The World Health
Organization (WHO) has stated that approximately half of the world’s 7.3 billion people cannot
access essential health services[1]. This is mainly due to the overwhelming number of people who
are in need of medical assistance with respect to the number of active and accessible physicians.
About 40% of countries have fewer than 10 doctors for every 10,000 individuals[1]. The world is
estimated to have a shortage of 18 million healthcare professionals by 2030 (mainly in lower-
income countries)[1].
Secondly, the cost of healthcare is becoming increasingly unmanageable for both citizens
and governmental bodies. In 2010, over 800 million people worldwide spent at least 10% of their
household budget on healthcare, and nearly 100 million people worldwide fell below the poverty
line as a result of their healthcare spending[1]. On a larger scale, the United States spent over
$10,000 on healthcare per capita in 2017 (the most of any other country), with 20 other countries
16
spending over $3,000 per capita the same year[2]. With these high healthcare costs, it’s incredibly
difficult for countries to manage a standard quality of healthcare for everyone, and this burden is
especially felt in rural and developing nations.
1.2 The Importance of Screening Tools
In order to combat the complications of traditional healthcare, more widespread, cost-effective
methods of disease screening are emerging. These screening methods tend to be non-invasive
measurements and visualization that are often more readily accessible than physicians. These
diagnostic tools allow individuals to obtain a preliminary metric that informs them of their
potential disease state, as well as whether or not to seek further medical assistance/care. If used
appropriately, these screening tools can help individuals in need seek medical assistance in a
timely manner, or even instruct individuals on how to prevent certain medical conditions from
even occurring. Most importantly, it will allow physicians to tend to patients whom are at high
risk levels (rather than need to examine every potential patient)—ultimately this would decrease
the intense burdens on various global healthcare systems.
1.3 The Benefits of Non-Invasive Screening Tools
While massively scaled screening tools are useful for identifying which individuals are in need
of treatment and targeted health education, there are many practical reasons that inhibit the
widespread use of screening tools. For tests that traditionally require biological specimens—such
as blood, urine, or sputum—these specimens must be collected, labelled, and transported to a
laboratory facility at another location. In addition, systems for recoding and tracking patient
medical records are needed in order to ensure that each patient receives their results.
Furthermore, due to poor supply chains, tests and materials are often in short supply or out of
stock. Since stable health infrastructure are lacking in many low-resource regions around the
world, screening tests that require biological specimens have presented a great challenge for
public health.
As an alternative, new technologies are emerging that enable other methods of
diagnosing certain health conditions non-invasively. While most of these methods do not possess
the sensitivity and specificity of a biochemical laboratory test, these new methods enable simpler
17
and scalable screening for disease. In general, non-invasive tests are faster to perform, give
immediate results, and don’t require any consumable supplies or materials. These technologies
thus represent a significant step forward in the surveillance and management of disease.
1.4 Current Work in Non-Invasive Diabetes Diagnostics
Much work has been done to explore non-invasive diagnostic systems for diabetes due to its
global prevalence. Diabetes affects people around the world, with the number of diseased
individuals rising more rapidly in low- and middle-income countries. According to the WHO,
diabetes was the direct cause of 1.6 million deaths each year, and the global prevalence of
diabetes in adults has risen from 4.7% in 1980 to 8.5% in 2014[3]. As the number of diseased
individuals increases, so will the global cost of diabetes-related medical care. By 2030, the
global cost of diabetes is projected to rise to an all-time maximum of 2.2% of global GDP[4].
The Mobile Technology Group, headed by Dr. Fletcher, has developed numerous low-
cost, non-invasive tools to improve clinical decisions around various global diseases—some
tools include peak flow meters for detecting pulmonary disorders, mobile games to monitor
mental health, and imaging algorithms to screen for infectious diseases[5]. One of the most
significant ventures that the Mobile Technology Group has made in the field of diabetes research
is their development of a mobile application which allows individuals to screen themselves for
diabetes severity. Even though the application is still in development, there have been great
strides for producing predictive models using non-invasive measurements that are cost-effective
and accessible for a majority people around the world[5][6].
1.5 Scope of Thesis
The content of this thesis is focused on exploring the intricacies of diabetes and non-invasive
diagnostic/screening tools for diabetes. This thesis will also address in-depth analyses of patient
data (collected by clinicians for the use of predictive model training), and methods of
cleaning/preparing patient data—presented analyses and methodologies are intended to improve
model training and overall predictive power of machine learning algorithms associated with the
Mobile Technology Group’s diabetes screening mobile application. Chapter 2 of this thesis
explains the time-based pathology and etiology of diabetes and other concurrent disorders. In
18
Chapter 3, various emerging and common non-invasive metrics for diabetes screening are
presented and analyzed for their efficacy. Chapter 4 describes the study design and protocol for
the mobile application as well as its current status. Chapter 5 explores metrics to assure quality
control of patient image data. In Chapter 6, patient data from the Mobile Technology Group’s
Diabetes Questionnaire is assessed for its correlation to patient blood sugar levels (a common
metric for diabetes screening). Chapter 7 discusses the use of a semi-supervised autoencoder for
the diabetic severity labeling of patient data. Chapter 8 discusses conclusions derived from the
all analyses, and future work aimed to improve the current mobile application for diabetes
screening.
20
Chapter 2
The Time Evolution of Diabetes and Cardiometabolic
Syndrome
Within the past decade, a significant rise in chronic diseases—such as diabetes, hypertension,
and obesity—has been observed in both industrialized and developing nations alike. With the
influx of these maladies, there is also a concurrent influx of cardiometabolic syndrome (CMS),
which is the umbrella condition that includes all these diseases[7]. CMS is a combination of
multifactorial diseases spanning maladaptive cardiovascular, renal, metabolic, prothrombotic,
and inflammatory abnormalities and dysfunctions[8]. The syndrome is mainly characterized by
insulin resistance, impaired glucose tolerance, dyslipidemia, high blood pressure, non-alcoholic
fatty liver disease, and central adiposity[9][10]. The condition of CMS continues to advance as a
threatening disease, and it has already been recognized as an entity by the World Health
Organization and the American Society of Endocrinology[9]. In order to combat the diffusion of
the disease, it is imperative to understand how CMS manifests, and the many factors which
influence its intensity.
One of the most common diseases associated with CMS and its complications is diabetes.
Diabetes was the seventh leading cause of death in the United States in 2017, and 1.5 million
Americans are diagnosed with diabetes every year[11][12]. Despite being a well-known and
researched disease, diabetes is often studied as an isolated illness. Diabetes progression develops
concurrently with numerous other biological conditions all under the CMS umbrella; the
etiologies of each of these unique conditions are intertwined. Revealing the time-varying
connections between diabetes and other cardiometabolic conditions may unveil new methods of
preventing and treating diabetes, concurrent diseases, and CMS as a whole.
21
Diabetes is strongly linked to the body’s management of insulin and blood sugar levels.
This regulation is completed via the islets of Langerhans within the pancreas. The most relevant
portion of the pancreatic islets, related to the development of diabetes, is the beta cell mass. The
beta cells are responsible for secreting insulin into the circulatory system after sensing an
increase of glucose[13]. The onset of diabetes is closely linked to abnormalities within the
function of beta cells, and the severity of diabetes grows with the decline of beta cell function.
Due to this, diabetes severity can be tracked by the presence or absence of specific metabolic
processes. There are five major stages within diabetes pathogenesis, with stage zero being a
healthy individual.
2.1 Stage One: Compensation
2.1.1 Description
The onset of diabetes actually begins with a slightly different precursor disease known as
prediabetes—this condition accounts for the first three stages of diabetes. Stage one is known as
Compensation[14]. During this stage, an individual will move from a healthy state, to one where
insulin resistance begins to manifest.
Due to the manifestation of insulin resistance, the beta cells within the pancreas will
increase the amount of insulin released in response to blood glucose, causing a spike in acute
insulin response (AIR)[14]. This is done by increasing the number of beta cells and/or the size of
each beta cell in the pancreas[14][15]. The increase in AIR reflects the compensatory measure the
body takes in order to counteract insulin resistance during the Compensation stage; the increase
in insulin is generally able to maintain normal blood glucose levels despite the developing
resistance. As a result of these metabolic processes, numerous symptoms may occur. For
instance, the beta cells may overcompensate when releasing increased levels of insulin, causing
blood glucose to deplete and inducing hypoglycemia—usually occurring 2-3 hours after a meal
when beta cells are most active[16].
2.1.2 Symptoms
With respect to beta cell insulin production, the increased levels of insulin in the Compensation
stage may induce hyperinsulinemia as well, driving an individual to exhibit symptoms of both
22
hyperinsulinemia and prediabetes. The following conditions are symptoms of hyperinsulinemia:
weight gain, strong cravings for sugar, intense feelings of hunger or frequent feelings of hungry,
anxiety, a lack of concentration/motivation, and fatigue[17]. Some of these symptoms—such as
weight gain, cravings for sugar, and intense and/or frequent hunger—are directly linked to
eating. High insulin levels would lead to low blood sugar levels, resulting in a need to increase
blood sugar levels via dietary consumption. Similarly, other symptoms like anxiety, a lack of
concentration/motivation, and fatigue are linked to energy storage and energy depletion. Since
blood sugar levels would be too low to supply sufficient energy, various tissues in the body
wouldn’t receive enough nutrients to operate naturally and efficiently. Regardless, the presence
of these symptoms generally go unseen, and many symptoms are difficult to connect solely to
prediabetes given their prevalence in various other ailments.
2.1.3 Risk Factors
Insulin resistance may result from various factors. Individuals can be predisposed to resistance
through genetics, or certain lifestyle behaviors can influence the induction of the condition. It has
been postulated that free fatty acid metabolites, created from breaking down fatty acids, can
interfere with downstream insulin signaling[18]. Likewise, the dysfunction of certain surface
protein complexes, or the phosphorylation of specific intracellular proteins, may hinder insulin
signal transduction or lead to reduced insulin receptor expression[18]. Even mitochondrial
dysfunction may contribute to insulin resistance, triggering the activation of several serine
kinases and weakening insulin signal transduction[18].
It is likely that one of the most common triggers for developing insulin resistance in
diabetes—outside of genetics—is a result of free fatty acid metabolites. This conclusion ties the
manifestation of prediabetes to its risk factors. The risk factors of the Compensation stage of
prediabetes are the following: a family history of diabetes, an increase BMI, a waist size greater
that 40 inches (men) or 35 inches (women), an age of 45 years and older, ethnic minorities
(African-American, Hispanic, Native American, Asian American, Pacific Islander), a history of
smoking, general inactivity, sleeping problems or sleep disorders, increased triglyceride levels,
decreased HDL-cholesterol levels, high blood pressure (hypertension), and a history of one or
more vascular diseases[19]. From this list, it’s clear that some of the most common risk factors for
developing prediabetes are habits and conditions which increase the number of free fatty acids in
23
the body—the other factors being genetic, resulting in a genetic cause of insulin resistance
manifestation in those cases.
At times in which the body has excess amounts of blood glucose—possibly due to a large
meal with minimal energy consumption following—the sugar is rarely dispelled from the body.
As a precious source of energy, unused glucose is stored in the body, often being converted into
glycogen, triglycerides, and also free fatty acids by the liver. When this energy source must be
used (i.e. in times of starvation), these macromolecules are broken down by various metabolic
processes to catalyze reactions, ultimately creating the metabolites. The abundance of the free
fatty acid metabolites are part of what block insulin signaling, initiating the Compensation stage.
2.1.4 Diagnostic Tests
Since blood glucose levels do not chance in the Compensation stage, and direct symptoms are
difficult to perceive for prediabetes, there are no formal tests to determine if an individual is in
this stage. Nevertheless, the Compensation stage is marked by an increase in insulin
(hyperinsulinemia) that can be measured via simple blood tests. Having plasma insulin levels
higher than 2 µU/mL, as well as a serum glucose concentration that is less than 60 mg/dL, is
indicative of having hyperinsulinemia[20]. Nevertheless, clearly defined and elevated insulin
levels are not always present in a state of hyperinsulinemia, especially at the time of
hypoglycemia. Therefore, the detection of suppressed beta-hydroxybutyrate (less than 1 µmol/L)
in conjunction with low levels of free fatty acids (less than 1 µmol/L) during a period of
hypoglycemia may also be necessary to indicate hyperinsulinemia[20]—however, these
alternative conditions are less likely to be observed in prediabetes specifically due to the strong
contribution that free fatty acids metabolites have in the manifestation of diabetes.
Throughout the manifestation of prediabetes, individuals may develop cardiovascular
diseases (CVD) as well. Most of these concurrently developing diseases involve complications in
blood vessel integrity. These conditions can therefore be monitored using CVD diagnostic tests.
The progression and severity of CVD is linked to the progression and severity of diabetes, so it’s
beneficial to accompany diabetes diagnostic tests with measures marking CVD progression—
some CVD diagnostic tests already used for analyzing diabetic states are infrared/thermal
imaging and skin fluorescence spectroscopy[21][22].
24
2.1.5 Concurrent Diseases
Prior to the Compensation stage, inflammation may develop in and around the adipose tissues of
the body[23]. As adipose cells grows in mass, they recruits more immune cells within their
tissues[24]. Both the adipose and immune cells synthesize and secrete proinflammatory
adipokines, cytokines, and chemokines which produce the aforementioned inflammation[23].
These proinflammatory compounds activate cellular pathways which lead to insulin resistance—
this means that inflammation is highly correlated to abnormal insulin signaling[25]. The insulin
resistance will then result in producing higher levels of blood glucose to be turned into fat,
growing adipose tissue mass and producing even more inflammation[23][24]. This inflammation
response is strongly related to the risk factors of the Compensation stage given that adipose
tissue grows in mass with increased triglycerides levels; therefore, inflammation can precede
prediabetes or both conditions can develop simultaneously.
To further support this claim, studies have shown that in conditions of hyperinsulinemia,
individuals may show early signs of atherosclerosis—however, atherosclerosis is not guaranteed
to manifest during this stage of diabetes[26]. Atherosclerosis is one of the major vascular diseases
triggered by prediabetes, and it is characterized by the hardening of arteries due to plaque
buildup within the arterial walls—these plaques being composed of fat, cholesterol, calcium, and
other substances within the blood[27]. In a homeostatic environment, insulin is involved in the
activation of endothelial nitric oxide synthase (eNOS) which subsequently produces nitric
oxide[26]; NO dilates the blood vessels, relaxing them and allowing for improved blood
flow[26][28]. Ultimately, this process prevents atherosclerosis by preventing the arterial walls from
thickening. However, hyperinsulinemia promotes the down-regulation of the Akt/PKB signaling
pathway within endothelial cells by overstimulating the insulin receptors, leading to insulin
resistance. This leads to less eNOS activation and nitric oxide production, promoting the
hardening of blood vessel tissue and the initiation of atherosclerosis[26].
Since atherosclerosis can affect any artery in the body, there are various related diseases
that may develop from this condition, such as the following: ischemic heart disease (coronary
heart disease/coronary artery disease), carotid artery disease, peripheral artery disease, and
chronic kidney disease[27]. Likewise, a complete blockage of an artery, due to atherosclerosis
plaque buildup, could result in a heart attack or stroke[27]. Once atherosclerosis develops, all
subsequent vascular diseases are no longer directly influenced by specific stages of diabetes, and
25
therefore cardiovascular pathogenesis begins to proceed separately. Nevertheless, the increased
severity of diabetic symptoms proportionally increases one’s risk of developing vascular damage
and/or CVD—as well as the rate at which current cardiovascular complications advance—as will
be described in the future stages of diabetes.
Insulin resistance will continue to develop if there is no intervention during the
Compensation stage. Once insulin resistance grows to a point where beta cells function can no
longer fully compensate, the prediabetes disease progresses to stage two: Stable Adaptation[14].
2.2 Stage Two: Stable Adaptation
2.2.1 Description
The Stable Adaptation stage is marked most notably by a gradual increase in blood glucose
above a normal level. This stage is also marked by a slight decrease in AIR. The decline in
insulin production/secretion can arise from various causes that can be linked to genetic and
environmental forces. In some cases, the immune system reacts to the influx of insulin being
produced by the beta cells. The immune system then attacks the beta cells, slowly destroying
them and inducing type 1 diabetes[29]. Over time, beta cell mass will deplete and the body will be
unable to naturally produced sufficient levels of insulin, making the individual completely
dependent on an outside source of insulin; however, this would take place in latter stages of the
disease. The autoimmune response against the beta cells may be related to the body’s
autoimmune response against cancerous cells—cancerous beta cells can produce insulin in
chaotic and abundant quantities, similar to how beta cells in stage one behave. Destroying the
beta cells would decrease the amount of insulin being produced, and therefore diminish the AIR.
Similarly, decreasing insulin production would increase blood glucose. Nevertheless, the Stable
Adaptation stage is not always triggered by an autoimmune response.
Beta cells may become less responsive to high levels of blood glucose[30]—similar to how
the various cells of the body become less responsive to insulin throughout stage one and two.
Due to this glucose resistance, the beta cells would not produce as strong of a glucose-stimulated
insulin response, leading to less insulin secretion. Progression of this decrease in beta cell
activity will lead to type 2 diabetes. Over time, the body will stop producing healthy amounts of
insulin, but the individual will not be completely dependent on an outside source of insulin.
Depending on the method of beta cell decline, the Stable Adaptation stage can be fairly brief or
26
last a lifetime. This stage of prediabetes remains stable as long as insulin production prevents
sharp rises in blood glucose.
2.2.2 Symptoms
The main symptoms of the Stable Adaptation stage are the increase in blood glucose and the
decrease in AIR mentioned previously. Depending on the extent of which insulin production has
decreased, one may still experience symptoms of symptoms the Compensation stage of
prediabetes, including hyperinsulinemia. As blood glucose increases and insulin production
decreases, these symptoms should subside—if they are present—and symptoms of
hyperglycemia may begin to appear. These symptoms include the following: increased thirst
and/or hunger, frequent urination, sugar in the urine, headache, blurred vision, and fatigue[31].
Nevertheless, the strength of these symptoms should be weak within this stage. The symptoms of
hyperglycemia and hyperinsulinemia (symptoms from the Compensation stage) are somewhat
similar, mainly because both are related to dysglycemia—abnormal blood glucose.
2.2.3 Risk Factors
Given that the Stable Adaptation stage is a stage within prediabetes, the same risk factors
mentioned in stage one will apply as risk factors for this stage too. The only additional risk factor
would be the presence of symptoms related to hyperinsulinemia, given that hyperinsulinemia
was a symptom of the prior stage.
2.2.4 Diagnostic Tests
Since the blood glucose level increases past the normal range during this stage, there are various
tests to measure glucose metabolism in order to check whether one is in this stage of prediabetes.
However, normal blood glucose can be defined in a variety of ways, especially since glucose
levels can vary significantly within short periods of time due merely to the nature of the various
metabolic processes taking place. Nevertheless, there are three major methods of measuring
blood glucose levels which are effective when diagnosing stages of diabetes: average blood
glucose via the A1C test, fasting plasma glucose (FPG) via the FPG test, and two-hour postload
glucose via an oral glucose tolerance test (OGTT)[32]. The A1C test measures average blood
glucose levels over 2-3 months by analyzing the percentage of glycated hemoglobin in
27
circulation over that time period. The FPG test measures the concentration of glucose in the
blood after an eight-hour period of avoiding food. Lastly, the OGTT examines the concentration
of blood sugar remaining in the blood over a two-hour period, after ingesting 75g of sugar orally.
Each test can be used to establish a healthy baseline, as well as track how one progresses through
the stages of diabetes.
In stage one, an individual maintains normal glucose levels, meaning an A1C of less than
5.7%, a FPG of less than 5.6 mmol/L (100 mg/dL), and an OGTT of less than 7.8 mmol/L (140
mg/dL)[32]. However, in stage two, blood glucose levels rise to slightly abnormal levels: A1C of
approximately 5.7%, FPG of approximately 5.6 mmol/L (ranging between 5.0 and 6.1 mmol/L
(89–110 mg/dL))[14][33], and OGTT of around 7.8 mmol/L[32].
2.2.5 Concurrent Diseases
In this stage, it is unlikely that any new conditions manifest, especially since the prediabetic
stages do not exhibit strong symptoms. Nevertheless, the inflammation which manifested
during/prior to stage one may spread to more areas of the body throughout the Stable Adaptation
stage. Also, dysglycemia may worsen any cardiovascular diseases obtained up to this point—
high concentrations of sugar and/or insulin can damage blood vessels[26][35]. These conditions
develop further as individuals approach stage three: Unstable Early Decomposition[14].
2.3 Stage Three: Unstable Early Decomposition
2.3.1 Description
The Unstable Early Decomposition stage is the final stage of prediabetes[14]. It is most notably
marked by a sharp, rapid increase in blood glucose, and an even further decline in insulin
production. Blood glucose begins to increase uncontrollably because beta cell decline has passed
a critical point. During this stage, impaired fasting glucose (IFG) and impaired glucose tolerance
(IGT) noticeably manifest. IFG is defined as having fasting glucose which is well above a
normal level (5.6 mmol/L). IGT is related to general insulin resistance, and the body’s impaired
ability to handle increased glucose in the blood. This stage is also associated with an increased
risk of cardiovascular pathology due to how glucose affects blood vessels[33]. Given by the name,
Unstable Early Decomposition is generally an unstable, transient stage because blood glucose
28
tends to shift drastically[14]. Glucose levels could easily increase to a diabetic level, yet changes
in one’s lifestyle may reverse beta cell decline and revert the disease progression back to a
previous prediabetic stage.
2.3.2 Symptoms
Unlike the Stable Adaptation stage, the blood glucose level in the Unstable Early Decomposition
stage is well above normal. This means that symptoms of hyperglycemia are highly likely to be
experienced, especially for individuals with blood glucose levels approaching the range of
diabetic levels. These would be the same hyperglycemic symptoms mentioned in the Stable
Adaptation stage, but to a stronger degree. The presence of more long-term hyperglycemia
symptoms may also appear, including fruity-smelling breath, nausea and vomiting, shortness of
breath, dry mouth, muscular weakness, and abdominal pain[34]. However, these symptoms should
be weak, if at all present, given the transience of the stage.
2.3.3 Risk Factors
The risk factors of the Unstable Early Decomposition stage are the same risk factors of the
previous two stages of prediabetes. Nevertheless, symptoms of hyperglycemia may signal the
end of the Stable Adaptation stage, which would in turn be an indicator for disease progression
into stage three.
2.3.4 Diagnostic Tests
This stage is demarcated by an interval of blood glucose between the upper limit of normal blood
glucose levels and the lower limit of diabetic glucose levels. Nevertheless, there are different
standards for defining this interval with regards to fasting glucose. By the World Health
Organization’s (WHO) criteria, individuals in this stage would have a FPG level between 6.1
mmol/L (110 mg/dL) and 6.9 mmol/L (125 mg/dL). By the American Diabetes Association’s
(ADA) criteria, individuals in this stage would have a FPG level between 5.6 mmol/L (100
mg/dL) and 6.9 mmol/L (125 mg/dL)[33].
Besides measuring FPG, this stage can also be classified by the A1C test and OGTT. An
individual would be in this stage if A1C levels are between 5.7% and 6.4% on two separate tests.
Similarly, an individual would be in this stage if one’s postload plasma glucose is between 7.8
29
mmol/L (140 mg/dL) and 11.0 mmol/L (199 mg/dL) after a two-hour period[32]. The combination
of multiple tests would more accurately diagnose this stage, however individuals are rarely found
in this stage clinically due to its transience[14].
2.3.5 Concurrent Diseases
As mentioned before, the spike in blood glucose warrants a high possibility of developing
cardiovascular disorders during this stage. Atherosclerosis is among these cardiovascular
disorders, however its development in the Unstable Early Decomposition stage can differ from
that in the Compensation stage. Atherosclerosis is commonly triggered by oxidative stress from
reactive oxidative species (ROS), and hyperglycemia can increase ROS formation through a
multitude of mechanisms[35]. For instance, ROS can deplete nitric oxide, increasing arterial
stiffness[28]. Also, ROS can cause the oxidative modification of low density lipoprotein and also
endothelial dysfunction, thereby promoting a vascular inflammatory response[36]—this
inflammation is with respect to the arterial walls, which is separate from inflammation stimulated
by adipose tissue. This vascular inflammation is what promotes blood vessel stiffening.
Besides promoting ROS production, the presence of ambient glucose in the setting of a
hyperglycemic state can stimulate the glycosylation of free amino groups in proteins, lipids,
and/or nucleic acids within blood vessel walls and adjacent tissues. These glycosylation products
rearrange over time to form irreversible end products that accumulate in and around the arterial
walls. These advanced glycation end products (AGEs) advance atherosclerosis and tissue
damage through a variety of mechanisms[35]. Similarly, the high concentrations of glucose, as a
result of hyperglycemia, can activate protein kinase C (PKC). One of the functions of PKC is
upstream regulation of a growth factor for the extracellular matrix. Overstimulating PKC will
result in the thickening of capillary basement membranes, leading to atherosclerosis[35].
As previously mentioned, the Unstable Early Decomposition stage marks the end of the
prediabetes. Without intervention, insulin production will continue to decrease and blood glucose
will continue to increase. As the stage rapidly advances, the symptoms and associated
complications of diabetes will emerge, leading to the fourth stage of diabetes: Stable
Decomposition[14].
30
2.4 Stage Four: Stable Decomposition
2.4.1 Description
This stage marks the beginning of the diabetes disease, where symptoms commonly associated
with diabetes begin to show unambiguously[14]. Individuals in the Stable Decomposition stage
are nearing beta cell failure. Nevertheless, the beta cells are still able to produce enough insulin
to avoid ketoacidosis[14]. When not enough insulin is being produced, glucose can no longer be
taken in by the cells, and the body must rely on ketone bodies (produced from fat) for energy.
Ketoacidosis is caused when the body starts breaking down fat at a rate that is too fast[37].
An individual can remain in this stage for the rest of their lifetime because the severity of
hyperglycemia reaches a stable plateau; nevertheless, hyperglycemia still has the potential to
cause various other disorders, and its likelihood of doing so increases with time. The length of
this stage is mostly dependent on beta cell survival. In the case of type 2 diabetes, beta cells are
not in danger of being destroyed. Even though hyperglycemia decreases the glucose sensitivity
of beta cells, they will continue to produce enough insulin to prevent ketoacidosis. However,
beta cell mass may decrease very slowly over time due to apoptosis[14]. In the case of type 1
diabetes, the immune system is continuously attacking and destroying the beta cells. An
individual with type 1 diabetes could rapidly progress through stage four as their beta cell mass
depletes and insulin production halts[14].
2.4.2 Symptoms
The major symptoms of diabetes begin to appear during this stage, which include the following:
tingling, numb, or painful sensations in the hands or feet, slow healing cuts and wounds, patches
of dark skin, itchy skin/yeast infections, and symptoms of hyperglycemia (mentioned in stage
two)[38]. Some of these symptoms are related to the complications which may develop as a result
of diabetes—poor blood circulation can lead to the skin and nerve conditions that are alluded to
through these symptoms.
31
2.4.3 Risk Factors
The main risk factor of Stable Decomposition is prediabetes since prediabetes must precede
diabetes. Similarly, the risk factors of the three stages of prediabetes would also be the risk
factors of the Stable Decomposition stage.
2.4.4 Diagnostic Tests
The metric for a diabetic fasting blood glucose level is consistent with both the WHO and the
ADA. Individuals in this stage would have a FPG level of 7.0 mmol/L (126 mg/dL) or greater[32].
An individual would also be classified in this stage if they have A1C levels of 6.5% or greater on
two separate tests, and/or a postload plasma glucose level of 11.1 mmol/L (200 mg/dL) or
greater after a two-hour period[32]. As previously mentioned, the combination of multiple tests
would more accurately diagnose this stage.
2.4.5 Concurrent Diseases
The Stable Decomposition stage of diabetes is mainly characterized by its lack of concurrent
disease complications, so there are not many external disorders that are linked to this stage.
However, this does not mean that existing complications will not increase in severity. An
individual in this stage would still have conditions from previous stages, such as hyperglycemia.
Likewise, atherosclerosis, which is promoted by hyperglycemia, may start or continue to
progress in severity. Other cardiovascular complications may occur once atherosclerosis
develops in multiple regions of the vasculature. Once the advancement of vascular damage
triggers the onset of other CVD conditions, the Stable Decomposition stage progresses to the
final stage of diabetes: Severe Decomposition[14].
2.5 Stage Five: Severe Decomposition
2.5.1 Description
The Severe Decomposition stage can be classified as stage four diabetes with added
complications. In this stage, various disorders connected to diabetes may manifest, and/or the
severity of diabetes may reach the critical point. Individuals often become ketotic in this stage,
meaning they are undergoing ketoacidosis and their blood is becoming increasingly acidic[14][37].
32
If an individual is ketotic, it is likely that their beta cells are depleted to a point where they are
completely dependent on outside sources of insulin for glucose-based energy production[14]. This
stage often occurs after a long period of time in the Stable Decomposition stage, where
conditions like atherosclerosis reach various parts of the body—the major complications of stage
five arise when blood vessel walls harden in different places.
2.5.2 Symptoms
During the Severe Decomposition stage, one will experience symptoms of stage four diabetes
along with potential symptoms of various other complications. Some of the major complications
include the following: cardiovascular disease, nephropathy, retinopathy, neuropathy, and
periodontitis[39][41]. Some of the other symptoms that may occur during this stage are high ketone
levels in one’s urine, sexual complications, high blood pressure, high cholesterol, and strokes[38].
The connection between diabetes and these other diseases can be observed when
analyzing how these major complications arise. Cardiovascular disease is an umbrella disease for
various heart disorders related to diseased vessels, structural problems, and blood clots[40]. The
atherosclerosis that develops from diabetes will directly cause damage to vessel walls, so it
directly influences the manifestation of cardiovascular disorders[39]. Nephropathy involves
damage to the kidneys, which are organs responsible for filtering waste out of the blood.
Diabetic symptoms like high blood pressure, and damage to blood vessel walls through plaque
buildup, around filtering processes will lead to this kidney dysfunction[39]. Retinopathy refers to
the damaging of the retina, which can be caused by blood vessel damage in the eye. Diabetic
retinopathy (DR) is caused by atherosclerosis in and around ocular vessels, leading to blindness
or other ocular disorders[39]. Neuropathy refers to nerve damage, which may occur in the body’s
extremities (hands and feet). Diabetes causes poor blood circulation, leading to the death of
nerve cells. If left untreated, these extremities may develop sores and infections that will need
eventual amputation[39]. Periodontitis refers to infections of the gums and bones which secure the
teeth in one’s mouth. Diabetes may cause the gums to become inflamed, dark spots or holes to
appear in your teeth, and painful oral complications due to high glucose levels and poor blood
circulation[41].
33
2.5.3 Risk Factors
Similar to the Stable Decomposition stage, the main risk factor of Severe Decomposition is
prediabetes since prediabetes must precede diabetes. However, the presence of diabetic
symptoms for a prolonged period of time can also be considered a risk factor of the Severe
Decomposition stage.
2.5.4 Diagnostic Tests
The metrics for the classification of the Severe Decomposition stage are the same as those for the
Stable Decomposition stage, mainly because the diabetic complications of stage five diabetes can
arise without major lifestyle changes from stage four diabetes. Therefore, Individuals in this
stage would have a FPG level of 7.0 mmol/L (126 mg/dL) or greater[32]. In the same way, an
individual would be in this stage with A1C levels of 6.5% or greater on two separate tests, and/or
a postload plasma glucose level of 11.1 mmol/L (200 mg/dL) or greater after a two-hour
period[32]. Nevertheless, if an individual is ketotic, they may experience spikes in blood glucose
that are magnitudes greater than the lower limits of these tests—this would be indicative of a
stage five diabetic condition. Along with these tests, any qualitative or quantitative diagnostic
tests which measure the severity of diabetes’ complications (cardiovascular disease,
nephropathy, retinopathy, neuropathy, periodontitis, etc.) can be used in conjunction to
accurately diagnose this stage of diabetes.
2.5.5 Concurrent Diseases
The various complications mentioned throughout the symptoms of stage five diabetes account
for the various diseases that may occur in coordination with this stage. Many of these diseases
result in permanent damage to the body and/or death.
34
Figure 2-1. The complete time evolution of diabetes (stage 1 through stage 5) and its adjacent disorders
and complications.
The Severe Decomposition stage concludes the progression of the diabetes disease
(Figure 2-1), providing some insight into the biological mechanisms of CMS. Diabetes is a major
factor of CMS, along with many other conditions. From analyzing the stages of diabetes, it is
clear that these diseases are linked to each other. Even though each condition under CMS has the
potential to manifest on its own, the continued progression of one condition increases the
probability of other CMS-related conditions manifesting within the body—as seen through the
lens of diabetes.
2.6 The Effects of Diabetes Medications
The use of diabetes medications can complicate the task of screening for the disease. Diabetes
treatments and medications are designed to suppress diabetic symptoms—initially on a
molecular scale, to eventually develop into a macroscopic phenotypic change over time.
Therefore, various tools, technologies, and methods designed to detect diabetic symptoms may
fail due to the medications that a patient is taking. Prior knowledge of diabetes medications can
enable clinicians and researchers to anticipate and adapt to unexpected complications when
35
employing specific diabetes diagnostics tests. This section highlights some of the most common
diabetes medications and their molecular effects within the body.
2.6.1 Metformin
Metformin is one of the first medications to be prescribed to type 2 diabetes patients. The drug
acts on the liver, slowing down its production of glucose[42]. The liver stores excess blood sugar
in the form of glycogen; this activity in promoted by high levels insulin levels and low glucagon
levels. As previously mentioned, insulin levels can fall as diabetes progresses, sending the body
the false signal that it is in a period of starvation. This triggers the release of glucagon and
subsequently causes the liver to release glucose into the blood. Since this was a only false signal
caused by low insulin levels due to beta cell decline, the liver’s release of glucose can cause
blood sugar levels to rise above a tolerable threshold—this eventually leads to the various
diabetic complications previously discussed.
Metformin hinders the liver’s ability to produce glucose from glucagon, preventing blood
sugar levels from increasing and causing vessel and tissue damage. With respect to diagnostic
tests, this medication would cause short-term blood test measurements to produce misleading
results. Diagnostic tests that measure tissue and vessel damage would still be effective depending
the extend of the damage prior to treatment, and the amount of time that the patient has been
taking metformin; minor damages may not be detectable and extensive therapy time can reverse
diabetes progression, bringing the patient closer to a healthy state.
2.6.2 Sulfonylureas and Meglitinides
These types of medications are designed to increase the secretion of insulin within the body[42]. A
boost in insulin secretion can combat the effects of beta cell decline. The additional insulin
would assist in overpowering any insulin resistance that has developed throughout the
progression of diabetes, enabling he uptake of blood glucose. Nevertheless, these medications
would be ineffective if not coupled with healthy lifestyle choices. High blood sugar, often caused
by high carbohydrate diets, will cause insulin to be released; the repetitive use of insulin would
only further promote insulin resistance, rendering sulfonylureas and meglitinides ineffective.
36
Although it’s through an indirect mechanism, these medications are designed to control
blood sugar levels. Nevertheless, sulfonylureas operate on a slower time-scale with respect to
meglitinides. It takes a longer time for sulfonylureas to change pancreatic activity, but the effects
are longer lasting with respect to meglitinides. This means that, short-term blood test
measurements may produce meaningful results for patients taking meglitinides as long as a few
days have passed since the patient’s last dose. Short-term blood test measurements are less likely
to be effective when examining patients on sulfonylureas.
2.6.3 Thiazolidinediones
These medications are designed to make bodily tissues more sensitive to insulin[42]. This directly
targets the problem of developing insulin resistance in diabetes progression. This medication
would be effective as long as the beta cells are still able to produce a healthy level of insulin.
However, these medications are linked with serious side effects like an increased risk of heart
failure and anemia. As with other medications, patients using thiazolidinediones may produce
misleading results for blood tests.
2.6.4 Insulin
Often as a last resort, or as a part of type 1 diabetes treatment plans, a direct supply of insulin is
taken intravenously to compensate for the lack of insulin being produced by the beta cells in the
pancreas[42]. There are various types of insulin that can have different effects on the body. In
general, type 2 diabetes patients begin by taking one long-lasting insulin shot per day. This
means that blood tests may potentially produce meaningful results given a few days have passed
since the last insulin injection.
2.7 Discussion
This breakdown of diabetes and diabetes medications should assist with targeting and/or
diagnosing CMS, along with addressing how some CMS disorders can beget others.
Understanding the connectivity of these disorders would allow researchers to approach
pharmaceutical remedies and therapies for life-threatening diseases—like heart disease,
atherosclerosis, and diabetes—from a completely new perspective. With future research, a
37
complete mapping of CMS could be developed; such a finding may even give rise to innovative
treatment plans that tackle all these disease complications at once and successfully impede the
threat of CMS altogether.
39
Chapter 3
Emerging Technologies and Tools for Non-Invasive
Diabetes Detection
As described previously, diabetes mellitus is one of the world’s most common and deadly
diseases—it’s estimated that a person dies every seven seconds due to diabetes or its
complications[12]. As a common and deadly disease, diabetes is heavily researched in order to
reveal new methods of preventing and treating the condition. However, remedies for diabetes are
ineffective for individuals who have the condition, but have not yet been diagnosed.
Unfortunately, many communities do not have the healthcare infrastructure necessary to
complete standard diagnostic tests for prevalent medical conditions like diabetes mellitus[43]. In
these regions, traditional tests may be too expensive for recurrent use, and the invasiveness of the
testing procedures can create a consequential risk of infection[44][45].
To match the healthcare needs of these developing nations, it’s imperative to explore
diabetes screening and diagnostic methods that are effective, inexpensive, and non-invasive.
Despite traditional blood tests like the A1C test, fasting plasma glucose (FPG) test, and oral
glucose tolerance test (OGTT) being considered the gold standard for diabetes diagnosis[39], there
exist a plethora of emerging technologies and tools that are capable of measuring features of
diabetes progression without disrupting an individual’s bodily integrity. This chapter will
enumerate these technologies, revealing each tool’s functionality towards diabetes analysis.
3.1 Infrared Thermal Imaging
Infrared thermal imaging is a non-invasive technique that captures the amount of natural infrared
(IR) radiation being emitted from the body[46]. Infrared thermal imaging has many uses as a
40
method for medical screening/investigations—certain applications of the assay can reveal the
extent of a patient’s vascular tissue damage based on the levels of IR radiation detected[21]. The
technique works by measuring IR radiation emitted from the body’s external tissues[21]. Infrared
imaging is most effective on surface-level dermal tissue, where there are fewer heat sources
interfering with the thermal characteristics of the vasculature. Infrared thermal imaging is
effective in observing irregularities in one’s blood flow[47]. Blood carries heat from proximal
regions of the body to distal ones, and this heat can be observed as IR radiation[47]. If a blood
vessel is restricted or mechanically obstructed, then blood flow will be reduced, making the
vessel appear cooler in temperature in comparison to normal conditions[47]. Therefore, these
changes in vessel temperature can be captured by infrared thermal imaging metrics, allowing
researchers to identify locations of irregular blood flow. An example of an infrared thermal
image is presented in Figure 3-1.
Figure 3-1. Example infrared thermal image of the face.
Infrared thermal imaging has been used to analyze vascular dysfunctions in various
cardiovascular diseases (CVDs)[21]. Given that CVD often progresses in conjunction with
diabetes, it’s plausible that infrared thermal imaging can be repurposed for diabetes screening.
For instance, atherosclerosis is a vascular disease very closely associated with diabetes and
41
cardiometabolic syndrome (CMS) development. Atherosclerosis can arise as early as stage one
of diabetes development, and the severity of the condition is often directly correlated with
diabetes progression. It has been shown that infrared thermal imaging is capable of tracking
atherosclerosis severity[21], which can provide insight into diabetes severity.
Brånemark and coauthors conducted a study to evaluate the use of infrared thermal
imaging for the recognition of peripheral vascular diseases in association with diabetes[48]. Using
infrared thermography, the researchers revealed characteristic abnormalities in the thermal
emission patterns of 16 diabetic subjects with and without vascular complications, concluding
that imaging the hands and feet of diabetic patients can provide insight into diabetes severity[48].
Fushimi and coauthors. conducted a similar adjacent study to analyze the effectiveness of
infrared thermal imaging to classify autonomic neuropathy in diabetic cases[49]. The researchers
concluded that infrared thermography was one of the most reliable and reproducible non-
invasive methods for detecting and monitoring diabetic vasosympathetic abnormalities[49]. Based
on the science of the technique, and the successful studies that implemented the technology,
infrared thermal imaging seems well-suited for future studies regarding blood circulation and
metabolism in relation to diabetes. However, the exact interpretation of the thermal patterns in
the face is still the subject of ongoing research.
3.2 Skin Fluorescence Spectroscopy
Skin fluorescence spectroscopy (SFS) is a non-invasive technique that measures the
accumulation of advanced glycation end products (AGEs) in skin tissue[50]. AGEs are formed in
hyperglycemic environments through a multistep process that causes the glycation and oxidation
of free amino groups in proteins, lipids, and/or nucleic acids[50]. These AGEs can damage tissues
by creating cross-linkages between free molecules and AGE receptors[50]. During diabetes
progression, AGEs accumulate in blood vessel walls and surrounding tissues, leading to varied
complications depending on the damaged tissues’ functions and the severity of the destruction.
There are numerous techniques used to measure AGE accumulation in the skin, such as
skin autofluorescence (SAF) and skin intrinsic fluorescence (SIF)[50]—even in the lens of the
eye, AGE concentration can be measured using lens autofluorescence (LAF)[51]. All these
techniques are designed to measure the relative abundance of AGEs in various areas of skin (or
42
crystalline lens) using a fluoresce reader (Figure 3-2). Several AGEs exhibit a characteristic
fluorescence with an excitation wavelength in the range of 350-390 nm and an emission range of
400-620 nm[52], making it possible for these technologies to work. As a test the viability of SFS,
Dekker and coauthors completed a cross-sectional study to determine if SAF measurements
correlated with atherosclerosis severity (independent of diabetes severity)[22]. The researchers
discovered that SAF measurements were indeed higher in patients with subclinical and clinical
atherosclerosis (with respect their control baseline)[22], which aligns with the current
understanding of how hyperglycemia, AGEs, and atherosclerosis are related.
Figure 3-2. Application of various skin fluorescence spectroscopy devices in practice[53].
There have been multiple recent studies exploring the direct use of SFS for diabetes
screening applications. Olsen and coauthors found in their study that SFS has similar
performance to FPG and A1C tests in terms of screening for abnormal glucose tolerance[54],
which is a key progenitor of diabetes development. Tentolouris and coauthors’ findings also
validated the SFS diagnostic measurement, discovering that SFS was superior to random blood
glucose (RBG) testing and the American Diabetes Association’s (ADA’s) Diabetes Risk Test[55]
in recognizing dysglycemia levels indicative of diabetes[56]. From these studies, SFS appears to
be an effective, non-invasive method for measuring diabetes severity.
3.3 Retinal and Iris Imaging
Retinal imaging is a non-invasive technique used to capture a visual representation of one’s
retina to analyze for the presence of ocular irregularities. Retinal imaging has been frequently
43
used to screen for diabetic retinopathy (DR), which is a complication associated with advanced
stages of diabetes[57]. DR is a disorder of the eye that occurs when the blood vessels within the
retina become damaged due to complications caused by hyperglycemic conditions. Depending
on the severity of DR, the condition can lead to blurred vision/blindness as well as directly
trigger other ocular disorders such as diabetic macular edema, neovascular glaucoma, and retinal
detachment[58]. With retinal imaging, clinicians are able to recognize the microaneurysms,
neovascularization, scarring, and other abnormalities on the retina that are indicative of DR[59].
Detecting the severity and relative abundance of these abnormalities will also provide insight
into overall diabetes progression, making retinal imaging an effective non-invasive tool for
diabetes screening.
While retinal imaging has been widely used to screen for DR, the use of the anterior
segment of the eye—namely the iris—has shown potential as another non-invasive diabetes
screening technique. Also known as iridology, this method originates from a branch of
alternative medicine known as naturopathy, and is a controversial field of study in Western
allopathic medicine. Proponents of iridology claim that medical conditions and disorders can be
observed and diagnosed through changes of the iris[60]. Specifically, iridologists state that
disorders of the body can provoke changes in the pigmentation and texture of various regions
within the iris[61]—these regions can be seen in iridology charts as show in Figure 3-3. There are
numerous studies that discredit the practice of iridology for varied reasons[62]; however, certain
principles of iridology have been shown to work in specific contexts when applied correctly.
44
Figure 3-3. Iridology chart for both the right and left irises[63].
Researchers Lin Ma and Naimin Li completed a study in 2007 proposing a computerized
iris diagnosis method aimed at eliminating the subjective and qualitative characteristics of the
traditional iridology[61]. In analyzing the results of their support vector machine (SVM) classifier,
they found their model had an 85.4% accuracy when classifying patients with nerve system
disease and alimentary canal disease[61]. Similarly, researchers Piyush Samant and Ravinder
Agarwal examined different machine learning models for the task of classifying patients as either
diabetic and non-diabetic using only iris images—they found that the best classification accuracy
was 89.66% using a random forest classifier[64]. In general, these studies have demonstrated that
one of the major flaws in current iridology practices is the use of manual diagnosis. The addition
of computer vision techniques for image preprocessing seems to help remove ingrained bias
from traditional iridology. Therefore, the employment of these computational technologies could
potentially enable iris imaging and analysis to become a viable and effective method for non-
invasive diabetes detection.
3.4 Nail Fold Capillaroscopy
Nail fold capillaroscopy (also known as nail fold dermoscopy/dermatoscopy) is a non-invasive
diagnostic technique designed to evaluate small vessels of the microcirculation, often used to
45
analyze rheumatic-associated diseases[65]. The microcirculation is composed of a branching
network of vessels classified as arterioles, capillaries, and venules[66]. The microcirculation is the
site where the exchange of heat, solutes, and inflammatory cells occurs between the blood and
tissue[66]. Given that the roles of glucose (as a solute) and inflammation are significant in
diabetes progression[25][36], it’s incredibly valuable to explore non-invasive metrics that allow for
the examination of the microcirculation, especially since diabetes has been shown to affect
capillary microarchitecture in other parts of the body, such as the retina[66].
Diabetes cannot be directly labeled as a rheumatic-associated disease, so nail fold
capillaroscopy results using traditional analytical methods would be ineffective for this
application. In order to compensate for this, Maldonado and coauthors conducted a study to
reveal symptoms—found using nail fold capillaroscopy—that directly correlate with the
presence of diabetes (and potentially other non-rheumatic diseases)[67]. Specific symptoms of
diabetes-related microcirculation damage include capillary dilatation, avascular zones, and
tortuous capillaries[67]. Some of these physiological features can be viewed in Figure 3-4.
46
Figure 3-4. Example of capillaroscopic alterations in a diabetic patient and a healthy subject[67]. (A) a.
capillary dilation, b. capillaries, cross-linked and tortuous. (B) Normal capillaroscopy, homogeneous
arrangement of the capillaries of the last distal row, diameters within normal parameters.
Various studies have been completed to determine the effectiveness of nail fold
capillaroscopy as a diagnostic technique, but few have explored its potential as a diagnostic
47
technique for diabetes. In 2018, Bakirci and coauthors completed a study to examine whether
nail fold capillaroscopy could be used to screen type 2 diabetes patients for DR[68]. The benefit of
predicting for DR rather than predicting for the presence of diabetes is that patients with DR are
nearly guaranteed to have some level of damage in their nail fold microcirculation; this is due to
the fact that the microcirculation in their eyes show damage. In Bakirci’s study, the ground truth
labels used for algorithmic prediction were determined by an ophthalmologist who examined all
patients for whether or not they had DR, and the nail fold capillaroscopy results were then
acquired using a rheumatologist who implemented the capillaroscopy analysis on the clinical
data[68]. The study found that capillary hemorrhages, instances of ectasia, instances of giant
capillaries, and neovascularization occurred more frequently in patients with DR, but the results
were not significant[68].
Computational researchers have theorized that there is inherent ambiguity in human
judgment when interpreting nail fold capillaroscopy results[69], which creates a challenge for
obtaining standardized labels for the images to be used in supervised learning methods. Disease
classification methods using this diagnostic technique may be improved by utilizing
computational frameworks—which is a similar point of contention in iridology and iris imaging
analysis. Nevertheless, more research is necessary to support this assertion.
3.5 Pulse Wave Analysis
Pulse wave analysis is a non-invasive technique used to examine vascular stiffness[70]. The
diagnostic technique works by recording a patient’s incident and peripheral pressure waveforms
from a specified blood vessel[70]. The incident pressure waveforms (ejection wave) correspond to
the pressure waves generated by the contraction of left ventricle of the heart, while the peripheral
pressure waveforms (reflected wave) correspond to pressure waves returning to the heart after
reflecting from small distal blood vessels. Measuring these waveforms enables the calculation of
a patient’s central aortic pressure, pulse wave velocity (PWV), and augmentation index (AIx or
AI)[70][71]—a schematic for how these values are generated is show in Figure 3-5. These
calculated metrics vary due to a variety of factors including age, physical fitness, diet, heart rate,
intensity/regularity of exercise, body height, and biological sex[72]. Likewise, the central aortic
48
pressure, PWV, and AIx can change due to the use of drugs or the presence of diseases like
atherosclerosis, heart failure, and diabetes[72].
Figure 3-5. Pulse waveform schematic depicting the measured and calculated values during pulse wave
analysis[73]. The schematic juxtaposes the typical waveforms of young individuals as opposed to elderly
individuals to display how vessel stiffening affects the shape of the measured waveform.
High blood pressure (high central aortic pressure) is strongly correlated with diabetes due
to the development of arterial inflammation and atherosclerosis within diabetes progression;
however, it’s not as clear how diabetes affects PWV and AIx. Zhang and coauthors conducted a
study to explore these trends between diabetes, PWV, and AIx, using a sample of 79 Chinese
type 2 diabetes patients[74]. The researchers’ findings showed that type 2 diabetes was a
significant independent determinant for carotid‐femoral pulse wave velocity (CF-PWV), fasting
glucose was a significant independent determinant for carotid‐radial pulse wave velocity (CR-
PWV), but neither type 2 diabetes nor fasting glucose was significantly associated with carotid‐
ankle pulse wave velocity (CA-PWV). AIx was actually shown to decrease in the Chinese
49
diabetes patient sample[74]; however, evidence from an adjacent study conducted by Lacy and
coauthors demonstrated that there was little change in AIx between diabetic and control
samples[75], showing that the association of AIx and diabetes is still inconclusive.
Overall, pulse wave analysis is an excellent diagnostic tool to determine arterial stiffness,
and it has many applications across CVD-related conditions. Central aortic pressure is an
effective indicator of atherosclerosis and diabetes progression, but caution must be taken when
using PWV and AIx as a predictor for diabetes.
3.6 Breath Analysis
Breath analysis (breath testing) is a non-invasive method of measuring the amount of specified
gases, volatile organic compounds (VOCs), and other aerosolized particles within a patient’s
breath for use in disease diagnoses. Breath tests have been used as diagnostic tests for a variety
of conditions like lactose intolerance, the presence of Helicobacter pylori (H. pylori), fructose
intolerance, small intestinal bacterial overgrowth (SIBO) syndrome, and various other
disorders[76]. Similar to breathalyzer tests which measure blood alcohol content (BAC), breath
analysis works by monitoring a patient’s breath for gases and particulates related to the disorder
of interest—this often involves taking baseline measurements, administering an oral solution
(sugar-water, milk, etc.) and exhaling into a breathalyzer at regular time intervals[76].
Exhaled gases and aerosols may be generated endogenously in the pulmonary tract,
blood, or peripheral tissues as metabolic byproducts of human cells[77]. With this in mind, further
research has shown that there are various biomarkers within the breath that correlate with blood
sugar levels[78]—this reveals the potential of using breath analysis as a non-invasive diagnostic
tool for diabetes. The presence of aerosolized glucose, ethanol, methanol, propionic acids, and
butanoic acids is indicative of elevated glucose and sucrose in the blood, tissues, and
gastrointestinal system[77]. Additionally, the presence of alkyl nitrates, carbon dioxide, carbon
monoxide, ethane, pentane, and propane is indicative of oxidative species and oxidative stress in
the body[77]—reactive oxidative species (ROS) promote vascular inflammatory responses that
trigger atherosclerosis[36]. Finally, the presence of ketones is indicative of ketoacidosis (which
develops at the end of diabetes progression), and the presence of isoprene is indicative of
cholesterol synthesis[77].
50
Currently, breath analysis methods for diabetes have been focused on measuring acetone
levels (as a metric for ketoacidosis)[77]. Wang and coauthors found a linear correlation between
the group mean acetone concentrations of type 1 diabetes patients and their categorized blood
glucose levels as well as their categorized hemoglobin A1C levels—however, a similar
correlation was not found for type 2 diabetes patients[79]. Adjacent studies confirm this
discrepancy between patients with type 1 diabetes and type 2 diabetes[80], meaning that breath
analysis for type 2 diabetes may need to be refocused on other gases and aerosols in order to be
effective as a diagnostic tool. Nevertheless, acetone breath analysis appears to be a relatively
effective diagnostic with respect to type 1 diabetes patients.
52
Chapter 4
Implementation of Non-Invasive Diabetes Screening
Tools and Clinical Study
In order to create a portable non-invasive diabetes screening platform, Dr. Fletcher’s group at
MIT has developed a mobile platform for field testing two different non-invasive diagnostic
methods—thermal imaging and iris analysis—in addition to an expanded diabetes risk
questionnaire. This platform includes a comprehensive Android mobile application as well as a
Django server that supports the integration of machine learning algorithms. (Figures 4-1 and 4-2)
Figure 4-1. Diagram of the system architecture developed by the Mobile Technology Group for clinical
study field work regarding the evaluation of non-invasive diabetes screening tools.
53
Figure 4-2. Sample screenshots of mobile applications developed by The Mobile Technology Group to
support field testing of diabetes screening tools.
4.1 Study Design and Protocol
Data collection is ongoing at two different sites in India: the Aditya Jyot Foundation for
Twinkling Little Eyes (AJFTLE) in Mumbai, and the Swami Vivekananda Yoga Anusandhana
Samsthana (S-VYASA) site in Bangalore. At AJFTLE all the measurements except for the
54
psychology-related questionnaires are taken since this site sees more patients, and does not have
the bandwidth to administer the psychology questionnaires to each patient. At S-VYASA, all
non-invasive measurements except for the vision test are taken. In order to validate the Mobile
Technology Group’s non-invasive measurements against standard screening tools, and to help
develop machine learning algorithms, both sites also administer blood glucose tests using a
standard Alere glucometer and perform retina scanning using the Remidio Fundus camera. At
each site, after a subject agrees to the study and signs the consent form, the following
measurements are taken:
• Diabetes Questionnaire: This questionnaire asks about general diabetes risk factors and
symptoms. The technician administering the questionnaire also takes several standard
measurements, such as height, weight, and blood pressure as part of this measurement.
• Psychology-related questionnaires: If the subject is at the S-VYASA site, they are
administered several psychology-related questionnaires.
• Iris images: The iris imaging device is used to take two pictures each of the patient’s left
and right eyes, for a total of four iris images per patient.
• Thermal images: The Seek Thermal camera is then used to capture two thermal images of
the face.
• Vision test: At the AJFTLE site, the subject will take a standard vision test, which
consists of reading a chart at a distance of six meters. The results of the test are recorded
in the Android mobile app.
• Blood test: At both sites, a finger prick blood test is read by an Alere glucometer and used
to record the RBS value. Additional blood glucose tests such as HbA1c may also be
recorded.
• Photoplethysmogram (PPG): The photoplethysmographic waveform is recorded using
the Android mobile phone.
• Retina images: Following these steps, the patient’s eyes are dilated for retinal imaging.
Though the previous measurements can be collected in any order, retinal scanning
happens last after all of the other measurements have been recorded because iris images
cannot be collected once the pupil has been dilated. A total of four retina images are
collected per eye with eight images in total.
55
4.2 Current Status and Available Data
Data collection for the study is ongoing. As of now, 282 unique patients have been sampled
within the study. Of these patients:
• 270 have recoded thermal images
• 262 have recorded iris images
• 240 have recorded blood measurements
• 238 have completed the vision test
• 213 have completed the Diabetes Questionnaire
• 4 have completed the Anxiety Questionnaire
• 4 have completed the Preservative Thinking Questionnaire
• 4 have completed the Depression Questionnaire
• 3 have completed the Sleep Questionnaire
It is currently unknown how many patients have had retina images recorder or have had their
PPG measurement taken. Also, currently none of the patients have been clinically labeled for any
level of diabetic severity.
Over the past years, the Mobile Technology Group has developed a Bayesian network
model for the non-invasive screening of diabetes—the network is designed to be incorporated
into the mobile application for public use. Nevertheless, the model is still undergoing
improvements. One of the major strategies for improving the model is by incorporating more
patient data. The Mobile Technology Group is currently developing predictive models for
thermal and iris patient image data to address this task. These models are being designed to
output scores depicting a patient’s probability of being within a given diabetic stage. These
individual outputs will then be taken into the Bayesian network to improve its predictive power.
Additionally, patient retina images are being assessed for different ocular maladies that are
indicative or diabetes or diabetic retinopathy. This information can then be directly included as
an input for the Bayesian network.
The inclusion of image-based data could greatly improve the performance of the
Bayesian network; however, additional screening and preprocessing steps are needed to ensure
model performance is improved rather than hindered by the additional data. Chapter 5 goes in-
depth on metrics which aid in ensuring image quality for image-based predictive models.
57
Chapter 5
Image Quality Analysis for Patient Image Data
The diabetes screening mobile application, created by the Mobile Technology Group, is designed
to accept multiple non-invasive measurements—these measurements are obtained by using non-
invasive diabetes diagnostic techniques (most of which are highlighted in Chapter 3). Some of
these non-invasive measurements are collected in the form of images; these images are derived
from infrared thermal imaging, retina imaging, and iris imaging data collection methods—an
example of each type of patient image is shown in Figure 5-1. The current issue regarding
processing image data is that incoming images are not screened for quality prior to being used in
predictive models. Developing a training dataset with a noticeable percentage of images being
off center, too blurry, or too bright/dark to detect features would cause model performance to
suffer. Currently the Mobile Technology Group has developed preliminary models for thermal
and iris image classification, and each model individually tackles issues associated with
centering/cropping an image—this is completed via face detection and pupil centering for the
thermal and iris models, respectively. Therefore, this chapter will focus on discussing methods
that separate out images that are too blurry, overexposed, and/or underexposed for model use.
58
Figure 5-1: Examples of patient thermal, retina, and iris images (displayed left to right). Patient image
data was collected by clinicians for the diabetes screening mobile application.
5.1 Automated Detection of Blur
There are numerous methods one can use to determine if an image is blurry. However, most
methodologies are focused on one of two approaches: analyzing the frequency domain of an
image to detect blur, or analyzing the spatial domain of an image to detect blur. To examine each
of these approaches, this analysis will compare and contrast two algorithms which employ these
blur detection approaches: the fast Fourier transform (FFT) algorithm which determines and
analyzes an image’s 2D discrete frequencies, and the Laplace operator which detects edges when
incorporated in convolutional image processing techniques. Since color is not directly tied to the
blur of an image, all analyses were performed on black and white versions of the input images.
5.1.1 Fast Fourier Transform (FFT) Blur Metric
The FFT algorithm is designed to take discrete, equally-spaced values in the time/spatial domain,
and convert them into the discrete frequency domain[81]. The reason this algorithm can be applied
to the blur detection task is because an image’s blurriness/sharpness is defined by the pixels—
pixel values are what form an image’s spatial domain. If there are sharp differences between sets
of pixel values, the FFT algorithm will detect these differences as high frequencies in the
frequency domain. Likewise, gradual changes in pixel values, over stretches of an image, would
produce low frequencies in the frequency domain. Blurry images tend to have more gradual
59
changes in pixel values while sharper images have more defined edges that produce strong
contrasts between pixels.
The FFT blur metric was developed by measuring the percent of high frequencies in the
frequency domain of an image, using a threshold to define what a high frequency constitutes.
The NumPy Python library (version 1.18.1) was used to perform a 2D FFT algorithm on each
image tested—a 2D FFT operation is necessary in order to obtain the frequency domain of a two-
dimensional data structure like a grayscale image. This blur metric outputs only a score
representing the percentage of high frequencies in the image, rather than a label depicting blur;
however, a blur threshold can easily be set by a user, where an image is considered blurry if the
output score falls below the preset threshold. Both the high-frequency threshold and the blur
threshold mentioned depend on the images being examined and the desired stringency of the
metric. For the comparison analyses conducted, the high-frequency threshold was set by equation
5.1.
High-Frequency Threshold = 0.33 · [(Input Image Height) + (Input Image Width)] (5.1)
5.1.2 Laplace Operator Blur Metric
In image processing methods, the Laplace operator is often portrayed as a matrix (convolutional
kernel/filter) that is designed to detect edges present in the spatial domain of an image[82]. The
Laplacian kernel functions by comparing a given pixel with its nearest neighbors. The kernel
starkly emphasizes differences in local pixel values, while essentially zeroing out all areas in an
image without sharp contrasts. The 3x3 convolutional kernel is applied to the whole image,
commonly with a stride length of one, to produce a matrix representing the edges within an
image. The discrete 2D Laplacian kernel is shown in Figure 5-2.
Figure 5-2. 2D Laplacian kernel.
The Laplacian blur metric was developed by applying the Laplacian kernel to an image
(with a stride length of one), and then obtaining the variance computed from all of the values
within the matrix produced. The reason why variance is used is because sharper images have
60
more contrast, producing more contrasting values in the output matrix. Therefore, output
matrices with higher variances directly correspond to shaper images. The OpenCV Python
library (version 4.2.0) was used to apply the Laplacian kernel on each image, as well as to obtain
the variance of the output matrix. Again, this blur metric only outputs a score rather than a label
depicting blur—the score for this metric represents the variance of the post-convolution image
matrix. Like the FFT blur metric, a blur threshold can easily be set by a user, where an image is
considered blurry if the output score falls below the preset threshold. The blur threshold depends
on the images being examined and the desired stringency of the metric.
5.1.3 Comparing and Contrasting Metrics
Three similar analyses were performed to examine how well each metric classifies the blur of an
image. For the sake of consistency, all analyses were performed on the same patient image, and
any blur not found in the original picture was artificially added via Photoshop’s gaussian blur
tool (version 20.0). The image used was a thermal image of a patient’s face, which was selected
arbitrarily from the current pool of patient image data.
The first analysis explored how applying a gaussian blur affects the output scores of each
metric. For this analysis, 11 images were assessed in order of increasing levels of blur strength.
Blurs were applied to the whole image and blur strengths ranged from 0% blur to 100% blur. The
results of this analysis are displayed in Figure 5-3.
61
Figure 5-3. Blur metric comparison using fully gaussian-blurred images with incrementally increasing blur
strength. (A) The 11 images used in the analysis. Images are fully gaussian-blurred, with blur strengths
increasing incrementally from 0% blur to 100% blur. (B) FFT blur metric scores with each of the 11
images. (C) Laplacian blur metric scores with each of the 11 images.
From this initial comparison, both metrics appear to respond to incrementally increasing
image blur with exponentially decaying output scores. The Laplacian metric appears to be more
stringent with respect to the FFT metric given that the first instance of image blur causes a
sharper drop in the metric’s outputted score. While this analysis reveals how each metric handles
images that are fully gaussian-blurred, it’s important to explore images that are partially blurred
as well—this would occur if portions of an image are in focus, while others are out of focus.
The second analysis explored how partially applying a gaussian blur affects the output
scores of each metric. For this analysis, 5 images were assessed with an increasing percentage of
each picture covered by a gaussian blur. Blurs were applied to the quadrants of an image, with
each sequential image having more quadrants blurred. Also, each blur was set with 100% blur
strength upon application. The results of this analysis are displayed in Figure 5-4.
62
Figure 5-4. Blur metric comparison using partially gaussian-blurred images with an incrementally
increasing number of blurred quadrants. (A) The 5 images used in the analysis. Images are partially
gaussian-blurred, having blur applied to image quadrants. The blur strength of each blurred quadrant is
100%. The number of blurred quadrants in each image increases from 0 quadrants to 4 quadrants across
the 5 images. (B) FFT blur metric scores with each of the 5 images. (C) Laplacian blur metric scores with
each of the 5 images.
There are clear differences to note in this comparison. The FFT metric output scores
resemble that of an inverted parabola, while the Laplacian metric output scores resemble a more
linear distribution. Given that the experimental setup applies incremental quadrant-based blurs to
images, the Laplacian metric seems to better capture the linear distortions of the data.
Nevertheless, this performance isn’t always desired as a method of screening images. For
instance, if the background of an image is blurry while the foreground is in focus, the Laplacian
may still output a low score despite important image features still being clear. In this case, the
FFT would be a better metric given that the FFT metric score only drops severely once blur is
applied to the entire image.
The final analysis combined the first two analyses by exploring how each metric would
handle partially blurred images that were gradually blurred to a fully blurred image. This test
simulates the situation previously described, where portions of an image are in focus while other
portion are out of focus. The goal of this analysis is to see how well each metric grades these
63
images when incremental blur is applied to the in-focus section of each image. For this analysis,
test images were created by applying a gaussian blur to the left half of image, where the blur
strength was at 100%. Gaussian blurs of 20% strength were then incrementally added to the right
half of each sequential image, producing 6 images in total. Finally, the original image (with no
blur applied) was included as image number 0, to establish a baseline to compare the results of
this analysis with each metric’s output score for the original image. In all, 7 images were
assessed for this analysis, and the results are displayed in Figure 5-5.
Figure 5-5. Blur metric comparison using partially gaussian-blurred images that were gradually blurred to
a fully blurred image. (A) The 7 images used in the analysis. Besides image number 0, each image
applied a gaussian blur to its left half, where the blur strength was at 100%. Gaussian blurs of 20%
strength were then incrementally added to the right half of each sequential image. (B) FFT blur metric
scores with each of the 7 images. (C) Laplacian blur metric scores with each of the 7 images.
The results from this final comparison mimic the results from the first two analyses. The
FFT metric output score appears to gradually decrease when only some portions of an image
contain blur, but declines sharply once any level of blur is applied to the whole image. The
Laplacian metric output score appears to decrease linearly when only some portions of an image
contain blur, but also declines sharply once any level of blur is applied to the whole image. It is
apparent that both the FFT and Laplacian blur metrics present appropriate descending trends
when analyzing increasingly blurred photos. Nevertheless, the Laplacian metric consistently
64
appears to be the more stringent metric. Between the two metrics, it seems that the FFT metric is
generally better able to account for the sharper portions of images.
The employment of either the FFT or Laplacian metric seems to depend on the problem
attempting to be solved and the image data required. If images contain a background and
foreground, and only certain regions must remain in focus for model performance to be effective,
then the FFT metric seems better suited. If the import features of an image constitute the entire
image, then very little blur should be tolerated, making the Laplacian metric more effective.
Given the nature of the patient image data (Figure 5-1), there is often a background present
which is not essential for the Mobile Technology Group’s classification models. With this in
mind, the FFT metric would be the best blur metric to screening patient images—fine tuning the
high-frequency threshold can somewhat increase the metric’s stringency if necessary.
5.2 Automated Detection of Saturation
The saturation detection problem is actually a simpler task in comparison to the blur detection
problem. A common method of analyzing natural images for whether they are
overexposed/underexposed is by measuring the percentage of ‘dark’ and ‘bright’ pixels in the
image. This analysis allows for the quick detection of whether an image contains a majority low-
intensity or high-intensity pixels. Unfortunately, this method of testing only applies to natural
images, mainly because pixel values within synthetic images don’t necessarily correlate to
saturation—however this is not an issue within the scope of the Mobile Technology Group’s
patient image measurements.
To evaluate the described saturation detection method, a means of separating results is
necessary—this will allow for the creation of ‘underexposed’, ‘normal’, and ‘overexposed’
labels. A potential method is to create a range of pixel intensities that define a ‘normal’ image—
this range will vary depending on the type of image data being analyzed. The initial analysis
examined a patient iris image with varied levels of saturation applied to it—the exposure filters
where applied using Photoshop. A total of 5 images were examined with saturation levels
increasing from -100% strength to 100% strength. The results of the initial analysis are shown in
Figure 5-6—again, all analyses were performed on black and white versions of the input images.
The reason thermal images were not used again for the saturation analysis is because saturation
65
is not relevant in thermal imaging given that thermal cameras do not detect visible light.
However, the methodologies described in this analysis can be repurposed analyze abnormal heat
signatures in thermal images.
Figure 5-6. Saturation metric applied to an iris image at varied saturation levels. The top row of the figure
displays the same iris image with incrementally increasing saturation (ranging from -100% strength to
100% strength). Under each image is the image’s distribution of pixel intensities. Mean values which fall
in the range of a ‘normal’ image are illustrated in green, while mean values that fall outside this range are
illustrated in red. The ‘normal’ range used for this analysis was [97.5, 157.5].
Figure 5-6 also highlights one of the major concerns around the saturation problem, with
regard to conducting image-based analyses, known as clipping. In both Image Number 0 and
Image Number 4, the variance of the pixel values within the image decreases due pixel values
butting up against the bounds of their dynamic range. This creates a loss in information which
cannot be regained via normal color-correcting means. Analyzing an images pixel distribution
allows one to detect potential instances of clipping.
While this method appears to function well with iris image data, there are complications
applying the same methodology to retina image data. Since retina images collected by the
Mobile Technology Group are taken using the Remidio Fundus Camera[6], the majority of the
background is pure black. This greatly skews the distribution of pixel intensities to the right,
which may render the metric ineffective. The naïve solution to correct this issue would be to
recalibrate the ‘normal’ range to account for this shift; however, the percent of a retina image
that is occupied by background varies from image to image—therefore, the amount by which the
66
distribution will shift will also vary. Nevertheless, this problem can be mitigated by simply
excising the lowest intensity values from the distribution, which essentially removes all the
background values from the intensity distribution (Figure 5-7). Overall, this procedure for
analyzing images for saturation is effective across all current patient image data types.
Figure 5-7. Saturation metric and situational corrections applied to a retina image. (A) The original retina
image. (B) The applied saturation detection metric with specific intensities removed from each pixel
intensity distribution. Mean values which fall in the range of a ‘normal’ image are illustrated in green, while
mean values that fall outside this range are illustrated in red. The ‘normal’ range used for this analysis
was [97.5, 157.5]. (C) A representation of the input image entering the saturation detection metric within
each case. For visualization purposes, pixels that were removed from the intensity distribution were
replaced with pixels of maximum intensity (255) in the images. Images depicted are in grayscale due to
being post-analysis outputs (metric inputs are greyscale images).
68
Chapter 6
Diabetes Questionnaire Analysis
Besides patient image data, the main source of data collection for the diabetes screening mobile
application is questionnaire data. There are numerous questionnaires for which patients are asked
to self-report including questionnaires about anxiety, sleep, preservative thinking, depression,
and lifestyle habits relevant to diabetes diagnosis. This analysis will specifically look at the
Diabetes Questionnaire, which assesses the direct risk factors related to diabetes development—
these risk factors are discussed in Chapter 2. The purpose of this analysis is to detect patterns in
the data that can be leveraged by prediction models, potentially leading to simpler solutions
which still enable high predictive power.
Similarly, these patterns may reveal a set of features that can be used to label patients
with different diabetic stages. The Mobile Technology Group does not track the gold-standard
blood tests discussed in Chapter 2; therefore clusters within the data can potentially be used as a
proxy for different diabetic stages if said clusters correlate with diabetes progression.
6.1 Data Preprocessing
In order to conduct meaningful analyses with the questionnaire data, all of the patient’s answers
must be converted into a numerical representation. The Diabetes Questionnaire asks multiple
questions which vary in structure. Some questions require specific numerical answers, such as
queries about height, weight, blood pressure, etc. Some questions are multiple choice, requiring
yes/no responses or a select prompt from a set of options. Finally, the rest of the questions are
multiple select, in which patients are able to select all answers that apply within the bounds of
the question. While the numerical questionnaire data did not need a transformation applied, the
categorical data needed to be encoded into a numerical space.
69
For multiple choice questions, values were assigned using integer encoding, where each
answer choice was assigned an integer value ranging from 0 to the total number of choices.
Higher numbers were generally associated with answer choices which provided more
information and enabled efficient pattern detection in the data—nevertheless, the integer
assignment process was subjective and may be an area of improvement. For multiple select
questions, values were converted using a custom binary instance encoding. Each answer choice
within a question was assigned a digit placement (i.e. ones, tens, hundreds, etc.)—the number of
digit placements matched the total number of choices. Again, answer choices which provided
more information were generally associated with higher digit placements. For the specific set of
answer options which a patient selected for a given question, the digit placement value of each
associated option would be 1, indicating that the choice was selected. The remaining choices
would have digit placement values of 0 to indicate that those choices were not selected.
Therefore, every possible set of selected choices produces a unique string of 0s and 1s
representing a binary number. This binary number was then converted to a base 10 integer to
produce a numerical representing the patient’s answer for said question. More information
regarding how each question in the Diabetes Questionnaire was converted to a numerical value is
provided in Table 6-1.
Question Conversion
Method Original Answer Choice Assignment
1 Height None Decimal Decimal
2 Weight None Decimal Decimal
3 Systolic Blood Pressure None Integer Integer
4 Diastolic Blood
Pressure None Integer Integer
5 Waist Circumference None Decimal Decimal
6 Hip Circumference None Decimal Decimal
7 Have you ever been tested for diabetes?
Integer Encoding
Yes 2
No 1
I don’t know 0
8 Have you ever been
diagnosed with Integer Encoding
Yes 2
No 1
70
diabetes? I don’t know 0
9
If you have been diagnosed with
diabetes, do you know which type of diabetes
you have?
Integer Encoding
Yes, Type 2 Diabetes 3
Yes, Type 1 Diabetes 2
I don’t know 1
N/A 0
10
If you have been diagnosed with
diabetes, when were you diagnosed?
Integer Encoding
More than 15 years ago 7
11-15 years ago 6
6-10 years ago 5
2-5 years ago 4
1-2 years ago 3
7-12 months ago 2
0-6 months ago 1
N/A 0
11 If you have diabetes,
what treatments are you using for your diabetes?
Custom Binary Instance Encoding
Ayurvedic or non-allopathic medicine
7th digit
Tablets 6th digit
Insulin 5th digit
Diet 4th digit
Exercise 3rd digit
No treatment 2nd digit
N/A 1st digit
12 Do you have a family history of diabetes?
Integer Encoding
Yes, both parents 3
Yes, one parent 2
No 1
I don’t know 0
13 Have you ever been
diagnosed with any of these diseases?
Custom Binary Instance Encoding
Cardiovascular disease 9th digit
Hypertension 8th digit
Anemia 7th digit
Renal or kidney disease 6th digit
Thyroid disease 5th digit
Pulmonary disease (COPD, Asthma, TB, ILD)
4th digit
Cancer 3rd digit
Other 2nd digit
71
N/A 1st digit
14
Besides diabetes medication, are you
also taking medicine for other diseases also?
Custom Binary Instance Encoding
Cardiovascular disease 11th digit
Hypertension 10th digit
Anemia 9th digit
Renal or kidney disease 8th digit
Thyroid disease 7th digit
Pulmonary disease (COPD, Asthma, TB, ILD)
6th digit
Pain 5th digit
Sleep 4th digit
Cancer 3rd digit
Other 2nd digit
N/A 1st digit
15 How much physical exercise do you do?
Integer Encoding
I do vigorous physical exercise at my work or
outside work on most days 3
I do some (moderate) physical exercise at my
work or outside work 2
I do a little bit (mild) physical exercise at my
work or outside work 1
I have no physical exercise at my work or outside work
0
16 What is your usual diet? Integer Encoding
Vegetarian 2
Vegan 1
Non-veg (My diet includes meat)
0
17 How often do you drink
alcohol? Integer Encoding
Often (more than 2 times per week)
3
Seldom (1 time per week or less)
2
Never 1
Prefer not to answer 0
18
Over the past 2 weeks, how often do you have
difficulty sleeping or difficulty falling asleep?
Integer Encoding
Nearly every day 3
More than half of the days 2
A few days 1
72
Not at all 0
19
Over the past 2 weeks, how often have you
been feeling nervous, anxious, or worried about many things?
Integer Encoding
Nearly every day 3
More than half of the days 2
A few days 1
Not at all 0
20
Over the past 2 weeks, how often have you
been sad, depressed or hopeless?
Integer Encoding
Nearly every day 3
More than half of the days 2
A few days 1
Not at all 0
21 Do you often feel
fatigued? Integer Encoding
Yes 2
No 1
I don’t know 0
22 Approximately how
many times per day do you urinate?
Integer Encoding (Modified)
More than 10 15
8-10 9
6-7 6.5
5 5
4 4
3 3
2 2
1 1
23 Do you often feel pain
in your limbs? Integer Encoding
Yes 2
No 1
I don’t know 0
24 Do you often feel numbness in your
limbs? Integer Encoding
Yes 2
No 1
I don’t know 0
Table 6-1. Numerical conversions applied to the Diabetes Questionnaire patient data.
Along with these questions, additional features were added to assess trends in patient
data—these features included a numerical value representing a patient’s body-mass index (BMI),
a patient’s label for hypertension severity, and a patient’s Indian Diabetes Risk Score (IDRS)
measurement. Each patient’s BMI was calculated using equation 6.1. The hypertension severity
73
label was determined using the American Heart Association’s (AHA’s) guidelines for the
detection, prevention, management, and treatment of high blood pressure[83]. Based on these
guidelines, patients were categorized into one of five hypertension severity levels: normal,
elevated, stage 1, stage 2, and hypertensive crisis. The IDRS is a measurement, created by the
Madras Diabetes Research Foundation (MDRF), designed to aid in the detection of undiagnosed
type 2 diabetes[84]. The IDRS calculation takes in patient features like age, biological sex, waist
circumference, physical activity level, and a patient’s potential family history of diabetes to
output a score which represents the patient’s risk of having or developing diabetic symptoms—
outputted scores are multiples of 10 ranging from 0 to 100. A research study conducted by
Dudeja and coauthors concluded that IDRS values correlate strongly with the presence of type 2
diabetic symptoms, and that the IDRS can be used as a non-invasive measure for diabetes
screening[84]. Numerical conversions for these derived features are shown in Table 6-2.
Question Conversion
Method Original Answer Choice Assignment
1 BMI None Decimal Decimal
2 Hypertension Severity Integer Encoding
Hypertensive Crisis 4
Hypertension: Stage 2 3
Hypertension: Stage 1 2
Elevated 1
Normal 0
3 IDRS (risk score) None Integer Integer
Table 6-2. Numerical conversions applied to features derived from Diabetes Questionnaire patient data.
6.2 Heatmap Correlation Analysis
From Dudeja et al.’s study, it seems likely that patient IDRS values should correlate strongly
with patient random blood sugar (RBS) measurements, especially since abnormal blood sugar
levels are a strong indicator of diabetes manifestation and severity. With this in mind, Figure 6-1
74
was created to display the RBS-IDRS correlation, as well as reveal other patterns that can be
found throughout the Diabetes Questionnaire data. As mentioned before, any patterns/clusters
found in the Diabetes Questionnaire data could be leveraged for patient labeling. This figure was
created by taking all the features within the Diabetes Questionnaire, the features derived from the
Diabetes Questionnaire (BMI, hypertension severity, and IDRS), and each patient’s RBS
measurement and associated RBS measurement instrument (as an integer encoding)—this
created a total of 174 patients and 29 features. All data values were then scaled such that each
feature had unit variance—values ranging between 0 and 1—and the created data table was
plotted in the form of a heatmap.
Figure 6-1. Heatmap analysis of 29 patient features across 174 patients. (A) The heatmap is sorted by
‘RBS’ values in descending order (from top to bottom). (B) The heatmap is sorted by ‘Diabetes
Treatments’ values in descending order (from top to bottom).
Figure 6-1A displayed the heatmap data sorted by each patient’s RBS value. Surprisingly
enough, the figure showed that there is little-to-no correlation between RBS and IDRS values
within the dataset. In fact, Figure 6-1A revealed that there are no observable correlations
between RBS and any of the other features in the data. Given that both RBS and IDRS are
verified methods of screening for diabetes, this discrepancy implied that some underlying patient
feature may be influencing other patient feature values in an unpredicted manor.
75
A trend in the data was eventually observed once patients were sorted by their diabetes
treatment information (Figure 6-1B). In Figure 6-1B, there is a correlation shown at the bottom
of the figure across specific patient features: ‘Tested for Diabetes’, ‘Diagnosed with Diabetes’,
‘Diabetes Type’, ‘Diagnosis History’, and ‘Diabetes Treatments’. Further inspection unearthed
the meaning behind this apparent correlation: these values would all be related for patients whom
have never been tested for diabetes. To elaborate, these untested patients would have also not
been diagnosed, they would not have a diabetes type without a diagnosis, they would also not
have a diagnosis history, and finally they would have no reason to be taking diabetes
medications.
Nevertheless, further analyses were conducted on the patients who were not taking
diabetes medications nor undergoing treatment. This is because the most common form of
medication taken among the patient population was discovered to be metformin (tablets), which
is designed to control blood sugar levels[85]. This discovery invalidates the methodology of using
RBS measurements as a metric for diabetes severity. However, patient RBS measurements could
still be as a valid metric within the subset of patients that are not undergoing diabetes treatments.
With this in mind, a heatmap similar to the one presented in Figure 6-1A was created, but only
with the 24 patients who were not undergoing diabetes treatments (Figure 6-2).
76
Figure 6-2. Heatmap analysis of 29 patient features across 24 patients who have not undergone any
diabetes treatments. The heatmap is sorted by ‘RBS’ values in descending order (from top to bottom).
Figure 6-2 appears to display moderate correlations within the data, which is an
improvement from the results depicted in Figure 6-1A—it appears that blood pressure, waist and
hip circumference, and risk score generally increases as RBS increases. If the patient risk scores
correlate strongly with the patient RBS values among the 24-patient population, it’s possible that
the IDRS metric could be used as a proxy for classifying diabetes within the whole 174-patient
population (since RBS is not usable with medicated patients). The reason why the IDRS would
be usable is because IDRS values are not significantly affected by patient medical treatments. An
77
IDRS value is computed from patient lifestyle-based questions rather than biomolecular
measurements—therefore IDRS would make an excellent proxy for the whole dataset if
applicable. Figure 6-3 was created to examine the validity of using IDRS as a proxy. The figure
displays a scatter plot showing the relationship between RBS and IDRS within the 24-patient
population. Table 6-3 complements Figure 6-3 by presenting the correlation coefficients from
both Pearson and Spearman correlation metrics.
Figure 6-3. Scatterplot depicting the correlation between RBS and IDRS among the 24 untreated patients
in the 174-patient population.
78
Correlation Metric Correlation Coefficient P-value
Pearson Correlation 0.436 0.033
Spearman Correlation 0.538 0.007
Table 6-3. Pearson and Spearman correlation coefficients for the correlation between RBS and IDRS
among the 24 untreated patients in the 174-patient population. The coefficients from both the Pearson
and the Spearman metric imply that there is a moderate correlation between the two features. With a
significant threshold of α=0.05, both coefficients are statistically significant.
The results shown in Figure 6-3 and Table 6-3 show that there is a moderate correlation
between the RBS values and IDRS values among the untreated population of patients, which
aligns with the initial assumption that IDRS is an effective diagnostic for diabetes severity. From
this analysis, it’s clear that the questionnaire is not fine-tuned to detect signals of diabetes in
medicated patients. So far, the IDRS is the only patient feature which may correlate strongly
with diabetic severity. To improve the Diabetes Questionnaire’s efficacy, further research must
be conducted on diabetic symptoms that are still detectable when undergoing each type of
diabetes treatment currently available—this will allow for more tailored questions which reveal
useful information across all patients. Additionally, collecting more patient data from patients
whom are not undergoing treatments would improve the statistical power of the RBS-IDRS
correlation analyses presented in Figure 6-3 and Table 6-3.
Since the IDRS score appears to correlate with diabetic severity, and the IDRS is derived
from other patient features within the Diabetes Questionnaire, there may be underlying patterns
in the patient data that are unable to be detected via various ordered sorts. Chapter 7, will discuss
the extent by which machine learning frameworks are able to uncover these patterns and leverage
them for the prediction of different diabetic stages.
80
Chapter 7
Semi-Supervised Autoencoder for Patient Labeling
A major limitation of the current dataset for the diabetes screening mobile application is that
patients are not labeled for diabetic severity—this means they aren’t labeled for the
presence/absence of diabetic symptoms, nor are they labeled under the specific diabetic stages
discussed in Chapter 2. For any supervised machine learning model to function, the model must
be provided with ground truth labels for training (in terms of what the model will be predicting).
Since, the end goal for the mobile application is to be able to output a label depicting the stage of
diabetes which a patient is in, there is interest in developing a method to label each measurement
with respect to specific diabetic stages.
The best method to label this medical data is by a physician who is an expert in diabetes.
However, with an increasing amount of patient data being collected for the mobile application,
the workload necessary to label this data manually is unreasonable. Nevertheless, there are
computational techniques that are able to utilize a small set of labels in order to classify/label
whole datasets of patients. This chapter will the discuss how a semi-supervised autoencoder can
be employed for the task of patient labeling.
7.1 Motivation Behind the Semi-Supervised Autoencoder and Initial
Assumptions
The reason a semi-supervised autoencoder was selected for the task was because while the model
aims to improve classification of the data, the model also attempts to maintain the data’s
representation—this both helps to prevent overfitting on the training data, and it ensures that
classification is done using meaningful features and values from the training data. Nevertheless,
the semi-supervised autoencoder will only produce meaningful results on the basis that three
81
major assumptions about the data remain true: the continuity assumption, the cluster assumption,
and the manifold assumption[86].
Firstly, the continuity assumption postulates that points which are close to each other are
more likely to share a label. This means it’s expected that patients that share similar values with
respect to their input features are expected to be in similar diabetic stages. Secondly, the cluster
assumption postulates that datapoints tend to form discrete clusters, and points within the same
cluster are more likely to share a label. Like the continuity assumption, patients with similar
feature values are expected to be near each other. These patients would then form clusters,
ideally displaying the distinct diabetic stages represented within the patient population. Finally,
the manifold assumption postulates that distinctive feature information within the data lies on a
manifold of much lower dimension than the input space. This means that the patient data can be
encoded into a lower dimensional representation while still preserving important feature
distinctions. This would allow for the autoencoder to learn patterns based on these encoded
patient features, rather that learn on a noisy and complex input space.
7.2 Methods
7.2.1 Autoencoder Input Features
There are multiple sources of data by which the autoencoder can extract features from including
patient image data and questionnaire data; however, there are limitations to what data would
actually enhance the model. For instance, patient images data is very valuable, but important
features must be extracted from image data prior to model input in order to provide useful
information. While images are able to be included as direct inputs into an autoencoder, the
competing classification task and decoding task of the semi-supervised learning system would
make feature extraction less efficient. The Mobile Technology Group is currently developing
algorithms and models to effectively extract relevant features from the patient image data; these
outputs can then be incorporated into the autoencoder to improve the autoencoder’s performance.
Nevertheless, features related to image data—including thermal, iris, and retina images—will
have to be excluded for this initial analysis of the autoencoder.
The remaining patient data includes blood test measurements, visual acuity test
measurements, and patient questionnaire data, which would all be effective as direct inputs into
82
the autoencoder model in a general context. However, specific complications made certain
features unusable for this initial analysis. For instance, blood test measurements were excluded
because a majority of the patient population was undergoing diabetic treatments via
medication—as depicted in Chapter 6, medicated patients don’t have blood sugar levels that
correlate with diabetic symptoms. Since the dataset has more patients on medication than without
medication, the addition of blood test measurements would provide more noise for the model
than useful information.
Some measurement features were disregarded because a significant number of patients
had not yet taken the specified measurement tests. Without a high volume of patient data, all
analyses must be conducted using only completed patient feature vectors—this is because
missing values in the data cannot be interpolated without an abundance of other patient data to
form a baseline. Therefore, only patients who have completed all the measurement tests, which
were used as input features, could be included in the autoencoder analysis; this limited the
number of measurement tests that could be used in order to maintain a sufficient number of
patients.
Ultimately, in order to maximize the number of patients and the number of patient
features included in this initial analysis, only the Diabetes Questionnaire data was used to create
the autoencoder input features. From the Diabetes Questionnaire, 25 essential features were
extracted for use in the analysis. The inclusion of any other measurement tests would not have
allowed for a multiclass prediction—this experimental setup will be described further in Sections
7.2.2 and 7.2.3. With the inclusion of more patient data, across more measurement tests in
general, a stronger and more robust analysis can be completed in the future.
7.2.2 Ground Truth Label Formation
Despite the autoencoder being designed for the task of labelling patients, the model would be
unable to train without a small sample of patient labels—this is necessary for the supervised
aspect of the semi-supervised autoencoder. A “small sample” of labels is a loose definition: at
least one unique class label is necessary for every class for which the model is attempting to
predict, and the accuracy of the model increases with the more labels within each class.
As previously mentioned, none of the patients within the dataset are clinically labeled so
a proxy must be used to conduct this analysis. As concluded in Chapter 6, patient IDRS values
83
make a fairly good proxy for labeling patients under different diabetic states. For this analysis,
IDRS values were used to label patients under three classes: Class 0 (non-diabetic), Class 1
(prediabetic), and Class 2 (diabetic). The reason for this is three class system rather than the ideal
six class system is due to the limitations of using IDRS values—IDRS values have no distinct
metric for classifying each of the unique diabetic stage, but the values can be used to classify
patients into low risk, medium risk, and high risk groups. The low risk group corresponds to
Class 0 which spans risk scores in the range [0, 30), the medium risk group corresponds to Class
1 which spans risk scores in the range [30, 60), and the high risk group corresponds to Class 2
which spans risk scores in the range [60, 100]. All patients were labeled under one of these
designated diabetic classes based on this criteria.
7.2.3 Autoencoder Hyperparameters and Architecture
The autoencoder was designed as a five-layer feed-forward neural network, with two encoding
layers (an input layer and a hidden layer), two decoding layers (a hidden layer and an output
layer), and a softmax classification layer attached to the latent encoded layer (Figure 7-1). Given
that each patient has 25 features within their feature vector the length of the input layer is 25
units; this also means that the length of the output layer is 25 units. Each hidden layer has a
length of 10 units. The encoded layer is designed to be 2 units in length in order to produce a
two-dimensional encoding for each patient—the dimensionality of the encoding is does not
strictly need to be 2; however this setup allows for the dimensionality reduction analysis to be
conducted using a 2D cartesian plane. The length of the classification layer of the autoencoder
depends on the number of classes that the autoencoder will be classifying patients into. There are
three diabetic classes in this preliminary analysis, so the classification layer is 3 units long.
Besides the classification layer, all layers within the autoencoder were activated by the sigmoid
activation function. The autoencoder was created using Python and the TensorFlow Python
library (version 2.1.0).
84
Figure 7-1. Semi-supervised autoencoder architecture.
The autoencoder was trained using both the mean absolute error and the categorical
cross-entropy loss functions, having both lasso and ridge regularization applied as well. The
mean absolute error loss function was used for the decoding task while the categorical cross-
entropy loss function was used for the classifying task. The model was trained using mini-batch
gradient descent with 2,000 epochs and batch sizes of 10 patients.
In total, there were 212 patients included in this analysis: 2 patients in Class 0, 48 patients
in Class 1, and 162 patients in Class 2. A 50/50 split was applied to the dataset to produce a
training dataset and a testing dataset for the autoencoder’s creation and evaluation, and the
division of the patient data is highlighted in Figure 7-2. The original dataset was randomly
shuffled prior to the 50/50 split, so each patient had an equal chance of being in either the
training or testing dataset.
85
Figure 7-2. Division of 212 patients into training and testing datasets via 50/50 split.
7.2.4 Dimensionality Reduction Analysis
As discussed in Section 7.1, there are many assumptions which are being taken in the process of
developing the semi-supervised autoencoder. This dimensionality reduction analysis is designed
to confirm whether these assumptions hold true within the dataset of patients used for the
creation and evaluation of the autoencoder.
The encoded layer of the autoencoder provides the autoencoder’s representation of the
patient data in a two-dimensional space. If the assumptions hold true, and the design of the
autoencoder is effective, then the 2D patient representations produced by the autoencoder should
ideally produce three distinct clusters which correlate to the three unique ground truth labels. The
dimensionality reduction analysis determined whether the autoencoder’s representation matched
this expected result by performing a k-means clustering of the autoencoder’s 2D patient
representation. This clustering was then compared to an identical representation where clusters
assignments were based on ground truth labels—as previously mentioned, the clusters in both
representations should be identical under the correct conditions. Given that there are three unique
classes within the dataset, k-mean clustering was completed using three clusters (k=3).
To also analyze the strength of the autoencoder’s 2D patient representations, the same
dimensionality reduction analysis was conducted on 2D patient representations using PCA (a
linear dimensionality reduction method) and t-SNE (a non-linear dimensionality reduction
86
method) as controls. The average silhouette coefficient of the ground truth clusters was used to
quantifiably evaluate the quality of each methods dimensionality reduction. All of the analyses
were conducted only using the 106 patients that were in the testing dataset. The k-means
clustering, PCA, t-SNE, and silhouette coefficient operations were completed using the scikit-
learn Python library (version 0.21.3)—the perplexity of t-SNE was set to 20.
7.3 Results
7.3.1 Patient Labeling via Autoencoder
The results of the autoencoder’s classification task can be seen in Figure 7-3 and Figure 7-4
displaying the receiver operator characteristic (ROC) curve and the precision-recall curve of the
binarized multi-class autoencoder predictions, respectively. As depicted in the legend of Figure
7-3, the areas under the ROC (AUROCs) curves for the binarized class predictions depict
positive performance of the overall model. The AUC of the binarized Class 0, however, is
misleading due to the fact that there was only one patient in Class 0 within the testing dataset—
nevertheless, the model predicted the patient’s class correctly.
The legend of Figure 7-4 shows the areas under the precision recall (AUPRs) curves for
the binarized class predictions. There is mixed performance among the different classes, but
there is an apparent trend in that the more samples which the autoencoder is trained on for a
given class, the higher the autoencoder’s positive predictive value is for that class. The exception
for this is the model’s apparent predictive power for Class 0. Again, the AUPR of the binarized
Class 0 is misleading due to the fact that there was only one patient in Class 0 within the testing
dataset. The addition of more Class 0 patients in future iterations of the semi-supervised
autoencoder analysis will lead to more conclusive results.
7.3.2 Dimensionality Reduction via Autoencoder
The dimension-reduced patient features via autoencoder are shown in Figure 7-5. Upon visual
inspection, there are many differences between the clusters presented in the k-means clustering
of the data and the ground truth clusters. This could mean that either the autoencoder is unable to
properly separate the ground truth clusters due to its design, or the data is not easily separable
because one or more of the fundamental assumptions is untrue.
87
The dimension-reduced patient features via PCA and via t-SNE are shown in Figure 7-6.
These figures display a similar trend in that clusters presented in the k-means clustering do not
coincide with the ground truth clusters. Given that PCA and t-SNE are common and effective
methods to perform dimensionality reduction, these results suggest hat the problem lies in the
data rather than the autoencoder.
The average silhouette coefficients of the original dataset and each dimension-reduced
dataset are displayed in Table 7-1. This table reveals that the autoencoder was best able to cluster
the ground truth variables compared to the original dataset and the control dimensionality
reduction methods.
Figure 7-3. Receiver operating characteristic (ROC) curves of the binarized multi-class predictions of the
semi-supervised autoencoder.
88
Figure 7-4. Precision-recall curves of the binarized multi-class predictions of the semi-supervised
autoencoder.
89
Figure 7-5. Dimensionality reduction of 106 patient feature vectors via autoencoder. (A) K-means
clustering of the dimensionality reduction via autoencoder; k=3. (B) Ground truth clustering of the
dimensionality reduction via autoencoder.
90
Figure 7-6. Dimensionality reduction of 106 patient feature vectors via controls (PCA and t-SNE). The top
two plots display the k-means clustering of the dimensionality reduction (A) via PCA and (B) via t-SNE;
k=3. The bottom two plots display the ground truth clustering of the dimensionality reduction (C) via PCA
and (D) via t-SNE.
91
Dimensionality Reduction Method
Dimensionality of Data Average Silhouette
Coefficient
None 25-dimensional -0.0059
Semi-Supervised Autoencoder 2-dimensional 0.1864
PCA 2-dimensional -0.0798
t-SNE 2-dimensional -0.1043
Table 7-1. Average silhouette coefficients of the ground truth clusters within the original dataset and the
various dimension-reduced patient representations.
7.4 Discussion
The results of the autoencoder’s patient labeling analysis display that model is well-suited to
label patient data despite limited resources. The model shows promise in terms of handling the
multi-class labeling task; however, further analysis is necessary when significantly more patient
data under the other diabetic classes becomes available. The performance of the autoencoder’s
classification implies that there are patterns within the Diabetes Questionnaire data that the
autoencoder was able to leverage for patient labeling, despite the lack patterns observed in the
data in the correlation analyses conducted in Chapter 6. The addition of more data from other
measurement tests may improve the predictive power of the autoencoder in future analyses.
The results of the autoencoder’s dimensionality reduction analysis show that the
autoencoder is unable to separate the patient data into ground truth labels in the lower
dimensional space. However, all the dimensionality reduction methods seem to struggle in
executing this task, which hints that the underlying problem may be within the data. The results
imply that the data does not align with one or more of the assumptions presented in Section 7.1,
meaning that there is too much variation among the patient features for patients within the same
ground truth cluster. This conclusion aligns with conclusions derived in Chapter 6, as the
Diabetes Questionnaire dataset had so much variation to no clear patterns were observable.
The results of the autoencoder’s patient labeling analysis and the results of the
autoencoder’s dimensionality reduction analysis seemingly conflict, because the former implies
that there are patterns in the Diabetes Questionnaire data while the latter implies that there is
mainly noise in the Diabetes Questionnaire data. This contradiction can be explained by the fact
92
that the autoencoder’s classification task can be somewhat successful if the model is able to pick
up on at least one feature that is indicative of the ground truth labels. However, the autoencoder’s
encoding task is only successful if there are patterns across multiple features that are indicative
of the ground truth labels. This conclusion can also be derived from the autoencoder’s encoding
function, which appears to perceive that the data can be split using only a single dimension—this
is why the autoencoder outputs patient datapoints along a diagonal. The only reason the
autoencoder’s 2D encoding would display a 1D solution is if the autoencoder’s classifier
determined that only one feature in the dataset which was indicative of ground truth labels.
Nevertheless, the inclusion of other patient measurements—especially the score from image-
based and PPG-based predictive models—is likely to improve the performance of both the
autoencoder’s tasks.
94
Chapter 8
Conclusion and Future Work
8.1 Contributions of Work
8.1.1 Exploration into the Biological Characteristics of Diabetes and Non-Invasive
Technologies to Detect Them
This thesis discussed the pathology and etiology of CMS with a focus on diabetes, enabling
guided research for the development of new tools—or the repurposing current tools—that detect
diabetic symptoms. Some non-invasive detection tools were explored throughout this thesis as
well, highlighting the potential of non-invasive diabetes diagnostics in easing the burden on the
global health care system. The inclusion of these tools, and the creation of predictive models to
calibrate these tools for the diabetes classification task, would greatly improve the current
diabetes screening mobile application.
8.1.2 Image Quality Metrics for the Improvement of Image-Based Predictive Models
This thesis included a discussion about analyses performed on image quality metrics which can
be employed to screen for high quality patient images. Adding a screen as an additional
preprocessing step would ultimately improve the extraction of important image-based features,
and potentially improve the predictive power of the Mobile Technology Group’s image-based
predictive models.
8.1.3 Preliminary Semi-Supervised Autoencoder for Patient Labeling
This thesis included a discussion about the creation and respective analyses performed on a semi-
supervised autoencoder for the task of labeling patients within the Mobile Technology Group’s
patient database. The discussion included information regarding the complications of using blood
95
sugar measurements with medicated patients, and the need to use patient IDRS values as a proxy
for base the autoencoder’s ground truth labels. The classification aspect of the autoencoder
performed moderately well, with a class-average area under the receiver operator characteristic
(AUROC) of 0.845, and a class-average area under the precision-recall (AUPR) curve of 0.789.
The encoding function developed by the autoencoder was not effective in separating the patient
data into ground truth clusters; nevertheless, the autoencoder was still the most effective method
for clustering the ground truth labels with a silhouette coefficient of 0.1864. With the results of
this preliminary analysis, the autoencoder shows great promise in becoming an effective tool for
patient labeling.
8.2 Future Work
Analyses which explored batches of patient data (Chapter 6 and Chapter 7) would greatly be
improved in terms of statistical power with the addition of more patient data across more
measurement tests; this would enable more meaningful conclusions to be derived from these
analyses. In this vein, it would be beneficial to expand the Diabetes Questionnaire analysis to
other questionnaires once a significant number of patients have taken these measurements—this
is in order to guarantee that analyses are robust. Similarly, it would be important in the future to
collect more data from patients whom are not undergoing medication in order to conduct data
correlation analyses using blood testing measurements. Blood tests are the gold standard
measurements for measuring diabetes progression and enable a more accurate separation of
patients into different diabetic stages.
The culmination of more patient data across different measurement tests, with a balance
of medicated and non-medicated patients among each measurement test, would also greatly
improve the predictive capability of the semi-supervised autoencoder—the autoencoder would be
able to explore more patient features and recognize stronger patterns in the data which are
indicative of each diabetic stage irrespective of the use of medication. The functionality of the
autoencoder would also be truly realized once a subset of patients are classified via direct
physician examination. The proxies used for the preliminary autoencoder defeat the purpose of
the autoencoder—if ground truth labels are created based on a single feature, there would be no
need for a model as long as the patient data for that single feature continues to be collected. The
96
use of blood sugar measurements for labeling would have been viable had the target patient
demographic solely been patients not on medication.
The image quality metrics would benefit from field testing once the predictive models for
each patient image type are finalized. These future analyses would consist of examining whether
models trained using images screened by the quality metrics improved the performance of the
predictive models—this is with respect to models trained using only the excised images as well
as models trained using all patient images. Future work would also include creating quality
metrics for photoplethysmography (PPG) measurements and questionnaire data—the reason
questionnaire data would need a quality metric is because questionnaire can contain aleatoric
error due to misinformation or incorrectly input values. Once quality metrics are developed for
all the relevant measurement tests, it would be useful to develop and implement data correction
methods which improve data of poor quality—this would also increase the number of patients
available for use within various analyses.
Data collection for the mobile application can be improved by using additional non-
invasive technologies as measurements tests—many of which are discussed in Chapter 3. These
diagnostic tests explore different aspects of diabetes pathogenesis and etiology, enabling better
characterization of prediabetic and diabetic stages. The inclusion of measurements from these
additional non-invasive diabetes screening tools could greatly improve the autoencoder’s patient
labeling process, as well as the Bayesian network model developed by Shivani Chauhan—a prior
Masters student of the Mobile Technology Group[6].
8.3 Larger Impact
The work presented in this thesis has been to evaluate and enhance the aspects of data collection,
data processing, and model creation for the Mobile Technology Group’s non-invasive diabetes
screening mobile application. Nevertheless, many of the technologies and analyses presented
throughout this thesis can be scaled to other projects, both within and outside of the Mobile
Technology Group, in the pursuit of developing technologies for non-invasive diagnoses. A
higher goal is to be able to leverage these tools to be able to diagnose the progression of various
umbrella diseases, such as cardiometabolic syndrome (CMS), in order to aid a wider population
of individuals who are still unable to receive proper medical care. Ultimately, it is the hope of
97
our group that this work will galvanize further research into the intricacies of interconnected
conditions in order to bring innovation to biomedical research and technology.
99
Bibliography
[1] World health statistics. World Health Organization, World Health Organization.
www.who.int/gho/world-health-statistics.
[2] Loudenback Tanza. The average cost of healthcare in 21 different countries. Business
Insider, Business Insider. 2019 Mar 7. www.businessinsider.com/personal-finance/cost-of-
healthcare-countries-ranked-2019-3.
[3] Diabetes. World Health Organization, World Health Organization. 2018 Oct 30.
www.who.int/news-room/fact-sheets/detail/diabetes.
[4] Bommer C, et al. Global economic burden of diabetes in adults: projections from 2015 to
2030. Diabetes Care, American Diabetes Association. 2018 May.
https://care.diabetesjournals.org/content/41/5/963.
[5] Yu Tania. Iris imaging for health diagnostics. Master’s thesis, MIT. 2018.
[6] Chauhan Shivani. A mobile platform for non-invasive diabetes screening. Master’s thesis,
MIT. 2019.
[7] Malik Vasanti S, et al. Global obesity: trends, risk factors and policy implications. Nature
Reviews. Endocrinology, U.S. National Library of Medicine. 2013 Jan.
doi.org/10.1038/nrendo.2012.199.
[8] Castro Jonathan P, et al. Cardiometabolic syndrome: pathophysiology and treatment.
Current Hypertension Reports, U.S. National Library of Medicine. 2003 Oct.
doi.org/10.1007/s11906-003-0085-y.
[9] Saljoughian Manouchehr. Cardiometabolic syndrome: a global health issue. U.S.
Pharmacist – The Leading Journal in Pharmacy. 2017 Feb 16.
www.uspharmacist.com/article/cardiometabolic-syndrome-a-global-health-issue.
[10] What is cardiometabolic disease and how is it different from cardiovascular disease?
Health & Nutrition Letter – Your Guide to Living Healthier Longer. 2018 Mar.
www.nutritionletter.tufts.edu/issues/14_3/ask-experts/Q-What-is-cardiometabolic-disease-
and-how_2308-1.html.
[11] Holland Kimberly. 12 leading causes of death in the United States. Medically Reviewed by
Deborah Weatherspoon, Healthline, Healthline Media. 2018 Nov 1.
www.healthline.com/health/leading-causes-of-death.
[12] Statistics about diabetes. American Diabetes Association. 2018 Mar 22.
www.diabetes.org/diabetes-basics/statistics/.
100
[13] The Editors of Encyclopaedia Britannica. Islets of Langerhans. Encyclopædia Britannica,
Encyclopædia Britannica, Inc. 2018 July 11. www.britannica.com/science/islets-of-
Langerhans.
[14] Weir Gordon C, Bonner-Weir Susan. Five stages of evolving beta-cell dysfunction during
progression to diabetes. Diabetes, American Diabetes Association. 2004 Dec 1.
diabetes.diabetesjournals.org/content/53/suppl_3/S16.
[15] Tabák Adam G, et al. Prediabetes: a high-risk state for diabetes development. Lancet
(London, England), U.S. National Library of Medicine. 2012 June 16.
doi.org/10.1016/S0140-6736(12)60283-9.
[16] Ramachandran A. Know the signs and symptoms of diabetes. Indian J Med Res, 2014 Nov;
140(5):579‐581. www.ncbi.nlm.nih.gov/pmc/articles/PMC4311308/.
[17] Hyperinsulinemia. Diabetes.co.uk – The Global Diabetes Community. 2019 Jan 15.
www.diabetes.co.uk/hyperinsulinemia.html.
[18] Saini Vandana. Molecular mechanisms of insulin resistance in type 2 diabetes mellitus.
World Journal of Diabetes, Baishideng Publishing Group Co., Limited. 2010 July 15;
1(3):68‐75. doi.org/10.4239/wjd.v1.i3.68
[19] Mayo Clinic Staff. Prediabetes. Mayo Clinic, Mayo Foundation for Medical Education and
Research. 2017 Aug 2. www.mayoclinic.org/diseases-conditions/prediabetes/symptoms-
causes/syc-20355278.
[20] Sinha Sunil K. Hyperinsulinism workup: laboratory studies, imaging studies, other tests.
Edited by Stephen Kemp, Medscape. 2019 Feb 2. emedicine.medscape.com/article/921258-
workup.
[21] Thiruvengadam, J., Anburajan, M., Menaka, M., Venkatraman, B. Potential of thermal
imaging as a tool for prediction of cardiovascular disease. Journal of Medical Physics.
2014 Apr; 39(2):98‐105. doi.org/10.4103/0971-6203.131283.
[22] Dekker Martijn A M den, et al. Skin autofluorescence, a non-invasive marker for AGE
accumulation, is associated with the degree of atherosclerosis. PloS One, Public Library of
Science. 2013 Dec 23. doi.org/10.1371/journal.pone.0083084.
[23] Ferrante Anthony W. The immune cells in adipose tissue. Diabetes, Obesity & Metabolism,
U.S. National Library of Medicine. 2013 Sep; 15 Suppl 3(0 3):34‐38.
doi.org/10.1111/dom.12154.
[24] WebMD Medical Reference. Are diabetes and inflammation connected? Medically
Reviewed by Michael Dansinger, WebMD, WebMD. 2017 June 21.
www.webmd.com/diabetes/type-2-diabetes-guide/inflammation-and-diabetes#1.
[25] Shoelson Steven E, et al. Inflammation and insulin resistance. The Journal of Clinical
Investigation. 2006 July 3; 116(7):1793‐1801. doi.org/10.1172/JCI29069.
[26] Bornfeldt Karin E, Tabas Ira. Insulin resistance, hyperglycemia, and atherosclerosis. Cell
Metabolism. 2011 Nov 2; 14(5):575‐585. doi.org/10.1016/j.cmet.2011.07.015
[27] Atherosclerosis. National Heart Lung and Blood Institute, U.S. Department of Health and
Human Services, www.nhlbi.nih.gov/health-topics/atherosclerosis.
101
[28] University of Rochester Medical Center. How diabetes drives atherosclerosis.
ScienceDaily, ScienceDaily. 2008 Mar 17.
www.sciencedaily.com/releases/2008/03/080313124430.htm.
[29] What is type 1 diabetes? Joslin Diabetes Center, 2019.
www.joslin.org/info/what_is_type_1_diabetes.html.
[30] Cantley James, Ashcroft Frances M. Q&A: insulin secretion and type 2 diabetes: why do β-
cells fail? BMC Biology. 2015 May 16. doi.org/10.1186/s12915-015-0140-6
[31] Hess-Fischl Amy. Hyperglycemia: when your blood glucose level goes too high. Medically
Reviewed by Brigid Gregg, EndocrineWeb. 2018 Sep 7.
www.endocrineweb.com/conditions/hyperglycemia/hyperglycemia-when-your-blood-
glucose-level-goes-too-high.
[32] Diagnosing diabetes and learning about prediabetes. American Diabetes Association. 2016
Nov 21. www.diabetes.org/diabetes-basics/diagnosis/.
[33] Prediabetes. Wikipedia, Wikimedia Foundation. Accessed: 2019 Apr 10.
en.wikipedia.org/wiki/Prediabetes.
[34] Mayo Clinic Staff. Hyperglycemia in diabetes. Mayo Clinic, Mayo Foundation for Medical
Education and Research. 2018 Nov 3. www.mayoclinic.org/diseases-
conditions/hyperglycemia/symptoms-causes-syc-20373631.
[35] Aronson Doron, Rayfield Elliot J. How hyperglycemia promotes atherosclerosis: Molecular
Mechanisms. Cardiovascular Diabetology, BioMed Central. 2002 Apr 8.
cardiab.biomedcentral.com/articles/10.1186/1475-2840-1-1.
[36] Jezovnik Mateja K, Poredos Pavel. Oxidative stress and atherosclerosis. European Society
of Cardiology. 2007 Oct 9. www.escardio.org/Journals/E-Journal-of-Cardiology-
Practice/Volume-6/Oxidative-stress-and-atherosclerosis-Title-Oxidative-stress-and-
atheroscleros.
[37] Wisse Brent. Diabetic ketoacidosis. Medically Reviewed by David Zieve, MedlinePlus,
U.S. National Library of Medicine. 2018 Jan 16. medlineplus.gov/ency/article/000320.htm
[38] Galan Nicole. 9 early warning signs and symptoms of type 2 diabetes. Medically Reviewed
by Maria Prelipcean, Medical News Today, MediLexicon International. 2018 Sep 26.
www.medicalnewstoday.com/articles/323185.php.
[39] Mayo Clinic Staff. Diabetes. Mayo Clinic, Mayo Foundation for Medical Education and
Research. 2018 Aug 8. www.mayoclinic.org/diseases-conditions/diabetes/symptoms-
causes/syc-20371444.
[40] Mayo Clinic Staff. Heart disease. Mayo Clinic, Mayo Foundation for Medical Education
and Research. 2018 Mar 22. www.mayoclinic.org/diseases-conditions/heart-
disease/symptoms-causes/syc-20353118.
[41] Diabetes, gum disease, & other dental problems. National Institute of Diabetes and
Digestive and Kidney Diseases, U.S. Department of Health and Human Services. 2014 Sep
1. www.niddk.nih.gov/health-information/diabetes/overview/preventing-problems/gum-
disease-dental-problems.
102
[42] Mayo Clinic Staff. Type 2 diabetes. Mayo Clinic, Mayo Foundation for Medical Education
and Research. 2019 Jan 9. www.mayoclinic.org/diseases-conditions/type-2-
diabetes/diagnosis-treatment/drc-20351199.
[43] Al-Lawati J A. Diabetes mellitus: a local and global public health emergency! Oman Med
J. 2017; 32(3):177-179. doi:10.5001/omj.2017.34.
[44] WHO guidelines on hand hygiene in health care: first global patient safety challenge clean
care is safer care. Geneva: World Health Organization, The burden of health care-
associated infection. 2009; 3. Available from: www.ncbi.nlm.nih.gov/books/NBK144030/.
[45] Healthcare-acquired infections (HAIs). PatientCareLink. Massachusetts Health & Hospital
Association. Accessed: 2020 May 5. patientcarelink.org/improving-patient-care/healthcare-
acquired-infections-hais/.
[46] Ring E F J, Ammer K. Infrared thermal imaging in medicine. Institute of Physics and
Engineering in Medicine. 2012. doi.org/10.1088/0967-3334/33/3/R33.
[47] Ring Francis. Thermal Imaging Today and Its Relevance to Diabetes. Journal of Diabetes
Science and Technology. Diabetes Technology Society. 2010 July 1.
doi.org/10.1177/193229681000400414.
[48] Brånemark P, Fagerberg S, Langer L, et al. Infrared thermography in diabetes mellitus a
preliminary study. Diabetologia. 1967; 3:529-532. doi.org/10.1007/BF01213572.
[49] Fushimi H, Inoue T, Nishikawa M, Matsuyama Y, Kitagawa J. A new index of autonomic
neuropathy in diabetes mellitus: heat stimulated thermographic patterns. Diabetes Res.
Clin. Pract. 1985; 1(2):103-107. doi.org/10.1016/S0168-8227(85)80035-8.
[50] Fokkens Bernardina T, Smit Andries J. Skin fluorescence as a clinical tool for non-invasive
assessment of advanced glycation and long-term complications of diabetes. Glycoconjugate
Journal, Springer US. 2016 June 10. doi.org/10.1007/s10719-016-9683-1.
[51] Cahn F, Burd J, Ignotz K, Mishra S. Measurement of lens autofluorescence can distinguish
subjects with diabetes from those without. J Diabetes Sci Technol. 2014 Jan; 8(1):43-49.
doi.org/10.1177/1932296813516955.
[52] Steenbeke M, et al. UV fluorescence-based determination of urinary advanced glycation
end products in patients with chronic kidney disease. Diagnostics. 2020.
doi.org/10.3390/diagnostics10010034.
[53] Paolillo F R, Mattos V S, de Oliveira A O, Guimarães F E G, Bagnato V S, de Castro Neto
J C. Noninvasive assessments of skin glycated proteins by fluorescence and Raman
techniques in diabetics and nondiabetics. J Biophotonics. 2019 Jan; 12(1):e201800162.
doi.org/10.1002/jbio.201800162.
[54] Olson B P, Matter N I, Ediger M N, Hull E L, Maynard J D. Noninvasive skin fluorescence
spectroscopy is comparable to hemoglobin A1c and fasting plasma glucose for detection of
abnormal glucose tolerance. J Diabetes Sci Technol. 2013 July 1; 7(4):990-1000.
doi.org/10.1177/193229681300700422.
[55] Type 2 diabetes risk test. American Diabetes Association. www.diabetes.org/risk-test.
103
[56] Tentolouris Nicholas, et al. Screening for HbA1c-defined prediabetes and diabetes in an at-
risk Greek population: performance comparison of random capillary glucose, the ADA
diabetes risk test and skin fluorescence spectroscopy. Diabetes Research and Clinical
Practice. 2013 Jan 28.
www.sciencedirect.com/science/article/abs/pii/S016882271300003X?via%3Dihub.
[57] Goh J K, Cheung C Y, Sim S S, Tan P C, Tan G S, Wong T Y. Retinal imaging techniques
for diabetic retinopathy screening. J Diabetes Sci Technol. 2016 Feb 1; 10(2):282-294.
doi.org/10.1177/1932296816629491.
[58] Diabetic retinopathy. National Eye Institute. Accessed: 2020 May 6.
www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/diabetic-retinopathy.
[59] Diabetic retinopathy. American Optometric Association. Accessed: 2020 May 6.
www.aoa.org/patients-and-public/eye-and-vision-problems/glossary-of-eye-and-vision-
conditions/diabetic-retinopathy.
[60] Jenson Bernard, et al. Iridology simplified: an introduction to the science of iridology and
its relation to nutrition. Iridologists International. 1980.
[61] Ma Lin, Li Naimin. Texture feature extraction and classification for iris diagnosis.
SpringerLink, Lecture Notes in Computer Science, vol 4901. Springer, Berlin, Heidelberg,
2008 Jan 4. doi.org/10.1007/978-3-540-77413-6_22.
[62] Ernst E. Iridology: not useful and potentially harmful. Archives of Ophthalmology. 2000
Jan; 118(1):120-121. doi.org/10.1001/archopht.118.1.120.
[63] Hussein S E, Hassan O A, Granat M H. Assessment of the potential iridology for
diagnosing kidney disease using wavelet analysis and neural networks. Biomedical Signal
Processing and Control. 2013 Oct; 8(6):534-541. doi.org/10.1016/j.bspc.2013.04.006.
[64] Samant Piyush, Agarwal Ravinder. Comparative analysis of classification based algorithms
for diabetes diagnosis using iris images. Journal of Medical Engineering & Technology.
2018 Jan 4; 42(1):35-42. doi.org/10.1080/03091902.2018.1412521.
[65] Cutolo M, Sulli A, Secchi M E, Paolino S, Pizzorni C. Nailfold capillaroscopy is useful for
the diagnosis and follow-up of autoimmune rheumatic diseases. A future tool for the
analysis of microvascular heart involvement? Rheumatology (Oxford). 2006 Oct; 45 Suppl
4iv43-6. doi.org/10.1093/rheumatology/kel310.
[66] Jackson William F. Microcirculation. Muscle. 2012 June 26; 2:1197-1206.
doi.org/10.1016/B978-0-12-381510-1.00089-2.
[67] Maldonado G, Guerreroa R, Paredes C, Ríosb C. Nailfold capillaroscopy in diabetes
mellitus. Microvascular Research. 2017 July 6; 112:41-46.
doi.org/10.1016/j.mvr.2017.03.001.
[68] Bakirci S, Celik E, Acikgoz S B, et al. The evaluation of nailfold videocapillaroscopy
findings in patients with type 2 diabetes with and without diabetic retinopathy. North Clin
Istanb. 2018 Oct 31; 6(2):146-150. doi.org/10.14744/nci.2018.02222.
[69] Suma K V, Manjunath Nitishi, Indira K, Rao Bheemsain. Segmentation of nailfold
capillary images for study of microcirculation in diabetes mellitus in Indian population.
Elsevier Publications. 2014 July.
104
www.researchgate.net/publication/318820243_Segmentation_of_Nailfold_Capillary_Imag
es_for_Study_of_Microcirculation_in_Diabetes_Mellitus_in_Indian_Population.
[70] Davies Justine I, Struthers Allan D. Beyond blood pressure: pulse wave analysis – a better
way of assessing cardiovascular risk? Future Cardiology. 2005 Nov 24; 1(1):69-78.
doi.org/10.1517/14796678.1.1.69.
[71] Gajdova J, Karasek D, Goldmannova D, Krystynik O, Schovanek J, Vaverkova H, Zadrazil
J. Pulse wave analysis and diabetes mellitus. A systematic review. Biomed Pap Med Fac
Univ Palacky Olomouc Czech Repub. 2017 Sep 26; 161(3):223-233.
doi.org/10.5507/bp.2017.028.
[72] O'Rourke M F, Pauca A, Jiang X J. Pulse wave analysis. Br J Clin Pharmacol. 2001 June;
51(6):507-522. doi.org/10.1046/j.0306-5251.2001.01400.x.
[73] Mikael Luana de Rezende, et al. Vascular aging and arterial stiffness. Arquivos Brasileiros
de Cardiologia. 2017 June 29; 109(3):1678-4170. doi.org/10.5935/abc.20170091.
[74] Zhang M, Bai Y, Ye P, Luo L, Xiao W, Wu H, Liu D. Type 2 diabetes is associated with
increased pulse wave velocity measured at different sites of the arterial system but not
augmentation index in a Chinese population. Clin Cardiol. 2011 Oct 12; 34(10):622-7.
doi.org/10.1002/clc.20956.
[75] Lacy P S, O'Brien D G, Stanley A G, et al. Increased pulse wave velocity is not associated
with elevated augmentation index in patients with diabetes. J Hypertens. 2004
Oct; 22:1937-1944. doi.org/10.1097/00004872-200410000-00016.
[76] Breath Testing. Johns Hopkins Division of Gastroenterology and Hepatology. Accessed:
2020 May 5.
www.hopkinsmedicine.org/gastroenterology_hepatology/clinical_services/specialty_servic
es/breath_testing.html.
[77] Minh T D C, Blake DR, Galassetti PR. The clinical potential of exhaled breath analysis for
diabetes mellitus. Diabetes Res Clin Pract. 2012 Mar 10; 97(2):195-205.
doi.org/10.1016/j.diabres.2012.02.006.
[78] Zhang D, Guo D, Yan K. A breath analysis system for diabetes screening and blood
glucose level prediction. Breath Analysis for Medical Applications. 2017 June 24;
1(1):259-279. link.springer.com/chapter/10.1007/978-981-10-4322-2_14.
[79] Wang C, Mbi A, Shepherd M. A study on breath acetone in diabetic patients using a cavity
ringdown breath analyzer: exploring correlations of breath acetone with blood glucose and
glycohemoglobin A1C. IEEE Sensors Journal. 2010 Jan; 10(1):54-63.
doi.org/10.1109/JSEN.2009.2035730.
[80] Tanda N, Hinokio Y, Washio J, Takahashi N, Koseki T. Breath acetone in type 1 and type
2 diabetes mellitus. In: Sasaki K, Suzuki O, Takahashi N. (eds) Interface Oral Health
Science. Springer, Tokyo. 2012; 1:212-214. doi.org/10.1007/978-4-431-54070-0_59.
[81] Fast Fourier transform. Wikipedia, Wikimedia Foundation. Accessed: 2020 May 10.
en.wikipedia.org/wiki/Fast_Fourier_transform.
[82] Discrete Laplace operator. Wikipedia, Wikimedia Foundation. Accessed: 2020 May 10.
en.wikipedia.org/wiki/Discrete_Laplace_operator.
105
[83] New ACC/AHA high blood pressure guidelines lower definition of hypertension.
American College of Cardiology. 2017 Nov 13. www.acc.org/latest-in-
cardiology/articles/2017/11/08/11/47/mon-5pm-bp-guideline-aha-2017.
[84] Dudeja P, Singh G, Gadekar T, Mukherji S. Performance of Indian Diabetes Risk Score
(IDRS) as screening tool for diabetes in an urban slum. Med J Armed Forces India. 2017
Apr; 73(2):123‐128. doi.org/10.1016/j.mjafi.2016.08.007.
[85] Metformin: side effects, dosage & uses. Medically Reviewed by Sanjai Sinha, Drugs.com.
Accessed: 2020 May 13. www.drugs.com/metformin.html.
[86] Semi-supervised learning. Wikipedia, Wikimedia Foundation. Accessed: 2020 May 12.
en.wikipedia.org/wiki/Semi-supervised_learning.