Prof. PK.Ragunath
Sri Ramachandra University
Department of Bioinformatics
INTRODUCTION
Chronic obstructive pulmonary disease (COPD) and Asthma are the
most frequent causes of respiratory ill health.
They cover all ages and several cases of comorbidity between the
two conditions have been reported.
Asthma and COPD are different diseases each with a unique natural
history and pathophysiology
Differentiating the underlying cause of their symptoms is difficult
and often leads to generalized treatment protocols
ASTHMA
Asthma is characterized by airflow obstruction.
300 million people of all ages and all ethnic background worldwide have Asthma. Globally 2,50,000 people die of Asthma every year (WHO, 2013-14).
India has an estimated 15-20 million asthmatics (ICMR,2012).
It is a condition due to inflammation of the air passages that which obstruct the flow of air in and out of the lungs. Affects the sensitivity of the nerve endings in the airways so they become easily irritated
Characterized by recurrent attacks of breathlessness and wheezing, which vary in severity and frequency.
According to the National Asthma Education and Prevention Program (NAEPP) and the Global Initiative for Asthma -- Asthma is additionally typified by variable and recurring symptoms, bronchial hyperresponsiveness and underlying inflammation of the airways.
COPD
Chronic obstructive pulmonary disease (COPD) is a “preventable and treatable disease
with some significant extrapulmonary effects that may contribute to the severity in
individual patients. - American Thoracic Society(ATS)
Its pulmonary component is characterized by airflow limitation that is not fully
reversible.
The airflow limitation is usually progressive and associated with abnormal inflammatory
response of the lung to noxious particles or gasses.
Chronic Obstructive Pulmonary Disease (COPD) affects 210 million people (WHO,2013-
14)
Conditions that contribute to COPD : Mucous hyper secretion with enlargement of tracheo-bronchial sub mucosal glands and
a disproportionate increase of mucous acini. Inflammation of bronchioli, mucous metaplasia and hyperplasia, with increased
intralumenal mucus, increased wall muscle, fibrosis and airway stenoses. Respiratory bronchiolitis is a critically important early lesion which may predispose to
the development of centrilobular emphysema. The severity of destruction of alveolar wall in emphysema appears to be the most
important determinant of chronic deterioration of airflow.
Asthma and Chronic Obstructive Pulmonary Disease (COPD) are complex conditions with imprecise definitions
Definitive morphological comparisons difficult.
Broadly - the airways in asthma are occluded by tenacious plugs of
exudate and mucus
Fragility of airway surface epithelium, thickening of the reticular layer
beneath the epithelial basal lamina and bronchial vessel congestion and
edema.
Increased inflammatory infiltrate comprising ‘activated’ lymphocytes
and eosinophils with release of granular content in the latter
There is enlargement of bronchial smooth muscle particularly in
medium sized bronchi
METABOLOMICS
Metabolomics involves quantitative measurement of time-related multi-
parametric metabolic response of living systems to pathophysiological stimuli or
change in gene expression profile - Daviss, Bennett (April 2005)
Comprehensive and simultaneous systematic determination of metabolite levels
in the metabolome and their changes over time as a consequence of stimuli
Involves quantitative measurement of time-related multi-parametric metabolic
response of living systems to pathophysiological stimuli or change in gene
expression profile.
Steps in Analysing of metabolomics data :
Efficient and unbiased separation of analytes
Detection
Identification and quantification
LINEAR REGRESSION
In correlation, the two variables are treated as equals.In
regression, one variable is considered independent (=predictor)
variable (X) and the other the dependent (=outcome) variable Y.
PRINCIPLE COMPONENT ANALYSIS
Multivariate analysis based on projection methods
Main tool used in chemometrics
Extract and display the systematic variation in the data
Each Principle Component (PC) is a linear combination of
the original data parameters
Each successive PC explains the maximum amount of
variance possible, not accounted for by the previous PCs
PCs Orthogonal to each other
MACHINE LEARNING - MULTILAYER PERCEPTRON
The Multilayer Perceptron – is a type of machine learning
o Used extensively for the solution of a number of different problems -
pattern recognition and interpolation
The basic Multilayer Perceptron learning algorithm is outlined below.
1. Initialize the network, with all weights set to random numbers between -1
and +1.
2. Present the first training pattern, and obtain the output.
3. Compare the network output with the target output.
4. Propagate the error backwards.
AIM & OBJECTIVES OF THE STUDY
To build a machine learning approach based model to classify metabolites associated with Asthma and COPD.
Objectives : To enlist the metabolites associated with Asthma & COPD
To generate molecular descriptors for all the chosen metabolite
To perform feature selection using Linear Regression to identify best descriptors
To perform feature extraction using PCA to generate Component Matrices
To build a machine learning approach (Multi Layer Perceptron) models to classify metabolites associated with Asthma and COPD based on both based on input data from Linear Regression and PCA and compare the efficiency of the 2 models.
WORK FLOW
Text-mine to identify metabolites associated with COPD & Asthma
Generate molecular descriptors for all the chosen metabolites using Descriptor Calculation Wizard of
Molegro Virtual Docker
Feature Selection using Linear
Regression to identify best descriptors
Feature Extraction using PCA to
generate Component
Matrices
Generate a model to classify metabolites associated with Asthma & COPD by employing machine learning by
Multilayer Perceptron
TEXT MINING
Condition Decreased Increased Total
Asthma 21 5 26
COPD 18 28 46
Common 0 3 3
Total 39 36 75
A comprehensive literature mining of all eligible studies on IDC gene
expression was carried out by searching the PubMed (as on March
2015) based on the following key terms
M AND ((“C” OR “c“) AND (H OR h))
M AND (“A” AND (H OR h))
Where,
M = Gene expression; C = Chronic Obstructive Pulmonary Disease; c = COPD; A = Asthma; H = Homo sapiens; h = human
S.No: Descriptor Category Descriptor - Details
1 Csp2 Simple Descriptors Number of Sp2 hybridized carbon atoms in ligand
2 HD-HD-Min Chemical FeatureDistance Matrix
Hydrogen Donor-Hydrogen Donor-Minimum Distance
3 R-R-Min Chemical FeatureDistance Matrix
Ring-Ring-Minimum Distance
4 Aro Simple Descriptors No: of aromatic groups
5 OH Simple Descriptors No: of hydroxyl groups
6 HD-R-Mean Chemical FeatureDistance Matrix
Hydrogen Donor – Ring – Mean Distance
The CFDM descriptors are obtained by calculating the minimum, maximum, and mean topological distance between all pairs of chemical features. The topological distance is defined as the smallest number of covalent bonds between the two features.
The CFDM descriptors are obtained by calculating the minimum, maximum, and mean topological distance between all pairs of chemical features. The topological distance is defined as the smallest number of covalent bonds between the two features.
Feature Selection Using Linear Regression To Identify Best Descriptors
SCREE PLOT DEPICTING PRINCIPLE COMPONENTS
Only the top 7 principle components which had
Eigen value of > 1 were selected
SCATTER PLOT OF PC1 VS PC2
Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on features selected by Linear Regression
Case Processing Summary
NPercen
tSampl
eTrainin
g51 70.8%
Testing 21 29.2%
Valid 72 100.0%
Excluded 3 Total 75
Classification
Sample
Observed
Predicted
.00 1.00
Percent
CorrectTrainin
g.00 15 6 71.4%1.00 1 29 96.7%
Overall Percent
31.4%
68.6% 86.3%
Testing
.00 3 2 60.0%1.00 1 15 93.8%
Overall Percent
19.0%
81.0% 85.7%
Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on Feature Extraction by PCA
Case Processing Summary
N PercentSample Trainin
g51 70.8%
Testing 21 29.2%
Valid 72 100.0%
Excluded 3 Total 75
Classification
SampleObserved
Predicted
0 1Percent Correct
Training 0 16 5 76.2%1 0 30 100.0%Overall Percent
31.4%
68.6% 90.2%
Testing 0 4 1 80.0%1 1 15 93.8%Overall Percent
23.8%
76.2% 90.5%
Dependent Variable: Grouping
ROC plot
of
sensitivity
versus
Specificity
Insights From The Result
The Multi Layer Perceptron model to classify metabolites associated with
Asthma and COPD based on Feature Extraction by PCA showed a
greater efficiency of >90 % in comparison to the model based on
feature selection by linear regression which showed a efficiency of
~87%
Scope : Similar metabolite classifying models can be built using Radial
Basis Function and it’s efficiency can be compared with the current
model.
A model for classifying metabolites associated with comorbid conditions
can be attempted in future
Acknowledgement
Our sincere thanks to Molegro Virtual Docker (MVD) for
providing a limited period trial version with which Molecu.es
descriptors were calculated
We thank Dr. Baljit Ubhi, Ph.D., Global Technical Marketing
Manager Metabolomics & Lipidomics for providing critical Insights
on applications of metabolomics in COPD without which this study
would have been impossible.
My Research TeamSri
Ramachand
ra
University
Thank You