+ All Categories
Home > Documents > Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Date post: 13-Jan-2016
Category:
Upload: isabel-banks
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics
Transcript
Page 1: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Prof. PK.Ragunath

Sri Ramachandra University

Department of Bioinformatics

Page 2: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

INTRODUCTION

Chronic obstructive pulmonary disease (COPD) and Asthma are the

most frequent causes of respiratory ill health.

They cover all ages and several cases of comorbidity between the

two conditions have been reported.

Asthma and COPD are different diseases each with a unique natural

history and pathophysiology

Differentiating the underlying cause of their symptoms is difficult

and often leads to generalized treatment protocols

Page 3: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

ASTHMA

Asthma is characterized by airflow obstruction.

300 million people of all ages and all ethnic background worldwide have Asthma. Globally 2,50,000 people die of Asthma every year (WHO, 2013-14).

India has an estimated 15-20 million asthmatics (ICMR,2012).

It is a condition due to inflammation of the air passages that which obstruct the flow of air in and out of the lungs. Affects the sensitivity of the nerve endings in the airways so they become easily irritated

Characterized by recurrent attacks of breathlessness and wheezing, which vary in severity and frequency.

According to the National Asthma Education and Prevention Program (NAEPP) and the Global Initiative for Asthma -- Asthma is additionally typified by variable and recurring symptoms, bronchial hyperresponsiveness and underlying inflammation of the airways.

Page 4: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

COPD

Chronic obstructive pulmonary disease (COPD) is a “preventable and treatable disease

with some significant extrapulmonary effects that may contribute to the severity in

individual patients. - American Thoracic Society(ATS)

Its pulmonary component is characterized by airflow limitation that is not fully

reversible.

The airflow limitation is usually progressive and associated with abnormal inflammatory

response of the lung to noxious particles or gasses.

Chronic Obstructive Pulmonary Disease (COPD) affects 210 million people (WHO,2013-

14)

Conditions that contribute to COPD : Mucous hyper secretion with enlargement of tracheo-bronchial sub mucosal glands and

a disproportionate increase of mucous acini. Inflammation of bronchioli, mucous metaplasia and hyperplasia, with increased

intralumenal mucus, increased wall muscle, fibrosis and airway stenoses. Respiratory bronchiolitis is a critically important early lesion which may predispose to

the development of centrilobular emphysema. The severity of destruction of alveolar wall in emphysema appears to be the most

important determinant of chronic deterioration of airflow.

Page 5: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Asthma and Chronic Obstructive Pulmonary Disease (COPD) are complex conditions with imprecise definitions

Definitive morphological comparisons difficult.

Broadly - the airways in asthma are occluded by tenacious plugs of

exudate and mucus

Fragility of airway surface epithelium, thickening of the reticular layer

beneath the epithelial basal lamina and bronchial vessel congestion and

edema.

Increased inflammatory infiltrate comprising ‘activated’ lymphocytes

and eosinophils with release of granular content in the latter

There is enlargement of bronchial smooth muscle particularly in

medium sized bronchi

Page 6: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

METABOLOMICS

Metabolomics involves quantitative measurement of time-related multi-

parametric metabolic response of living systems to pathophysiological stimuli or

change in gene expression profile - Daviss, Bennett (April 2005)

Comprehensive and simultaneous systematic determination of metabolite levels

in the metabolome and their changes over time as a consequence of stimuli

Involves quantitative measurement of time-related multi-parametric metabolic

response of living systems to pathophysiological stimuli or change in gene

expression profile.

Steps in Analysing of metabolomics data :

Efficient and unbiased separation of analytes

Detection

Identification and quantification

Page 7: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

LINEAR REGRESSION

In correlation, the two variables are treated as equals.In

regression, one variable is considered independent (=predictor)

variable (X) and the other the dependent (=outcome) variable Y.

Page 8: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

PRINCIPLE COMPONENT ANALYSIS

Multivariate analysis based on projection methods

Main tool used in chemometrics

Extract and display the systematic variation in the data

Each Principle Component (PC) is a linear combination of

the original data parameters

Each successive PC explains the maximum amount of

variance possible, not accounted for by the previous PCs

PCs Orthogonal to each other

Page 9: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

MACHINE LEARNING - MULTILAYER PERCEPTRON

The Multilayer Perceptron – is a type of machine learning

o Used extensively for the solution of a number of different problems -

pattern recognition and interpolation

The basic Multilayer Perceptron learning algorithm is outlined below.

1. Initialize the network, with all weights set to random numbers between -1

and +1.

2. Present the first training pattern, and obtain the output.

3. Compare the network output with the target output.

4. Propagate the error backwards.

Page 10: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

AIM & OBJECTIVES OF THE STUDY

To build a machine learning approach based model to classify metabolites associated with Asthma and COPD.

Objectives : To enlist the metabolites associated with Asthma & COPD

To generate molecular descriptors for all the chosen metabolite

To perform feature selection using Linear Regression to identify best descriptors

To perform feature extraction using PCA to generate Component Matrices

To build a machine learning approach (Multi Layer Perceptron) models to classify metabolites associated with Asthma and COPD based on both based on input data from Linear Regression and PCA and compare the efficiency of the 2 models.

Page 11: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

WORK FLOW

Text-mine to identify metabolites associated with COPD & Asthma

Generate molecular descriptors for all the chosen metabolites using Descriptor Calculation Wizard of

Molegro Virtual Docker

Feature Selection using Linear

Regression to identify best descriptors

Feature Extraction using PCA to

generate Component

Matrices

Generate a model to classify metabolites associated with Asthma & COPD by employing machine learning by

Multilayer Perceptron

Page 12: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

TEXT MINING

Condition Decreased Increased Total

Asthma 21 5 26

COPD 18 28 46

Common 0 3 3

Total 39 36 75

A comprehensive literature mining of all eligible studies on IDC gene

expression was carried out by searching the PubMed (as on March

2015) based on the following key terms

M AND ((“C” OR “c“) AND (H OR h))

M AND (“A” AND (H OR h))

Where,

M = Gene expression; C = Chronic Obstructive Pulmonary Disease; c = COPD; A = Asthma; H = Homo sapiens; h = human

Page 13: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

S.No: Descriptor Category Descriptor - Details

1 Csp2 Simple Descriptors Number of Sp2 hybridized carbon atoms in ligand

2 HD-HD-Min Chemical FeatureDistance Matrix

Hydrogen Donor-Hydrogen Donor-Minimum Distance

3 R-R-Min Chemical FeatureDistance Matrix

Ring-Ring-Minimum Distance

4 Aro Simple Descriptors No: of aromatic groups

5 OH Simple Descriptors No: of hydroxyl groups

6 HD-R-Mean Chemical FeatureDistance Matrix

Hydrogen Donor – Ring – Mean Distance

The CFDM descriptors are obtained by calculating the minimum, maximum, and mean topological distance between all pairs of chemical features. The topological distance is defined as the smallest number of covalent bonds between the two features.

The CFDM descriptors are obtained by calculating the minimum, maximum, and mean topological distance between all pairs of chemical features. The topological distance is defined as the smallest number of covalent bonds between the two features.

Feature Selection Using Linear Regression To Identify Best Descriptors

Page 14: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

SCREE PLOT DEPICTING PRINCIPLE COMPONENTS

Only the top 7 principle components which had

Eigen value of > 1 were selected

Page 15: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

SCATTER PLOT OF PC1 VS PC2

Page 16: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on features selected by Linear Regression

Case Processing Summary

NPercen

tSampl

eTrainin

g51 70.8%

Testing 21 29.2%

Valid 72 100.0%

Excluded 3  Total 75  

Classification

Sample

Observed

Predicted

.00 1.00

Percent

CorrectTrainin

g.00 15 6 71.4%1.00 1 29 96.7%

Overall Percent

31.4%

68.6% 86.3%

Testing

.00 3 2 60.0%1.00 1 15 93.8%

Overall Percent

19.0%

81.0% 85.7%

Page 17: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on Feature Extraction by PCA

Case Processing Summary

N PercentSample Trainin

g51 70.8%

Testing 21 29.2%

Valid 72 100.0%

Excluded 3  Total 75  

Classification

SampleObserved

Predicted

0 1Percent Correct

Training 0 16 5 76.2%1 0 30 100.0%Overall Percent

31.4%

68.6% 90.2%

Testing 0 4 1 80.0%1 1 15 93.8%Overall Percent

23.8%

76.2% 90.5%

Dependent Variable: Grouping

ROC plot

of

sensitivity

versus

Specificity

Page 18: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Insights From The Result

The Multi Layer Perceptron model to classify metabolites associated with

Asthma and COPD based on Feature Extraction by PCA showed a

greater efficiency of >90 % in comparison to the model based on

feature selection by linear regression which showed a efficiency of

~87%

Scope : Similar metabolite classifying models can be built using Radial

Basis Function and it’s efficiency can be compared with the current

model.

A model for classifying metabolites associated with comorbid conditions

can be attempted in future

Page 19: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

Acknowledgement

Our sincere thanks to Molegro Virtual Docker (MVD) for

providing a limited period trial version with which Molecu.es

descriptors were calculated

We thank Dr. Baljit Ubhi, Ph.D., Global Technical Marketing

Manager Metabolomics & Lipidomics for providing critical Insights

on applications of metabolomics in COPD without which this study

would have been impossible.

Page 20: Prof. PK.Ragunath Sri Ramachandra University Department of Bioinformatics.

My Research TeamSri

Ramachand

ra

University

Thank You


Recommended