+ All Categories
Home > Documents > CONTENTS Page INTRODUCTION EXPLORATORY DATA...

CONTENTS Page INTRODUCTION EXPLORATORY DATA...

Date post: 22-Jun-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
2
i CONTENTS Page Table of Contents i Abstract ii 1. INTRODUCTION 1.1 Multivariate Biomedical Data 1.2 Biomedical Cancer Genomic Data 1.3 Microarray and Gene Expression Levels 1.4 Data Under Study 1.4.1 Leukemia Cancer Gene Expression Data Set 1.5 Objectives of the Project 1.6 Summary Statistics for Multivariate Data Set 1.7 Relative Variance Covariance Matrix. 1 1 3 4 6 8 10 12 13 2. EXPLORATORY DATA ANALYSIS 2.1 Histograms 2.2 Box and Whiskers Plots 2.3 Transformations 2.3.1 Box-Cox Transformation 2.4 Exploratory Data Analysis: A Graphical View 2.5 Probability Plots 2.6 Fitting a Probability Distribution 2.7 Extreme Value Distributions 2.7.1 Extreme Value Distribution Argument 2.7.2 Generalized Extreme Value Distributions 2.7.3 Gumbel Distribution 2.7.4 Fréchet Distribution 2.7.5 Weibull Distribution 2.8 Goodness of Fit Test 2.9 Probability Difference Plots 2.10 Correlation Structure of the Data Matrix 2.10.1 Interpreting Coefficient of Correlation 17 17 18 19 20 21 32 35 36 37 38 40 42 43 45 56 60 61 3. PRINCIPAL COMPONENT ANALYSIS AND ITS USE IN CLUSTERING TISSUE SAMPLES 3.1 Clustering Gene Expression Data 3.2 Cluster Analysis: A Comparison 3.3 Principal Component Analysis and Clustering 3.4 Principal Component Analysis 3.5 Principal Components 3.5.1 Principal Components Using Variance Covariance Matrix 3.5.2 Principal Components Using Correlation Matrix 3.5.3 Principal Components Using Relative Variance Matrix 3.6 Principal Components : Relative Variance Matrix versus Correlation Matrix 3.7 Principal Component Loadings 68 69 71 74 76 77 77 79 81 82 83
Transcript
Page 1: CONTENTS Page INTRODUCTION EXPLORATORY DATA ANALYSISprr.hec.gov.pk/jspui/bitstream/123456789/1546/2/2110S-0.pdf · CONTENTS Page Table of Contents i Abstract ii 1. INTRODUCTION 1.1

i

CONTENTS Page

Table of Contents i

Abstract ii

1. INTRODUCTION

1.1 Multivariate Biomedical Data

1.2 Biomedical Cancer Genomic Data

1.3 Microarray and Gene Expression Levels

1.4 Data Under Study

1.4.1 Leukemia Cancer Gene Expression Data Set

1.5 Objectives of the Project

1.6 Summary Statistics for Multivariate Data Set

1.7 Relative Variance Covariance Matrix.

1

1

3

4

6

8

10

12

13

2. EXPLORATORY DATA ANALYSIS

2.1 Histograms

2.2 Box and Whiskers Plots

2.3 Transformations

2.3.1 Box-Cox Transformation

2.4 Exploratory Data Analysis: A Graphical View

2.5 Probability Plots

2.6 Fitting a Probability Distribution

2.7 Extreme Value Distributions

2.7.1 Extreme Value Distribution Argument

2.7.2 Generalized Extreme Value Distributions

2.7.3 Gumbel Distribution

2.7.4 Fréchet Distribution

2.7.5 Weibull Distribution

2.8 Goodness of Fit Test

2.9 Probability Difference Plots

2.10 Correlation Structure of the Data Matrix

2.10.1 Interpreting Coefficient of Correlation

17

17

18

19

20

21

32

35

36

37

38

40

42

43

45

56

60

61

3. PRINCIPAL COMPONENT ANALYSIS AND ITS USE IN CLUSTERING

TISSUE SAMPLES

3.1 Clustering Gene Expression Data

3.2 Cluster Analysis: A Comparison

3.3 Principal Component Analysis and Clustering

3.4 Principal Component Analysis

3.5 Principal Components

3.5.1 Principal Components Using Variance Covariance Matrix

3.5.2 Principal Components Using Correlation Matrix

3.5.3 Principal Components Using Relative Variance Matrix

3.6 Principal Components : Relative Variance Matrix versus Correlation Matrix

3.7 Principal Component Loadings

68

69

71

74

76

77

77

79

81

82

83

Page 2: CONTENTS Page INTRODUCTION EXPLORATORY DATA ANALYSISprr.hec.gov.pk/jspui/bitstream/123456789/1546/2/2110S-0.pdf · CONTENTS Page Table of Contents i Abstract ii 1. INTRODUCTION 1.1

ii

3.8 PCA Clustering: A literature review

3.9 Kaiser’s Criterion for Retaining Principal Components

3.10 PCA in Graphical Representation

3.10.1 The PC Plots

84

86

88

88

4. SCREENING AND CLUSTERING OF GENES

4.1 Gene Clustering

4.2 Screening of Genes

4.3 Role of Minimum Threshold Value ‘20’

4.4 Garcia’s Criterion of Relative Variance

4.5 The High Variant Cluster of Genes

4.6 Discriminant Analysis

4.7 Discriminating the High Variant Gene Group

103

104

106

110

118

122

141

145

5. DISCUSSIONS, CONCLUSIONS WITH FUTURE RECOMMENDATIONS

5.1 Main Issues in Genomic data set

5.2 Addressing the Issues

5.3 Recommendations

150

150

151

155

APPENDIX

157

REFERENCES 164


Recommended