Post on 05-Jan-2016
description
transcript
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemiaArmstrong et al, Nature Genetics 30, 41-47 (2002)
Blank slide/colon data
gene1
1.62 1.33 0.79 0.41 0.39 0.38 1.22 1.57 0.72 0.97 1.12 0.61 0.79 0.36 0.52 0.58 0.44 0.35 0.53 0.52 0.46 0.59 0.68 0.27 0.67 0.49 0.49 0.53 0.35 1.44 0.55 0.33 1.70 0.59 0.73 1.54 1.03 0.54 0.66 0.33
2.81 2.18 2.68 2.17 2.84 2.58 4.97 2.12 2.76 3.41 2.72 3.26 2.51 1.24 2.83 1.25 4.22 1.06 2.30 0.44 1.21 1.57
Hsa.37937 3' UTR 2a 197371 MYOSIN HEAVY CHAIN, NONMUSCLE (Gallus gallus)
tumor:
normal:
mean = 0.73 std = 0.4
mean = 2.41 std = 1.05
histograms
2.81 2.18 2.68 2.17 2.84 2.58 4.97 2.12 2.76 3.41 2.72 3.26 2.51 1.24 2.83 1.25 4.22 1.06 2.30 0.44 1.21 1.57
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5 1 3 2 3 5 4 2 1 1
HISTOGRAM, BINS OF 0.5
NORMALIZED (FREQUENCIES)
mean = 0.73 std = 0.4 mean = 2.41 std = 1.05
t-test
T = -9.04 P = 10 e-14
gene1000
0.21 0.38 0.51 0.23 0.23 0.32 0.20 0.53 0.33 0.47 0.25 0.22 0.36 0.26 0.27 0.26 0.26 0.33 0.30 0.15 0.25 0.18 0.19 0.28 0.25 0.25 0.54 0.20 0.41 0.47 0.49 0.39 0.33 0.44 0.37 0.42 0.34 0.35 0.56 0.37
0.20 0.32 0.62 0.21 0.31 0.25 0.24 0.40 0.25 0.50 0.19 0.37 0.63 0.33 0.41 0.48 0.59 0.45 0.48 0.31 0.30 0.41
Hsa.37192 3' UTR 2a 186603 EUKARYOTIC INITIATION FACTOR 4B (Homo sapiens)
mean = 0.328 std = 0.111
mean = 0.375 std = 0.134
tumor:
normal:
NORMALIZED (FREQUENCIES)
t-test
T = -1.48 P = 0.15
85%
gene2000
Hsa.1829 gene 1 Human mRNA fragment for class II histocompatibility antigen beta-chain (pII-beta-4). 1.50 2.53 2.38 3.16 3.01 2.45 1.70 2.10 3.14 2.76
1.57 4.15 3.60 5.32 2.20 1.82 2.81 5.33 4.03 2.28 1.48 2.03 1.75 1.64 2.92 1.26 1.75 2.03 2.45 2.25 2.82 3.87 1.67 1.22 2.49 1.74 4.96 1.49 1.38 5.98
1.56 3.07 4.15 8.12 3.41 3.78 1.42 0.96 2.09 2.63 2.29 2.11 1.26 1.85 1.61 3.18 2.23 1.02 3.36 3.63 2.11 1.93
tumor:
normal:
mean = 2.6258 std = 1.2039
mean = 2.6261 std = 1.536
histograms
NORMALIZED (FREQUENCIES)
t-test
T = - 0.001 P = 0.9992
E, C&N_log2E
colon date expression matrix E
log2 E, center, normalize
genes ordered by p-value
726 genes with p < 0.05ordered by difference ofmeans (normal – tumor)
after ttest 0.05 order by diffmeans
genes with p < 0.05
RANDOM DATA
sorted p
Q=0.15
I=758
how many out of 726 are false?
0.14
FDR: 726*0.14=101 false separating genes
how many genes at FDR=0.05?
516*0.05=26 false separating genes
26 out of 516 - false
26 - false
random data
100separating (p<0.001), 1900 random
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
Armstrong et al, Nature Genetics 30, 41-47 (2002)
separation
E1
E2
ALL
MLL
E1-2E2= 0
= E1- 2E2 < 0
= E1- 2E2 > 0
projection 1
E1
E2
ALL
MLL
w
+/- PROJECTIONS ON w – DO SEPARATE ALL FROM MLL
projection 2
E1
E2
ALL
MLL
+/- PROJECTIONS ON w – DO NOT SEPARATE ALL FROM MLL
projection 3
E1
E2
WELL SEPARATED CENTERS OF MASS -NO SEPARATION OF THE TWO CLOUDS
projection 4
E1
E2WEAK SEPARATION OFCENTERS OF MASS –GOOD SEPARATION OF THE TWO CLOUDS
Fisher to perceptron
E1
E2
ALL
MLL
OPTIMAL LINETO PROJECT ON
FISHER
PERCEPTRON
UNSUPERVISED ANALYSIS
•GOAL A: FIND GROUPS OF GENES THAT HAVE
CORRELATED EXPRESSION PROFILES. THESE GENES ARE
BELIEVED TO BELONG TO THE SAME BIOLOGICAL
PROCESS.
•GOAL B: DIVIDE TISSUES TO GROUPS WITH SIMILAR
GENE EXPRESSION PROFILES. THESE TISSUES ARE
EXPECTED TO BE IN THE SAME BIOLOGICAL (CLINICAL)
STATE.
CLUSTERING
Unsupervised analysis
Giraffe
DEFINITION OF THE CLUSTERING PROBLEM
CLUSTER ANALYSIS YIELDS DENDROGRAM
T (RESOLUTION)
Giraffe + Okapi
BUT WHAT ABOUT THE OKAPI ?
STATEMENT OF THE PROBLEM
GIVEN DATA POINTS Xi, i=1,2,...N, EMBEDDED IN D
- DIMENSIONAL SPACE, IDENTIFY THE
UNDERLYING STRUCTURE OF THE DATA.
AIMS:PARTITION THE DATA INTO M CLUSTERS,
POINTS OF SAME CLUSTER - "MORE SIMILAR“
M ALSO TO BE DETERMINED!
GENERATE DENDROGRAM,
IDENTIFY SIGNIFICANT, “STABLE” CLUSTERS
"ILL POSED": WHAT IS "MORE SIMILAR"?
RESOLUTION
Statement of the problem2
CLUSTER ANALYSIS YIELDS DENDROGRAM
Dendrogram2
TLINEAR ORDERING OF DATA
YOUNG OLD
52 41 3
Agglomerative Hierarchical Clustering
3
1
4 2
5
Distance between joined clusters
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Dendrogram
The dendrogram induces a linear ordering of the data points
The dendrogram induces a linear ordering of the data points
Hierarchical Clustering -Summary
• Results depend on distance update method
• Greedy iterative process
• NOT robust against noise
• No inherent measure to identify stable clusters
2 good clouds
COMPACT WELL SEPARATED CLOUDS – EVERYTHING WORKS
2 flat clouds
2 FLAT CLOUDS - SINGLE LINKAGE WORKS
filament
SINGLE LINKAGE SENSITIVE TO NOISE
52 41 3
Average linkage
3
1
4 2
5
Distance between joined clusters
Need to define the distance between thenew cluster and the other clusters.
Average Linkage: average distance between all pairs
Need to define the distance between thenew cluster and the other clusters.
Average Linkage: average distance between all pairs
Dendrogram