+ All Categories
Home > Documents > Lecture 9,10: DNA Microarrays: Novel Applications

Lecture 9,10: DNA Microarrays: Novel Applications

Date post: 12-Sep-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
64
10.555-Bioinformatics- MIT-2003 L 09-10: Microarrays 1-Novel applications 1 10.555 Bioinformatics: Principles, Methods and Applications MIT, Spring term, 2003 Lecture 9,10: DNA Microarrays: Novel Applications
Transcript
Page 1: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

1

10.555Bioinformatics: Principles, Methods and Applications

MIT, Spring term, 2003

Lecture 9,10:DNA Microarrays: Novel

Applications

Page 2: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

2

Probing cellular function

DNA microarrays

Environment

DNA

mRNAFluxes

Proteins

Proteomics Metabolic Flux Analysis

Signal Transduction

Page 3: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

3

DNA Micro-Array Methodology

DNA

Protein

mRNA

ReverseTranscription

Sample

Control

ReverseTranscription

MixEqually Apply to

Microarray

Sample Control

Page 4: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

4

DNA Micro-Arraying Technology

<100 µm

Page 5: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

5

Mountains of Biological Data

19,200 Human Gene Array

from TIGR

Page 6: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

6

Beyond transcriptional studies: New applications of microarrays

1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., PNAS, May 2002)

Page 7: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

7

Fragment genomic DNA and gel purify to average size of

0.5-3 kbp

Ligate into plasmid toget Genomic Library [pTAGL]

Purify plasmids from time points throughout growth in selective

conditions

Fragment plasmids and label with fluorescent nucleotide

Heterogeneous Selective Growth of overexpression

library

33%

68%

88%

Plasmid DNA

Identify Genes byDNA Micro-Array

Genomic DNA

Cy5

Cy3

Growth AdvantageGrowth Disadvantage

Page 8: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

8

(5.000)

(4.000)

(3.000)

(2.000)

(1.000)

-

1.000

2.000

0 5 10 15 20 25

Region of selective advantage for overexpression library

hours

ln(O

D60

0)

0.0% Pine-Sol

0.25% Pine-Sol

>1% Pine-Sol

Wild-Type Cell

Library Transformed Wild-Type Cell

Determining Selective Conditions

Which plasmidsare enriched?

Which plasmids are enriched?

Page 9: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

9

0.4% Pine-Solmid-exponential phase

0.4% Pine-Sol early stationary phase

Control early stationary phase

Same culture+3 hours

Page 10: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

10

-5

-4

-3

-2

-1

0

1

2

0 5 10 15 20 25 30hours

ln(O

D60

0) 0.0% Pine-Sol

0.4% Pine-Sol

Dynamic Screening of Plasmid Populations

Page 11: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

11

0.4% Exponential 0.4% Early Stationary 0.0% Early Stationary

Plasmids: Resistance GenesGenomic DNA: Control

Discovering Antibiotic Resistance and Susceptibility Genes

Susceptibility Gene?

Page 12: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

12

0% Pine-Sol

Cy3

Sig

nal

14000ygcA

sucDgabD

icdA

phePputA

fdoG

metR pheA

putA

cysH yhgBglnA

lpxD

livG, livJ, fdhF, proA, kdsB, trpE

0.4% Pine-SolC

y3 S

igna

l

14000

0Cy5 Signal0 2000Cy5 Signal0 2000

Quantifying Plasmid Levels to Identify Resistance Genes

Page 13: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

13

Gene FunctionygcA UnknowngabD Succinate-semialdehyde dehydrogenasepheP Phenylalanine specific permeasesucD Succinyl-CoA synthetaseputA Proline dehydrogenaseicdA Isocitrate dehydrogenasefdoG Formate dehydrogenasemetR Positive regulator for methionine geneslpxD Glucosamine-N-acyltransferase (lipid A)pheA Phenylalanine dehydrogenasecysH Adenylsulfate reductaseglnA Glutamine synthetaseyhgB UnknownlivG High affinity branched chain amino acid transporterlivJ High affinity branched chain amino acid transporter

fdhF Formate dehydrogenasetrpE Anthranilate synthase component (tryptophan)

kdsBCMP-3-deoxy-D-manno-octulosonatecytidylytransferase (lipid A)

proA Glutamyl P reductase

Antibiotic Resistance Genes

Page 14: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

14

0.112

0.080

0.000

0.065

0.065

0.165

0.000

0

CV4

0.040

0.015

0.029

0.135

0.135

0.020

0*7

NA

p-value5

9

4

4

4

4

4

4

29

N6

1.19RespBAD-livJ

1.14RespBAD-trpE

Res

Null

Null

Res

Res

Control

Trait

1.06 pBAD-hybC

1.06pBAD-leuC

1.18pBAD-pheP

1.40Enriched Library

1.20Library

1.03pUC19 Control

MIC1 Ratio2Transformant

Table 2

1MIC: Minimum Inhibitory Concentration. 2Ratio of MIC of the corresponding strain by the MIC value of the pUC19 control. 3MIC for pUC control was 0.6 +/- 0.063 %(v/v) Pine-Sol in LBA. 4Coefficient of Variation = SD/mean. Each strain was tested four to nine times with the pUC control tested 29 times. 5p-value is the one-tailed probability that the mean MIC between the transformant and the pUC19 control were equal using a students means t-test for two samples of unequal variances. Therefore the null hypothesis that the MIC means of the control and the strains reported are the same is rejected with 95% confidence for all strains except of those of the null genes. 6Number of separate MIC assays. 7Variance in library MIC assays was zero

Validation Experiments

Page 15: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

15

Beyond transcriptional studies: New applications of microarrays

1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., 2002)

2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)

Page 16: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

16

Research Goal

• Understand how DNA binding proteins regulate global gene expression

• Study genes whose expression is directly controlled by Gal4 and Ste12

Page 17: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

17

ChIP Chip

• ChIP: Chromatin Immuno-Precipitation• Fix cells with formaldehyde and harvest• Use an antibody to precipitate DNA

fragments bound to protein of interest• Remove cross-links, amplify, label (Cy5)• Mix with unenriched sample (Cy3)• Bind to DNA microarray

Page 18: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

18

Example of results

Page 19: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

19

Why?

• Microarrays alone cannot distinguish primary effects from secondary effects

• Identification of where protein binds gives exact interaction

g3g2g1

Page 20: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

20

Example: Gal4 in yeast

• Gal4 activates the genes necessary for galactose metabolism

• Find genes which are upregulated in galactose and bound by Gal4

• 10 targets found with 0.001 ≥ P value

Page 21: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

21

Results

Page 22: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

22

Results, Part II

• 7 genes known to be bound by Gal4– GAL1, GAL2, GAL3, GAL7, GAL10, GAL80, GCY1

• 3 others (MTH1, PCL10, FUR4)

• The consensus binding sequence (CGGN11CCG) occurs in many other places“…sequence alone is not sufficient to account for

specificity of Gal4 binding in vivo.”

Page 23: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

23

Confirmation

+/- Strains with or without tagged Gal4I/P Unenriched or Enriched DNA

Standard Chromatin IP

Page 24: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

24

Additional confirmationRT-PCR to quantify expression of 3 unexpected genes

Wild type vs gal4-

“Galactose-induced expression of FUR4, MTH1, and PCL10 is Gal-4 dependent”

Page 25: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

25

Results, Part IIIInduce transport and conversion

Reduce other transporters

Maximize energy from galactose

Increase pyrimadines for UDP (?)

Page 26: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

26

Beyond transcriptional studies: New applications of microarrays

1. Genome-wide screening for genes conferring specific cellular traits (Gill et al., PNAS)

2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)

3. Experimental annotation of human genome (Shoemaker et al., Nature 409:922-927, 2001; Rosetta Inpharmatics)

Page 27: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

27

WHAT THE PAPER CLAIMS...

Microarray technology:• High-throughput microarray based experimental method

to validate predicted exons• Group the exons into genes by co-regulated expression• Define full-length mRNA transcriptsApproaches:1 Exon array approach - high-throughput method2 Tiling array approach - higher-resolution method

Page 28: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

28

EXON ARRAY APPROACH

Analysis of human chromosome 22q

• Exon arrays consisting of long oligonucleotide probes derived from the 8,183 predicted exons annotated on chromosome 22q were fabricated

• Hybridization with flourescently labeled cDNAs derived from 69 experiments conducted on pairs of conditions(e.g., kidney vs. lung, testes vs uterus, etc.) using Cy3/Cy5 labels

Page 29: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

29

FABRICATION OF EXON ARRAYS

Page 30: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

30

COMPENDIUM OF EXPERIMENTS

Page 31: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

31

VALIDATION OF GENE BOUNDARIES

Page 32: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

32

SUMMARY OF RESULTS

Page 33: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

33

TILING ARRAY APPROACH

• Higher-resolution view of a genomic region of interest• Can potentially reveal exons not identified by current gene

prediction algorithms• Provides information about alternative splicing• Can be applied specifically on initial and terminal exons

where gene prediction programs are not very accurate• Does not need any a priori information on exon content of

genomic sequence

Page 34: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

34

TESTES TRANSCRIPT REFINED USING TILING ARRAYS

Page 35: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

35

HUMAN GENOME SCAN USING EXON ARRAYS

Page 36: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

36

DISCUSSION

• A microarray approach to the simultaneous validation of gene predictions and study of the transcriptome under any number of medically interestingly conditions

• Exon-based approach provides high-throughput screening of diverse cell types, growth conditions and disease states

• Method could be used for rapid annotation of sequence information from the Human Genome project

Page 37: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

37

Beyond transcriptional studies: New applications of microarrays

1. Genome-wide screening for genes conferring specific cellular traits

2. Genome-wide location and function of DNA binding proteins (Ren et al., Science, December 22, 2000)

3. Experimental annotation of human genome4. Large scale identification of secreted and

membrane associated gene products (Diehn et al., Nature Genetics, 25:58-62 2000)

Page 38: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

38

Motivation

• Importance of membrane-associated and secreted proteins– Receptors– Transporters– Adhesion molecules– Hormones– Cytokines

Page 39: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

39

Current Methods

• Computational– Potential amino-terminal membrane-targeting

signals– Trans-membrane domains

• Require knowledge of entire coding sequence

• Experimental– No large-scale genomic methods

Page 40: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

40

Method for isolating membrane-bound polysomes

Cy5Cy3

Equilibrium density centrifugation in aSucrose gradient

Page 41: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

41

Evaluation for genes with known localization

Distinctly membrane-bound

Distinctly cytosolic

Overlap:• Confirmed in other organisms

and from literature• No simple classification,

biological explanation

Page 42: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

42

Characterization of unclassified genes

Moving Average Methodology•Sort genes by the ratio value•Compute fraction of the genes known to be membrane-associated in a given window of specified size•Plot this fraction against the expression of the central gene in window

Compare with the already characterized genes, in the following graphs.Fraction = probability

Human Cells (Jurkat)Window size: 151 genes

Yeast CellsWindow size: 175 genes

Page 43: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

43

Comparison with computational predictions

Genes enriched in themembrane fraction

Only for Yeast genes, since fully sequenced

Also identify several genes that are membrane-associated but were not computationally predicted

Page 44: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

44

Misclassifications/Exceptions• Cytosolic protein in expt. membrane fraction

– HAC1: spliced in cytoplasm by interaction with Ire1p, an ER transmembrane protein

• Cytosolic protein with membrane binding domain– Calcineurin B: associated to cytoplasmic side of plasma

membrane. Ribosomes may be recruited to PM

• Alternative splice sites– The present microarray cannot distinguish between the

various forms

Page 45: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

45

Analysis of Microarray Data

1. Internal Validation• Background• Normalization• Confidence intervals

2. Analysis of Static expression data• PCA, Decision trees

3. Analysis of Dynamic expression data• Clustering (Hierarchical, SOM’s)

Page 46: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

46

Analysis of Microarray Data

1. Internal ValidationRNA Diagnostics (yield, purity)Average Yield: 3.5 +/- 0.8 ug RNA/108 cells (Cyanobacteria)Average Purity: A260/280 = 1.85 +/- 0.1

Data Filtering (Removes precipitants, ghost spots, weak signals)Filter 1: (Backgroundlocal-Backgroundavg) > xSDbackgroundFilter 2: (Signallocal - Backgroundlocal ) < ySDB’local

Filter1 Filter2 S/N Cy3 S/N Cy5 % Spotsx(SDavg) y(SDB’local) Retained

1 0 4.1 2.2 98%3 0.5 5.3 2.8 91%1 1 7.8 3.8 75%2 2 10.3 4.5 57%0.5 2 6.7 3.4 69%

Page 47: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

47

Filtering

Cy3 Cy5Labeling 112 NT/Cy3 293 NT/Cy5Brightness (Ex*QY) 57,000 70,000

Precipitation Ghost Spots

Adjustment

Signal Background

Internal Validation of Micro-Arraying Methodology

Page 48: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

48

Signal/Noise: S/N= (Signali - Backgroundi )/σback

For S/N > 1.96 the signal is diferent than the background with 95% confidence interval (CI)

Labeling DiagnosticsExtinction Quantum Brightness Avg. Incorp’nCoeff. (EC) Yield (QY) (EC*QY) (NT/Cy)

Cy3: 150,000 0.38 57,000 112 +/- 47Cy5: 250,000 0.28 70,000 293 +/-154

Ratio NTCy5/NTCy3Molar 2.62Brightness 3.21

Page 49: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

49

(2.0

)

0.4

2.7

5.1

7.5

9.9

12.2

14.6

0

50

100

150

200

250

How Repeatable are Our Results?

CV = 30-45%

95% Confidence Interval = 1.6-1.9 = Intensity SampleIntensity Control

How Confident Are We That Signal is not Noise?

BackgroundSignal

Confidence

SDbackground

Internal Validation of Micro-Arraying Methodology

# Sp

ots

Page 50: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

50

Normalization:1. By the average or total signal for each fluorophore (accounts for variations in brightness and total RNA)

• Values now are reported as fractions of total RNA. This may change under certain conditions (partial arrays)

2. By the average of rRNA genes. They can change too

3. Backgrounds: BCy3 / BCy5

4. Specific brightness: BRCy3 / BRCy5

5. Total signal: Scy3 / SCy5

6. Total signal minus background: ((S-B)3 / (S-B)5)Is class discovery dependent upon filtering and normalization?

Page 51: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

51

-5.9

3

-4.4

3

-3.6

8

-2.9

9

-2.4

3

-1.8

1

-1.2

3

-0.6

5

-0.0

7

0.57

1.12

1.80

2.32

3.00

3.56

4.56

5.53

0

2040

60

80

100

120140

160

180

200

Filtering

Filtered then Adjusted by Average SignalFiltered then Adjusted by Background and BrightnessNo Filter, Adjusted by Background and Brightness

Avg. InducedAvg. Repressed

Log2 (Sample/Control)

# of

Gen

es

Adjustment

Internal Validation of Micro-Arraying Methodology

Page 52: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

52

Analysis of Microarray Data

1. Internal Validation• Background• Normalization• Confidence intervals

2. Analysis of Static expression data• PCA, Decision trees

Key Questions• Sample classification-Diagnosis• Identification of discriminatory genes

3. Analysis of Dynamic expression data• Clustering (Hierarchical, SOM’s)

Page 53: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

53

Data Analysis and Pattern Classification

Problem-1: Consider N samples and M genes with theircorresponding expression levels, ei , where i = 1, …, M. M1of these tissues are characterized as “Healthy”, while theother M2 are labeled as “Pathological”. Find the set of discriminatory genes whose expression levels can diagnose the state, i.e. healthy or pathological, of a new sample tissue.Feature Space: The space of expression levels for the Mgenes, i.e. FS = {e1 , e2 , e3 , …, eM-1 , eM }Class: A set of genes characterized by the same label, e.g.C1 = “Healthy” and C2 = “Pathological”.Pattern: The specific M-tuple of expression levels, whichcharacterizes a tissue as belonging to a specific class, i.e.p(2) = {e(2)

1 , e(2)2 , e(2)

3 , …, e(2)M-1 , e(2)

M }, Pattern for“Pathological” Tissues.

Page 54: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

54

Data Analysis and Pattern Classification

Pattern Classification: The process through which the feature space, FS, is partitioned into K exclusive regions,FSi i = 1, 2, …, K. Thus,

FS(i) ∩ FS(j) = 0 and ∪i=1-K FS(i) = FSDiscriminant Functions: d (p) = d (e1 , e2 , e3 , …, eM-1 , eM)define the partition of the feature space into the K regions.

Page 55: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

55

Page 56: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

56

Data Analysis and Pattern Discovery

• Problem-2: Consider N samples and M genes with theircorresponding expression levels, ei , where i = 1, …, M.“Discover” the patterns in gene expression levels which arecommon in a number of samples, i.e. find the groups ofsamples, each of which is characterized by a common patternin gene expression and define this common pattern of geneexpression levels for each group of samples.

• Problem-3: Consider one type of sample and the geneexpression levels for M genes over a period of L timepoints. “Discover” the patterns in gene expression levels,which are common for a particular group of genes, andcluster the genes with similar patterns into the same group.

Page 57: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

57

Data Analysis and Pattern Classification

Training: The process through which one determines thediscriminant functions, using past examples of “pattern” -“class” associations, i.e. associations betweenpattern p(i) = {e(i)

1 , e(i)2 , e(i)

3 , …, e(i)M-1 , e(i)

M } and Class C(i)

Types of Problems:• Static: when the gene expression levels represent theexpression at a single time.• Dynamic, or Time-Dependent: when the expression levelsare measured over a period of time at various time intervals.

• Equal sampling intervals.• Unequal sampling intervals.

Page 58: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

58

Data Analysis and Pattern Classification

• Issues to Resolve:– Labeling the various samples– Representation:Selecting the distinguishing features for classification; particularly important for time-dependent data, e.g. do you use the values, or the time derivatives of expression levels for classification?– Selecting the form of the discriminant function– Do you have statistically “enough” data for training?– Do you have enough data for testing?– What is the “noise” in your measurements?– What is the sensitivity of the generated discriminantfunction?– What is the robustness of the resulting classification scheme?

Page 59: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

59

Information Theory:Decision Trees in Pattern Classification

Let N be the total number of examples (e.g. samples) and Mithe number of samples in each of the K classes.The Shannon entropy provides a measure of the information contentin the data set,

I(M1, M2, …, MK) = Σi=1-K (Mi/M) log2 (Mi/M)

• If all examples belong in the same class then I = 0.• The smaller the entropy the less variety of classes (more order) in the data set.

Split the data into two groups G1 and G2 with M(1) and M(2) examples (samples) in each group. Compute the information content for each group and for the whole set of examples.

Page 60: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

60

Page 61: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

61

Page 62: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

62

0 0.05 0.1 0.15 0.2 0.25-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Principal Component 1

Prin

cipa

l Com

pone

nt 2

Loading plot of the Genes

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

x 104

-4

-2

0

2

4

6

8

10x 10

4

Principal Component 1

Prin

cipa

l Com

pone

nt 2

Score Plot of the Tissue Samples

Colon

Esoph

Lu-4

Lu-3 S tom

Blood

VU-3 Cerv

ix

Pros t

Endo-1

Endo-2P lc-1 Plc-

2

VU-1

VU-2

Lu-1

Lu-2

Mu-2

Mu-1

Mu-3

Tes tes

Spleen

Myo-1

Myo-2

Kid-1

Kid-2

Kid-3

Kid-4

Br-Ag

Br4

Br-MC

Br-HC

Br-CP

Ovary1

Ovary2

Li-1

Li-2

Bre-1

Bre-2

Br-Ce

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

x 104

-4

-2

0

2

4

6

8

10x 10

4

Principal Component 1

Prin

cipa

l Com

pone

nt 2

Score Plot of the Tis sue Samples

Colon

Esoph

Lu-4

Lu-3

Stom

Blood

VU-3

Cervix

Prost

Endo-1

Endo-2

Plc-1

Plc-2

VU-1

VU-2

Lu-1

Lu-2

Mu-2

Mu-1

Mu-3

Tes te

s

Spleen

Myo-1

Myo-2 Kid-

1

Kid-2

Kid-3

Kid-4

Br-Ag

Br4

Br-MC

Br-HC

Br-CP

Ovary1

Ovary2

Li-1

Li-2

Bre-1

Bre-2 Br-C

e

Figure 1: Selection of relevant genes using the loadings on the principal components.

(a) (b)

(c)(d)

Page 63: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

63

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Principal Component 1

Prin

cipa

l Com

pone

nt 2

Loading Plot of the Genes

0.5 1 1.5 2 2.5 3

x 104

-1

-0.5

0

0.5

1

1.5

2x 10

4

Spleen

BR-AG

PE1

VU2

BR-HC

Liver

CER2 Blood

PE2

BR-CA

Kid-3 Lung

Ovary

Sk Mus

CER1

Kid-2

Myo-1

P lac

Kid-4

BR-L4

VU1

Kid-1

Myo-2

BR-MOT

Score P lot of the Samples

Principal Component 1

Prin

cipa

l Com

pone

nt 2

0 1 2 3 4 5 6 7

x 104

-3000

-2500

-2000

-1500

-1000

-500

0

500

Spleen

BR-AG

PE1

VU2

BR-HC

Liver

CER2

Blood

PE2

BR-CA

Kid-3

Lung

Ovary

Sk Mus

CER1

Kid-2

Myo-1

P lac

Kid-4

BR-L4

VU1

Kid-1

Myo-2

BR-MOT

Score Plot of the Samples

Principal Component 1

Prin

cipa

l Com

pone

nt 2

0 1 2 3 4 5 6 7

x 104

-6000

-5000

-4000

-3000

-2000

-1000

0

1000

SpleenBR-A

G

PE1 VU2

BR-H

C Liver

CER2 Blood

PE2 BR-C

A

Kid-3

Lung

Ovary

Sk Mus

CER1 Kid-

2

Myo-1

P lac

Kid-4

BR-L4

VU1

Kid-1

Myo-2

BR-MOT

Score Plot of the Samples

Principal Component 1

Prin

cipa

l Com

pone

nt 2

Figure 2: Projection of the samples using the genes in the specific structures observed in the Loading plot.(a) The projection using genes in structure A separated the skeletal muscle tissue. The genes inthis structure contained several troponins and skeletal muscle related genes. (b)The projection using genesin structure B separated the liver and lung. These genes contained albumins and apolipoproteins, among others (c) The genes in structure C separated the brain samples from the remaining tissues. The structure contained severalbrain specific genes and ribosome-related genes.

(a)

(b) (c)

A

B

C

Page 64: Lecture 9,10: DNA Microarrays: Novel Applications

10.555-Bioinformatics-MIT-2003

L 09-10: Microarrays 1-Novel applications

64

0 0.5 1 1.50

5

10

15

20

25

30His togram of the angles of the genes in the loading plot

Angle

Num

ber o

f Gen

es

Structure A

Structure BStructure C

0 0.5 1 1.5 2 2.5 3

x 104

-1

-0.5

0

0.5

1

x 104

P rincipal Component 1

Prin

cipa

l Com

pone

nt 2

Li-1

Li-2

NLi-1

NLi-3

NLi-2

0 1 2 3 4 5 6

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 10

4

Mu-2

Mu-1

Mu-3

P rincipal Component 1

Prin

cipa

l Com

pone

nt 2

NMu-

1NM

u-2

NMu-

3

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

x 104

-1

-0.5

0

0.5

1

1.5

2

2.5x 10

4

P rincipal Component 1

Prin

cipa

l Com

pone

nt 2


Recommended