+ All Categories
Home > Documents > Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey...

Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey...

Date post: 18-Dec-2015
Category:
Upload: julie-snow
View: 217 times
Download: 0 times
Share this document with a friend
30
Gene Expression Index Stat 115 2012
Transcript
Page 1: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

Gene Expression Index

Stat 115

2012

Page 2: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

2

Outline• Gene expression index

– MAS4, average

– MAS5, Tukey Biweight

– dChip, model based, multi-array

– RMA, model based, multi-array

– Method comparison• Latin Square spike-in experiment

– Importance of probe mapping

These are perhaps the few most popular of many methods for normalizing and computing expression measures using Affymetrix data. Currently over 50 methods are describedand compared at http://affycomp.biostat.jhsph.edu/.

Page 3: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

3

cDNA Microarrays

• Fold change: ratio Cy5 / Cy3

• When fold change is negative

Log2(Cy5 / Cy3)

Genes

Arrays

array 1 array 2 array 3 array 4 array 5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

Page 4: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

4

Affymetrix Microarray Expression Index

• How to summarize probes in a probeset?

Brighter PM usually carries more information, but not always the case (cross-hybridization)

Page 5: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

5

MAS4• GeneChip® older software Microarray Analysis

Software 4.0 uses AvgDiff

• A: a set of suitable pairs chosen by software– Remove highest/lowest– Calculate mean, sd from remaining probes– Eliminate probes more than 3 sd from mean

• Drawback (naïve algorithm):– Can omit 30-40% probes – Can give negative values

j

jj MMPMAvgDiff )(1

Page 6: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

6

MAS5• GeneChip® newest version

• CT* (change threshold) a version of MM that is never bigger than PM– If MM<PM, CT* = MM– If MM>PM, estimate typical

case MM for PM • Tukeybiweight of MMs

with similar PM values ~70% PM

– If typical MMs>PM for, set CT* = PM - • Robust weighting to down weight outliers

)}{log( *jj CTPMghtTukeyBiweisignal

Page 7: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

7

Li & Wong (dChip)Important observation: relative values of probes within a

probeset very stable across multiple samples.

Page 8: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

8

Model-Based Expression Index

• Look at multiple samples at a time, give different probes a different weight

• Each probe signal is proportional to – Amount of target sample:

– Affinity of specific probe sequence to the target: j

1

2

Probes 1 2 3

sample 1

sample 2

Page 9: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

9

Li & Wong (dChip)

• Model

• Iteratively estimate θi and φj to minimize εij

• Try to minimize the sum of errors

ijjiijij MMPM

............

...)()()(

...)()()(

...)()()(

333231

232221

131211

MPMPMP

MPMPMP

MPMPMP

Sample1

Sample2Sample3…

φ1 φ2 φ3

Probe1 Probe2 Probe3 …1

2

3

Concentration Probe affinity

Error

Page 10: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

10

RMA = Robust Multi-chip Analysis

• Irizarry & Speed, 2003

• Eliminates MM probes

• Probe intensity background adjustment

• Quantile normalize the background adjusted PM

• Take Log of PM

• Robust probe summary

Page 11: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

11

RMA Background Subtraction

• Signal + BG = PM

• Signal ~ exponential; BG ~ normal

+ =

Signal + Noise = Observed

Page 12: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

12

RMA Background Subtraction

• BG distribution

Page 13: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

13

Why Log(PM)• Captures the fact that higher value probes are

more variable• Assume probe noise is comparable on log scale

Page 14: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

14

• For each probe set, PMij = ij

• Fit the model:

– aj is expression index, bj is probe effect– Log2n() stands for logarithm after quantile

normalization of n samples

• Iteratively refit aj and bj (similar to dChip)– Main difference is to minimize error at log PM

RMA

)log()log()(log jiijPM

ijjiij baPM )bg(nlog2

Page 15: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

RMA model fitting: Median Polish

15

• For a given probe set with J probe pairs, let yij denote the background-adjusted, base-2-logged, and quantile-normalized value for GeneChip i and probe j.

• Assume yij = μi + αj + eij where α1 + α2 + ... + αn = 0.

• Perform Tukey’s Median Polish on the matrix of yij values with yij in the ith row and jth column.

gene expressionof the probe seton GeneChip i

probe affinityaffect for thejth probe in theprobe set

residual

Page 16: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

16

An Example (from Dan Nettleton)

Suppose the following are background-adjusted, log2-transformed, quantile-normalized PM intensitiesfor a single probe set. Determine the final RMAexpression measures for this probe set.

1 2 3 4 51 4 3 6 4 72 8 1 10 5 113 6 2 7 8 84 9 4 12 9 125 7 5 9 6 10

Gen

eChi

p

Probe

Page 17: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

17

An Example (continued)

4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10

48797

rowmedians

0 -1 2 0 3 0 -7 2 -3 3-1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3

matrix afterremoving

row medians

Page 18: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

18

An Example (continued)

0 -1 2 0 3 0 -7 2 -3 3-1 -5 0 1 1 0 -5 3 0 3 0 -2 2 -1 3

0 -5 2 0 3

column medians

0 4 0 0 0 0 -2 0 -3 0-1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0

matrix aftersubtracting

column medians

Page 19: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

19

An Example (continued)

0 4 0 0 0 0 -2 0 -3 0-1 0 -2 1 -2 0 0 1 0 0 0 3 0 -1 0

0 0-1 0 0

rowmedians

matrix afterremoving

row medians

0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0

Page 20: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

20

An Example (continued)

0 4 0 0 0 0 -2 0 -3 0 0 1 -1 2 -1 0 0 1 0 0 0 3 0 -1 0

0 1 0 0 0

column medians

matrix aftersubtracting

column medians

0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0

Page 21: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

21

An Example (continued)

0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0

All row medians and column medians are 0.Thus the median polish procedure has converged.The above is the residual matrix that we willsubtract from the original matrix to obtain thefitted values.

Page 22: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

22

An Example (continued)

0 3 0 0 0 0 -3 0 -3 0 0 0 -1 2 -1 0 -1 1 0 0 0 2 0 -1 0

4 3 6 4 7 8 1 10 5 11 6 2 7 8 8 9 4 12 9 12 7 5 9 6 10

4 0 6 4 78 4 10 8 116 2 8 6 99 5 11 9 127 3 9 7 10

original matrix residuals from median polish

matrix of fitted values

4.28.26.29.27.2

row means= μ1

= μ2

= μ3

= μ4

= μ5

^^

^^^

RMAexpressionmeasuresfor the 5 GeneChips

Page 23: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

23

Method Comparison Standard• Spike-ins: introduce markers with known

concentration (intensity) to RNA samples– Should cover a broad range of concentrations– Run two samples with and without spike-in, see

whether algorithm can detect the spike-in (differential expression)

• Dilutions: – Serial dilutions: 1:2, 1:4, 1:8…

• Latin square spike-in captures both approaches above

• Compare both accuracy qualitatively and expression index quantiatively

Page 24: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

24

Latin Square Spike-ins

Page 25: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

25

MAS4 MAS 5

dChip RMA

Red numbers indicate spikedgenes

Method Comparison of Spike-in

Page 26: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

26

Method Comparison Conclusion

• No one uses MAS4 now• With fold change, RMA > dChip > MAS5• With p-value, RMA ~ MAS5 > dChip• MAS 5.0 does a good job on abundant genes• dChip and RMA do better on less abundant genes • Affy developed multi-chip model-based PLIER,

currently open source, although no documentation• All five models are implemented in BioConductor

(open source R package)

Page 27: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

27

214019_at: CCND1

....

Page 28: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

28

Page 29: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

29

Probe Mapping in Affymetrix Expression arrays

• Inconsistencies in ~5% of NetAffx probe-to-gene annotations (Perez-Iratxeta et al. 2005).

• Remapping all the probes with documented human transcripts resulted in the redefinition of ~37% of probes in Affy’s newest U133 Plus 2.0 array (Harbig et al. 2005).– Provide new and better .cdf file for probe mapping

• Evolving gene/transcript definitions can cause ~30% difference in the differentially expressed genes (Dai et al. 2005).

Page 30: Gene Expression Index Stat 115 2012. 2 Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

30

Acknowledgment

• Terry Speed, Rafael Irizarry & group• Kevin Coombes & Keith Baggerly• Erick Rouchka• Wing Wong & Cheng Li• Mark Reimers• Erin Conlon• Larry Hunter• Zhijin Wu• Wei Li


Recommended