+ All Categories
Home > Documents > Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Date post: 15-Jan-2016
Category:
View: 218 times
Download: 0 times
Share this document with a friend
88
Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009
Transcript
Page 1: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Microarray normalization, error models, quality

Wolfgang HuberEMBLBrixen 15 June 2009

Page 2: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Brief historyLate 1980s: Poustka, Lennon, Lehrach: cDNAs spotted on nylon membranes

1990s: Affymetrix adapts microchip production technology for in situ oligonucleotide synthesis („commercial and heavily patent-fenced“)

1990s: Brown lab in Stanford develops two-colour spotted array technology („open and free“)

1998: Yeast cell cycle expression profiling on spotted arrays (Spellmann) and Affymetrix (Cho)

1999: Tumor type discrimination based on mRNA profiles (Golub)

2000-ca. 2004: Affymetrix dominates the commercial microarray market

Since ~2003: Nimblegen, Illumina, Agilent (and many others)

Throughout 2000‘s: CGH, CNVs, SNPs, ChIP, tiling arrays

Since ~2007: Next-generation sequencing (454, Solexa, ABI Solid,...)

Page 3: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Oligonucleotide microarrays

Page 4: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Base Pairing

Ability to use hybridisation for constructing specific + sensitive probes at will is unique to DNA (cf. proteins,

RNA, metabolites)

Page 5: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Oligonucleotide microarrays

5µm5µm

Millions of copies of a Millions of copies of a specificspecificoligonucleotide probe oligonucleotide probe molecule per patchmolecule per patch

Image of array after hybridisation and stainingImage of array after hybridisation and staining

up to 6.5 Mioup to 6.5 Miodifferent probe patchesdifferent probe patches

Target - single strandedTarget - single stranded cDNAcDNA

Oligonucleotide probeOligonucleotide probe

**

**

*

1.28cm1.28cm

GeneChipGeneChip

Hybridized Probe CellHybridized Probe Cell

Page 6: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Probe sets

Page 7: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Terminology for transcription arrays

Each target molecule (transcript) is represented by several oligonucleotides of (intended) length 25 bases

Probe: one of these 25-mer oligonucleotidesProbe set: a collection of probes (e.g. 11) targeting the

same transcript

MGED/MIAME: „probe“ is ambiguous!Reporter: the sequenceFeature: a physical patch on the array with molecules

intended to have the same reporter sequence (one reporter can be represented by multiple features)

Page 8: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Image analysis

• several dozen pixels per feature• segmentation• summarisation into one number representing the intensity level for this feature

CEL file

Page 9: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

array data

samples:mRNA fromtissue biopsies,cell lines

arrays:probes = gene-specific DNA strands

2.93

1.67

0.72

0.6

5.8

1.12

tissue B

3.314.2MCAM

0.671.32LAMA4

0.120.01CASP4

1.02.2ALDH4

1.81.1VIM

2.120.02ErbB2

tissue Ctissue A

fluorescent detection of the amount of

sample-probe binding

Page 10: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Why do you need ‘normalisation’?

Page 11: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

From: lymphoma dataset

vsn package

Alizadeh et al., Nature 2000

Systematic drift effects

Page 12: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

MA-plotM

A

2

2

2

2

log

log ( )

1 1log2 2log

1 1

A RG

RM

G

RA

GM

Page 13: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

05

10

15

log 2

inte

nsity

arrays / dyes

Page 14: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

5 10 15

0.0

00

.05

0.1

00

.15

0.2

00

.25

8 arrays from the lymphoma data (Alizadeh 2000)

log2intensity

De

nsi

ty

Page 15: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

log2 Cope et al. Bioinformatics 2003

Non-linearityspike-in data

Page 16: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

ratio compression

Yue et al., (Incyte

Genomics) NAR (2001)

29 e41

nominal 3:1

nominal 1:1

nominal 1:3

Page 17: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

A complex measurement process lies between mRNA concentrations and intensities

o RNA degradation

o quality of actual probe sequences (vs intended)

o image segmentation

o amplification efficiency

o scratches and spatial gradients on the array

o signal quantification

o reverse transcription efficiency

o cross-talk across features

o signal "preprocessing"

o hybridization efficiency and specificity

o cross-hybridisation

o labeling efficiency

o optical noise

The problem is less that these steps are ‘not perfect’; it is that they vary from array to array, experiment to experiment.

Page 18: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Preprocessing Terminology

Calibration, normalisation: adjust for systematic drifts associated with dye, array (and sometimes position within array)

Background correction: adjust for the non-linearity at the lower end of the dynamic range

Transformation: bring data to a scale appropriate for the analysis (e.g. logarithm; variance stabilisation)

Log-ratio: adjust for unknown scale (units) of the data

Existing approaches differ in the order in which these steps are done, some are exactly stepwise („greedy“), others aim to gain strength by doing things simultaneously.

Page 19: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Why do you need statistics?

Page 20: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

tumor-normal

Which genes are differentially transcribed?

same-same

log-ratio

Page 21: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Statistics 101:

bias accuracy

p

recis

ion

vari

an

ce

Page 22: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Basic dogma of data analysis

Can always increase sensitivity

on the cost of specificity, or vice

versa, the art is to

- optimize both, then

- find the best trade-off.

X

X

X

X

X

X

X

X

X

Page 23: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

How to compare microarray intensities with each other?

How to address measurement uncertainty?

How to calibrate (“normalize”) for systematic differences between samples?

How to deal with non-linearity (esp. at the lower end, „background“)

Questions

Page 24: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Sources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-fluorescent detection

probe purity and length distributionspotting efficiency, spot sizecross-/unspecific hybridizationstray signal

Calibration Error model

Systematic o similar effect on many measurementso corrections can be estimated from data

Stochastic

o too random to be ex-plicitely accounted for o remain as “noise”

Page 25: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quantile normalisation

Page 26: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quantile normalisation

Ben Bolstad 2001

1e

2e

d

d

d

Page 27: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

data("Dilution")nq = normalize.quantiles(exprs(Dilution))nr = apply(exprs(Dilution), 2, rank)for(i in 1:4) plot(nr[,i], nq[,i], pch=".", log="y", xlab="rank",

ylab="quantile normalized", main=sampleNames(Dilution)[i])

Page 28: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

6 8 10 12 14

0.0

0.2

0.4

0.6

0.8

before

log2(exprs(Dilution))

De

nsi

ty

6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

after quantile normalisation

log2(nq)

De

nsi

ty

Page 29: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quantile normalisation is: per array rank-transformation followed by replacing ranks with values

from a common reference distribution

Histogram of log2(nq[, 1])

log2(nq[, 1])

Fre

qu

en

cy

6 8 10 12 14

05

00

01

00

00

15

00

02

00

00

25

00

03

00

00

Page 30: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quantile normalisation

+ Simple, fast, easy to implement

+ Always works, needs no user interaction / tuning

+ Non-parametric: can correct for quite nasty non-linearities (saturation, background) in the data

- Always "works", even if data are bad / inappropriate

- May be conservative: rank transformation looses information - may yield less power to detect differentially expressed genes

- Aggressive: if there is an excess of up- (or down) regulated genes, it removes not just technical, but also biological variation

Page 31: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

loess normalisation

Page 32: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

"loess" normalisationloess (locally weighted scatterplot smoothing): an

algorithm for robust local polynomial regression by W. S. Cleveland and colleagues (AT&T, 1980s) and handily available in R

Page 33: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Local polynomial regression

0

22 1

1 1

1

Global polynomial regression

( ) ...

applied to data ( , ),..., ( , ), with equal weights

resulting in global fit ( ,..., )

Local polynomial regression around

with w

pp

n n

p

y x a x a x a x a

x y x y

a a

1

eights ( - )

resulting in local fit ( ( ),..., ( ))

b

p

h x

a v a v

bandwidth b

Page 34: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Robust regression

2 4 6 8 10

51

01

52

0

x

y

lmrlm

2

1

1

1, ,

OLS: ( ) min

M-est.: ( ) min

LTS: { ( ) | } min

n

i ii

n

i ii

i i i n

y f x

M y f x

y f x

F

Page 35: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

C. Loader

Local Regression and Likelihood

Springer Verlag

Page 36: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

loess normalisation

before after

• local polynomial regression of M against A• 'normalised' M-values are the residuals

Page 37: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

local polynomial regression normalisation in >2 dimensions

Page 38: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

n-dimensional local regression model for microarray normalisation( ) ( )

: log-intensity of gene in condition , replicate

: baseline value gene ( -value)

: effect of treatment on gene

( ) : intensity-dependent normalisation fu

kij k ij k ik k kij

kij

k

ik

ij k

Y

Y k i j

k A

i k

nction for array

( ) : intensity-dependent error scale function

: i.i.d. error term

k

kij

ij

An algorithm for fitting this robustly is described (roughly) in the paper. They only provided software as a compiled binary for Windows. The method has not found much use.

Page 39: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Estimating relative expression

(fold-changes)

Page 40: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

ratios and fold changes

Fold changes are useful to describe continuous changes in expression

1000

1500

3000

x3

x1.5

A B C

0

200

3000

?

?

A B C

But what if the gene is “off” (below detection limit) in one condition?

Page 41: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

ratios and fold changes

The idea of the log-ratio (base 2)0: no change

+1: up by factor of 21 = 2 +2: up by factor of 22 = 4 -1: down by factor of 2-1 = 1/2 -2: down by factor of 2-2 = ¼

What about a change from 0 to 500?- conceptually- noise, measurement precision

A unit for measuring changes in expression: assumes that a change from 1000 to 2000 units has a similar biological meaning to one from 5000 to 10000.…. data reduction

Page 42: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Many data are measured in definite units:

• time in seconds• lengths in meters• energy in Joule, etc.

Climb Mount Plose (2465 m) from Brixen (559 m) with weight of 76 kg, working against a gravitation field of strength 9.81 m/s2 :

What is wrong with microarray data?

(2465 - 559) · 76 · 9.81 m kg m/s2

= 1 421 037 kg m2 s-2

= 1 421.037 kJ

Page 43: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Two component error model and variance

stabilisation

Page 44: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

ik i ika a

ai per-sample offset

ik additive noise

bi per-sample gain factor

bk sequence-wise probe efficiency

ik multiplicative noise

exp( )ik i k ikb b b

ik ik ik ky a b x

The two component model

measured intensity = offset + gain true abundance

Page 45: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

The two-component model

raw scale log scale

“additive” noise

“multiplicative” noise

B. Durbin, D. Rocke, JCB 2001

Page 46: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

The additive-multiplicative error model

Trey Ideker et al.: JCB (2000)

David Rocke and Blythe Durbin: JCB (2001), Bioinformatics (2002)

Use for robust affine regression normalisation: W. Huber, Anja von Heydebreck et al. Bioinformatics (2002).

For background correction in RMA: R. Irizarry et al., Biostatistics (2003).

Page 47: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Parameterization

(1 )

y a b x

y a b x e

two practically equivalent forms

(<<1)

a: average background

on one array, for one color, the same for all features

also dependent on the reporter sequence

background fluctuations

same distribution in whole experiment

different distributions

b: average gain factor on one array, for one color, the same for all features

intensity dependent

gain fluctuations same distribution in whole experiment

different distributions

Page 48: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

variance stabilizing transformations

Xu a family of random variables with

E(Xu) = u and Var(Xu) = v(u). Define

Var f(Xu ) does not depend on u

Derivation: linear approximation,relies on smoothness of v(u).

( )v( )

x

duf x

u

Page 49: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

0 20000 40000 60000

8.0

8.5

9.0

9.5

10

.01

1.0

raw scale

tra

nsf

orm

ed

sca

le

variance stabilizing transformation

f(x)

x

Page 50: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

variance stabilizing transformations

1( )

v( )

x

f x duu

1.) constant variance (‘additive’) 2( ) sv u f u

2.) constant CV (‘multiplicative’) 2( ) logv u u f u

4.) additive and multiplicative

2 2 0

0( ) ( ) arsinhu u

v u u u s fs

3.) offset 20 0( ) ( ) log( )v u u u f u u

Page 51: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

the “glog” transformation

P. Munson, 2001

D. Rocke & B. Durbin, ISMB 2002

W. Huber et al., ISMB 2002

2 2

2 2

e

glog ( , ) log2

glog ( ,1) log 2 arsinh( )e

x x cx c

x x

Page 52: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

raw scale log glog

difference

log-ratio

generalized

log-ratio

constant partvariance:

proportional part

glog

Page 53: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

dif

fere

nc

e re

d-g

reen

rank(average)

Page 54: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Parameter estimation

2Yarsinh , (0, )iki

k ki kii

aN c

b

:

o maximum likelihood estimator: straightforward – but sensitive to deviations from normality

o model holds for genes that are unchanged; differentially transcribed genes act as outliers.

o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression.

o works well as long as <50% of genes are differentially transcribed (and may still work otherwise)

ii k i k i ka a L a i p e r - s a m p l e o ff s e t

L i k l o c a l b a c k g r o u n d p r o v i d e d b y i m a g e a n a l y s i s

i k ~ N ( 0 , b i2 s 1

2 )

“ a d d i t i v e n o i s e ”

b i p e r - s a m p l en o r m a l i z a t i o n f a c t o r

b k s e q u e n c e - w i s el a b e l i n g e ffi c i e n c y

i k ~ N ( 0 , s 22 )

“ m u l t i p l i c a t i v e n o i s e ”

e x p ( )ii k k i kb b b

i k i k i k i ky a b x

m e a s u r e d i n t e n s i t y = o ff s e t + g a i n * t r u e a b u n d a n c e

Page 55: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Least trimmed sum of squares regression

0 2 4 6 8

02

46

8

x

y

n

2 2

( ) ( )i=1

( )i iy f x

minimize

- least sum of squares - least trimmed sum of squares

P. Rousseeuw, 1980s

Page 56: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

“usual” log-ratio

'glog' (generalized log-ratio)

1

2

2 21 1 1

2 22 2 2

log

log

x

x

x x c

x x c

c1, c2 are experiment specific parameters (~level of background noise)

Page 57: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Variance Bias Trade-Off

Est

imat

ed l

og

-fo

ld-c

han

ge

Signal intensity

logglog

Page 58: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Variance-bias trade-off and shrinkage estimators

Shrinkage estimators:a general technology in statistics:pay a small price in bias for a large decrease of variance, so overall the mean-squared-error (MSE) is reduced.

Particularly useful if you have few replicates.

Generalized log-ratio is a shrinkage estimator for log fold change

Page 59: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Variance-bias trade-off and shrinkage estimators

Same-same comparison

log-ratio

glog-ratio

Lines: 29 data points with observed ratio of 2

Fig. 5.11 from Hahne et al.‘s useR-book

Page 60: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Linear and Non-linear

linear affine linear “genuinely” non-linear

Page 61: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Always affine?

vsn provides a combination of glog-transformation and affine between-array* normalisation

What if you want to normalise for genuine non-linear effects, and still use the nice transformation?

Set parameter calib in vsn2 function to none (default: affine) and do your own normalisation beforehand (do not (log-)transform). The vignette shows an example for use with quantile normalisation.

* print-tip groups or other stratifications are also possible

Page 62: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Background correction

Page 63: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Background correctionBackground correction

Irizarry et al. Biostatistics 2003

0 pm

500 fm 1 pm

750 fm

Page 64: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

RMA Background correction

~ log-normal with mean and sd read off values

~ exponential

closed form expression for [ | ],

ˆ use this as ( 0).

(NB, [ 0] 1 is not realistic)

PM B S

B MM

S

E S PM

s

P S

Irizarry et al. (2002)

Page 65: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Background correction: Background correction:

raw intensities x

biased background correction

s=E[S|data]

unbiased background correction

s=x-b

log2(s) glog2(s|data)

?

Page 66: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Comparison between RMA and VSN background correction

vsn package vignette

Page 67: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Dilution data

vsn package vignette

Page 68: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Summaries for Affymetrix genechip probe sets

Page 69: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Data and notationPMikg , MMikg = Intensities for perfect match and

mismatch probe k for gene g on chip i

i = 1,…, n one to hundreds of chips

k = 1,…, J usually 11 probe pairs

g = 1,…, G tens of thousands of probe sets.

Tasks: calibrate (normalize) the measurements from different chips (samples)summarize for each probe set the probe level data, i.e., 11 PM and MM

pairs, into a single expression measure.compare between chips (samples) for detecting differential

expression.

Page 70: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Expression measures: MAS 4.0

Expression measures: MAS 4.0

Affymetrix GeneChip MAS 4.0 software used AvDiff, a trimmed mean:

o sort dk = PMk -MMk o exclude highest and lowest valueo K := those pairs within 3 standard deviations of

the average

1( )

# k kk K

AvDiff PM MMK

Page 71: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Expression measures MAS 5.0

Expression measures MAS 5.0

Instead of MM, use "repaired" version CT

CT = MM if MM<PM

= PM / "typical log-ratio" if MM>=PM

Signal = Weighted mean of the values log(PM-CT)

weights follow Tukey Biweight function

(location = data median,

scale a fixed multiple of MAD)

0 20 40 60 80 100

0.0

0.4

0.8

Tukey Biweight

x

w

Page 72: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Expression measures: Li & Wong

Expression measures: Li & Wong

dChip fits a model for each gene

where

i : expression measure for the gene in sample i

k : probe effect

i is estimated by maximum likelihood

2, (0, )ki ki k i ki kiPM MM N

Page 73: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.
Page 74: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

dChip

RMA

bi is estimated using the robust method median polish (successively remove row and column medians, accumulate terms, until convergence).

Expression measures RMA: Irizarry et al. (2002)Expression measures

RMA: Irizarry et al. (2002)

2log ki k i kiY a b

2, (0, )ki k i ki kiY N

Page 75: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

However, median (and hence median polish) is not always so robust...

See also: Casneuf T. et al. (2007), In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. BMC Bioinformatics 2007;8(1): 461

x

Fre

qu

en

cy

-2 0 2 4 6 8 10

02

46

x

Fre

qu

en

cy

-2 0 2 4 6 8 100

24

6

- median- trimmed mean (0.15)

Page 76: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Probe effect adjustment by using gDNA reference

Huber et al., Bioinformatics 2006

Page 77: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Genechip S. cerevisiae Tiling Array

4 bp tiling path over complete genome(12 M basepairs, 16 chromosomes)

Sense and Antisense strands6.5 Mio oligonucleotides 5 m feature size

manufactured by Affymetrixdesigned by Lars Steinmetz (EMBL & Stanford Genome Center)

Page 78: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

RNA Hybridization

Page 79: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Before normalization

Page 80: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Probe specific response normali-zation

2log ii

i

yq

s

2

( )glog i i

ii

y b sq

s

2log iy

2log is

remove ‘dead’ probes

2glog

i ii

i

PM MMq

s

S/N

3.22

3.47

4.04

4.58

4.36

Page 81: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Probe-specific response normalization

si probe specific response factor. Estimate taken from DNA hybridization data

bi =b(si ) probe specific background term. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)

2

( )glog i i

ii

y b sq

s

Page 82: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Estimation of b: joint distribution of (DNA, RNA) values of intergenic PM probes

log2 DNA intensity

log

2 R

NA

in

ten

sity unannotated

transcripts

background

b(s)

Page 83: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

After normalization

Page 84: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quality assessment

Page 85: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Quality Assessment and Control

arrayQualityMetrics package by Audrey Kauffmann

This afternoon!

Page 86: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

References

Bioinformatics and computational biology solutions using R and Bioconductor, R. Gentleman, V. Carey, W. Huber, R. Irizarry, S. Dudoit, Springer (2005).

Variance stabilization applied to microarray data calibration and to the quantification of differential expression. W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M. Vingron. Bioinformatics 18 suppl. 1 (2002), S96-S104.

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. R. Irizarry, B. Hobbs, F. Collins, …, T. Speed. Biostatistics 4 (2003) 249-264.

Error models for microarray intensities. W. Huber, A. von Heydebreck, and M. Vingron. Encyclopedia of Genomics, Proteomics and Bioinformatics. John Wiley & sons (2005).

Normalization and analysis of DNA microarray data by self-consistency and local regression. T.B. Kepler, L. Crosby, K. Morgan. Genome Biology. 3(7):research0037 (2002)

Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. S. Dudoit, Y.H. Yang, M. J. Callow, T. P. Speed. Technical report # 578, August 2000 (UC Berkeley Dep. Statistics)

A Benchmark for Affymetrix GeneChip Expression Measures. L.M. Cope, R.A. Irizarry, H. A. Jaffee, Z. Wu, T.P. Speed. Bioinformatics (2003).

....many, many more...

Page 87: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Acknowledgements

Anja von Heydebreck (Darmstadt)Robert Gentleman (Seattle)Günther Sawitzki (Heidelberg)Martin Vingron (Berlin)Rafael Irizarry (Baltimore)Terry Speed (Berkeley)Judith Boer (Leiden) Anke Schroth (Wiesloch)Friederike Wilmer (Hilden)Jörn Tödling (Cambridge)Lars Steinmetz (Heidelberg)Audrey Kauffmann (Cambridge)

Page 88: Microarray normalization, error models, quality Wolfgang Huber EMBL Brixen 15 June 2009.

Recommended