+ All Categories
Home > Documents > normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization...

normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization...

Date post: 16-Feb-2018
Category:
Upload: nguyenthu
View: 231 times
Download: 2 times
Share this document with a friend
19
1 Normalization Normalization Normalization is needed to ensure that differences in intensities are indeed due to differential expression, and not some printing, hybridization, or scanning artifact. Normalization is necessary before any analysis which involves within or between slides comparisons of intensities, e.g., clustering, testing. Somewhat different approaches are used in two- color and one-color technologies Example of Replicate Data Example of Replicate Data Here different scanners were used Example of Example of Replicate Data Replicate Data
Transcript
Page 1: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

1

NormalizationNormalization

• Normalization is needed to ensure that differencesin intensities are indeed due to differentialexpression, and not some printing, hybridization, orscanning artifact.

• Normalization is necessary before any analysiswhich involves within or between slidescomparisons of intensities, e.g., clustering, testing.

• Somewhat different approaches are used in two-color and one-color technologies

Example of Replicate DataExample of Replicate Data

Here different scanners were used

Example ofExample of Replicate DataReplicate Data

Page 2: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

2

MostMost Common ProblemCommon Problem

Intensity dependent effect: Differentbackground level most likely culprit

Scatter PlotScatter Plot

Demonstrates importance of MA plot

Two-color platformsTwo-color platforms• Platforms that use printing robots are

prone to many systematic effects:– Dye– Print-tip– Plates– Print order– Spatial

• Some examples follow

Page 3: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

3

Print-tip EffectPrint-tip Effect

spotting pin quality decline

after delivery of 3x105 spots

after delivery of 5x105 spots

H. Sueltmann DKFZ/MGA

Plate effectPlate effect

Page 4: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

4

Bad Plate EffectBad Plate Effect

Bad Plate EffectBad Plate Effect

Print Order EffectPrint Order Effect

Page 5: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

5

Spatial EffectSpatial Effect

Spatial EffectsSpatial Effects

R Rb R-Rbcolor scale by rank

spotted cDNA arrays, Stanford-type

anotherarray:

print-tip

colorscale

~log(G)

colorscale

~rank(G)

10 20 30 40 50 60

10

20

30

40

50

60

1:nrhyb

1:nrhyb

1 2 3 4 5 6 7 8 910111213141516171823242526272829303132333435363738737475767778798081828384858687888990919293949596979899100

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Batches: array to array differences dij = madk(hik -hjk)

arrays i=1…63; roughly sorted by time

Page 6: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

6

What can we do?What can we do?• Throw away the data and start again? Maybe.• Statistics offers hope:

– Use control genes to adjust– Assume most genes are not differentially

expressed– Assume distribution of expression are the

same

Simplest IdeaSimplest Idea• Assume all arrays have the same median log expression or relative log

expression

• Subtract median from each array

• In two-color platforms, we typically correct the Ms. Median correctionforces the median log ratio to be 0– Note: We assume there are as many over-expressed as under-

expressed genes)

• For Affymetrix arrays we usually add a constant that takes us back tothe original range.– It is common to use the median of the medians– Typically, we subtract in the log-scale

• Usually this is not enough, e.g. it will not account for intensitydependent bias

House Keeping GenesHouse Keeping Genes

I rarely find house keeping genes useful

Page 7: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

7

More Elaborate SolutionsMore Elaborate Solutions• Proposed solutions

– Force distributions (not just medians) to be the same:• Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

– Use curve estimators, e.g. loess, to adjust for the effect:• Li and Wong (2001) Note: they also use a rank invariant set• Colantuoni et al (2002)• Dudoit et al (2002)

– Use adjustments based on additive/multiplicative model:• Rocke and Durbin (2003)• Huber et al (2002)• Cui et al (2003)

Quantile Quantile normalizationnormalization• All these non-linear methods perform similarly• Quantiles is my favorite because its fast and

conceptually simple• Basic idea:

– order value in each array– take average across probes– Substitute probe intensity with average– Put in original order

Example of Example of quantile quantile normalizationnormalization

933

853

864

1445

442

1465

954

843

843

432

888

666

555

555

333

635

565

586

858

353

Original Ordered Averaged Re-ordered

Page 8: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

8

Before Before Quantile Quantile NormalizationNormalization

After After Quantile Quantile NormalizationNormalization

A worry is that it over corrects

Two-color PlatformsTwo-color Platforms• Quantile normalization is popular with

high-density one channel arrays

• With two-color platforms we have manyeffects to worry about and seems weshould take advantage of the pairedstructure

Page 9: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

9

ANOVAANOVA• One of the first approaches was to fit

ANOVA models to log intensities with aglobal effect for each Dye

• This does not correct for the non-lineardependence on intensity

• Recent implementations subtract aconstant from the original scale toremove the non-linear effect i

For references look at papers by Gary Churchill

Different BackgroundDifferent Background

Above is MA for R=50+S, G=100+S

Correcting M approachesCorrecting M approaches• Most popular approach is to correct M directly• We assume that we observer M + Bias and

that Bias depends on Intensity (A), print-tip,plate, spatial location, etc…

• Idea: Estimate bias and remove it• For continuous variables we assume the

dependence is smooth and use loess toestimtate them

• The normalized M is M - estimated Bias• Most versatile method

For details look for papers by Terry Speed and Gordon Smyth

Page 10: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

10

Example: Intensity EffectExample: Intensity Effect• The most common problem is intensity

dependent effects– Probably due to different background

• Loess is used to estimate and removethis effects

LoessLoess

Page 11: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

11

Page 12: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

12

Print-tip LoessPrint-tip Loess

Error model approachesError model approaches• Error model approaches describe the

need for normalization with an additivebackground plus stochasticmultiplicative error model

• From this model an variance stabilizingtransformation is obtained

• Log ratios are no longer the measure ofdifferential expression

For details see papers by Wolfgang Huber and David Rocke

Page 13: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

13

FollowingFollowing Slides ProvidedSlides Providedby Wolfgang Huberby Wolfgang Huber

Error modelsError modelsDescribe the possible outcomes of a set ofmeasurements

Outcomes depend on:-true value of the measured quantity(abundances of specific molecules in biological sample)

-measurement apparatus(cascade of biochemical reactions, optical detectionsystem with laser scanner or CCD camera)

!= +iik ik

a a

ai per-sample offset

eik ~ N(0, bi2s1

2) “additive noise”

bi per-sample normalization factor

bk sequence-wise probe efficiency

hik ~ N(0,s22)

“multiplicative noise”

exp( )iik k ikb b b !=

ik ik ik ky a b x= +

The two component model

measured intensity = offset + gain × true abundance

Page 14: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

14

The two-component model The two-component model

raw scale log scale

“additive” noise

“multiplicative” noise

B. Durbin, D. Rocke, JCB 2001

ParameterizationParameterization(1 )y a b x

y a b x e!

" !

"

= + + # # +

= + + # #

two practicallyequivalent forms

(h<<1)

iid per arrayiid in wholeexperiment

h random gainfluctuations

per array x colorx print-tip group

per array x colorb systematic gainfactor

iid per arrayiid in wholeexperiment

e randombackground

per array x colorx print-tip group

same for all probes(per array x color)

a systematicbackground

Important issues for model fitting Important issues for model fittingParameterization

variance vs bias

"Heteroskedasticity" (unequal variances)⇒ weighted regression or variance stabilizing

transformationOutliers⇒ use a robust methodAlgorithmIf likelihood is not quadratic, need non-linear

optimization. Local minima / concavity oflikelihood?

Page 15: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

15

variance stabilizing transformationsvariance stabilizing transformations

Xu a family of random variables with

EXu=u, VarXu=v(u). Define

⇒ var f(Xu ) ≈ independent of u

1( )

v( )

x

f x duu

= !derivation: linear approximation

0 20000 40000 60000

8.0

8.5

9.0

9.5

10.0

11.0

raw scale

transfo

rmed s

cale

variance stabilizing transformationsvariance stabilizing transformations

f(x)

x

variance stabilizing transformationsvariance stabilizing transformations1

( )v( )

x

f x duu

= !

1.) constant variance (‘additive’)2( ) sv u f u= ! "

2.) constant CV (‘multiplicative’) 2( ) logv u u f u! " !

4.) additive and multiplicative

2 2 00( ) ( ) arsinh

u uv u u u s f

s

+! + + " !

3.) offset2

0 0( ) ( ) log( )v u u u f u u! + " ! +

Page 16: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

16

the the ““glogglog”” transformation transformation

- - - f(x) = log(x)

——— hs(x) = asinh(x/s)

( )( )

2arsinh( ) log 1

arsinh log log 2 0limx

x x x

x x!"

= + +

# # =

P. Munson, 2001

D. Rocke & B. Durbin,ISMB 2002

W. Huber et al., ISMB2002

raw scale log glog

difference

log-ratio

generalized

log-ratio

constant partvariance:

proportional part

glog

the transformed modelthe transformed model

2

Yarsinh

(0, )

sikik ki

si

ki

a

b

N c

µ !

!

"= +

:

i: arraysk: probess: probe strata (e.g. print-tip, region)

Page 17: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

17

profile log-likelihoodprofile log-likelihood

,

( , ) sup ( , , , )c

pll a b ll a b cµ

µ=

Here:

Least trimmed sum of squares regressionLeast trimmed sum of squares regression

0 2 4 6 8

02

46

8

x

y

( )2n/2

( ) ( )i=1

( )i iy f x!"

minimize

- least sum of squares- least trimmed sum of squares

P. Rousseeuw, 1980s

“usual” log-ratio

'glog'(generalizedlog-ratio)

+ +

+ +

1

2

2 21 1 1

2 22 2 2

log

log

x

x

x x c

x x c

c1, c2 are experiment specific parameters(~level of background noise)

Page 18: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

18

Variance Bias Trade-OffVariance Bias Trade-OffEs

timat

ed log

-fold-

chan

ge

Signal intensity

logglog

Variance-bias trade-off and shrinkage estimatorsVariance-bias trade-off and shrinkage estimators

Shrinkage estimators:pay a small price in bias for a large decrease of variance,so overall the mean-squared-error (MSE) is reduced.

Particularly useful if you have few replicates.

Generalized log-ratio:= a shrinkage estimator for fold change

There are many possible choices, we chose “variance-stabilization”:+ interpretable even in cases where genes are off in someconditions+ can subsequently use standard statistical methods(hypothesis testing, ANOVA, clustering, classification…)without the worries about low-level variability that are oftenwarranted on the log-scale

““Single color normalizationSingle color normalization””n red-green arrays (R1, G1, R2, G2,… Rn, Gn)within/between slides

for (i=1:n)calculate Mi= log(Ri/Gi), Ai= ½ log(Ri*Gi)normalize Mi vs Ai

normalize M1…Mnall at once

normalize the matrix of (R, G) then calculate log-ratios or any other

contrast you like

Page 19: normalization - Departmentsririzarr/Teaching/688/normalization.pdf · 1 Normalization •Normalization is needed to ensure that differences in intensities are indeed due to differential

19

Back to you Back to you RafaRafa!!

Concluding RemarksConcluding Remarks• Notice Normalization and background

correction are related• Current procedures are based on

assumptions• Many new problems clearly violate

these assumptions• We will discuss this problem in another

lecture


Recommended