+ All Categories
Home > Documents > Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA...

Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA...

Date post: 23-Jun-2020
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
26
DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable
Transcript
Page 1: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Normalization

Getting the numbers comparable

Page 2: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Sample PreparationHybridization

Array designProbe design

QuestionExperimental Design

Buy Chip/Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network

ComparableGene Expression Data

Normalization

Image analysis

The DNA Array Analysis Pipeline

Page 3: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Intensities are not just mRNAconcentrations

• Tissue contamination• RNA degradation• RNA purification• Reverse transcription• Amplification efficiency• Dye effect (cy3/cy5)

• Spotting• DNA-support binding• Other issues related toarray manufacturing

• ‘Background’ correction• Image segmentation• Hybridization efficiencyand specificity

• Spatial effects

Example of spatial effects on microarrays

Spatial biasestimate

Raw data

The distribution of solventsand temperature over thearray surface and thewashing procedure, mayresult in spatial effects

Page 4: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Gene-specific variation

Spotting efficiency,– Spot size– Spot shape

Cross-/unspecifichybridization

Biological variation– Effect– Noise

Global variation

Amount of RNA in the biopsy

Efficiencies of:– RNA extraction– Reverse transcription– amplification– Labeling– Photodetection

Systematic

Two kinds of variation

Stochastic

Page 5: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Stochastic noise we use statistics to deal with

PCA Plot of 34 patients, 8973 dimensions (genes) reduced to 2

Page 6: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

...like we will see tomorrow

PCA for 100 most significant genes reduced to 2 dimensions

Page 7: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Gene-specific variation:

• Too random to be explicitlyaccounted for• “noise”

Array-specific variation:

• Similar effect on manymeasurements• Corrections can beestimated from data

Normalization Statistical testing

Sources of variation

Systematic Stochastic

Page 8: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Calibration = Normalization = Scaling

Page 9: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Nonlinear normalization

Page 10: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

The Qspline method

From the empirical distribution, a number of quantiles are calculated foreach of the channels to be normalized (one channel shown in red) and forthe reference distribution (shown in black)A QQ-plot is made and a normalization curve is constructed by fitting acubic spline functionAs reference one can use an artificial “median array” for a set of arraysor use a log-normal distribution, which is a good approximation.

Page 11: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Once again…qspline

When many microarrays are to benormalized to each other an averagearray can be used as target

Accumulating quantiles

Page 12: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Lowess Normalization

One of the most commonly utilized normalizationtechniques is the LOcally Weighted ScatterplotSmoothing (LOWESS) algorithm.

M

A

* * * * ** *

Page 13: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Invariant set normalization (Li and Wong)

A invariant set of probes is used

-Probes that does does not change intensity rank between arrays

-A piecewise linear median line is calculated

-This curve is used for normalization

Page 14: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Spatial biasestimate

Spatial normalization

After intensitynormalization

After spatialnormalization

Raw data After intensitynormalizationAfter intensitynormalization

After spatialnormalizationAfter spatial

normalization

Page 15: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Sample PreparationHybridization

Array designProbe design

QuestionExperimental Design

Buy Chip/Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network

ComparableGene Expression Data

Normalization

Image analysis

The DNA Array Analysis Pipeline

Page 16: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Expression index value

Some microarrays have multiple probes addressingthe expression of the same gene

– Affymetrix chips have 11-20 probe pairs pr. Gene

- Perfect Match (PM)

- MisMatch (MM)

PM: CGATCAATTGCACTATGTCATTTCTMM: CGATCAATTGCAGTATGTCATTTCT

However for downstream analysiswe often want to deal with only onevalue pr. gene.Therefore we want to collapse theintensities from many probes intoone value:a gene expression index value

Page 17: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Expression index calculation

Simplest method? Median

But more sophisticated methods exists:dChip, RMA and MAS 5 (from Affymetrix)

Page 18: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

dChip (Li & Wong)

Model: PMij = θiφj + εij

Outlier removal:– Identify extreme residuals– Remove– Re-fit– Iterate

Distribution of errors εij assumedindependent of signal strength

(Li and Wong, 2001)

Page 19: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

RMA

Robust Multi-array Average (RMA) expressionmeasure (Irizarry et al., Biostatistics, 2003)

For each probe set, re-write PMij = θiφj as:log(PMij)= log(θi ) + log(φj)

Fit this additive model by iteratively re-weightedleast-squares or median polish

Page 20: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

MAS. 5

MicroArray Suite version 5 uses

MM* is an adjusted MM that is never bigger than PMTukey biweight is a robust average procedure with weightsand outlier rejection

)}{log( *jj MMPMghtTukeyBiweisignal −=

Page 21: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Std Dev of gene measures from 20 replicate arrays

Methods compared on expression variance

Std Dev of gene measures from 20 replicate arrays

Blue and Red: RMA; Black: dChip; Green: MAS5.0

Expression level

From Terry speed

Page 22: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

Robustness

MAS5.0

(Irizarry et al., Biostatistics, 2003)

MAS 5.0

Log fold change estimate from 1.25ug cRNA

Log

fold

cha

nge

est

imat

e fro

m 2

0ug

cRNA

Page 23: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

RobustnessdChip

(Irizarry et al., Biostatistics, 2003)

dChip

Log fold change estimate from 1.25ug cRNA

Log

fold

cha

nge

est

imat

e fro

m 2

0ug

cRNA

Page 24: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

RobustnessRMA

(Irizarry et al., Biostatistics, 2003)

RMA

Log fold change estimate from 1.25ug cRNA

Log

fold

cha

nge

est

imat

e fro

m 2

0ug

cRNA

Page 25: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

All of this is implemented in…

R

In the BioConductor packages ‘affy’

(Gautier et al., 2003).

Page 26: Normalization - cbs.dtu.dk · Normalization Image analysis The DNA Array Analysis Pipeline. DNA Microarray Bioinformatics - #27612 Expression index value Some microarrays have multiple

DNA Microarray Bioinformatics - #27612

ReferencesLi and Wong, (2001). Model-based analysis of oligonucleotide arrays: Modelvalidation, design issues and standard error application.Genome Biology 2:1–11.

Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of AffymetrixGeneChip probe level data.Nucleic Acids Research 31(4):e15.)

Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA,version 5 edition, 2001.

Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis ofaffymetrix genechip data at the probe level. Bioinformatics


Recommended