Post on 19-Dec-2015
transcript
DNA Microarray Bioinformatics - #27612
Normalization
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612
Sample PreparationHybridization
Array designProbe design
QuestionExperimental Design
Buy Chip/Array
Statistical AnalysisFit to Model (time series)
Expression IndexCalculation
Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network
ComparableGene Expression Data
Normalization
Image analysis
The DNA Array Analysis Pipeline
DNA Microarray Bioinformatics - #27612
Expression intensities are not just target concentrations
• Sample contamination
• RNA quality• Sample preparation• Dye effect (cy3/cy5)• Probe affinity• Hybridization• Unspecific signal
(background)• Saturation
•Spotting•Other issues related to array manufacturing
•Image segmentation•Array spatial effects
DNA Microarray Bioinformatics - #27612
Gene-specific variation
Spotting (size and shape)Cross-hybridizationDye
Biological variation– Effect– Noise
Global variation
RNA qualitySample preparationDyeHybridizationPhotodetection
Systematic
Two kinds of variation in the signal
Stochastic
DNA Microarray Bioinformatics - #27612
Gene-specific variation:
• Too random to be explicitly accounted for• “noise”
Global variation:
• Similar effect on many
measurements• Corrections can be estimated from data
Normalization Statistical testing
Sources of variation
Systematic Stochastic
DNA Microarray Bioinformatics - #27612
Calibration = Normalization = Scaling
DNA Microarray Bioinformatics - #27612
Nonlinear normalization
DNA Microarray Bioinformatics - #27612
Lowess Normalization
One of the most commonly utilized normalization techniques is the LOcally Weighted Scatterplot Smoothing (LOWESS) algorithm.
M
A
* * * *** *
DNA Microarray Bioinformatics - #27612
The Qspline method
From the empirical distribution, a number of quantiles are calculated for each of the channels to be normalized (one channel shown in red) and for the reference distribution (shown in black)A QQ-plot is made and a normalization curve is constructed by fitting a cubic spline functionAs reference one can use an artificial “median array” for a set of arrays or use a log-normal distribution, which is a good approximation.
DNA Microarray Bioinformatics - #27612
Once again…qspline
When many microarrays are to be normalized to each other an average array can be used as target
Accumulating quantiles
DNA Microarray Bioinformatics - #27612
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Invariant set normalization (Li and Wong)
A invariant set of probes is used-Probes that does does not change intensity rank between arrays-A piecewise linear median line is calculated-This curve is used for normalization
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
DNA Microarray Bioinformatics - #27612
Spatial biasestimate
Spatial normalization
After intensitynormalization
After spatialnormalization
Raw data After intensitynormalizationAfter intensitynormalization
After spatialnormalizationAfter spatial
normalization
DNA Microarray Bioinformatics - #27612
Sample PreparationHybridization
Array designProbe design
QuestionExperimental Design
Buy Chip/Array
Statistical AnalysisFit to Model (time series)
Expression IndexCalculation
Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network
ComparableGene Expression Data
Normalization
Image analysis
The DNA Array Analysis Pipeline
DNA Microarray Bioinformatics - #27612
Expression index value
Some microarrays have multiple probes addressing the expression of the same target
– Affymetrix GeneChips have 11-20 probe pairs pr. Gene
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.- Perfect Match (PM)
- MisMatch (MM)
PM: CGATCAATTGCACTATGTCATTTCT MM: CGATCAATTGCAGTATGTCATTTCT
DNA Microarray Bioinformatics - #27612
Expression index calculation
Simplest method?Median
But more sophisticated methods exists:dChip, RMA and MAS 5
DNA Microarray Bioinformatics - #27612
dChip (Li & Wong)
Model: PMij = ij + ij
Outlier removal:– Identify extreme residuals– Remove– Re-fit– Iterate
Distribution of errors ij assumed independent of signal strength
(Li and Wong, 2001)
DNA Microarray Bioinformatics - #27612
RMA
Robust Multi-array Average (RMA) expression measure (Irizarry et al., Biostatistics, 2003)
For each probe set, re-write PMij = ij as:
log(PMij)= log(i ) + log(j)
Fit this additive model by iteratively re-weighted least-squares or median polish
DNA Microarray Bioinformatics - #27612
MAS. 5
MicroArray Suite version 5 uses
MM* is an adjusted MM that is never bigger than PM
Tukey biweight is a robust average procedure with
weights and outlier rejection
)}{log( *jj MMPMghtTukeyBiweisignal −=
DNA Microarray Bioinformatics - #27612
Std Dev of gene measures from 20 replicate arrays
Methods compared on expression variance
Standard deviation of gene measures from 20 replicate arrays
RMA: Blue and RedMAS5: GreendChip: Black
Expression level
From Terry speed
DNA Microarray Bioinformatics - #27612
Robustness
MAS5.0
(Irizarry et al., Biostatistics, 2003)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
MAS 5.0
Log fold change estimate from 1.25ug cRNA
Log
fold
ch
an
ge
esti
mate
fro
m 2
0u
g c
RN
A
DNA Microarray Bioinformatics - #27612
Robustness
dChip
(Irizarry et al., Biostatistics, 2003)
dChip
Log fold change estimate from 1.25ug cRNA
Log
fold
ch
an
ge
esti
mate
fro
m 2
0u
g c
RN
A
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
DNA Microarray Bioinformatics - #27612
Robustness
RMA
(Irizarry et al., Biostatistics, 2003)
RMA
Log fold change estimate from 1.25ug cRNA
Log
fold
ch
an
ge
esti
mate
fro
m 2
0u
g c
RN
A
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
DNA Microarray Bioinformatics - #27612
All of this is implemented in…
R
In the BioConductor packages ‘affy’
(Gautier et al., 2003).
DNA Microarray Bioinformatics - #27612
References
Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues and standard error application. Genome Biology 2:1–11.
Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 31(4):e15.)
Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa
Clara, CA, version 5 edition, 2001.
Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of affymetrix genechip data at the probe level. Bioinformatics