A robust neural networks approach for spatial and intensity-dependent normalization of cDNA...

Post on 22-Dec-2015

217 views 2 download

Tags:

transcript

A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray

data

A.L. Tarca, J.E.K. Cooke and J. MacKay

Presented by Dana Mohamed

Microarrays

Importance of Microarrays (and that the data is correct)

• Assumption that microarray data linearly reflects amount of mRNA present in cell– In turn, reflects gene expression levels

• If the data is incorrect,– So is our interpretation of gene expression

• And therefore all the science built on that interpretation is also incorrect

Where error is • Intensity of Fluorescence– Overall imbalance of dye intensity• 2 dyes: Cy5 (R) and Cy3 (G)• If R & G expressed at equal levels, R/G = 1

• Space– Intensities variable on coordinates• Can be “dirty” on sides of microarray

Previous Methods• Many address intensity bias

• Few address spatial bias

• Most rely on M* = M – m–M* is the normalized values

–M is the raw log-ratio (M = log2R/G)

–m is the estimate of the bias

Important Variables• M = log2(R/G)

– Log ratio converts multiplicative error to additive error

• A = (1/2)0.5log2RG

– Average of the log-intensities

• Minus-add plots–M vs. A– Useful for assessing systematic bias

Calculating m in other methods• gMed – global median normalization

– m = median(Mi)– Mi are all the values of M

• pLo – print tip loess– m = ci (A)

• pLoGS – found in GeneSight biodiscovery.com

– Local group median (3x3 square regions) + print tip loess

• cPLo2D - print tip loess + pure 2D normalization– BioConductor bioconductor.org

– m = α ci (A) + β ci (SpotRow,SpotCol)– ci (SpotRow,SpotCol) is the loess estimate of M using spot row and

column coordinates inside the ith print tip

• gLoMedF – global loess normalization + spatial median filter

Robust Neural Networks Technique

pNN2DA – print tip robust neural nets 2D and A

– Attempt to find the best fit of M using A and the 2-D space coordinates of the spots:

m = ci (A,X,Y)

• Instead of using individual print tips – use 3x3 “bins” of them – X and Y – Accounts for spatial bias

Neural Nets Terminology• Uses multi-layer feedforward network

• Sigmoid Function

Neural Networks• Uses multi-layer feedforward network

• x is the vector (X,Y,A,1),• I = 3,• w are the weights, • sigma one represents the hidden neurons and

they are sigmoid functions, • sigma two is the single neuron in the output layer,

which is also sigmoid, • Sigma one J+1 accounts for the second layer

bias, • J represents the number of neurons in the hidden

layer of the network

Multi-layered FeedforwardUsually, J = 3 to take care of outliers but also so as to avoid over-fitting

Criteria & DatasetsCriteria:

a) reduce variability of log-ratios between replicated slides and within slides

b) ability to distinguish truly regulated genes from the other genes

Datasets:

1) Apo AI: a,b

2) Swirl Zebra Fish: a

3) Poplar experiment: a

4) Perturbed Apo AI: b

Classic Neural Nets vs. Robust NNets

Criteria refresher

• The ability to reduce the variability of log-ratios between replicated slides and within slides

• The ability to distinguish truly regulated genes from the other genes

Impact on Variability

Cont. – 3 Data Sets

Downregulated Gene Sorting – Apo AI set

DRGS – Perturbed Apo AI set

Spatial Uniformity of M values distribution

Results Table

Strengths/Weaknesses• Seems promising

• Uses multiple tests to determine efficacy

• Doesn’t use enough datasets

• Uses patterned perturbed dataset– But no “real” perturbed dataset

Future Work• More datasets

• When should this normalization technique be used over other techniques?

• Should this technique be combined with elements of other techniques to further improve it?

References• Tarca, A.L., J.E.K. Cooke, and J. Mackay.

“A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data." Bioinformatics Jun 2005; 21: 2674 - 2683

• Haykin, Simon. Neural Networks: A Comprehensive Foundation. New Jersey: Prentice Hall, 1999.

• Mount, David W. Bioinformatics: sequence and genome analysis. New York: Cold Spring Harbor Laboratory Press, 2001.