Structure and Analysis ofAffymetrix Arrays
Monnie McGee
Department of Statistical Science
Southern Methodist University
UTSW Microarray Analysis Course, October 28, 2005 – p.1/56
Outline
Brief Review of Spotted Array Technology
Structure of Affymetrix Arrays
Exploratory Data Analysis
Affymetrix Data Files
Obtaining Gene Expression Values
Software
UTSW Microarray Analysis Course, October 28, 2005 – p.2/56
Microarray Measurements
All raw measurements are fluorescence intensities
Target cDNA (or mRNA) is fluorescently labeled
Molecules in dye are excited using a laser
Measurement is a count of the photons emitted
Entire slide or chip is scanned, and the result is a digitalimage
Image is processed to locate probes and assignintensity measurements to each probe
UTSW Microarray Analysis Course, October 28, 2005 – p.3/56
Microarray Technologies
Two Channel Spotted ArraysRobotic MicrospottingProbes are 300 to 3000 base pairs in lengthLong-oligo arrays: probes are uniformly 60 to 90 bpCommerical arrays using inkjet technology
Single-channel ArraysHigh-density short oligo (25 bp) arrays (Affymetrix,Nimblegen)
UTSW Microarray Analysis Course, October 28, 2005 – p.4/56
Spotted Arrays
Diagram courtesy of Columbia Department of Computer Science
UTSW Microarray Analysis Course, October 28, 2005 – p.5/56
The Affymetrix Chip
Human Genome U133 Plus 2.0 Array
Courtesy of Affymetrix
Some Definitions
Probes = 25 bpsequences
Probe sets = 11 to 20probes corresponding toa particular gene or EST
Chip contains 54K probesets
UTSW Microarray Analysis Course, October 28, 2005 – p.6/56
In situ Synthesis of Probes
Image Courtesy of Affymetrix
UTSW Microarray Analysis Course, October 28, 2005 – p.7/56
Probe Selection: HG-U133 Plus 2.0Sequence data for new content obtained from dbEST,GenBank, and RefSeq.
Draft assembly of Human Genome (NCBI Build 31)used to assess sequence orientation and quality.
Probes selected from the 600 bases most proximal tothe 3′ end of each transcript.
Probe Selection regions defined by the following:3′ ends of RefSeq and complete CDS mRNAsequencesEight or more 3′ EST reads terminating at thesample position (evidence for polyadenylation)3′ end of the assembly (consensus end).
Details found in Affymetrix Technical Note (2003).
UTSW Microarray Analysis Course, October 28, 2005 – p.8/56
Types of Probe Sets
No suffix: predicted to perfectly match a singletranscript
“_a” suffix: recognize multiple alternative transcriptsfrom the same gene
“_s” suffix: common probes among multiple transcriptsfrom separate genes
“_x” suffix: contain some probes that are identical orhighly similar to other sequences.
UTSW Microarray Analysis Course, October 28, 2005 – p.9/56
mRNA Hybridizes to Probes
Image Courtesy of Affymetrix
UTSW Microarray Analysis Course, October 28, 2005 – p.10/56
Sizes of Various GeneChips
Arrays for 27 organisms
Arabidopsis (2), Drosophilia (2), Mouse (5), Human (8),Yeast (2)
Arabidopsis: 24K genes, 11 pairs per probe setC Elegans: 22.5K genes, 11 pairs per probe setDrosophilia: 13.5K genes, 14 pairs per probe setHuman HG-U133 plus 2.0: 54K genes, 11-20 pairsper probe set.
Source: http://www.affymetrix.com/support/technical/datasheets.affx
UTSW Microarray Analysis Course, October 28, 2005 – p.11/56
Perfect Match vs. Mismatch
PM Probe = 25 bp probe perfectly complementary to aspecific region of a gene
MM Probe = 25 bp probe agreeing with a PM apart fromthe middle base.
The middle base is a transition (A ⇐⇒ T, C ⇐⇒ G) ofthat base
UTSW Microarray Analysis Course, October 28, 2005 – p.12/56
Perfect Match vs. Mismatch
PM Probe = 25 bp probe perfectly complementary to aspecific region of a gene
MM Probe = 25 bp probe agreeing with a PM apart fromthe middle base.
The middle base is a transition (A ⇐⇒ T, C ⇐⇒ G) ofthat base
Image Courtesy of Affymetrix
UTSW Microarray Analysis Course, October 28, 2005 – p.12/56
Riddle of the Mismatches
Mismatches were designed to capture non-specifichybridization
Hypothesized True Signal = PM - MM
Problem: Approximately 30% of the mismatches aregreater than their corresponding perfect matches.
UTSW Microarray Analysis Course, October 28, 2005 – p.13/56
Riddle of the Mismatches
Mismatches were designed to capture non-specifichybridization
Hypothesized True Signal = PM - MM
Problem: Approximately 30% of the mismatches aregreater than their corresponding perfect matches.
WHY ?
UTSW Microarray Analysis Course, October 28, 2005 – p.13/56
PM and MM Example
Target Transcript for Human recA gene:
ctcagcttaagtcatggaattctagaggatgtatctcacaagtaggatcaag
c t c a g c t t a a g t c a t g g a a t t c t a g PM1
c t c a g c t t a a g t g a t g g a a t t c t a g MM1
t c a g c t t a a g t c a t g g a a t t c t a g a PM2
t c a g c t t a a g t c t t g g a a t t c t a g a PM2
a t t c t a g a g g a t g t a t c t c a c a a g t PM3
a t t c t a g a g g a t c t a t c t c a c a a g t MM3
a g g a t g t a t c t c a c a a g t a g g a t c a PM4
a g g a t g t a t c t c t c a a g t a g g a t c a MM4
UTSW Microarray Analysis Course, October 28, 2005 – p.14/56
PM and MM Example
Target Transcript for Human recA gene:
ctcagcttaagtcatggaattctagaggatgtatctcacaagtaggatcaag
c t c a g c t t a a g t c a t g g a a t t c t a g PM1
c t c a g c t t a a g t g a t g g a a t t c t a g MM1
t c a g c t t a a g t c a t g g a a t t c t a g a PM2
t c a g c t t a a g t c t t g g a a t t c t a g a PM2
a t t c t a g a g g a t g t a t c t c a c a a g t PM3
a t t c t a g a g g a t c t a t c t c a c a a g t MM3
a g g a t g t a t c t c a c a a g t a g g a t c a PM4
a g g a t g t a t c t c t c a a g t a g g a t c a MM4
Morals: Large Overlap of sequences and variable GC content
UTSW Microarray Analysis Course, October 28, 2005 – p.14/56
Other Sources of VariationSystematic
Amount of RNA in biopsy extraction, Efficiencies of RNA
extraction, reverse transcription, labeling, photodetection, GC
content of probes
Similar effect on many measurements
Corrections can be estimated from data
Calibration corrections
UTSW Microarray Analysis Course, October 28, 2005 – p.15/56
Other Sources of VariationSystematic
Amount of RNA in biopsy extraction, Efficiencies of RNA
extraction, reverse transcription, labeling, photodetection, GC
content of probes
Similar effect on many measurements
Corrections can be estimated from data
Calibration corrections
StochasticPCR yield, DNA quality, Spotting efficiency, spot size,
Non-specific hybridization, Stray signal
Too random to be explicitly accounted for in a model
Noise components & “Schmutz”
UTSW Microarray Analysis Course, October 28, 2005 – p.15/56
Quality Control
We wish to find and eliminate problem probes beforeanalyzing the data
Problems may be local (scratch on the array,inadequate washing) or global (background set toohigh)
Look at image plots, histograms, MA plots, boxplots,etc.
UTSW Microarray Analysis Course, October 28, 2005 – p.16/56
Contaminated Image
Image courtesy of http//:www.biostat.harvard.edu/complab/dchip
UTSW Microarray Analysis Course, October 28, 2005 – p.17/56
Why Normalize ?
Ensure that differences in intensities are truly due todifferential expression, not printing, hybridization, orscanning artifacts
Must be done before an analysis which involvescomparison of intensities within or between slides
Procedures depend on the array technology
UTSW Microarray Analysis Course, October 28, 2005 – p.18/56
Dilution Data
Human liver tissue hybridized to human array HGU95A
Large range of proportions and dilutions
Our data hybridized at 10.0 and 20.0 µg
Two replicate arrays for each generated cRNA
Each array replicate was processed in a differentscanner
For more information, see http://qlotus02.genelogic.com/datasets.nsf/
UTSW Microarray Analysis Course, October 28, 2005 – p.19/56
Histograms from Dilution Study
6 8 10 12 14
0.0
0.1
0.2
0.3
0.4
0.5
0.6
log intensity
dens
ity
UTSW Microarray Analysis Course, October 28, 2005 – p.20/56
Boxplots
X20A X20B X10A X10B
68
1012
14
Small part of dilution study
UTSW Microarray Analysis Course, October 28, 2005 – p.21/56
M-A PlotsPlot of log fold change for gene j (Mj) versus the averagelog intensity for that gene (Aj).
6 8 10 12 14
−1
01
23
4
10B vs pseudo−median reference chip
A
M
Median: −0.535IQR: 0.207
UTSW Microarray Analysis Course, October 28, 2005 – p.22/56
Exploratory Data Analysis
UTSW Microarray Analysis Course, October 28, 2005 – p.23/56
Exploratory Data Analysis (cont’d)
UTSW Microarray Analysis Course, October 28, 2005 – p.24/56
Affymetrix Files
CDF file: Chip description file, describes which probesgo into which probe sets
DAT file: TIFF Image file, 107 pixels, ∼ 50 MB
CEL file: Probe intensities, ∼ 600,000 numbers
CHP file: Gene expression values as calculated byGeneChip Operating Software (GCOS)
Probe sets correspond to genes, gene fragments, or ESTs
UTSW Microarray Analysis Course, October 28, 2005 – p.25/56
Affymetrix DAT file
Scan of whole chip (left) and top left-hand corner (right) ofArabidopsis thaliana Genome Array.
Images courtesy of NASCA Arrays Help.
UTSW Microarray Analysis Course, October 28, 2005 – p.26/56
From DAT to CEL
CEL files contain fluorescence intensity values for all probepairs and all probe sets.
Use gridding to estimate location of probe cell centers
Remove outer 36 pixels → 8 × 8 pixels
PM (MM) intensity is the 75th percentile of the 8× 8 pixelvalues
Background: Average of the lowest 2% of probe cells is sub-
tracted
UTSW Microarray Analysis Course, October 28, 2005 – p.27/56
Analysis Tasks
Identify up- and down-regulated genes.
Find groups of genes with similar expression profiles.
Find groups of experiments (tissues) with similarexpression profiles.
Find genes that explain observed differences amongtissues (feature selection).
UTSW Microarray Analysis Course, October 28, 2005 – p.28/56
From CEL to Gene Expression
Computing Expression Values for each probe set requiresthree steps which begin with probe level data:
UTSW Microarray Analysis Course, October 28, 2005 – p.29/56
From CEL to Gene Expression
Computing Expression Values for each probe set requiresthree steps which begin with probe level data:
Central Dogma of Microarray Analysis:
Background correction (local vs. global)
Normalization (baseline array vs. complete data)
Summarization (single vs. multiple chips)
UTSW Microarray Analysis Course, October 28, 2005 – p.29/56
From CEL to Gene ExpressionThe “Big Four” algorithms for correcting, normalizing, andsummarizing probe level data.
Microarray Analysis Suite 5.0 (MAS5 - Affymetrix, 2001,2003)
Model Based Expression Index (MBEI - Li and Wong,2001a,b)
Robust Multichip Analysis (RMA - Irizarry et. al., 2003)
Significance Analysis of Microarrays (SAM - Tusher,Tibshirani, and Chu, 2001)
UTSW Microarray Analysis Course, October 28, 2005 – p.30/56
Background Correction in MAS 5.0Affymetrix proposed two methods: location specificadjustment and ideal mismatch.
Location Specific Adjustment:
Array is split into K rectangular zones, denotedZk, k = 1, . . . ,K. The default for K is 16.
Control cells and masked cells are not used in thecalculation
Intensities within zones are ranked and the lowest 2% ischosen as the background b for that zone (bZk)
Standard deviation of bZk is calculated as an estimateof the background variability n for each zone (nZk).
UTSW Microarray Analysis Course, October 28, 2005 – p.31/56
Background Correction in MAS 5.0Result is smoothed via the following formula
wk(x, y) =1
d2k(x, y) + ψ
The background is given by
b(x, y) =1
∑Kk=1wk(x, y)
K∑
k=1
wk(x, y)bZk
where dk(x, y) is the Euclidean distance between chip coor-
dinate (x, y) and the center of the kth zone and ψ is a smooth-
ing parameter (100 by default).
UTSW Microarray Analysis Course, October 28, 2005 – p.32/56
LSA ContinuedCalculate a local noise background n based on thestandard deviation of the lowest 2% of the backgroundin that zone (nZk).
Weight n(Zk) for background values using sameformula as for smoothing of background correction
Set threshold and floor such that no value is adjustedbelow that threshold.
Compute the Adjusted Intensity, A(x, y), via
A(x, y) = max(I ′(x, y) − b(x, y), fn(x, y))
where I ′(x, y) = max(I ′(x, y), 0.5) is the cell intensity at chip
coordinates (x, y), and f (default 0.5) is the threshold.
UTSW Microarray Analysis Course, October 28, 2005 – p.33/56
Affy Method 2: Ideal Mismatch
IMi,j =
MMij MMij < PMijPMij
2SBiMMij ≥ PMij and SBi > τc
PMij
2
0
@
τc
1+τc−SBi
τs
1
A
MMij ≥ PMij and SBi ≤ τc.
where τs is a cutoff describing the variability of the probepairs within the probe set, and τc is some tolerance level.
Defaults: τc = 0.03, τs = 10.
Now Signal = Tbi (PVi,1, . . . , PVi,ni), where
PVi,j = log2(max(PMij − IMij , δ)), for δ small.
UTSW Microarray Analysis Course, October 28, 2005 – p.34/56
Normalization in MAS 5.0Let X by a p× n matrix with columns representing arraysand rows probes or probesets.
Pick a column of X = log(X) to serve as baseline array, saycolumn j.
1. Compute (trimmed) mean of column j. Call this X̃j.
2. Compute (trimmed) mean of column i. Call this X̃i.
3. Compute βi = X̃j
X̃i
.
4. Multiply elements of column i by βi.
Repeat 2 – 4 for all columns.
UTSW Microarray Analysis Course, October 28, 2005 – p.35/56
Summarization in MAS 5.0A signal (expression) value is calculated by combining theprobe intensities for each probe pair within a probe set.
Find a typical log ratio of PM to MM for probe pair j inprobe set i- known as Specific Background
SBi = Tbi(log2(PMij) − log2(MMij) : j = 1, . . . , ni)
where Tbi is the Tukey Biweight.
If SBi is large, values from the probe set are useddirectly to construct the ideal mismatch (IM) for a probepair.
If SBi is small (as defined by τc), smooth MM to usemore of PM value as IM.
UTSW Microarray Analysis Course, October 28, 2005 – p.36/56
Content of the CHP fileData analysis output for a Single Array Analysis includesthe following:
List of probes (transcripts)
Stat Pairs: Number of probe pairs to interrogate eachgene
Stat Pairs Used: Number of pairs used to calculatesignal
Signal: Raw Adjusted Intensity
Detection Call: presence or absence of transcript
Detection P-value: p-value used to determine presenceor absence of transcript
UTSW Microarray Analysis Course, October 28, 2005 – p.37/56
What is a P-value?
The probability that a test statistic as extreme or moreextreme will be obtained assuming that the null hypothesisof the test is true.
For probe pairs, the null hypothesis is that there is nosignificant difference in intensity between PM and MMvalues for the same probe pair.
UTSW Microarray Analysis Course, October 28, 2005 – p.38/56
Absolute Analysis of One ArrayFour steps to calculating presence/absence of transcripts:
1. Remove saturated prove pairs and ignore probe pairs where
PM ≈ MM + τ (default: τ = 0.015).
2. Calculate discrimination scores (Ri) for each probe pair
Ri =PMi − MMi
PMi + MMi
3. Use Wilcoxon’s signed-rank test to calculate a p-value for each
pair
4. Compare the p-value wtih preset significance levels as follows:
Present if p < α1 (default: α1 = 0.04).
Marginal if α1 = p < α2
Absent if p ≥ α2 (default: α2 = 0.06).
UTSW Microarray Analysis Course, October 28, 2005 – p.39/56
Comparisons of Multiple ArraysLet γ1 and γ2 be user defined thresholds for change callssuch that 0 < γ1 < γ2 < 1.
p = Change p-value, calculated using signed rank testcomparing PM and MM differences for each probe pair in aprobe set present on both arrays being compared.
Possible Outcomes:
Increase (p < γ1)
Marginal Increase (γ1 ≤ p < γ2)
No Change (γ2 ≤ p ≤ 1 − γ2)
Marginal Decrease (1 − γ2 > p ≤ 1 − γ1)
Decrease (p > 1 − γ1)
Source: http://www.wadsworth.org/genomics/microarray/
UTSW Microarray Analysis Course, October 28, 2005 – p.40/56
Marginal CallsWhat do I do with Marginal Calls?
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Marginal CallsWhat do I do with Marginal Calls?
Ignore them (treat them as absent)
Include them (treat them as present)
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Marginal CallsWhat do I do with Marginal Calls?
Ignore them (treat them as absent)
Include them (treat them as present)
Include them with some probability (detection filter -McClintick, et. al., 2003)
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Marginal CallsWhat do I do with Marginal Calls?
Ignore them (treat them as absent)
Include them (treat them as present)
Include them with some probability (detection filter -McClintick, et. al., 2003)
Examine literature
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Marginal CallsWhat do I do with Marginal Calls?
Ignore them (treat them as absent)
Include them (treat them as present)
Include them with some probability (detection filter -McClintick, et. al., 2003)
Examine literature
Examine other arrays for the call of that same transcript
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Marginal CallsWhat do I do with Marginal Calls?
Ignore them (treat them as absent)
Include them (treat them as present)
Include them with some probability (detection filter -McClintick, et. al., 2003)
Examine literature
Examine other arrays for the call of that same transcript
Rules for the inclusion of marginal calls seem to be an openresearch question.
UTSW Microarray Analysis Course, October 28, 2005 – p.41/56
Problem: Multiple Comparisons
The Type I Error is the probability of rejecting the nullhypothesis when it is true (1 - sensitivity).
α1 & γ1 are meant to control P(Type I Error).
If α1 = 0.04, there are 4 chances in 100 that we will obtain afalse positive result.
UTSW Microarray Analysis Course, October 28, 2005 – p.42/56
Problem: Multiple Comparisons
The Type I Error is the probability of rejecting the nullhypothesis when it is true (1 - sensitivity).
α1 & γ1 are meant to control P(Type I Error).
If α1 = 0.04, there are 4 chances in 100 that we will obtain afalse positive result.
For absolute analysis, approximately 600,000 statisticaltests are done for each array.
At α = 0.04, we expect 600, 000 × 0.04 = 24, 000 false positiveresults!
Solutions: Bonferroni Adjustment, False Discovery Rate,etc.
UTSW Microarray Analysis Course, October 28, 2005 – p.42/56
Model Based Expression IndexFit the following model using multiple chips for one gene:
yij = PMij −MMij = θiφj + ǫij
where θi is the expression index in chip iφj is a scaling factor characterizing probe pair jǫij are normal errors
Least squares estimates for parameters are carried out byiteratively fitting the set of θs and φs, treating the other setas known.
Standard errors of θ used to identify array outliers
Standard errors of φ used to identify probe outliers
MBEI model can also be based on PM only value
UTSW Microarray Analysis Course, October 28, 2005 – p.43/56
Normalization in MBEI
Non–linear, baseline array method
1. Pick a column of X to serve as baseline array, saycolumn j. For MBEI, the common baseline array is onehaving median overall brightness.
2. Fit a smooth non-linear relationship mapping column ito the baseline. Call this f̂i.
3. Normalized values for column j are given by f̂i(Xj).
4. Repeat 2 and 3 for all columns of X.
Various non-linear relationships are possible:cross-validated splines, running median lines, loesssmoothers, etc.
UTSW Microarray Analysis Course, October 28, 2005 – p.44/56
Summarization in MBEI
For each probeset n = 1, . . . , NP , fit the model
log2
(
y(n)ij
)
= β(n)j + α
(n)i + ǫ
(n)ij
where α(n)i is a probe effect and ǫ(n)
ij are errors.
Use standard linear regression techniques to fit themodel.
The estimated β(n)j are the base 2 log expression
values.
Outlier arrays, probes, and individual intensities areremoved prior to summarization.
UTSW Microarray Analysis Course, October 28, 2005 – p.45/56
Background Correction in RMA
Assumption:X = S + Y
where
X = observed probe–level intensity
S ∼ E(α) = true signal
Y ∼ TN(µ, σ2) = background noise
Reference: Irizarry et. al., Biostatistics, 2003
UTSW Microarray Analysis Course, October 28, 2005 – p.46/56
RMA for the Right–Brained ...
Image courtesy of Terry Speed
UTSW Microarray Analysis Course, October 28, 2005 – p.47/56
Parameter EstimationBackground Corrected intensity is Eij = E(Sij|Xij),where i = 1 . . . G, and j = 1, . . . , J .
We need to estimate µ, σ, and α.
UTSW Microarray Analysis Course, October 28, 2005 – p.48/56
Parameter EstimationBackground Corrected intensity is Eij = E(Sij|Xij),where i = 1 . . . G, and j = 1, . . . , J .
We need to estimate µ, σ, and α.
How does RMA estimate the parameters?
µ = Mode of observations to the left of the overall mode
σ = Sample standard deviation for observations to left ofoverall mode
α = Mode of observations to the right of the overall mode
UTSW Microarray Analysis Course, October 28, 2005 – p.48/56
Normalization in RMA
Quantile Normalization Algorithm
Given n arrays of length p, form matrix X of dimensionp× n where each array is a column.
Sort each column of X to give Xsort.
Take the mean across rows of Xsort.
Assign this mean to each element in the row to getquantile equalized X ′
sort.
Rearrange each column of X ′
sort to have the sameordering as the original matrix X to obtain Xnormalized.
UTSW Microarray Analysis Course, October 28, 2005 – p.49/56
Summarization in RMA
Median Polish Algorithm (Tukey 1977, Bolstad 2004)
Fits the following model
log2
(
y(n)ij
)
= µ(n) + θ(n)j + α
(n)i + ǫ
(n)ij
with constraints
median(θj) = median(αi) = 0
mediani(ǫij) = medianj(ǫij) = 0.
UTSW Microarray Analysis Course, October 28, 2005 – p.50/56
Median Polish AlgorithmForm a matrix for each probe set n such that the probes arein rows and the arrays are in columns.
Add a row and a column to give matrix of the form:
e11 . . . e1NAa1
... . . . ......
eIn1 . . . eInNAaIn
b1 . . . bNAm
where, initially, eij = y(n)ij and ai = bj = m = 0.
UTSW Microarray Analysis Course, October 28, 2005 – p.51/56
Median Polish (continued)
Take the median across columns, subtracting resultsfrom each element in that row and adding it to the finalcolumn
Take medians across rows, subtracting results fromeach element in that column and adding them to thefinal row.
Continue until the changes become small or zero
In conclusion: µ̂ = m, θ̂j = bj, and α̂i = ai.
UTSW Microarray Analysis Course, October 28, 2005 – p.52/56
Significance Analysis of Microarrays
Algorithm to determine “significantly” expressed genes
Original article mentions use of GeneChip AnalysisSuite software for background correction, normalizationand summarization.
Assigns a score to each gene on the basis of change ingene expression relative to the standard deviation ofrepeated measurements.
If the score exceeds a threshold, use permutations ofrepeated measurements to estimate the percentage ofgenes identified by chance.
More Information:http://www-stat.stanford.edu/ tibs/SAM/.
UTSW Microarray Analysis Course, October 28, 2005 – p.53/56
Microarray SoftwareOpen Source
Bioconductor: Calculates RMA, MBEI, MAS5,
http://www.bioconductor.org
dChip (MBEI only, http://biosun1.harvard.edu/complab/dchip/)
Significance Analysis of Microarrays (SAM)
Generalized Probe Model (GPM - Fan, et. al.2005,
http://qge.fhcrc.org/probeplus)
Commerical
GCOS, MAS 5.0 (Affymetrix)
S-Plus ArrayAnalyzer: Calculates RMA, MBEI, MAS5*
Iobion GeneTraffic: RMA, MBEI, MAS5*
UTSW Microarray Analysis Course, October 28, 2005 – p.54/56
References1. Affymetrix, Inc (2001). "Statistical Algorithms Reference". Data Analysis
Fundamentals Technical Manual, Chapter 5. www.affymetrix.com.
2. Affymetrix Technical Note: Design and Performance of the GeneChip Human GenomeU133 Plus 2.0 and Human Genome U133A Plus 2.0 Arrays (2003).www.affymetrix.com.
3. Affymetrix, Inc (2002). Statistical Algorithms Description Document.www.affymetrix.com.
4. Bolstad, Ben (2004). Low Level Analysis of High-density Oligonucleotide Array Data:Background, Normalization and Summarization. Dissertation. University of California,Berkeley.
5. Fan W, Pritchard JI, Olson JM, Khalid N, and Zhao LP (2005). A class of models foranalyzing gene expression analysis array data. BMC Genomics, 6:16,http://www.biomedcentral.com/1471-2164/6/16/.
6. Irizarry, R. A. , Bolstad, B. M. , Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P.(2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research,31 (4): e15.
7. Irizarry, R. A. , Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U.,and Speed, T. P. (2003). Exploration, normalization, and summaries of high densityoligonucleotide array probe level data. Biostatistics, 4: 249–264.
UTSW Microarray Analysis Course, October 28, 2005 – p.55/56
References Continued8. Li, C. and Wong. H. W. (2001). Model-based analysis of oligonucleotide arrays:
Expression index computation and outlier detection. Proceedings of the NationalAcademy of Sciences, 98 (1): 31-36.
9. Li, C. and Wong. H. W. (2001). Model-based analysis of oligonucleotide arrays: modelvalidation, design issues and standard error application. Genome Biology, 8 (8):research0032.1-0032.11.
10. McClintick JN, Jerome RE, Nicholson CR, Crabb DW, Edenberg HJ (2003).Reproducibility of oligonucleotide arrays using small samples. BMC Genomics:4(4),http://www.biomedcentral.com/1471-2164/4/4.
11. Naef, F and Magnasco (2003). Solving the riddle of the bright mismatches: Labelingand effective binding in oligonucleotide arrays. Physical Review, 68.
12. Tukey JW (1977). Exploratory Data Analysis. Addison-Wesley, ReadingMassachusetts.
13. Tusher VG, Tibshirani R and Chu G (2001). Significance analysis of microarraysapplied to the ionizing radiation response. Proceedings of the National Academy ofSciences 98: 5116-5121 (Apr 24).
UTSW Microarray Analysis Course, October 28, 2005 – p.56/56