Array Platforms
• 16K Agilent inkjet printed cDNA arrays– The recently developed inkjet printing method (Agilent
Technologies) produces more uniform spots than pin spotting techniques
– Array includes cDNAs selected from the RIKEN FANTOM collection supplemented by cDNAs from AfCS protein list
• Affymetrix GeneChip system– U74A v.2 chip (represents approx. 13,000 mouse genes)
• 16k Agilent inkjet printed Oligonucleotide arrays (in preparation)– Operon 70mers (13,443) and Compugen 65mers (2,304)
Ligand Screen Transcript Analysis
• B cell samples prepared by Cell Lab.• Cultured for different time periods (.5, 1, 2, and 4 hr) in the presence
or absence of ligands before harvesting for total RNA isolation.• Treated and untreated time-course samples hybridized against a spleen
reference.• After removing the common spleen denominator, comparison to 0
time point data reflects the changes in mRNA levels due to ligand treatment and/or time in culture.
• All of the experiments were done in triplicate. Including in controls >450 arrays
Molecular Biology Laboratory
Microarray & Analysis
Sangdun Choi
Xiaocui Zhu Rebecca Hart
Anna CaoMi Sook ChangJong Woo KimSun Young Lee
a. Calculate gene expression value:Compute log2(Treated/0hr) = log2(Treated/Spleen) – log2(0hr/Spleen) using
processedSignalIntensity b. Hierarchical cluster:
with genes showing >= 2 fold change in at least one condition while keeping ligands in alphabetical/time course order:
Gene 1Gene 2Gene 3……..
30m
in1h
r2h
r4h
r30
min
2M
A1h
r 2M
A2h
r 2M
A4h
r 2M
A30
min
AIG
1hr
AIG
2hr
AIG
4hr
AIG
….
Average of triplicates Average of 6-23 replicates
5281 genes
132 conditions
Clustering Analysis of Gene Expression Profile Using log2Ratio (Treated/0hr)
Gen
es, c
lust
ered
Ligands, time course ( i.e. medium- 30 min, 1hr, 2hr, 4hr; 2MA- 30 min, 1hr, 2hr, 4hr…)
Genes up regulated in AIG, CD40L, IL4, LPS and CpG
IL4
LP
S
CD
40L
AIG
CpG
317 features
Ccnd2
Cdk4
Caspase 4
Bax
Ak2Hk2
Atf
cdk6
Ifrd2
Image contrast: 1.07
Non
e
Genes down regulated in AIG, CD40L, IL4, LPS and CpG
IL4
LP
S
CD
40L
AIG
CpG
319 features
id3
Bnip3l
Gnai2
Gprk6Bcap31
Image contrast: 1.07
cAMP-GEFII
Non
e
IL4
LP
S
CD
40L
AIG
CpG
Genes showing AIG & CD40L specific changes
235 features
Par-6Gadd45b
Dagk1
Mapk12
Image contrast: 1.16
IL3raIL10ra
Non
e
Genes up regulated in IL4
IL4
LP
S
CD
40L
AIG
CpG
42 featuresImage contrast: 1.14
Non
e
Socs1
Caspase 6Xbp1
Rgs14
Dapp1
IL4
LP
S
CD
40L
AIG
CpG
Genes showing AIG specific changes
65 features
Stress induced protein
Bak1
Image contrast: 1.54
apolipoprotein E
Bcl2l11LTb
Non
e
Madhusudan NatarajanRama Ranganathan
basal Observed value
basalσ obsσ
basal
basalobsz
σ−
=
Clustering Analysis of Gene Expression Profile Using Z Score
Z score: a measurement of the distance between an observed value and the mean of a population
a. Calculate gene expression metric, x:For each gene i on a given chip j: xij ={rMedianIntensity (treated) / gMedianIntensity
(spleen) }/ xj , where xj is the mean of intensity ratio of all genes on chip j
c. Calculate the mean and standard deviation of gene expression in 27 sets of 0hr untreated data: For each gene i, calculate the mean(i) and the standard deviation (σi) of expression on
27 0hr chips;
d. Calculate Z score as a measurement of differential expression from 0hr condition For each gene i on a given chip j, Zij = (xij – i) / σi
f. Cluster genes and ligands using Z-score:with genes whose Z > 2 in any of the ligands
Clustering Analysis of Gene Expression Profile Using Z Score
Clustering ligand based on Z scores
AfCS Data Analysis- Microarray
Dennis Mock
UC Principal Statistician
University of California, San Diego
Director: Shankar Subramaniam
Acknowledgment: Eugene Ke, Bob Sinkovits, Brian Saunders
Two-way hierarchical clustering –unsupervised- Ligands (n=33)
(0hr, .5h, 1h, 2h, 4h)
Note: the ligand cluster according early –late conditions with 90-100% accuracy
(metrics: sample = Euclidean; gene = Pearson)
.
.
.
.
.
.
.
.
.
late 2-4 hrearly .5-1 hr
0 hr early .5-1 hr
(non-mitogenic)
late 2-4 hr
mitogenic
Interleukins
Dennis Mock - UCSD
Significance analysis of microarrays* (SAM)(R. Tibshirani, G. Chu 2002)
Objective: The replicated expression for each gene is taken for the 4hr time condition (untreated vs ligand) to determine whether the gene is statistically
differentially up- or down- regulated.
The t-statistics for all the genes are ordered and noted. The labels are then permutated and the t-statistic is calculated again. After many iterations, the cumulative t-statistics is averaged for each gene. Finally, for a given false positive rate, [called “False Discovery Rate” or FDR], the significant genes are selected.
For each gene, define the adjusted “t-statistic” as follows:
treated - untreated
σ + adjustment factor
mean of replicates
σ standard deviation for the gene
Dennis Mock - UCSD
Differentially expressed genes for ligands vs UNTREATED @ 4hr [ SAM ; False Discovery Rate ( ) ]
ligand (4hr)
40L (1%)LPS (1%)AIG (1%)IL4 (1%)CPG (1%)IFB (1.5%)GRH (1%)2MA (18%)LPA (17%)
CGS (2.9%)BOM (35%)IGF (8%)
S1P (38%)PAF (2.4%)70L (6%)
NPY (10%)DIM (9%)
LB4 (23%)M3A (3.5%)FML (11%)TGF (2.5%)TER (35%)IL10 (20%)ELC (26%)PGE (11%)
BAFF (11%)BLC (57%)NGF (42%)TNF (33%)SDF (20%)IFG (25%)NEB (25%)
SLC (NA)
number of genes (probes)differentially expressed
0
50
100
150
200
500
600
700
800
900
1000
1100
down-regulated
up-regulated
Concordance of significantly up (+) or down (-) regulated genes mitogenic ligands (FDR = 1%)
756 (-)
1082 (+)
337 (-)135 (-) 553 (-)
147 (-)
“down-regulated” matches
“up-regulated” matches
3 (-)
446 (-)
887 (+)
96 (-)
Mosaic plot
578 (+)
73 (+)
597 (+)
117 (+)
47 (+)
477 (+)
117 (+)
4 (+) 6 (+) 3 (+)
796 (-)
854 (+)
5 (+) 4 (+)
3 (-)
10 (+)
1 (-)
3 (-)
2 (-)
3 (-)
72 (+)
18 (+)
341 (-)
143 (-)
152(-)
80(+)
108 (+)
171 (-)
163 (+)
151 (-)
119 (-)
Discordance matrix
Example: CD40L had 756 down-regulated and 1082 up-regulated genes.
Those which were similarly regulated in AIG:
337 down
578 up.
72 (-)
40L AIG CPG LPS IL4 IL1040L - 17 0 0 9 0AIG 4 - 0 1 3 0CPG 0 6 - 0 2 0LPS 1 17 0 - 11 1IL4 3 3 0 0 - 0
IL10 0 0 0 0 0 -
Beyond Clustering
• How can we obtain biological information from array data at the level of individual genes and correlations in expression between genes?
• Can we use the correlations to build a connection network that reflects correlations in expression? Is there biological significance to this?
Two-way hierarchical cluster:
mean ratio (vs control) of phosphoprotein levels and ligand
Note: the ligands that elicit an ERK response (chemokines + AIG, CD40L) clustered together.
Transcription factor encoded by fos is stabilized by ERK and continues to affect other IE genes such as jun
from Nature Cell Biology august 2002 v 4 issue 8
ObservationsTo a first approximation the resting B-cells behave as if they have evolved to respondwith massive transcriptional changes to a very specific subset of ligands. Somecommon pathways are activated and some gene expression changes are restricted toindividual ligands.
Caveats:a)The experimental design was optimized to allow comparison of many data sets byusing a spleen reference. Would direct comparisons between treated and untreatedtime matched samples have allowed small transient changes in gene expression to bereadily seen. – Some comparative experiments done Cell Lab (UTSWM) – beinganalyzed.
b)There is more information in this data set. The analysis thus far deemphasizes timeseries data and supervised analysis methods suggests that changes correlated withsome of the apparently less active ligands can be unearthed. – see posters D.Mock(UCSD) & R.Scheurmann(UTSWM)
c)These resting B-cells are beginning to undergo apoptosis and thus the experimentswere done over a short time period. The full articulation of the response to the activeligands is not observed. – Experiments done with Bcl-2 resting B-cells –B.Seamanand T. Roach (UCSF)
d) Transcription changes the “state of the cell”. Thus double ligand experiments willneed to account for the order of addition of ligands as well as concentration.Somepreliminary experiments done CellLab (UTSWM)- being analyzed
A clear lesson that we must implement as soon as possible is to decrease the cycle time from experimental design - data collection - data analysis - conclusions, models - to experimental redesign. In the past the rate limiting step has been data analysis
Input Signals
Signal Processing
Translocation
Gene Expression
Cytoskeleton
Transcription Translation
Transcription Translation