Microarrays
Lecture Slides Courtesy ofDr. Tim [email protected]
Outline:• Microarray experiments• Different types of microarrays•Clustering and interpretation
www.accessexcellence.org/AB/GG/nucleic.html
Nucleic Acid Hybridization
controltreatment
(drug, mutation)
updownunchangednot present
x y z
xx
x
xx
yy
yy
zz z
cDNA pools
Typical use of cDNA microarrays:“Internal” normalization using two colors
“cDNA microarrays” are essentially dot-blots on glass slides
http://arrayit.com/Products/Printing/Stealth/stealth.html
• This slide was made with 16 pins• 4.5 mm pin spacing matches 384-well plates (16 x 24)• Done with robotics• Slides usually coated with poly-lysine• Spots are usually 100-150 microns• Spot spacing is usually 200-300 microns.• Slides are 25 x 75 mm• Easy to deposit 20K spots/slide
0.45 mm
Microarray expression profiling by 2-color assay (“cDNA arrays”)
Array: PCR products6250 yeast ORFs
hybridized cDNAs:green = controlred = experiment
*Schena et al., 1995
Image processing and normalization: what is microarray data?Microarray data is summary information from image files that come out of the scanner.Image processing: line up grids, flag bad spots, quantitate.
Looking at data from a single experiment
3-AT vs.No drug
wild-type vs.wild-type
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Log10(Intensity)
Log1
0(Expression Ratio)
Slides: 11120c01 -11121c01
P-value < 0.01
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
P-value < 0.01
Log10(Intensity)
Log1
0(Expression Ratio)
Slides: 11857c01 -11858c01
log10(average intensity)
-2 -1 0 1 2
log 1
0(r
atio
)lo
g 10(r
atio
)
2
1
0
-1
-2
-2 -1 0 1 2
2
1
0
-1
-2
Other types of arrays
Photolithographic arrays (Affymetrix)
Building up oligonucleotides on a surface:
http://www.affymetrix.com/technology/manufacturing/index.affx
Photolithographic arrays (Affymetrix)
aka “GeneChip”
Arrays are typically 25-mers, with “mismatch” control for specificity
Photolithographic arrays (Affymetrix)Advantages:
Density is limited essentially by the 5 micron resolution of scanners (solution: larger arrays).
Well-developed protocols.
“Industry standard” (largely self-driven).
Disadvantages:
Not all probes work well. Affymetrix has evolved a complicated system to compensate for this, but even “believers” use at least four probes per gene, and usually more.
Single color.
Sample preparation typically requires amplification.
Single supplier; historically intellectual property issues. (i.e. comparisons)
• 25,000 oligos / 1 x 3 inches
• Sequence completely flexible
• 60-mers
G
AGTC
A
CGGG
C
TGAA
Ink-jet arrays (Agilent)
Hughes TR et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001 Apr;19(4):342-7.
Ink-jet arrays generally agree with spotted cDNA arrays
Yeast IJS array: ~8 oligos per gene Spo vs. SC
cDNA array
mu
ltip
le o
ligos
cDNA array
sin
gle
olig
o
r = 0.96
HXT3 HXT1
HXT4
r = 0.97
Ink-jet arrays (Agilent)Advantages:
User-specified sequences; “no questions asked”
Sensitivity and specificity are defined and exceed requirement for most expression profiling applications; no amplification required
Virtually every 60-mer is functional
Data correlates well with spotted cDNA arrays
Disadvantages:
Density currently limited to ~45,000 spots per array.
Single supplier (although a protocol is in press for making your own synthesizer!)
2-D clusteringStep 1: cut experiments and transcripts
falling below P-value and ratio thresholds
-10 -5 -2 1 2 5 10
fold repression fold induction
transcript response index
exp
erim
ent
ind
ex
44 experimentsx
407 genes
2-D clustering
-10 -5 -2 1 2 5 10
fold repression fold induction
Step 2: cluster experiments and transcriptstranscript response index
exp
erim
ent
ind
ex
RHO O/XPKC O/X
ste mutants
treatment withalpha-factor
Data from Roberts et al., Science (2000)
K = 10 #1 #2 #3
There are many types of clustering. One example: K-means (must choose K)
See: Sherlock G. Analysis of large-scale gene expression data.Curr Opin Immunol. 2000 Apr;12(2):201-5.
Basics of clustering freeware: Eisen’s “Cluster” and “Treeview”
Mike Eisen's web site: rana.lbl.gov/EisenSoftware.htm
“Cluster” loads an Excel file (save as tab-delimited text) in the following format:
Cluster
Treeview
(also: “TreeArrange” - http://monod.uwaterloo.ca/downloads/treearrange/)There are also many commercial programs available.
mRNA
protein
nucleus
cell
Microarray expression data
Co-regulated groups of genes
Functional categories
Predict functions of new genes
cis, trans regulators
GO-Biological Process categories
Broad
Mid-level
Narrow eye pigment metabolism
eye morphogenesis
pigment metabolism
striated muscle contraction
ATP biosynthesis
vision
CNS development
insulin secretion
Very Broadmetabolism
163
137
21
36
25
33
34
1548
# annotated genes(mouse)
development 2341
GO-Biological Process hierarchy
eye pigment metabolism
eye morphogenesis
pigment metabolism
CNS development
metabolism
development
Other types of categorical annotations:
KEGG, EC numbers (describe biochemical “pathways”)
MIPS, YPD (yeast databases – older than GO)
Results of individual studies (localization, 2-hybrid screens, protein complexes, etc.
Sequence motifs, structural domains (pfam, SMART)
Cluster labelamino acid metabolismarginine biosynthesisarginine catabolismaromatic AA metabolismasparagine biosynthesisbranched chain AA synthlysine biosynthesismethionine biosynthesissulfur AA tnsprt, metabadenine biosynthesisaldehyde metabolismbiotin biosynthesiscitrate metabolismergosterol biosynthesisfatty acid biosynthesisgluconeogenesisNAD biosynthesisone-carbon metabolismpyridoxine metabolismthiamin biosynthesis 1thiamin biosynthesis 2hexose transportsodium ion transportpolyamine transportnucleocytoplasmic transportribosome/RNA biogenesisribosomal proteinstranslational elongationprotein foldingsecretionprotein glycosylationvesicle-mediated transportproteasomevacuole fusionmitoribosome/respirationMitochond. electron trans.iron transport/TCA cycleChromatin/transcriptionhistonesMCM2/3/6/CDC47DNA replicationmitotic cell cycleCLB1/CLB6/BBP1cytokinesisdevelopmentpheromone responseconjugationsporulation/meiosisresponse to oxidative stressstress/heat shock
Sample genesTRP4, HIS3ARG1, ARG3CAR1, CAR2ARO9, ARO10ASN1, ASN2ILV1,2,3,6LYS2, LYS9MET3,16,28MUP1, MHT1ADE1,4,8AAD4,14,16BIO3,4CIT1,2ERG1,5,11FAS1,FAS2PGK1, TDH1,2,3BNA4,6GCV1,2,3SNO1, SNZ1THI5,12THI2,20HXT4,GSY1ENA1,2,5TPO2,3KAP123,NUP100MAK16,CBF5RPS1A,RPL28TEF1,2SSA1,HSP60VTH1,KRE11ALG6,CAX4VPS5,IMH1RPN6,RPT5VTC1,3,4,PHO84MRPL1,MRPS5ATP1,COX4FRE1,FET3SNF2,CHD1,DOT6HTA1,HHF1MCM2,3,6RFA1,POL12SPC110,CIN8CLB1,6CTS1,EGT2PAM1,GIC2FUS3,FAR1CIK1,KAR3SPO11,SPO19GDH3,HYR1 HSP104,SSA4
Candidate regulatorGCN4ARG80/81ARG80/81/UME6/RPD3ARO80GCN4/HAP1/HAP2LEU3, GCN4LYS14CBF1, MET28, MET32MET31,MET32BAS1, BAS2, GCN4
RTG3ECM22/UPC2INO4GCR1
THI2/THI3THI2/THI3GCR1NRG1,MIG1HAA1RRPE-binding factorPAC/RRPE-binding factors
HAC1,ROX1RLM1XBP1
RPN4PHO4
HAP2/3/4/5MAC1/RCS1/AFT1/PDR1/3
HIR1,HIR2ECBMCBHCM1FKH1ACE2,SWI4
MATALPHA2,STE12KAR4NDT80ROX1,MSN2,MSN4MSN2,MSN4
249
gen
es1,
226
gen
esNon-overlapping yeast gene expression
clusters424 experiments
Chua et al., 2004
Analyzing clusters:
amino acid biosynthesis (p<10-
14)**amino acid metabolism (p<10-
14)**
methionine metabolism (p=1.07×10-7)
**When testing clusters against many different types of categorical annotations, should consider correcting for multiple-testing, and also consider
that categories are often not independent
**http://area51.med.utoronto.ca/FUNSPEC.html