Introduction to DNA Microarrays

Post on 13-Jan-2016

28 views 0 download

Tags:

description

Introduction to DNA Microarrays. Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity mfmiles@vcu.edu 225-4054. Biological Regulation: “You are what you express”. Levels of regulation Methods of measurement - PowerPoint PPT Presentation

transcript

Introduction to DNA Microarrays

Michael F. Miles, M.D., Ph.D.

Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of

Biological Complexity

mfmiles@vcu.edu

225-4054

Biological Regulation: “You are what you express”

• Levels of regulation

• Methods of measurement

• Concept of genomics

Regulation of Gene Expression

• Transcriptional– Altered DNA binding protein complex abundance or function

• Post-transcriptional– mRNA stability– mRNA processing (alternative splicing)

• Translational– RNA trafficking– RNA binding proteins

• Post-translational– Many forms!

Regulation of Gene Expression

• Genes are expressed when they are transcribed into

RNA

• Amount of mRNA indicates gene activity

• Some genes expressed in all tissues -- but are still

regulated!

• Some genes expressed selectively depending on

tissue, disease, environment

• Dynamic regulation of gene expression allows long

term responses to environment

Acute Drug Use

Mesolimbic dopamine? Other

ReinforcementIntoxication

Chronic Drug Use

Compulsive Drug Use

“Addiction”

?Synaptic RemodelingPersistent Gene Exp.

ToleranceDependence

Sensitization

Altered SignalingGene Expression

?Synaptic Remodeling

Progress in Studies on Gene Regulation

1960 1970 1980 1990 2000

mRNA,tRNA discovered

Nucleic acid hybridization, protein/RNA

electrophoresisMolecular cloning;

Southern, Northern & Western blots; 2-D

gelsSubtractive

Hybridization, PCR, Differential Display,

MALDI/TOF MS

Genome Sequencing

DNA/Protein Microarrays

Nucleic Acid Hybridization: How It Works

Primer on Nucleic Acid Hybridization

• Hybridization rate depends on time,the concentration of nucleic acids, and the reassociation constant for the nucleic acid:

C/Co = 1/(1+kCot)

Biological Networks

Types of Biological Networks

Gene Regulation Network

Examining Biological Networks: Experimental Design

Examining Biological Networks

A Bit of History

~1992-1996: Oligo arrays developed by Fodor, Stryer, Lockhart, others at Stanford/Affymetrix and Southern in Great Britain

~1994-1995: cDNA arrays usually attributed to Pat Brown and Dari Shalon at Stanford who first used a robot to print the arrays. In 1994, Shalon started Synteni which was bought by Incyte in 1998.

However, in 1982 Augenlicht and Korbin proposed a DNA array (Cancer Research) and in 1984 they made a 4000 element array to interrogate human cancer cells.

(Rejected by Science, Nature and the NIH)

High Density DNA Microarrays

Expression Profiling: A Non-biased, Genomic Approach to Understanding Complex CNS Disease

Candidate Gene Studies

Molecular Triangulation:

Genomics, Genetics and Pharmacology

Bioinformatics:Genetical genomicsFunctional GroupingLiterature NetworksProtein Interactions

Promotor Motif Grouping

Utility of Expression Profiling

• Non-biased, genome-wide• Hypothesis generating • Gene hunting• Pattern identification:

– Insight into gene function– Molecular classification– Phenotypic mechanisms

PFCHIP VTA

NAC

Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns

0 +2-2

relative change

PFCHIP NAC

VTA

AvgDiff S-score

Experimental Design with DNA Microarrays

Type of Variance FactorsBiological Animal-animal differences (intra/inter cage, supplier)

Genotype

Circadian rhythms

Stress

Technical Sample treatment/harvesting (dissections, injections)

Target preparation (enzyme lots, mRNA quality)

Lot-to-lot chip variation

Chip processing (scanning order)

Environmental Temperature

Handling

Noise/odors

Sources of Variance in Microarray Experiments

High Density DNA Microarrays

Synthesis and Analysis of 2-color Spotted cDNA Arrays: “Brown

Chips”

Comparative Hybridization with Spotted cDNA Microarrays

Synthesis of High Density Oligonucleotide Arrays by Photolithography/Photochemistry

GeneChip Features

• Parallel analysis of >30K human, rat or mouse genes/EST clusters with 15-20 oligos (25 mer) per gene/EST

• entire genome analysis (human, yeast, mouse)

• 3-4 orders of magnitude dynamic range (1-10,000 copies/cell)

• quantitative for changes >25% ??• SNP analysis

Oligonucleotide Array Analysis

AAAA

Oligo(dT)-T7

Total RNA Rtase/Pol II

dsDNAAAAA-T7TTTT-T7

CTP-biotin

T7 polTTTT-5’5’

Biotin-cRNA

Hybridization

Steptavidin-phycoerythrin

Scanning

PM

MM

Stepwise Analysis of Microarray Data

• Low-level analysis -- image analysis, expression quantitation

• Primary analysis -- is there a change in expression?

• Secondary analysis -- what genes show correlated patterns of expression? (supervised vs. unsupervised)

• Tertiary analysis -- is there a phenotypic “trace” for a given expression pattern?

Affymetrix Arrays: Image Analysis

Affymetrix Arrays: Image Analysis

“.DAT” file “.CEL” file

Affymetrix Arrays: PM-MM Difference Calculation

Probe pairs control for non-specific hybridization of oligonucleotides

(a)

Variability in Ln(FC)

- 4

- 3

- 2

- 1

0

1

2

3

4

- 4 - 3 - 2 - 1 0 1 2 3 4

l n ( P F C 1 A S / V T A 1 A S )

R = 0 . 7 1

ln(FoldChange) S-score

Ln(FC1)

Ln(FC2)

Probe Level Analysis Methods

• AvgDiff -- Affymetrix 1996, trimmed mean with exclusion of outliers, PM-MM

• MAS 5 -- Affymetrix 2001, modeled correction of MM, Tukey’s bi-weight, PM-MM or PM-m

• MBEI -- Li and Wong 2001, modeled correction and outlier detection, PM-MM or PM only

• RMA (Robust Multichip Analysis) -- Irizarry et al. 2002,

PM only• PDNN (Position Dependent Nearest Neighbor) - Zhang et

al. 2003, thermodynamic model for probe interactions, PM only

“Lowess” normalization,Pin-specific Profiles

After Print-tip Normalization

Slide Normalization: Pieces and Pins

See also: Schuchhardt, J. et al., NAR 28: e47 (2000)

http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf

Normalization Confounds: Non-linearity

Normal vs. NormalNormal vs. Normal

Normal vs. TumorNormal vs. Tumor

Statistical Analysis of Microarrays: “Not Your Father’s Oldsmobile”

Secondary Analysis: Expression Patterns

• Supervised multivariate analyses– Support vector machines

• Non-supervised clustering methods– Hierarchical– K-means– SOM

Clustering Methods

• Distance measurement -- Euclidean most frequently used (d2 = (xi-yi)2)

• Clustering techniques• Supervised multivariate analyses

– Support vector machines

• Non-supervised clustering methods– Hierarchical -- single vs. complete vs. average linkage– K-means -- have to estimate “k” initially– SOM -- self-organizing maps– Principal components analysis

K-means vs. Hierarchical Clustering

• K-means: select number of groups, divide genes randomly into those groups, calculate inter- and intra-group distances. Move genes until maximize inter-group and minimize intra-group differences.

• Hierarchical: calculate all pairwise distances (correlations) and order genes accordingly.

PFCHIP VTA

NAC

Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns

0 +2-2

relative change

PFCHIP NAC

VTA

AvgDiff

S-score

Expression Profiling:

“It is possible that the expression profile could serve as a universal phenotype … Using a comprehensive database of reference profiles, the pathway(s) perturbed by an uncharacterized mutation would be ascertained by simply asking which expression patterns in the database its profile most strongly resembles … it should be equally effective at determining consequences of pharmaceutical treatments and disease states”

Hughes et al. Cell 102:109-126 (2000)

Use of Expression Profile “Compendium” to Characterize Gene or Drug Function

Hughes et al. Cell 102:109-126 (2000)

established error modelprofiled large number of mutants/drugs under highly controlled conditionsstatistical treatment of expression patternsverified array results with biochemical/phenotypic assays

Key features:

Correlation in Expression Profiles of Drugs/Genes Affecting Same

Pathways

cup5 and vma8, components of

H+/ATPase complex

Unrelated gene

mutants HMG CoA-

reductase mutant vs. lovastatin, an

inhibitor of HMG2

Red symbols = significant change (p<0.05) in both treatmentsHughes et al. Cell 102:109-126 (2000)

Assigning Function to Uncharacterized Genes by Expression Profiles

Hughes et al. Cell 102:109-126 (2000)

Tertiary Analysis: Connecting Function with Expression Patterns

• Annotation– UniGene/Swiss-Prot, SOURCE, DAVID

• Biased functional assessment– Manual, GenMAPP, GeneSpring

• Non-biased functional queries– PubGen– MAPPFinder, DAVID/Ease, GEPAS,

GOTree Machine, others• Overlaying genomics and genetics

– WebQTL

Non-biased (semi) Functional Group Analysis:

GenMAPP

Expression Analysis Systematic Explorer -- EASE

http://apps1.niaid.nih.gov/david/upload.jsp

Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.

EASE -- Options in Analysis

Efforts to Integrate Diverse Biological Databases

with Expression Information: PubGen

www.PubGen.org

Expression Networks

Expression Profiling

Pharmacology Genetics

Complex

Trait

Prot-Prot

Interactions

OntologyHomolo-Gene

BioMed Lit

Relations

Quaternary Analysis: Profiles to Physiology

Analysis Stages for Oligonucleotide Microarrays

Analysis Stage Description Examples of MethodsNormalization Equalizes overall signal across

arrays to be compared, ensureslinearity of response acrossabundance classes

Whole chip(26)Quantile(27)

Probe reduction Combines signals from multipleprobes or probe pairs to define“expression level”. Identifiesgenes with invalid or hyper-variable expression levels.

Weighted average (MAS 4)(29)Tukey bi-weight (MAS 5)(30)Model-based (MBEI)(31)Log scale linear additive (RMA)(32)Position-dependent stacking energy modeling(PDNN) (33)

Comparative Compares expression of a geneacross two or more arrays todetermine significant changes inexpression

t-testrank order (MAS 5) (30)permutation (SAM) (46, 47)S-score (48)

Multivariatestudies

Identifies significant correlationsin expression data acrossexperiments/conditions

hierarchical clusteringk-means clusteringself-organizing mapsprinciple components analysis& many more(34, 49)

Biological overlay Identify functions for givengenes, clusters of genes;hypothesis generation

Multiple database access (Source)(50)PubMed correlations (PubGene)(51)Gene Ontology rankings (GenMAPP,MAPPFinder, DAVID/EASE)(52, 53)

Bioinformatics Resources for Microarray Experiments

Name Description Link

SOURCE Human, rat, mouse gene compilationfrom multiple databases; allows batchsubmissions for annotation

http://source.stanford.edu/cgi-bin/sourceSearch

GeneLynx Human, mouse gene compilation;multiple database links regardinggene/protein structure and function

http://www.genelynx.org/

DAVID/Ease Mines gene list for frequency of GOcategories; annotation of gene list;statistical analysis of biological themesin gene list (EASE)

http://apps1.niaid.nih.gov/David/upload.asp

GenMAPP/MAPPFinder Superimposes array data on biologicalpathways; statistical ranking offunctional groups

http://www.genmapp.org/

FatiGO Mines gene list for occurrence of GOterms; statistical comparison of twolists for over-representation

http://fatigo.bioinfo.cnio.es/

PubGene Finds associations between genes inbiomedical literature; superimposesarray data on literature links;commercial version available

http://www.pubgene.org/

MEME Search promoter regions of genes inlist/cluster for conserved motifs

http://meme.sdsc.edu/meme/website/intro.html