+ All Categories
Home > Documents > CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Date post: 29-Dec-2015
Category:
Upload: darrell-horn
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
CS 5263 Bioinformatics Lecture 22 Introduction to Microarray
Transcript
Page 1: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

CS 5263 Bioinformatics

Lecture 22

Introduction to Microarray

Page 2: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Outline

• What is microarray• Basic categories of microarray• How can microarray be used• Computational and statistical methods involved

in microarray– Probe design– Image processing– Pre-processing– Differentially expressed gene identification– Clustering / classification– Network / pathway modeling

Page 3: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Gene expression

• Genes have different activities at different time / location

• DNA Microarrays– Measure gene transcription (amount of mRNA) in

a high-throughput fashion– A surrogate of gene activity

Reverse transcription (in lab)Product is called cDNA

Page 4: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

http://www.escience.ws/b572/L13/north.html

Northern Blot (an old technique for measuring mRNA expression)

1. mRNA extracted and purified.

2. mRNA loaded for electrophoresis.

Lane 1: size standards.Lane 2: RNA to be tested.

3. The gel is charged and RNA “swim” through gel according to weight.

-

4. mRNA are transferred from the gel to a membrane.

5. A labeled probe specific for the RNA fragment is incubated with the blot. So the RNA of interest can be detected.

+

Hybridization

Need relatively large amount of mRNA

Page 5: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

See animation of RT-PCR:http://www.bio.davidson.edu/courses/Immunology/Flash/RT_PCR.html

RT-PCR (reverse transcription-polymerase chain reaction)

http://www.ambion.com/techlib/basics/rtpcr/

real-time RT-PCR

1. RNA is reverse transcribed to DNA.2. PCR procedures can be used amplify DNA at exponential

rate.3. Gel quantification for the amplified product.

---- an semi-quantitative method. Smaller amount of sample needed.

1. The PCR amplification can be monitored by fluorescence in “real time”.

2. The fluorescence values recorded in each cycle represent the amount of amplified product.

---- a quantitative method. The current most advanced and accurate analysis for mRNA abundance. Usually used to validate microarray result.

Often used to validate microarray

Page 6: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Limitation of the old techniques

1. Labor intensive

2. Can only detect up to dozens of genes. (gene-by-gene analysis)

Page 7: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

What is a Microarray

Conceptually similar to (reverse) Northern blot

(Many) probes, rather than mRNAs, are fixed on some surface, in an ordered way

Gene 102

Gene 305

Page 8: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

What is a microarray (2)

• A 2D array of DNA sequences from thousands of genes

• Each spot has many copies of same gene (probe)

• Allow mRNAs from a sample to hybridize

• Measure number of hybridizations per spot

Page 9: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Goals of a Microarray Experiment

1. Find the genes that change expression between experimental and control samples

2. Classify samples based on a gene expression profile

3. Find patterns: Groups of biologically related genes that change expression together across samples/treatments

Page 10: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Microarray categories

• cDNAs microarray– Each probe is the cDNA of a gene (hundreds to

thousands bp)– Stanford, Brown Lab

• Oligonucleotide microarray– Each probe is a synthesized short DNA (uniquely

corresponding to a substring of a gene)– Affymetrix: ~ 25mers– Aglient: ~ 60 mers

• Others

Page 11: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Spotted cDNA microarray

Page 12: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Array Manufacturing

Each tube contains cDNAs corresponding to a unique gene. Pre-amplified, and spotted onto a glass slide

Page 13: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Experiment

cy3

cy5

Page 14: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Data acquisition

Computer programs are used to process the image into digital signals.

• Segmentation: determine the boundary between signal and background

• Results: gene expression ratios between two samples

Page 16: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Affymetrix GeneChip®

Page 17: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

25-mer unique oligo

mismatch in the middle nuclieotide

multiple probes (11~16) for each gene

from Affymetrix Inc.

Array Design

Page 18: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

                                                         

   

from Affymetrix Inc.

Technology adapted from semiconductor industry.(photolithography and combinatorial chemistry)

Array Manufacturing

In situ synthesis of oligonucletides

Page 19: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

GeneChip® Probe Arrays

24µm

Millions of copies of a specificoligonucleotide probe

Image of Hybridized Probe Array

>200,000 differentcomplementary probes

Single stranded, labeled RNA target

Oligonucleotide probe

* **

**

1.28cm

GeneChip Probe ArrayHybridized Probe Cell

*

Page 20: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

from Affymetrix Inc.

Overview of the Affymetrix GeneChip technology

Each probe set combines to give an absolute expression level.Image segmentation is relatively easy. But how to use MM signal is debatable

Page 21: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

cDNA GeneChip

Probe preparation

Probes are cDNA fragments, usually amplified by PCR and spotted by robot.

Probes are short oligos synthesized using a photolithographic approach.

colors Two-color

(measures relative intensity)

One-color

(measures absolute intensity)

Gene representation

One probe per gene 11-16 probe pairs per gene

Probe length Long, varying lengths

(hundreds to 1K bp)

25-mers

Density Maximum of ~15000 probes. 38500 genes * 11 probes = 423500 probes

Comparison of cDNA array and GeneChip

Page 22: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Affymetrix GeneChipOne color design

cDNA microarrayTwo color design

Why the difference?

Page 23: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Affymetrix GeneChipPhotolithography

(The amount of oligos on a probe is well controlled)

cDNA microarrayRobotic spotting

(The amount of cDNA spotted on a probe may vary greatly)

Page 24: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Advantage and disadvantage of cDNA array and GeneChip

cDNA microarray Affymetrix GeneChip

The data can be noisy and with variable quality

Specific and sensitive. Result very reproducible.

Cross(non-specific) hybridization can often happen.

Hybridization more specific.

May need a RNA amplification procedure.

Can use small amount of RNA.

More difficulty in image analysis. Image analysis and intensity extraction is easier.

Need to search the database for gene annotation.

More widely used. Better quality of gene annotation.

Cheap. (both initial cost and per slide cost)

Expensive (~$400 per array+labeling and hybridization)

Can be custom made for special species.

Only several popular species are available

Do not need to know the exact DNA sequence.

Need the DNA sequence for probe selection.

Page 25: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Computational aspects

• Probe design

• Image processing

• Pre-processing

• Differentially expressed gene identification

• Clustering / classification

• Network / pathway modeling

Page 26: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

First step: pre-processing

• Transformation– Transforms intensities or ratios to a different scale– Why?

• For convenience• Convert data into a certain distribution (e.g. normal) assumed

by many other statistical procedures

• Normalization– Correct for systematic errors– Make data from different samples comparable

Garbage in => Garbage out

Page 27: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Where errors could come from?

• Random errors– Repeat the same experiment twice, get diff results– Using multiple replicates reduces the problem

• Systematic errors– Arrays manufactured at different time– On the same array, probes printed with different

printer tips may have different biases– Dye effect: difference between Cy5 and Cy3 labeling– Experimental factors

• Array A being applied more mRNAs than array B• Sample preparation procedure• Experiments carried out at different time, by different users,

etc.

Page 28: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

cDNA microarray data preprocessing

Page 29: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Typical experiments• Wide-type cells vs mutated cells• Diseased cells with normal cells• Cells under normal growth condition vs cells treated with chemicals• Typically repeated for several times

Pro

bes

(g

enes

)

Ratios

Page 30: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Transforming cDNA microarray data• Data: Cy5/Cy3 ratios as well as raw intensities• Most common is log2 transformation

• 2 fold increase => log2(2) = 1

• 2 fold decrease => log2(1/2) = -1

0 2 4 6 8 10 12 140

500

1000

1500

2000

2500

3000

3500

Cy5/Cy3 ratio

Fre

qu

en

cy

-4 -3 -2 -1 0 1 2 3 40

200

400

600

800

1000

1200

1400

1600

1800

log2(Cy5/Cy3)

Fre

qu

en

cy

Page 31: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Dye effect

cDNA microarray experiments using two identical samples.Cy5 consistently lower than Cy3. Solution: dye swapping.

Page 32: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Dye swapping

• Chip 1: label test by cy5 and control by cy3• Chip 2: label test by cy3 and control by cy5• Ideally cy5/cy3 = cy3/cy5• Not so due to dye effect• Compute average ratio:

½ log2 (cy5/cy3 on chip 1)

+ ½ log2 (cy3/cy5 on chip 2)

Page 33: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Total intensity normalization

• Even after dye-swapping, may still see systematic biases

• Assume the total amount of mRNAs should not change between two samples– Not necessarily true

– Rescale so that the two colors have same total intensity

– Rescale according to a subset of genes

• House-keeping genes• Middle 90% (for example) of genes• Spike-in genes

-4 -3 -2 -1 0 1 2 3 40

500

1000

1500

2000

2500

3000

log2(Cy5/Cy3)

Fre

qu

en

cy-4 -3 -2 -1 0 1 2 3 4

0

500

1000

1500

2000

2500

3000

log2(Cy5/Cy3)

Fre

qu

en

cy

Page 34: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

M-A plot• Also know as ratio-intensity plot

• M: log2(cy5 / cy3) = log2(cy5) – log2(cy3)

• A: ½ log2(cy5 * cy3) = (log2(cy5) + log2(cy3)) / 2

A

M

Ideal:

• M centered at zero

• variance does not depend on A.

However:

• Systematic dependence between M and A

• High variance of M for smaller A

Page 35: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Lowess normalization

• Lowess: Locally Weighted Regression

• Fit local polynomial functions

• M adjusted according to fitted line

A

M

A

M’

Page 36: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Replicate filtering

• Experiments repeated• Genes with very high

variability is questionable

Log2(ratio1)

Log 2

(rat

io2)

Ratio 1 Ratio 2

Page 37: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

oligo microarray data preprocessing

(Affymetrix chip)

Page 38: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Typical experiments

• Multiple microarrays– n samples (from different time, location, condition,

treatment, etc.)– k replicates for each samples

• For example– Samples collected from 100 healthy people and 100

cancer patients– Cells treated with some drugs, take samples every 10

minutes• Repeat on 3 – 5 microarrays for each sample

– Improve reliability of the results– Often averaged after some preprocessing

Page 39: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Main characteristics

• For each gene, there are multiple PM and MM probes (11-16 pairs)– how to obtain overall intensities from these

probe-level intensities?

• Array outputs are absolute values rather than ratios– Cross-array normalization is important for

them to be comparable

Page 40: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

How to use MM information?

• Earlier approach: – First remove outliner probes– Actual intensity = Ipm – Imm

– IPM = IMM + Ispecific ?• Recent trend

– Tend to ignore Imm or use in a different way

• Various software packages– MAS5 (by affymetrix)– dChIP– RMA– GCRMA

0 500 1000 15000

100

200

300

400

500

600

700

800

900

PM

MM

Page 41: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Normalization

• Similar to cDNA microarrays• Total intensity normalization

– Each array has the same mean intensity– Can be based on all genes or a selected subset of

genes• House-keeping genes• Middle 90% (for example) of genes• Spike-in genes

• Lowess with a common reference• Many useful tools implemented in Bioconductor

Page 42: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Conclusions• Microarray provides a way to measure thousands

of genes simultaneously and make the global monitoring of cellular activities possible.

• The method produces noisy data and normalization is crucial.

• Real Time RT-PCR for validation of small number of genes.

Page 43: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Limitation

• Measures mRNA instead of proteins. Actual protein abundance and post-translation modification can not be detected.

• Suitable for global monitoring and should be used to generate further hypothesis or should combine with other carefully designed experiments.

Page 44: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Microarray preproc questions• What kind of array it is?

– Two-color?– One-color?– Oligo array?– cDNA array?

• How is the experiment designed?– Time series?– Test vs control?

• What kind of preprocessing has been done?– What value: raw intensity value or ratios?– Transformation? Log scale? Linear scale? – Normalization: within-array? Cross-array?

• What are the next steps you want to proceed?– Identifying differentially expressed genes?– Clustering?

Page 45: CS 5263 Bioinformatics Lecture 22 Introduction to Microarray.

Some real data

• Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown, “Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale”, Science, 278: 680 – 686, 1997


Recommended