GeneChips and Microarray Expression Data David Paoletti.

Post on 23-Dec-2015

223 views 0 download

transcript

GeneChips andMicroarray Expression Data

David Paoletti

The Problem

• Determine gene expression (activity)

• What proteins are being produced by a group of cells?

The Assumption

• The RNA present in the cell determines what proteins are being produced

• Efficiency

The Why

• Understanding

• Toxicology

• Drug design– Evaluation– Specificity– Response

What is a GeneChip?

• 1.28 x 1.28 cm glass wafer 500,000 features

– 24 x 24 m probe site– 25 mer oligo, complementary

• PM: perfect match

• MM: mismatch

2.5 M copies

GeneChip

The Solution

The Gains

• Speed

• Possibility

• Sensitivity

• Reproducibility

The Process

CellsPoly-ARNA

AAAA

cDNA

L L L

L

IVT

Biotin-labeledAntisense cRNA

L

Fragment (heat, Mg2+)

Labeledfragments

Hybridize Wash/stain Scan

L

Hybridization and Staining

LL

GeneChip BiotinLabeled cRNA

+L

L

L

L

L

L

L

L

L

L+

SAPEStreptavidin-phycoerythrin

Hybridized Array

Specialized Equipment

How Features Are Chosen5’ 3’Gene Sequence

Multipleoligo probes

25 mers

Perfect MatchMismatch

Feature Values

83 112 96 32

47 382 165 87

55 246 140 93

104 552 187 65

Remove outermost rows and columns

Find 75th percentile of remaining values

This value is taken as representative of this feature

Background Noise Removal

• The array is divided into 16 equal sectors

• For each sector– Find the lowest 2% of the feature intensities– Average these– Subtract this average from the intensity value of

all features in the sector

Noise Calculation

bgi i

iraw

pixel

stdev

NQ

1

NFSFQQ raw

Average Difference Intensity

• For a given gene– For each probe pair for the given gene

• Calculate the difference PM-MM

– Calculate , for this set– If abs( (PM – MM) - ) 3, delete from set– Remaining set is pairs in avg

avgin pairsavgin pairs#

1

iii MMPMAvgDiff

Positive & Negative Probe Pairs

If both true, mark as positive

If both true, mark as negative

PM-MM SDT

PM/MM SRT

MM-PM SDT

MM/PM SRT

SDT = Q · STDmult

By default, SRT = 1.5, STDmult = 2.0 (low density), 4.0 (high)

Voting Methods forAbsolute Call

• Positive/negative ratio

PNR = #pos / #neg

• Positive fraction

PF = #pos / #used

• Log average ratio

avgin pairs

)/log(avgin pairs#

10MMPMLA

Decision Matrix

Absent Marginal Present

PNR 3.00 4.00

PF 0.33 0.43

LA 0.90 1.30

Average Difference andAbsolute Call

• Which of these do you base a decision on, for whether a gene is being expressed?

• Use the absolute call for decision

• Use average difference to compare those which are present

Conclusions

• Incredible amalgam of biological and computational processes

• Allows analyses that would not be performed otherwise

• Already of proven worth

References

• Moore, S K; Making chips to probe genes, IEEE Spectrum, March 2001, 54-60.

• GeneChip Gene Expression Algorithm Training, Part I: Absolute Analysis; Affymetrix.

• Berberich, S, and McGorry, M; GeneChip protocols; Wright State University.