Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 218 times |
Download: | 1 times |
DNA microarray and DNA microarray and array data analysisarray data analysis
Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of the Gene Expression Array Core Facility at CWRU
What is DNA MicroarrayWhat is DNA Microarray DNA microarray is a new technology to DNA microarray is a new technology to
measure the level of the measure the level of the mRNA gene mRNA gene productsproducts of a living cell. of a living cell.
A microarray chip is a rectangular chip A microarray chip is a rectangular chip on which is imposed a grid of on which is imposed a grid of DNA DNA spotsspots. These spots form a . These spots form a two two dimensional arraydimensional array. .
Each spot in the array contains millions Each spot in the array contains millions of copies of some DNA strand, bonded of copies of some DNA strand, bonded to the chip.to the chip.
Chips are made tiny so that a small Chips are made tiny so that a small amount of RNA is needed from amount of RNA is needed from experimental cells.experimental cells.
DNA MicroarrayDNA Microarray Many applications in both basic and clinical Many applications in both basic and clinical
research research determining the role a gene plays in a pathway, determining the role a gene plays in a pathway,
disease, diagnostics and pharmacology, …disease, diagnostics and pharmacology, …
There are three main platforms for There are three main platforms for performing microarray analyses. performing microarray analyses. cDNA arrayscDNA arrays (generic, multiple (generic, multiple
manufacturers)manufacturers) Oligonucleotide arraysOligonucleotide arrays ( (genechipsgenechips) )
(Affymetrix)(Affymetrix) cDNA membranes (radioactive detection)cDNA membranes (radioactive detection)
cDNA MicroarraycDNA Microarray Spot cloned cDNAs onto a glass/nylon Spot cloned cDNAs onto a glass/nylon
microscope slidemicroscope slide usually PCR amplified segments of plasmidsusually PCR amplified segments of plasmids Complementary hybridizationComplementary hybridization
-- CTAGCAGG actual gene-- CTAGCAGG actual gene
-- GATCGTCC cDNA (-- GATCGTCC cDNA (Reverse transcriptase)Reverse transcriptase)-- CUAGCAGG mRNA-- CUAGCAGG mRNA
Label 2 mRNA samples with 2 different colors Label 2 mRNA samples with 2 different colors of fluorescent dye -- control vs. experimentalof fluorescent dye -- control vs. experimental
Mix two labeled mRNAs and hybridize to the Mix two labeled mRNAs and hybridize to the chipchip
Make two scans - one for each colorMake two scans - one for each color Combine the images to calculate ratios of Combine the images to calculate ratios of
amounts of each mRNA that bind to each spotamounts of each mRNA that bind to each spot
CTRL
TEST
Spotted Microarray Spotted Microarray Process Process
cDNA Array Experiment cDNA Array Experiment MovieMovie
http://http://www.bio.davidson.edu/courses/genowww.bio.davidson.edu/courses/genomics/chip/chip.htmlmics/chip/chip.html
““Long Oligos”Long Oligos”
Like cDNAs, but instead of using a Like cDNAs, but instead of using a cloned gene, design a 40-70 base cloned gene, design a 40-70 base probe to represent each geneprobe to represent each gene
Relies on genome sequence Relies on genome sequence database and bioinformaticsdatabase and bioinformatics
Reduces cross hybridizationReduces cross hybridization Cheaper and possibly more sensitive Cheaper and possibly more sensitive
than Affy. systemthan Affy. system
AffymetrixAffymetrix Uses 25 base oligos synthesized in place on Uses 25 base oligos synthesized in place on
a chip (20 pairs of oligos for each gene)a chip (20 pairs of oligos for each gene) cRNA labeled and scanned in a single cRNA labeled and scanned in a single
“color”“color” one sample per chipone sample per chip
Can have as many as Can have as many as 47,000 probes47,000 probes on a on a chip (HG-U133 Plus 2.0 Array)chip (HG-U133 Plus 2.0 Array)
Arrays get smaller every year (more genes)Arrays get smaller every year (more genes) Chips are expensive (about $400/chip)Chips are expensive (about $400/chip) Proprietary system: “black box” software, Proprietary system: “black box” software,
can only use their chipscan only use their chips
Affymetrix Genome Arrays
Affymetrix GeneChipAffymetrix GeneChip®® Probe Probe ArrayArray
Affymetrix GeneChipAffymetrix GeneChip®® Probe Probe
ArraysArrays
24~50µm
Each probe cell or feature containsmillions of copies of a specificoligonucleotide probe
Image of Hybridized Probe Array
Single stranded, fluorescentlylabeled cRNA target
Oligonucleotide probe
**
**
*
1.28cm
GeneChip Probe Array
Hybridized Probe Cell
BGT108_DukeUniv
*
AffymetrAffymetrix ix
GeneChiGeneChippProbe: Probe:
25 bases long single 25 bases long single stranded DNA stranded DNA oligosoligos
Probe Cell: Probe Cell: Single square-Single square-
shaped feature on shaped feature on an array containing an array containing one type of probe. one type of probe.
Contains millions of Contains millions of probe moleculesprobe molecules
Probe Pair: Probe Pair: Perfect Perfect
Match/MismatchMatch/MismatchProbe Set
Array Design
3’
5’ Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene
Twenty oligo probes are selected from the last 600 bases from the 3’ end of the gene
Perfect Match
Mismatch
25 mer DNA oligoFor each probe selected, a partner containing a central mutation is also made
Perfect MatchMismatch
Probe Set
Probe Pair
PMMM Probe Cell
24m
24mFor each gene a total of 20 probe pairs are arrayed on the chip
Probe Sub-types on chips
Known genes Specific transcripts Exemplars Consensus Housekeeping genes
Expressed sequence tags (ESTs) Spiked control transcripts
Total RNA (5-8 g) AAAAAAAAA
cRNA preparation
cRNA is now ready for hybridization to test chip
cDNA Strand 1 synthesis TTTTTTTTTNNNNNNNNNAAAAAAAAA
SS II reverse transcriptaseT7RNA pol. promoter
cDNA Strand 2 synthesis
TTTTTTTTTNNNNNNNNNAAAAAAAAANNNNN
E. coli DNA pol. I
T7RNA pol. promoter
NNNNNNNN
IVT cRNA synthesis amplifies and labels transcripts with
Biotin NNNNNNNNNNNNNAAAAAAAAAAAAAAN
TTTTTTTTTT T
UUUUUUUUUU
………..UUUUUUUUUU………..UUUUUUUUUU………..UUUUUUUUUU………..UUUUUUUUUU………..……
…….
T7 RNA pol.TT
Fragmented cRNA
cDNA probes
B
B
BB
B
B
B
B
B
B
B
B
BB
B
B
B
BB
B
B
cRNA labeled targets
B
B
B
B
B
B
BB
B
B
B
BB
B
B
cRNA labeled targets
Non-SpecificBinding
SpecificBinding
Post hybridiz-ation washes
SFL
SFL
SFL
B
B
B
SFL
SFL
SFL
B
BB SFL
SFL
SFL
Streptavidin
Microarray experimentMicroarray experiment
ScanScan
Wash Wash StainStain
BB BB BB BB
Biotin-Labeled Biotin-Labeled cRNA transcriptcRNA transcript
cDNAcDNA
IVTIVT
(B-UTP)(B-UTP)
Poly (A)Poly (A)++
RNARNA
AAAAAAAA
HybridizeHybridize
(1-18 hours)(1-18 hours)
FragmentFragment(heat, Mg(heat, Mg2+2+))
Biotin-Labeled Biotin-Labeled cRNA fragmentscRNA fragments
BB BB
BB
BB
CellsCells
The chip image data file (or “.dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan.
Here, we zoom in to see an individual probe set that has been highlighted
Probe set
.dat file
The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell
A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells
.cel file
Affymetrix Algorithms 1.1 Adjusting MMs to purge negative values
All MMs < PMs,No adjustment
necessary
Few MMs > PMs, change MMs based on weighted mean of other MMs
Most MMs > PMs, change MMs to be slightly lesss than PM
1. Signal
Affymetrix Algorithms Signal Calculation.Calculate the signal
PM 1000 5000 430 765 355 98 3005 413 20333 590MM 900 2000 230 25 331 40 1200 203 6197 230
PM-MM 100 3000 200 740 24 58 1805 210 14136 360
Using Tukey’s biweight mean = 1780Signal (expression level) = 1780
Having adjusted the MM values, we now calculate the signal
The PM values.
The PM-MM values are calculated.The MM values.
Standard deviations
1
1 2 3 4 5 6
Weight factor
The unweighted mean is vulnerable to outlier data. In order to protect against this, we dampen the effect of outliers by using the Tukey bi-weight mean. PM-MM values that are a number of standard deviations away from the mean are given low weights in accordance with the graph shown here. Individual PM-MM data are multiplied by the weight factor before calculation of the mean. The weighted mean is then called the “signal.”
Unweighted mean = 2063
.xls file
ALL_vs_AML_train_set_38_sorALL_vs_AML_train_set_38_sorted.rested.res
ALL_vs_AML_train_set_38_sortALL_vs_AML_train_set_38_sorted.clsed.cls
38 2 138 2 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
1 11 1
27 11