Predicting Microarray Signals by Physical Modeling Josh ...Which is spliced to get rid of junk...

Post on 08-Oct-2020

1 views 0 download

transcript

Predicting Microarray Signals byPhysical Modeling

Josh Deutsch

University of California

Santa Cruz

Predicting Microarray Signals by Physical Modeling – p.1/39

Collaborators

Shoudan Liang

NASA Ames

Onuttom Narayan

UCSC

Predicting Microarray Signals by Physical Modeling – p.2/39

Outline

BackgroundGene transcription and regulationWhat are microarraysExample application: Cancer diagnosis

The problem: how to improve sensitivity?

Microarrays in detailPhysical constraints on how well they can workDifferent approaches to signal extraction

Physical model of the systemDifferent types of bindingExperimental comparisonComparison with other approaches

ConclusionsPredicting Microarray Signals by Physical Modeling – p.3/39

Gene transcription

A T C A T G T A C G T A

T A G T A C A T G C A T

Transcription

factors

RNA polymerase

DNA

A U C A U G U A C G U Apre-mRNA

junk

spliceosome

mRNA

DNA→ messenger RNA (mRNA)

DNA is read by RNApolymerase

Producing pre-mRNA

Which is spliced to get rid ofjunk

Predicting Microarray Signals by Physical Modeling – p.4/39

Gene transcription

A T C A T G T A C G T A

T A G T A C A T G C A T

Transcription

factors

RNA polymerase

DNA

A U C A U G U A C G U Apre-mRNA

junk

spliceosome

mRNA

DNA→ messenger RNA (mRNA)

DNA is read by RNApolymerase

Producing pre-mRNA

Which is spliced to get rid ofjunk

Predicting Microarray Signals by Physical Modeling – p.4/39

Gene transcription

A T C A T G T A C G T A

T A G T A C A T G C A T

Transcription

factors

RNA polymerase

DNA

A U C A U G U A C G U Apre-mRNA

junk

spliceosome

mRNA

DNA→ messenger RNA (mRNA)

DNA is read by RNApolymerase

Producing pre-mRNA

Which is spliced to get rid ofjunk

Predicting Microarray Signals by Physical Modeling – p.4/39

Gene transcription

mRNA

protein

Ribosome

The ribosome translates ev-ery three base pairs into oneamino acid e.g. CAU→ His-tidine. Predicting Microarray Signals by Physical Modeling – p.5/39

Gene transcription

mRNA

protein

Ribosome

The ribosome translates ev-ery three base pairs into oneamino acid e.g. CAU→ His-tidine.

(Miller, Hamkalo, and Thomas)

Predicting Microarray Signals by Physical Modeling – p.5/39

Gene regulation

Sequence specific transcription factors

Transcription

factors

RNA polymerase

DNApromoter siteTATAAAA

Many genes and cell types use the same proteins to regulate tran-

scription. A specific combination of these is probably needed to turn

on a gene.

Predicting Microarray Signals by Physical Modeling – p.6/39

Genetic networks

The state of the cell can be understood from the way different pro-

teins regulate each other and respond to external inputs.

Predicting Microarray Signals by Physical Modeling – p.7/39

What are microarrays?

Gene 1

Gene 2

Gene 3

Gene 4

Fabricate a chip by synthesizing 25 base pair oligomers attached toa substrate.

Predicting Microarray Signals by Physical Modeling – p.8/39

What are microarrays?

Gene 1

Gene 2

Gene 3

Gene 4

Fabricate a chip by synthesizing 25 base pair oligomers attached toa substrate.

Predicting Microarray Signals by Physical Modeling – p.8/39

Linkage to substrate

E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999).

The oligomer probes are attached to the surface via polymeric linker

molecules. Their density is about 1 probe every 40 square angstroms.

Predicting Microarray Signals by Physical Modeling – p.9/39

Sample preparation

Extract mRNA from cell samples

mRNA

Predicting Microarray Signals by Physical Modeling – p.10/39

Sample preparation

Extract mRNA from cell samples

mRNA

cDNA reverse transcriptasemRNA

Predicting Microarray Signals by Physical Modeling – p.10/39

Hybridization to array

Add fluorescent tags to targets:

Predicting Microarray Signals by Physical Modeling – p.11/39

Hybridization to array

Add fluorescent tags to targets:

Then hybridize to probes:

Predicting Microarray Signals by Physical Modeling – p.11/39

Linkage to substrate

E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999).

In many experiments, target molecules can be longer or shorter thanprobe molecules. When they’re longer, secondary structureformation can also occur.

Predicting Microarray Signals by Physical Modeling – p.12/39

A real microarray

From Liming Shi /em gene-chips.comPredicting Microarray Signals by Physical Modeling – p.13/39

Affymetrix gene-chip

K.A. Baggerly (2002)

This is a 640× 640 cell U95Av2 chip. Note the vertical bands which

are evidence of misalignment with the scanner.

Predicting Microarray Signals by Physical Modeling – p.14/39

Application: Cancer diagnosis

Microarrays have been used to differentially diagnose differentkinds of cancer, for example

Diffuse large B-cell Lymphoma

Ovarian Cancer

Leukemia

Breast Cancer

Small Round Blue Cell Tumors

Predicting Microarray Signals by Physical Modeling – p.15/39

Application: Cancer diagnosis

Microarrays have been used to differentially diagnose differentkinds of cancer, for example

Diffuse large B-cell Lymphoma

Ovarian Cancer

Leukemia

Breast Cancer

Small Round Blue Cell Tumors

Often it is hard to distinguish different sub-types of cancer by othermeans. These distinctions are very important because they oftendetermine the course of treatment.

Predicting Microarray Signals by Physical Modeling – p.15/39

Small Round Blue Cell Tumors

Recent work by Khan et al has used microarrays to diagnose fourtypes of pediatric tumors, Small Round Blue Cell Tumors

(SRBCT):

neuroblastoma

Ewing family tumors

Rhabdomyosarcoma

non-Hodgkin lymphoma

Using 63 samples for training and 20 for testing, Khan et al wereable to correctly predict all data using only 96 genes from a pool ofmany thousands.

Predicting Microarray Signals by Physical Modeling – p.16/39

Prediction using minimal gene set

EWSBLNBRMS

57 62 45 60 55 58 59 54 53 49 51 44 47 56 48 43 61 50 52 46 33 32 31 42 39 38 37 41 40 36 34 35 9 3 6 2 10 4 5 7 11 8 0 15 1 16 14 18 17 19 12 22 21 13 20 26 30 28 27 25 29 23 24

Results for the prediction of samples for for different kinds ofSRBCT. Using GESSES (J.M. Deutsch Bioinformatics (2003)).

Predicting Microarray Signals by Physical Modeling – p.17/39

High sensitivity needed

Many genes critical to the function of a cell are only present inminute concentrations, for example in the initial stages of aregulatory cascade.

With many orders of magnitude of concentrations present fordifferent kinds of mRNA, this pushes the limits of microarraytechnology.

Predicting Microarray Signals by Physical Modeling – p.18/39

High sensitivity needed

Many genes critical to the function of a cell are only present inminute concentrations, for example in the initial stages of aregulatory cascade.

With many orders of magnitude of concentrations present fordifferent kinds of mRNA, this pushes the limits of microarraytechnology.

0

5000

10000

15000

20000

25000

0 50 100 150 200 250

leve

l

index

inputoutput

Data from “1532 Series” Affymetrix spike-in human gene experi-

ments.Predicting Microarray Signals by Physical Modeling – p.18/39

High sensitivity needed

Many genes critical to the function of a cell are only present inminute concentrations, for example in the initial stages of aregulatory cascade.

With many orders of magnitude of concentrations present fordifferent kinds of mRNA, this pushes the limits of microarraytechnology.

0.1

1

10

100

1000

10000

100000

0 50 100 150 200 250

leve

l

index

inputoutput

Data from “1532 Series” Affymetrix spike-in human gene experi-

ments.Predicting Microarray Signals by Physical Modeling – p.18/39

Affymetrix Latin Square Experiments

A B C D E F G H I J K L M N

1 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024

2 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0

3 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25

4 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5

5 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1

6 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2

7 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4

8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8

9 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16

10 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32

11 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64

12 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128

13 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256

14 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512

Groups of TranscriptspM Concentration

Gen

eChi

p® E

xper

imen

t

Predicting Microarray Signals by Physical Modeling – p.19/39

Is this variation all noise?

This variation is reproducible:

Predicting Microarray Signals by Physical Modeling – p.20/39

Is this variation all noise?

This variation is reproducible:

0.1

1

10

100

1000

10000

100000

0 50 100 150 200 250

leve

l

index

inputoutputoutput

Predicting Microarray Signals by Physical Modeling – p.20/39

Is this variation all noise?

This variation is reproducible:

0.1

1

10

100

1000

10000

100000

0 50 100 150 200 250

leve

l

index

inputoutputoutput

It is likely due to the physicochemical differences between targetand probe molecules.

The output measures the amount of binding.

In the case of these Affymetrix arrays, the hybridization isbetween cRNA targets in solution and DNA probes.

Clearly not all target molecules are binding to the probes.

Predicting Microarray Signals by Physical Modeling – p.20/39

Why not design it to be less variable?

It is hard finding optimal conditions for a microarray to work. Animportant parameter is the melting temperature for DNA/RNAhybridization of a probe and complementary target molecule.

At temperatures << melting temperature, the time scale forrelaxation becomes very long. The system becomes too stickyand irreversible. (Oligomers are 25 base pairs).

At temperature >> melting temperature, nothing hybridizes.

Therefore it must operate just under a typical meltingtemperature

The melting temperature depends on the probe’s chemicalsequence.

Because of fluctuation in the melting temperature, probes withlower than average melting curves will show less hybridization.

Predicting Microarray Signals by Physical Modeling – p.21/39

Why not design it to be less variable?

It is hard finding optimal conditions for a microarray to work. Animportant parameter is the melting temperature for DNA/RNAhybridization of a probe and complementary target molecule.

At temperatures << melting temperature, the time scale forrelaxation becomes very long. The system becomes too stickyand irreversible. (Oligomers are 25 base pairs).

At temperature >> melting temperature, nothing hybridizes.

Therefore it must operate just under a typical meltingtemperature

The melting temperature depends on the probe’s chemicalsequence.

Because of fluctuation in the melting temperature, probes withlower than average melting curves will show less hybridization.

Predicting Microarray Signals by Physical Modeling – p.21/39

Why not design it to be less variable?

It is hard finding optimal conditions for a microarray to work. Animportant parameter is the melting temperature for DNA/RNAhybridization of a probe and complementary target molecule.

At temperatures << melting temperature, the time scale forrelaxation becomes very long. The system becomes too stickyand irreversible. (Oligomers are 25 base pairs).

At temperature >> melting temperature, nothing hybridizes.

Therefore it must operate just under a typical meltingtemperature

The melting temperature depends on the probe’s chemicalsequence.

Because of fluctuation in the melting temperature, probes withlower than average melting curves will show less hybridization.

Predicting Microarray Signals by Physical Modeling – p.21/39

Why not design it to be less variable?

It is hard finding optimal conditions for a microarray to work. Animportant parameter is the melting temperature for DNA/RNAhybridization of a probe and complementary target molecule.

At temperatures << melting temperature, the time scale forrelaxation becomes very long. The system becomes too stickyand irreversible. (Oligomers are 25 base pairs).

At temperature >> melting temperature, nothing hybridizes.

Therefore it must operate just under a typical meltingtemperature

The melting temperature depends on the probe’s chemicalsequence.

Because of fluctuation in the melting temperature, probes withlower than average melting curves will show less hybridization.

Predicting Microarray Signals by Physical Modeling – p.21/39

Why not design it to be less variable?

It is hard finding optimal conditions for a microarray to work. Animportant parameter is the melting temperature for DNA/RNAhybridization of a probe and complementary target molecule.

At temperatures << melting temperature, the time scale forrelaxation becomes very long. The system becomes too stickyand irreversible. (Oligomers are 25 base pairs).

At temperature >> melting temperature, nothing hybridizes.

Therefore it must operate just under a typical meltingtemperature

The melting temperature depends on the probe’s chemicalsequence.

Because of fluctuation in the melting temperature, probes withlower than average melting curves will show less hybridization.

Predicting Microarray Signals by Physical Modeling – p.21/39

Affymetrix’s approach

Mismatch

Perfect Match

TACTGTCTATGGACGGCTTCGAATGTACTGTCTATGGTCGGCTTCGAATG

...ATTCTCAGGATACTGCCCGTTACTGTCTATGGTCGGCTTCGAATGATACTCTCGTATATCGATCGGCTTATACGCGATTATACGC...

mRNA sequence:

Probe intensities

Predicting Microarray Signals by Physical Modeling – p.22/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

Predicting Microarray Signals by Physical Modeling – p.23/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

Predicting Microarray Signals by Physical Modeling – p.23/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".

It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

Predicting Microarray Signals by Physical Modeling – p.23/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

Predicting Microarray Signals by Physical Modeling – p.23/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

Predicting Microarray Signals by Physical Modeling – p.23/39

Affymetrix uses statistics

Include probes that have one base altered so that it does nothybridize as readily to the target sequence.

Make many probes (typically 16) per mRNA, to average outover this variation as follow:

Affymetrix MAS5.0 uses "One step Tukey’s biweightestimate".It subtracts off the mismatch signal from the perfect matchin cases where the perfect match is larger.

Do statistics on these numbers to try to obtain a good estimatefor the input target concentration.

This approach ignores the fact that these variations are largely repro-

ducible and depend on the sequence of the probes.

Predicting Microarray Signals by Physical Modeling – p.23/39

Using sequence information

Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented amodel to try to predict hybridization intensities using sequenceinformation.

The model they use postulates an "energy" of binding thatdepends on position along the sequence. One expects that theends to be less tightly bound than the middle. There are twokinds of energy:

Non-specific term that gives the base line binding when thematching target is not present.Specific term that has a linear dependence of input targetconcentration on the measured signal.

Minimizing over (16*2 + 24*2 + 3) parameters, they obtainresults far better than the MAS5.0 approach of Affymetrix.

Predicting Microarray Signals by Physical Modeling – p.24/39

Using sequence information

Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented amodel to try to predict hybridization intensities using sequenceinformation.

The model they use postulates an "energy" of binding thatdepends on position along the sequence. One expects that theends to be less tightly bound than the middle. There are twokinds of energy:

Non-specific term that gives the base line binding when thematching target is not present.

Specific term that has a linear dependence of input targetconcentration on the measured signal.

Minimizing over (16*2 + 24*2 + 3) parameters, they obtainresults far better than the MAS5.0 approach of Affymetrix.

Predicting Microarray Signals by Physical Modeling – p.24/39

Using sequence information

Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented amodel to try to predict hybridization intensities using sequenceinformation.

The model they use postulates an "energy" of binding thatdepends on position along the sequence. One expects that theends to be less tightly bound than the middle. There are twokinds of energy:

Non-specific term that gives the base line binding when thematching target is not present.Specific term that has a linear dependence of input targetconcentration on the measured signal.

Minimizing over (16*2 + 24*2 + 3) parameters, they obtainresults far better than the MAS5.0 approach of Affymetrix.

Predicting Microarray Signals by Physical Modeling – p.24/39

Using sequence information

Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented amodel to try to predict hybridization intensities using sequenceinformation.

The model they use postulates an "energy" of binding thatdepends on position along the sequence. One expects that theends to be less tightly bound than the middle. There are twokinds of energy:

Non-specific term that gives the base line binding when thematching target is not present.Specific term that has a linear dependence of input targetconcentration on the measured signal.

Minimizing over (16*2 + 24*2 + 3) parameters, they obtainresults far better than the MAS5.0 approach of Affymetrix.

Predicting Microarray Signals by Physical Modeling – p.24/39

Using sequence information

Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented amodel to try to predict hybridization intensities using sequenceinformation.

The model they use postulates an "energy" of binding thatdepends on position along the sequence. One expects that theends to be less tightly bound than the middle. There are twokinds of energy:

Non-specific term that gives the base line binding when thematching target is not present.Specific term that has a linear dependence of input targetconcentration on the measured signal.

Minimizing over (16*2 + 24*2 + 3) parameters, they obtainresults far better than the MAS5.0 approach of Affymetrix.

But does this model capture the physics? Predicting Microarray Signals by Physical Modeling – p.24/39

Important physical effects

Predicting Microarray Signals by Physical Modeling – p.25/39

Important physical effects

Nonspecific Binding

Predicting Microarray Signals by Physical Modeling – p.25/39

Important physical effects

Target−Target Binding

Nonspecific Binding

Predicting Microarray Signals by Physical Modeling – p.25/39

Important physical effects

Target−Target Binding

Nonspecific Binding

Nonspecific binding and target-target binding are important effects

to consider.

Predicting Microarray Signals by Physical Modeling – p.25/39

Evidence for target target binding

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 200 400 600 800 1000 1200

outp

ut

input

Probes asymptote at different maximum values often correlating with

their sequences being similar.

Predicting Microarray Signals by Physical Modeling – p.26/39

Evidence for nonspecific binding

50

100

150

200

250

300

350

0 0.2 0.4 0.6 0.8 1

outp

ut

input

When the input target concentration is zero or small, there is still asubstantial signal from the corresponding probes.

Predicting Microarray Signals by Physical Modeling – p.27/39

Partial zippering

Because the binding energy is strongly dependent on base paircomposition near the melting temperature, we expect that moleculesare only partially bound at any one time.

Predicting Microarray Signals by Physical Modeling – p.28/39

Partial zippering

Because the binding energy is strongly dependent on base paircomposition near the melting temperature, we expect that moleculesare only partially bound at any one time.

For a segment bound from monomer n to monomer m, the freeenergy of the nearest neighbor model is:

∆Gmn =n−1∑

i=m

ε(i, i+ 1) + εinitiation

Predicting Microarray Signals by Physical Modeling – p.28/39

Partition function

The partition function for a target molecule hybridized to a probe,Z = exp(−β∆G) gives the total free energy for binding.It sums over all possible zippered states starting at m and ending atm. For a probe with a total of N monomers:

Predicting Microarray Signals by Physical Modeling – p.29/39

Partition function

The partition function for a target molecule hybridized to a probe,Z = exp(−β∆G) gives the total free energy for binding.It sums over all possible zippered states starting at m and ending atm. For a probe with a total of N monomers:

Z =∑ N∑

m<n=2

e−β∆Gmn=∑ N∑

m<n=2

e−β(Pn−1i=m ε(i,i+1)+εinitiation)

sum over all begin and end points

Predicting Microarray Signals by Physical Modeling – p.29/39

Partition function

The partition function for a target molecule hybridized to a probe,Z = exp(−β∆G) gives the total free energy for binding.It sums over all possible zippered states starting at m and ending atm. For a probe with a total of N monomers:

Z =∑ N∑

m<n=2

e−β∆Gmn=∑ N∑

m<n=2

e−β(Pn−1i=m ε(i,i+1)+εinitiation)

using the nearest neighbor model

Predicting Microarray Signals by Physical Modeling – p.29/39

Partition function

The partition function for a target molecule hybridized to a probe,Z = exp(−β∆G) gives the total free energy for binding.It sums over all possible zippered states starting at m and ending atm. For a probe with a total of N monomers:

Z =∑ N∑

m<n=2

e−β∆Gmn =∑ N∑

m<n=2

e−β(Pn−1i=m ε(i,i+1)+εinitiation)

Predicting Microarray Signals by Physical Modeling – p.29/39

Partition function

The partition function for a target molecule hybridized to a probe,Z = exp(−β∆G) gives the total free energy for binding.It sums over all possible zippered states starting at m and ending atm. For a probe with a total of N monomers:

Z =∑ N∑

m<n=2

e−β∆Gmn =∑ N∑

m<n=2

e−β(Pn−1i=m ε(i,i+1)+εinitiation)

This depends strongly on the the temperature and the probesequence.

It can be efficiently computed in O(N) operations using a recursion

relation for each probe considered.

Predicting Microarray Signals by Physical Modeling – p.29/39

Fraction bound

Z (or ∆G) and the chemical potential give the affinity for binding.That is the fraction f of bound probes is

f =1

1 + eβ(∆G−µ)

Probe Solution

G∆

µ = const+ log(concentration)/β

is determined by the requirement that

µsolution = µprobes

Predicting Microarray Signals by Physical Modeling – p.30/39

Nonspecific free energy

A probe can still bind, though more weakly to target sequences thathave mismatches. The above method of calculation still applies.

Predicting Microarray Signals by Physical Modeling – p.31/39

Nonspecific free energy

A probe can still bind, though more weakly to target sequences thathave mismatches. The above method of calculation still applies.

However we don’t know the concentrations of the target solutionbecause Affymetrix has added uncharacterized concentrations ofhuman RNA as a background.

Predicting Microarray Signals by Physical Modeling – p.31/39

Nonspecific free energy

A probe can still bind, though more weakly to target sequences thathave mismatches. The above method of calculation still applies.

However we don’t know the concentrations of the target solutionbecause Affymetrix has added uncharacterized concentrations ofhuman RNA as a background.

If the binding is weak, when can replace Z with an annealedaverage over different sequences. This leads to an effective nearestneighbor model for this non-specific binding

∆GNSmn =

n−1∑

i=m

εNS(i, i+ 1) + εNSinitiation

Here the εNS’s have to be empirically determined.Predicting Microarray Signals by Physical Modeling – p.31/39

Target-target binding

For similar reasons we don’t know the concentration of targets.Therefore we replace those interactions with an effective annealedmodel.

∆GTTmn =

n−1∑

i=m

εTT (i, i+ 1) + εTTinitiation

Predicting Microarray Signals by Physical Modeling – p.32/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,

Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,

Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

The full model

Given an initial set of concentrations for different target molecules,our model will predict the observed intensities of hybridization oftargets to probe molecules.The parameters in the model are:

Energies in the nearest neighbor model ε(b1, b2) e.g. ε(G, T ).There are 16 possibilities and 3 sets of terms. specific,non-specific, and target-target.

3 Initiation factors.

Small additive background constant,Number of probe molecules,Proportionality factor between binding and light intensity.

A total of 54 parameters, to fit 2464 data points from the LatinSquare experiments.

Predicting Microarray Signals by Physical Modeling – p.33/39

Parameter fitting

We minimize the fitness function:

I =∑

all data

(log(predicted conc.)− log(observed conc.))2

with respect to all parameters.

This is hard because there are many minima in parameter space. This

can be solved using simulating annealing monte carlo, and parallel

tempering.

Predicting Microarray Signals by Physical Modeling – p.34/39

Parallel tempering

1 2 NN−143 N−3 N−2

T

Simulate N copies of a system at temperatures T1, T2, . . . , TN .

Exchange neighboring system configurations depending on therelative energies of the systems ∆E and the temperaturedifference ∆β with probability

p =

{exp(∆β ∆E) for ∆β ∆E < 0 ,

1 otherwise .

Predicting Microarray Signals by Physical Modeling – p.35/39

Parallel tempering

1 2 NN−143 N−3 N−2

T

Simulate N copies of a system at temperatures T1, T2, . . . , TN .

Exchange neighboring system configurations depending on therelative energies of the systems ∆E and the temperaturedifference ∆β with probability

p =

{exp(∆β ∆E) for ∆β ∆E < 0 ,

1 otherwise .

Predicting Microarray Signals by Physical Modeling – p.35/39

Parallel tempering

1 2 NN−143 N−3 N−2

T

Simulate N copies of a system at temperatures T1, T2, . . . , TN .

Exchange neighboring system configurations depending on therelative energies of the systems ∆E and the temperaturedifference ∆β with probability

p =

{exp(∆β ∆E) for ∆β ∆E < 0 ,

1 otherwise .

It is very efficient at finding low energy states of systems with many

degrees of freedom Predicting Microarray Signals by Physical Modeling – p.35/39

Results

Imin = 0.188

0.1 1

10 100

1000 10000

100000

0 20 40 60 80 100 120 140 160 180

leve

l

index

realpredinput

0.1 1

10 100

1000 10000

100000

0 20 40 60 80 100 120 140 160 180

leve

l

index

realpredinput

Predicting Microarray Signals by Physical Modeling – p.36/39

Leave out one

Predict first transcript (16 probes) training on all the other data.

100

1000

10000

100000

0 2 4 6 8 10 12 14 16

leve

l

index

realpred

tex

Predicting Microarray Signals by Physical Modeling – p.37/39

Partial binding

The ends of the hybridized targets tend to be frayed.

The fraction of the time a nucleotide is unbound depends on thesequence.

Predicting Microarray Signals by Physical Modeling – p.38/39

Partial binding

The ends of the hybridized targets tend to be frayed.

The fraction of the time a nucleotide is unbound depends on thesequence.

Predicting Microarray Signals by Physical Modeling – p.38/39

Partial binding

The ends of the hybridized targets tend to be frayed.

The fraction of the time a nucleotide is unbound depends on thesequence.

0

0.2

0.4

0.6

0.8

1

5 10 15 20

fract

ion

unbo

und

position

Predicting Microarray Signals by Physical Modeling – p.38/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.

Non-specific binding of other target molecules.Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.

Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.Binding of target molecules to each other.

Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39

Conclusions

The output of Affymetrix gene chips can be understood interms of the physicochemical properties of equilibriumstatistical mechanics. The processes involve

Partially zippered RNA targets and DNA probes.Non-specific binding of other target molecules.Binding of target molecules to each other.Taking into account the nonlinearity (saturation) of binding.

This understanding leads a model that predicts mRNA levelswell.

It is hoped that this model will be turned into software that willbenefit researchers wanting a more accurate determination ofmRNA levels in their experiments.

Predicting Microarray Signals by Physical Modeling – p.39/39