+ All Categories
Home > Documents > Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering...

Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Discovering...

Date post: 28-Mar-2015
Category:
Upload: wesley-simonson
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
74
Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/ Discovering Transcription Factor Binding Sites in Co-Regulated Genes Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy)
Transcript
Page 1: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Computational Molecular BiologyBiochem 218 – BioMedical Informatics 231

http://biochem218.stanford.edu/

Discovering Transcription FactorBinding Sites in Co-Regulated Genes

Doug BrutlagProfessor Emeritus

Biochemistry & Medicine (by courtesy)

Page 2: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Motivation

Searching for conserved sequencemotifs regulating the expression

MicroArray analysis of whole genome gene expression

Clustering of genes based on their expression pattern

Page 3: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Megacluster of Yeast Gene Expression

Page 4: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

T Cells Signaling

DNA Damage

Fibroblast Stimulation

B Cells Signaling

CMV Infection

Anoxia

Polio InfectionMonocytes Signaling IL4

Hormone

Human Gene Expression Signatures

Page 5: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Upstream Regions Co-expressed

Genes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC

Finding Transcription Factor Binding Sites

Pho 5

Pho 8

Pho 81

Pho 84

Pho …

Transcription Start

Page 6: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Upstream Regions Co-expressedGenes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT

Finding Transcription Factor Binding Sites

Page 7: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Upstream Regions Co-expressedGenes

ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT

Pho4 binding

Finding Transcription Factor Binding Sites

Page 8: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Three Algorithms

• BioProspectoro Presented in 2000o Extends Gibb’s sampling (stochastic

method)o For any cluster of sequences

• MDScano Deterministic approacho Enumerativeo Very fasto For sequences with some ranking

information• MotifCut and MotifScan

o Graph-basedo Does not use PSSMso Novel and sensitive

Page 9: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Representing Ambiguous DNA Motifs

• Sequence Patterns (Regular expressions)

• IUPAC nomenclatures for DNA ambiguities

Consensus motif: CACAAAADegenerate motif: CRCAAAW

A/TA/G

Page 10: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Weight Matrix for Transcription Factor Binding Sites

A DNA Motif as a position specific frequency weight matrix

SitesATGGCATG

AGGGTGCG

ATCGCATG

TTGCCACG

ATGGTATT

ATTGCACG

AGGGCGTT

ATGACATG

ATGGCATG

ACTGGATG

Pos A C G T1 9 0 0 12 0 1 2 73 0 1 7 24 1 1 8 05 0 7 1 26 8 0 2 07 0 3 0 78 0 0 8 2

Alignment Matrix Frequency weight MatrixPos A C G T Con

1 0.9 0 0 0.1 A2 0 0.1 0.2 0.7 T3 0 0.1 0.7 0.2 G4 0.1 0.1 0.8 0 G5 0 0.7 0.1 0.2 C6 0.8 0 0.2 0 A7 0 0.3 0 0.7 T8 0 0 0.8 0.2 G

Page 11: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Weight Matrix with Consensus Sequence & Logotype with Degenerate

Consensus

TTWHYCGGHY

Weight Matrix or Position Specific Scoring Matrix

Page 12: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Initialization

Gather together upstream regulatory regions

Page 13: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Initialization

a1

a2

a3

a4

ak

Actual Location of Regulatory Motifs is Unknown

Page 14: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Initialization

Initial Motif

Randomly initialize the beginning motif

a3'a4'ak'

a2'a1'

Page 15: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

a1'

BioProspector Iterative Update

Take out one sequence at a time with its segment

Page 16: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Segment (1-6): 1.5 Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 17: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (2-7): 3

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of SegmentS

eg

me

nt

Sco

re

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 18: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (3-8): 2.7

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 19: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (4-9): 9.0

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 20: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (5-10): 3.2

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 21: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (6-11): 27.1

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 22: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (7-12): 11.2

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 23: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (8-13): 2.9

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 24: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Motif Withouta1' Segment

Segment (9-14): 9.1

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Sequence 1

BioProspector Iterative Update

Score each segment with the current motif

Page 25: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

Segment Scores of Sequence 1

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Se

gm

en

t S

core

Motif Withouta1' Segment

a1"

Candidate Motif

BioProspector Iterative Update

Score sequence 1 in all possible alignments

Page 26: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

a3'a4'ak'

a2'

a1"

BioProspector Iterative Update

Repeat the process until convergence

Motif Withouta2' Segment

Page 27: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Challenges for BioProspector http://bioprospector.stanford.edu/

• Variable (0-n) motif sites per sequence• Motif enriched only in upstream

sequences, not in the whole genome • Some motifs could have two conserved

blocks separated by a variable length gap

• Motifs are not highly conserved (~50%)• Some motifs show a palindromic

symmetry• Assign motifs a measure of statistical

significance

Page 28: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Thresholds Allow forVariable Motif Copies

• Sequences that do not have the motif• Sequences with multiple copies of motif

Sampling with Two Threshold

0

5

10

15

20

1 2 3 4 5 6 7 8 9 0 11 12 13 14 15 16 17 18 19 20

Starting Position of Segment

Seg

men

t S

coreTH

TL

Sample

Discard

Keep

Page 29: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Finds Motif With Two Blocks

Two-block motifs: GACACATTACCTATGC TGGCCCTACGACCTCTCGC

CACAATTACCACCA TGGCGTGATCTCAGACACGGACGGC

GCCTCGATTACCGTGGTA TGGCTAGTTCTCAAACCTGACTAAA

TCTCGTTAGATTACCACCCA TGGCCGTATCGAGAGCG

CGCTAGCCATTACCGAT TGGCGTTCTCGAGAATTGCCTAT

Page 30: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Finds MotifsWith Two Blocks

Two-block motifs

Sequence

Min Gap

Max Gap

blk1 block222.426.530.118.9

97.9

Sample Block2 start

Page 31: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Finds Motif With Inverse Complementary Blocks

Two-block motifsPalindrome motifs:

AATGCG

GCGTAA

Page 32: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

• B. subtilis transcription best studied• 136 σA-dependent promoter sequences [-

100, 15]

• Look for w1 = w2 = 5, gap[15, 20] two-block motif

• Correctly identified motif [TTGACA, TATAAT]and 70% of all the sites

• Occasionally predicted two promoters““Correct” siteCorrect” site Second site Second site

abrBabrB TTGACG TTGACG TACAATTACAAT

vegveg TTGACA TTGACA TATAATTATAAT

f105f105 TTTACA TTTACA TACAATTACAAT

BioProspector Results:B. subtilis two-block promoter

Page 33: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Web Server:http://bioprospector.stanford.edu/

Page 34: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

BioProspector Web Server:http://bioprospector.stanford.edu/

Page 35: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Compare Prospectorhttp://compareprospector.stanford.edu/

Liu et al, 2004, Genome Res 14(3): 451-458.

Page 36: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Compare Prospectorhttp://compareprospector.stanford.edu/

1 kb

Liu et al, 2004, Genome Res 14(3): 451-458

Regions conserved between two species

Motif

Page 37: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Compare Prospectorhttp://compareprospector.stanford.edu/

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene n

Biased sampling:

Initial iterations: Tch

Later iterations: Tcl

Tch

Tch

Tcl

Tcl

Liu et al, 2004, Genome Res 14(3): 451-8,

Page 38: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Compare Prospectorhttp://compareprospector.stanford.edu/

(Liu Y et al, Nucleic Acids Res 32:W204-7)

Page 39: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Compare Prospectorhttp://compareprospector.stanford.edu/

(Liu Y et al, Nucleic Acids Res 32:W204-7)

Page 40: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 41: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Cross link protein-DNA interaction

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 42: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Cross link protein-DNA interactionShear DNA

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 43: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Immunoprecipitation

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 44: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

PCR amplify and label DNA

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 45: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Hybridize with microarray and measure reading

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 46: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Cross link protein-DNA interaction

Shear DNAImmunoprecipitation

Purify DNAPurify DNA

PCR amplify and label DNA

Hybridize with microarray and measure reading

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

Page 47: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Chromatin Immune Precipitation

Page 48: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Yeast Rap1 Sequences

• Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

• Rap1 IP Enriched 727 DNA fragmentso 45% are intergenico Average length 1-2 KBo Some are false positiveso Some have multiple Rap1 sites

Page 49: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Useful Insights

• In ChIP-array experiments, highly enriched sequences are usually the real targets

• Transcription factor binding sites occurs more abundantly in these real targets

• Search TF sites from high-confidence sequences first before examine the rest sequences?

Motif Discovery Scan (MDscan)

Page 50: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Define m-matches

For a given w-mer and any other random w-mer

TGTAACGT 8-mer

TGTAACGT matched 8

AGTAACGT matched 7

TGCAACAT matched 6

TGACACGG matched 5

AATAACAG matched 4

m-matches for an 8-mer

Pick a reasonable m, e.g. in yeast

Page 51: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Finding candidate motifs

TopSeqs

Seed 1

All IP enriched sequences

m-matches

Page 52: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Finding candidate motifs

TopSeqs

Seed 2

All IP enriched sequences

m-matches

Page 53: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Finding candidate motifs

TopSeqs

Seed 3

All IP enriched sequences

m-matches

Page 54: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Scanning sequences with top motifs

• Keep 30-50 top scoring candidate motifs:

Motif Signal Abundance

ConservedPositions

Specificity(unlikely in genome)

Page 55: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Scanning sequences with top motifs

• Keep 30-50 top scoring candidate motifs:

• Scan the rest of the sequences with the candidate motifs

Motif Signal Abundance

ConservedPositions

Specificity(unlikely in genome)

Page 56: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Finding All Motif Instances

TopSeqs

Seed 3

All IP enriched sequences

m-matches

Page 57: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Algorithm:Refine the motifs

TopSeqs

Seed 3

All IP enriched sequences

m-matches

X

X

X

Page 58: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation

• Nine motif matrix models with 3 widths and 3 degeneracy

GACTCCCAGATTGCCTGGCTACCTGACTACCAGAGTACCAGACTATCTGAGTACCAGGCTCCCAGACTCCCA

W8S1More

Conserved

W8S3Less

Conserved

GACTCCGAGGGAACCAGCTTCCAAGACTACCACAGTACGAGGCTAGCAGACTGCCGGACTACCAGACTCCCG

Page 59: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation

Each test set:• 100 sequences of 600 bases from

yeast intergenic • Motif segments generated and

inserted according to the following abundance:

Higher confidenceMotif more abundant

Page 60: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation

• 100 tests for 3 widths3 strengths4 abundances

3600 tests

Page 61: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation

• 100 tests for 3 widths3 degeneracy4 abundance

3 X Consensus• MDscan speed 14 X

BioProspector27 X AlignACE

3600 tests

Page 62: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation Accuracyw = 8

Page 63: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation Accuracyw = 12

Page 64: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Simulation Accuracyw = 16

Page 65: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Biological Tests

• Gal4 & Ste12 [Ren et al. Science 2000]o Gal4: galactose metabolismo Ste12: responds to mating pheromones

Page 66: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Biological Tests

• SBF & MBF [Iyer et al. Nature 2001]o SBF: Swi4 + Swi6 budding, membrane, cell

wall biosynthesiso MBF: Mbp1 + Swi6 DNA replication and

repair

Page 67: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

MDscan Biological Tests

• Rap1 [Lieb et al. Nature Genetics 2001]o Repressor activatoro 37% pol II events in exponentially growing

cells

Page 68: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

TAMO: Tools for the Analysis of Motifs

http://fraenkel.mit.edu/TAMO/

Page 69: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

WebMotifshttp://fraenkel.mit.edu/webmotifs/

Page 70: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

WebMotifshttp://fraenkel.mit.edu/webmotifs/

Page 71: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Melina: Comparing Motifshttp://melina1.hgc.jp/

Page 72: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Melina: Comparing Motifshttp://melina1.hgc.jp/

Page 73: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Single Microarray Determination of Transcription

Factor Motifs

One microarray experiment, no clustering needed

Basic idea: more affectedsequences may contain

moremotif TF sites

Exp

ress

ion

log

rati

o

Genes

Induced

Repressed

Page 74: Computational Molecular Biology Biochem 218 – BioMedical Informatics 231   Discovering Transcription.

Summary

• BioProspector is stochastic• BioProspector can get trapped in local

maxima• BioProspector must be run multiple times to

discover the true globally optimal motif• BioProspector is slow• MDScan is deterministic• MDScan always gives the same answer with

the same data• MDScan is fast• MDScan uses rank order data to accelerate

the search process and to allow it to be deterministic

• MDScan is fast enough to search intergenic regions from entire genomes.

• MDScan is not as sensitive as BioProspector


Recommended