Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 220 times |
Download: | 2 times |
Some Statistical Methods For
Detecting ClusteringIn
Biological Sequences
Some Statistical Methods For
Detecting ClusteringIn
Biological Sequences
[email protected]@nih.gov
John L. SpougeNational Center for Biotechnology Information
John L. SpougeNational Center for Biotechnology Information
Bldg. 45, Rm. 6AS 47JNCBI, NLM, NIH
Bethesda MD 20894
Bldg. 45, Rm. 6AS 47JNCBI, NLM, NIH
Bethesda MD 20894
NCBI
01101011
NCBI
01101011
Clustering in bacterial genomesClustering in bacterial genomes Minimum distance statistic (with Fisher omnibus test)Minimum distance statistic (with Fisher omnibus test) Kolmogorov-Smirnov testsKolmogorov-Smirnov tests Scan testsScan tests Local run (BLAST-like) tests using Poisson process (PP)Local run (BLAST-like) tests using Poisson process (PP)
Clustering of Intergenic conservationClustering of Intergenic conservation Hypergeometric testHypergeometric test
Clustering of PSSM motifsClustering of PSSM motifs Chi-square for 1/0 “motifs”Chi-square for 1/0 “motifs” Compound Poisson process (CPP) models for PSSM motifsCompound Poisson process (CPP) models for PSSM motifs Local run (BLAST-like) tests for PSSM motifs using CPPLocal run (BLAST-like) tests for PSSM motifs using CPP
Clustering in bacterial genomesClustering in bacterial genomes Minimum distance statistic (with Fisher omnibus test)Minimum distance statistic (with Fisher omnibus test) Kolmogorov-Smirnov testsKolmogorov-Smirnov tests Scan testsScan tests Local run (BLAST-like) tests using Poisson process (PP)Local run (BLAST-like) tests using Poisson process (PP)
Clustering of Intergenic conservationClustering of Intergenic conservation Hypergeometric testHypergeometric test
Clustering of PSSM motifsClustering of PSSM motifs Chi-square for 1/0 “motifs”Chi-square for 1/0 “motifs” Compound Poisson process (CPP) models for PSSM motifsCompound Poisson process (CPP) models for PSSM motifs Local run (BLAST-like) tests for PSSM motifs using CPPLocal run (BLAST-like) tests for PSSM motifs using CPP
OverviewOverview
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
IK Jordan IK Jordan et al et al (2001) (2001) Genome Res Genome Res 1111:555-565:555-565
Given a small gene family in several bacterial genomes, Given a small gene family in several bacterial genomes, do its genes tend to cluster?do its genes tend to cluster?
NCBI
01101011
NCBI
01101011Fisher Omnibus TestFisher Omnibus Test
The Fisher omnibus combines The Fisher omnibus combines several weak one-sided continuous p-values several weak one-sided continuous p-values
to test the aggregate for significance.to test the aggregate for significance.
1 2, ,..., np p p
1
2 lnn
ii
p
is chi-square with 2n degrees of freedomis chi-square with 2n degrees of freedom
NCBI
01101011
NCBI
01101011Fisher Omnibus TestFisher Omnibus Test
lnX p is exponential (1) distributedis exponential (1) distributed
For any one-sided continuous p-value,For any one-sided continuous p-value,
ln x xp x p e e P P
1
2 lnn
ii
p
is chi-square with 2n degrees of freedomis chi-square with 2n degrees of freedom
1
lnn
ii
p
is gamma (1,n) distributedis gamma (1,n) distributed
NCBI
01101011
NCBI
01101011Minimum DistanceMinimum Distance
* *1 0{ } 1
nni i iP n
* *1 0{0 }n
i i i P
0 0 *1
*2
*3 1 1n
0 1 2 3...
...
S Karlin & HM TaylorS Karlin & HM Taylor (1981) (1981) A Second Course in Stochastic Processes, p. A Second Course in Stochastic Processes, p. 132132
NCBI
01101011
NCBI
01101011
0 11 0
1 ...{0 }
!
n
nni i i iP
n
1 0{0 }ni i i i P
0 1 2 3...
...
B de FinettiB de Finetti (1964) (1964) Giornale Istituto Italiano degli Attuari Giornale Istituto Italiano degli Attuari 2727:151:151
W FellerW Feller (1971) (1971) An Introduction to Probability Theory…, Vol. An Introduction to Probability Theory…, Vol. 22, , p.p. 42 42
0 0 1 2 3 1 1n
De Finetti’s FormulaDe Finetti’s Formula
NCBI
01101011
NCBI
01101011
# { }... ...
!t i i i in n n
n
x X Xt x x x
n
t x x x
n
FHG
IKJ
1 0
0 1 0 1b gbg
# { }...t x x x i i in
nX X
0 10 1 0
0 0X X1 X 2 X 3...
1 1nX t
x0 x1 x2 x3...
X x1 0
X x x2 0 1
X x x x3 0 1 2
X t x x xn n 1 0 11 ......
# { }t i i i inx X X 1 0
Discrete VersionDiscrete Version
0 0X
NCBI
01101011
NCBI
01101011Special CasesSpecial Cases
X1 X 2 X 3...
x0 x1 x2 x3...
# { }t i i i inx X X 1 0
{ } { , ,..., }xt
ni in
FHGIKJ0 0 0 0
# { }...
t i i i in nx X X
t x x x
n
FHG
IKJ 1 0
0 1
{ } { , , ,..., , }xt n
ni in
FHG
IKJ0 0 1 1 1 0
1
0 0X 1 1nX t
NCBI
01101011
NCBI
01101011Minimum DistanceMinimum Distance
# { }... ...
!t i i i in n n
n
x X Xt x x x
n
t x x x
n
FHG
IKJ
1 0
0 1 0 1b gbg
Choose n distinct numbers from {1,2,…,t} such that the minimum distance between consecutive order statistics exceeds x 0.
Choose n distinct numbers from {1,2,…,t} such that the minimum distance between consecutive order statistics exceeds x 0.
X1 X 2 X 3...
x0 x1 x2 x3...
{ } { , , ,..., , } !( )
( )x x x x nt n x
nt n xi i
n n
FHG
IKJ 0 0 0
11 bg
S Karlin & HM TaylorS Karlin & HM Taylor (1981) (1981) A Second Course in Stochastic ProcessesA Second Course in Stochastic Processes
0 0X 1 1nX t
NCBI
01101011
NCBI
01101011Threading ConfigurationsThreading Configurationsm #{ }threading configurations
{ ( ) }x X X l xi i i i i in 1 0 X { }X i i
n1
A11
A19
A28
A4
A1
A2
A33
44
1111
1919
2828
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
Given a large gene family in one bacterial genome, Given a large gene family in one bacterial genome, do its genes tend to cluster?do its genes tend to cluster?
IK Jordan IK Jordan et al et al (2001) (2001) Genome Res Genome Res 1111:555-565:555-565
NCBI
01101011
NCBI
01101011Kolmogorov-Smirnov TestsKolmogorov-Smirnov Tests
M Kendall & A Stuart (1979) M Kendall & A Stuart (1979) The Advanced Theory of StatisticsThe Advanced Theory of Statistics, , Vol. Vol. 2, 2, pp. 476 . 476
The Kolmogorov-Smirnov test examines whetherThe Kolmogorov-Smirnov test examines whether come from distribution functioncome from distribution function1 2
ˆ ˆ ˆ, ,..., nX X X F x
* *F X x X F x FF x x P P
1 2ˆ ˆ ˆ, ,..., nF X F X F X are uniformly distributedare uniformly distributed
Are uniformly distributed?Are uniformly distributed?1 2ˆ ˆ ˆ, ,..., nU U U
NCBI
01101011
NCBI
01101011Kolmogorov-Smirnov TestsKolmogorov-Smirnov Tests
Are uniformly distributed?Are uniformly distributed?1 2ˆ ˆ ˆ, ,..., nU U U
*1,2,...,maxn k n k
kD n U
n
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
value
cum
ula
tive
dis
t
*,k
kU
n PlotPlot
*1,2,...,maxn k n k
kD n U
n
*1,2,...,maxn k n k
kD n U
n
NCBI
01101011
NCBI
01101011Kolmogorov-Smirnov TestsKolmogorov-Smirnov Tests
* 1
1 1
1
k k nk
n n
k Z k n S k S kn U n
n Z n Z nn n n n
k kB B
n n
L BreimanL Breiman (1992) (1992) ProbabilityProbability
1E 2E 3E 1nE ...nE...
*
1
kk
n
ZU
Z
wherewhere1
k
k iiZ E
iE is exponential (1) distributedis exponential (1) distributed
1
1k
k k iiZ k S E
NCBI
01101011
NCBI
01101011Kolmogorov-Smirnov TestsKolmogorov-Smirnov Tests
Are uniformly distributed?Are uniformly distributed?1 2ˆ ˆ ˆ, ,..., nU U U
*1,2,..., 0 1max max 1n k n k t
kD n U B t tB
n
*1,2,..., 0 1max max 1n k n k t
kD n U tB B t
n
*1,2,..., 0 1max max 1n k n k t
kD n U B t tB
n
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
Given a large gene family in one linear genome, Given a large gene family in one linear genome, do its genes tend to cluster?do its genes tend to cluster?
*1U *
2U *3U 1... *
nU...0
*1,2,..., 0 1max max 1n k n k t
kD n U y B t tB y
n
P P
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
*1U *
2U *3U 1... *
nU...0
*
1
kk
n
ZU
Z
wherewhere1
k
k iiZ E
iE is exponential (1) distributedis exponential (1) distributed
*1nU
Given a large gene family in one circular genome, Given a large gene family in one circular genome, do its genes tend to cluster?do its genes tend to cluster?
IK Jordan IK Jordan et al et al (2001) (2001) Genome Res Genome Res 1111:555-565:555-565
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
*1U *
2U *3U 1... *
nU...0 *1nU
*
1
kk
n
ZU
Z
wherewhere1
k
k iiZ E
iE is exponential (1) distributedis exponential (1) distributed
* * 11
1
kk k k
n
EU U n E
Z
are approximately exponential (are approximately exponential (nn) distributed) distributed* *1k kU U
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
Given a large gene family in one circular genome, Given a large gene family in one circular genome, do its genes tend to cluster?do its genes tend to cluster?
* *1,2,..., 1
0 1
max 1 exp
max 1
n k n k k
t
kD n n U U y
n
B t tB y
P
P
*1U *
2U *3U 1... *
nU...0 *1nU
IK Jordan IK Jordan et al et al (2001) (2001) Genome Res Genome Res 1111:555-565:555-565
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
Given a set of restriction sites in a genome, Given a set of restriction sites in a genome, do the sites tend to cluster?do the sites tend to cluster?
*1U *
2U *3U 1... *
nU...0 *1nU
S Karlin & C Macken (1991) S Karlin & C Macken (1991) J Amer Stat Soc J Amer Stat Soc 8686:27-35:27-35
rkm kkth minimum in an th minimum in an rr-scan-scan r
kM kkth maximum in an th maximum in an rr-scan-scan
rr-scan-scan for for r = 3r = 3 1 2 3 1 1... n r n r n rX X X X X X
NCBI
01101011
NCBI
01101011Clustering in Bacterial GenomesClustering in Bacterial Genomes
1
0
1ln 1 ln ln exp
ikr
ki
M n r n xn i
P
1 !
xe
r
1
1 10
exp , !
i rkr
k ri
x xm
n i r
P
!
rx
r
A Dembo & S Karlin(1992) A Dembo & S Karlin(1992) Ann Appl Prob Ann Appl Prob 22:329-357:329-357C Chen & S Karlin (2000) C Chen & S Karlin (2000) J Appl Prob J Appl Prob 3737:865-880:865-880
NCBI
01101011
NCBI
01101011Clustering of ConservationClustering of Conservation1 L
......
Conserved NucleotideConserved NucleotideConserved NucleotideConserved Nucleotide Non-conserved NucleotideNon-conserved NucleotideNon-conserved NucleotideNon-conserved Nucleotide
After accounting for After accounting for edge effectsedge effects, could uniformly random , could uniformly random conserved and non-conserved nucleotides be as clustered conserved and non-conserved nucleotides be as clustered
as the data from intergenic regions?as the data from intergenic regions?
After accounting for After accounting for edge effectsedge effects, could uniformly random , could uniformly random conserved and non-conserved nucleotides be as clustered conserved and non-conserved nucleotides be as clustered
as the data from intergenic regions?as the data from intergenic regions?
Alternative with Some Very Long Conserved ClustersAlternative with Some Very Long Conserved Clusters
Scan or Local Run test is powerful against alternative.Scan or Local Run test is powerful against alternative.
Alternative with Some Very Long Conserved ClustersAlternative with Some Very Long Conserved Clusters
Scan or Local Run test is powerful against alternative.Scan or Local Run test is powerful against alternative.
Alternative with Many Short Conserved ClustersAlternative with Many Short Conserved Clusters
Hypergeometric test offers more power against alternative.Hypergeometric test offers more power against alternative.
Alternative with Many Short Conserved ClustersAlternative with Many Short Conserved Clusters
Hypergeometric test offers more power against alternative.Hypergeometric test offers more power against alternative.
NCBI
01101011
NCBI
01101011
Extreme CasesExtreme Caseskk = 0 or 1 corresponds to complete separation of = 0 or 1 corresponds to complete separation of
conserved and non-conserved positionsconserved and non-conserved positionsk = k = min{min{mm,,nn} corresponds to complete mixing} corresponds to complete mixing
Extreme CasesExtreme Caseskk = 0 or 1 corresponds to complete separation of = 0 or 1 corresponds to complete separation of
conserved and non-conserved positionsconserved and non-conserved positionsk = k = min{min{mm,,nn} corresponds to complete mixing} corresponds to complete mixing
Given Given mm conserved positions and conserved positions and nn non-conserved positions, non-conserved positions, calculate the probability that exactly calculate the probability that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position.positions are followed by a non-conserved position.
Given Given mm conserved positions and conserved positions and nn non-conserved positions, non-conserved positions, calculate the probability that exactly calculate the probability that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position.positions are followed by a non-conserved position.
1 L......
Conserved NucleotideConserved NucleotideConserved NucleotideConserved Nucleotide Non-conserved NucleotideNon-conserved NucleotideNon-conserved NucleotideNon-conserved Nucleotide
Clustering of ConservationClustering of Conservation
NCBI
01101011
NCBI
01101011
Given Given mm conserved positions and conserved positions and nn non-conserved positions, non-conserved positions, calculate the probability that exactly calculate the probability that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position.positions are followed by a non-conserved position.
Given Given mm conserved positions and conserved positions and nn non-conserved positions, non-conserved positions, calculate the probability that exactly calculate the probability that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position.positions are followed by a non-conserved position.
, ;m n m n
p m n kk k n
Hypergeometric DistributionHypergeometric DistributionHypergeometric DistributionHypergeometric Distribution
1 L......
Conserved NucleotideConserved NucleotideConserved NucleotideConserved Nucleotide Non-conserved NucleotideNon-conserved NucleotideNon-conserved NucleotideNon-conserved Nucleotide
Clustering of ConservationClustering of Conservation
NCBI
01101011
NCBI
01101011
Count the number of ways of placing Count the number of ways of placing mm conserved positions (1) and conserved positions (1) and
nn non-conserved positions (0) so that exactly non-conserved positions (0) so that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position (10).positions are followed by a non-conserved position (10).
Count the number of ways of placing Count the number of ways of placing mm conserved positions (1) and conserved positions (1) and
nn non-conserved positions (0) so that exactly non-conserved positions (0) so that exactly kk of the conserved of the conserved
positions are followed by a non-conserved position (10).positions are followed by a non-conserved position (10).
1 L......
Conserved NucleotideConserved NucleotideConserved NucleotideConserved Nucleotide Non-conserved NucleotideNon-conserved NucleotideNon-conserved NucleotideNon-conserved Nucleotide
01100010011100110110001001110011
01011010000010100110111010011011Count the number of ways of placing Count the number of ways of placing k k 1010’s, ’s, nnkk 00’s, ’s,
and and mmkk 11’s so that none of the ’s so that none of the 11’s is followed by a ’s is followed by a 00. .
01100010011100110110001001110011
01011010000010100110111010011011Count the number of ways of placing Count the number of ways of placing k k 1010’s, ’s, nnkk 00’s, ’s,
and and mmkk 11’s so that none of the ’s so that none of the 11’s is followed by a ’s is followed by a 00. .
Clustering of ConservationClustering of Conservation
NCBI
01101011
NCBI
01101011
n
k
m
k
Place the Place the k k 1010’s and ’s and mmkk 11’s ’s
in arbitrary order.in arbitrary order.
Place the Place the k k 1010’s and ’s and mmkk 11’s ’s
in arbitrary order.in arbitrary order.
111010 1010 11 111010 11 11 111010 1010 11 111010 11 11
Count the number of ways of placing Count the number of ways of placing k k 1010’s, ’s,
nnkk 00’s, and ’s, and mmkk 11’s so that no ’s so that no 00 follows a follows a 11. .
Count the number of ways of placing Count the number of ways of placing k k 1010’s, ’s,
nnkk 00’s, and ’s, and mmkk 11’s so that no ’s so that no 00 follows a follows a 11. .
01011010000010100110111010011011 01011010000010100110111010011011
0.00.0.0 0.00.0.0 0.00.0.0 0.00.0.0 Place the Place the nnkk 00’s in ’s in kk+1 bins.+1 bins.Place the Place the nnkk 00’s in ’s in kk+1 bins.+1 bins.
Clustering of ConservationClustering of Conservation
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
Given PSSMs signals in pieces of DNA, Given PSSMs signals in pieces of DNA, does any piece have an unusual number of signals?does any piece have an unusual number of signals?
For 1/0 signals, a For 1/0 signals, a 22 test suffices. test suffices.
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
Given PSSMs signals in pieces of DNA, Given PSSMs signals in pieces of DNA, does any piece have an unusual number of signals?does any piece have an unusual number of signals?
Consider the strength of the signal.Consider the strength of the signal.M Frith & Zhiping Weng (2001)M Frith & Zhiping Weng (2001)
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
Given PSSMs signals in pieces of DNA, Given PSSMs signals in pieces of DNA, does any piece have an unusual number of signals?does any piece have an unusual number of signals?
(1)(1) Assume an independent, identically distributed Assume an independent, identically distributed DNA base composition.DNA base composition.
(2)(2) Assume the PSSM signals, appropriately Assume the PSSM signals, appropriately truncated, follow a compound Poisson truncated, follow a compound Poisson process process with parameters (with parameters (, , ). ).
S Schbath S Schbath et al.et al. (1998) (1998) J Comp Biol J Comp Biol 55:223-253:223-253
0Z
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
compound Poisson process (compound Poisson process (, , ) ) “time” “time”
0Z L
Tail probability can be calculated Tail probability can be calculated by small sample asymptotics.by small sample asymptotics.
ZN L tP
exp exp 1ZT TN L TZ E E
cumulant generating function of sum of signalscumulant generating function of sum of signals
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
Given different PSSM signals in a piece of DNA, Given different PSSM signals in a piece of DNA, any of the signals unusually concentrated?any of the signals unusually concentrated?
(1)(1) Assume an independent, identically distributed Assume an independent, identically distributed DNA base composition.DNA base composition.
(2)(2) Assume the PSSM signals, appropriately Assume the PSSM signals, appropriately truncated, follow a compound Poisson truncated, follow a compound Poisson process process with parameters (with parameters (, , ). ). 0Z
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
0
ˆ supu v L
M L S v S u
ZS t N t at
L
NCBI
01101011
NCBI
01101011Alignment MatricesAlignment Matrices
NCBI
01101011
NCBI
01101011Alignment Score RenewalAlignment Score Renewal
local score below 0local score below 0local score below 0local score below 0renewalrenewalrenewalrenewal
1k1
Local Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single Diagonal
K random random renewal lengthrenewal length
random random renewal lengthrenewal length
NCBI
01101011
NCBI
01101011Alignment Score SuccessAlignment Score Success
successsuccesssuccesssuccess yEP probabilityprobabilityof successof successprobabilityprobabilityof successof success
1k1
Local Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single Diagonal
local score above ylocal score above ylocal score above ylocal score above y
NCBI
01101011
NCBI
01101011HSP Poisson DistributionHSP Poisson Distribution
S Karlin & A Dembo (1992) S Karlin & A Dembo (1992) Adv Appl Prob Adv Appl Prob 2424:113:113S Karlin & A Dembo (1992) S Karlin & A Dembo (1992) Adv Appl Prob Adv Appl Prob 2424:113:113m
n
0 lim
y
ymn
K
EP
E
lim 0yy
EP
lim!
j
yy
N j ej
EP
NCBI
01101011
NCBI
01101011Finite-Size EffectFinite-Size Effect
successsuccesssuccesssuccess local score above ylocal score above ylocal score above ylocal score above y
1k1
Local Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single DiagonalLocal Alignment Score on a Single Diagonal
| yT EE expected timeexpected timeto successto success
expected timeexpected timeto successto success
NCBI
01101011
NCBI
01101011
m
n
ˆ lim | |y
AG y yy
m T n TK
E
E EP
E EE
lim 0yy
EP
lim!
j
yy
N j ej
EP
S Altschul & W Gish (1996) Methods Enzymology S Altschul & W Gish (1996) Methods Enzymology 266266S Altschul & W Gish (1996) Methods Enzymology S Altschul & W Gish (1996) Methods Enzymology 266266
Finite-Size EffectFinite-Size Effect
NCBI
01101011
NCBI
01101011PSSM Motif ClusteringPSSM Motif Clustering
0
ˆ supu v L
M L S v S u
L
*
*
2
0 *
1
1y
Z
Ze aL
Ze
E
E
M̂ L y e P
* *exp 1Z E
1a
NCBI
01101011
NCBI
01101011
Clustering in bacterial genomesClustering in bacterial genomes Minimum distance statistic (with Fisher omnibus test)Minimum distance statistic (with Fisher omnibus test) Kolmogorov-Smirnov testsKolmogorov-Smirnov tests Scan testsScan tests Local run (BLAST-like) tests using Poisson process (PP)Local run (BLAST-like) tests using Poisson process (PP)
Clustering of Intergenic conservationClustering of Intergenic conservation Hypergeometric testHypergeometric test
Clustering of PSSM motifsClustering of PSSM motifs Chi-square for 1/0 “motifs”Chi-square for 1/0 “motifs” Small sample asymptotic methods for PSSM motifsSmall sample asymptotic methods for PSSM motifs Local run (BLAST-like) tests for PSSM motifs using compound PPLocal run (BLAST-like) tests for PSSM motifs using compound PP
Clustering in bacterial genomesClustering in bacterial genomes Minimum distance statistic (with Fisher omnibus test)Minimum distance statistic (with Fisher omnibus test) Kolmogorov-Smirnov testsKolmogorov-Smirnov tests Scan testsScan tests Local run (BLAST-like) tests using Poisson process (PP)Local run (BLAST-like) tests using Poisson process (PP)
Clustering of Intergenic conservationClustering of Intergenic conservation Hypergeometric testHypergeometric test
Clustering of PSSM motifsClustering of PSSM motifs Chi-square for 1/0 “motifs”Chi-square for 1/0 “motifs” Small sample asymptotic methods for PSSM motifsSmall sample asymptotic methods for PSSM motifs Local run (BLAST-like) tests for PSSM motifs using compound PPLocal run (BLAST-like) tests for PSSM motifs using compound PP
Summary of TechniquesSummary of Techniques