+ All Categories
Home > Documents > Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides...

Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides...

Date post: 23-Jan-2016
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
73
http://creativecommons.org/licenses/by- sa/2.0/
Transcript
Page 1: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

http://creativecommons.org/licenses/by-sa/2.0/

Page 2: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BNFO 602, Lecture 2

Usman Roshan

Some of the slides are based upon material by David Wishart of University of Alberta and Ron Shamirof Tel Aviv University

Page 3: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Previously…

• Model of DNA sequence evolution– Poisson model under two state characters– Derivation of expected number of changes on

a single edge of a tree– Jukes-Cantor for four state model (DNA)– Estimating expected number of changes of

two DNA sequences using maximum likelihood

Page 4: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Previously…

• Distance based phylogeny reconstruction methods– UPGMA --- not additive– Neighbor Joining --- additive– Both are fast --- polynomial time– Easy to implement

Page 5: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Previously

• Simulation – Method for simulating evolution of DNA

sequences on a fixed tree– Comparing two different phylogenies for

computing the error rate– Effect of accuracy on real methods as a

function of sequence length, number of sequences, and other factors

Page 6: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sequencing SuccessesT7 bacteriophagecompleted in 198339,937 bp, 59 coded proteins

Escherichia colicompleted in 19984,639,221 bp, 4293 ORFs

Sacchoromyces cerevisaecompleted in 199612,069,252 bp, 5800 genes

Page 7: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sequencing SuccessesCaenorhabditis eleganscompleted in 199895,078,296 bp, 19,099 genes

Drosophila melanogastercompleted in 2000116,117,226 bp, 13,601 genes

Homo sapienscompleted in 20033,201,762,515 bp, 31,780 genes

Page 8: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Genomes to Date• 8 vertebrates (human, mouse, rat, fugu, zebrafish)• 3 plants (arabadopsis, rice, poplar)• 2 insects (fruit fly, mosquito)• 2 nematodes (C. elegans, C. briggsae)• 1 sea squirt• 4 parasites (plasmodium, guillardia)• 4 fungi (S. cerevisae, S. pombe)• 200+ bacteria and archebacteria• 2000+ viruses

Page 9: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

So what do we do with all this sequence data?

Page 10: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

So what do we do with all So what do we do with all this sequence data?this sequence data?

Comparative bioinformatics

Page 11: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

DNA Sequence Evolution

AAGACTT -3 mil yrs

-2 mil yrs

-1 mil yrs

today

AAGACTT

T_GACTTAAGGCTT

_GGGCTT TAGACCTT A_CACTT

ACCTT (Cat)

ACACTTC (Lion)

TAGCCCTTA (Monkey)

TAGGCCTT (Human)

GGCTT(Mouse)

T_GACTTAAGGCTT

AAGACTT

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

TAGGCCTT (Human)

TAGCCCTTA (Monkey)

A_C_CTT (Cat)

A_CACTTC (Lion)

_G_GCTT (Mouse)

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

Page 12: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sequence alignments

They tell us about

• Function or activity of a new gene/protein

• Structure or shape of a new protein

• Location or preferred location of a protein

• Stability of a gene or protein

• Origin of a gene or protein

• Origin or phylogeny of an organelle

• Origin or phylogeny of an organism

• And more…

Page 13: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Pairwise alignment

• How to align two sequences?

Page 14: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Pairwise alignment

• How to align two sequences?How to align two sequences?• We use dynamic programming• Treat DNA sequences as strings over the

alphabet {A, C, G, T}

Page 15: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Pairwise alignment

Page 16: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Dynamic programming

Define V(i,j) to be the optimal pairwise alignment score between S1..i and T1..j (|S|=m, |T|=n)

Page 17: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Dynamic programming

Time and space complexity is O(mn)

Define V(i,j) to be the optimal pairwise alignment score between S1..i and T1..j (|S|=m, |T|=n)

Page 18: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tabular computation of scores

Page 19: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Traceback to get alignment

Page 20: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Local alignment

Finding optimally aligned local regions

Page 21: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Local alignment

Page 22: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Database searching

• Suppose we have a set of 1,000,000 sequences

• You have a query sequence q and want to find the m closest ones in the database---that means 1,000,000 pairwise alignments!

• How to speed up pairwise alignments?

Page 23: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

FASTA

• FASTA was the first software for quick searching of a database

• Introduced the idea of searching for k-mers

• Can be done quickly by preprocessing database

Page 24: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

FASTA: combine high scoring hits into diagonal runs

Page 25: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BLAST

Key idea: search for k-mers (short matchig substrings) quickly by preprocessing the database.

Page 26: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BLAST

This key idea can also be used for speeding up pairwise alignments when doing multiple sequence alignments

Page 27: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Biologically realistic scoring matrices

• PAM and BLOSUM are most popular

• PAM was developed by Margaret Dayhoff and co-workers in 1978 by examining 1572 mutations between 71 families of closely related proteins

• BLOSUM is more recent and computed from blocks of sequences with sufficient similarity

Page 28: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

PAM

• We need to compute the probability transition matrix M which defines the probability of amino acid i converting to j

• Examine a set of closely related sequences which are easy to align---for PAM 1572 mutations between 71 families

• Compute probabilities of change and background probabilities by simple counting

Page 29: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

PAM• In this model the unit of evolution is the amount

of evolution that will change 1 in 100 amino acids on the average

The scoring matrix Sab is the ratio of Mab to pb

Page 30: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

PAM Mij matrix (x10000)

Page 31: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Multiple sequence alignment

• “Two sequences whisper, multiple sequences shout out loud”---Arthur Lesk

• Computationally very hard---NP-hard

Page 32: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Formally…

Page 33: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Multiple sequence alignment

Unaligned sequences

GGCTT

TAGGCCTT

TAGCCCTTA

ACACTTC

ACTT

Aligned sequences

_G_ _ GCTT_

TAGGCCTT_

TAGCCCTTA

A_ _CACTTC

A_ _C_ CTT_ Conserved regions help us to identify functionality

Page 34: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sum of pairs score

Page 35: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sum of pairs score

• What is the sum of pairs score of this alignment?

Page 36: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tree alignment score

Page 37: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tree alignment score

Page 38: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tree Alignment

TAGGCCTT (Human)

TAGCCCTTA (Monkey)

ACCTT (Cat)

ACACTTC (Lion)

GGCTT (Mouse)

Page 39: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tree Alignment

TAGGCCTT_ (Human)

TAGCCCTTA (Monkey)

A__C_CTT_ (Cat)

A__CACTTC (Lion)

_G__GCTT_ (Mouse)

TAGGCCTT_ A__CACTT_

TGGGGCTT_

AGGGACTT_

0 2

2

11

3

3

2

Tree alignment score = 14

Page 40: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Tree Alignment---depends on tree

TAGGCCTT_ (Human)

TAGCCCTTA (Monkey)

A__C_CTT_ (Cat)

A__CACTTC (Lion)

_G__GCTT_ (Mouse)

TA_CCCTT_ TA_CCCTTA

TA_CCCTT_

TA_CCCTTA

2 3

1

41

0

4

0

Tree alignment score = 15 Switch monkey and cat

Page 41: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Profiles

• Before we see how to construct multiple alignments, how do we align two alignments?

• Idea: summarize an alignment using its profile and align the two profiles

Page 42: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Profile alignment

Page 43: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Iterative alignment(heuristic for sum-of-pairs)

• Pick a random sequence from input set S• Do (n-1) pairwise alignments and align to

closest one t in S• Remove t from S and compute profile of

alignment• While sequences remaining in S

– Do |S| pairwise alignments and align to closest one t

– Remove t from S

Page 44: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Iterative alignment

• Once alignment is computed randomly divide it into two parts

• Compute profile of each sub-alignment and realign the profiles

• If sum-of-pairs of the new alignment is better than the previous then keep, otherwise continue with a different division until specified iteration limit

Page 45: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Progressive alignment

• Idea: perform profile alignments in the order dictated by a tree

• Given a guide-tree do a post-order search and align sequences in that order

• Widely used heuristic

• Can be used for solving tree alignment

Page 46: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Simultaneous alignment and phylogeny reconstruction

• Given unaligned sequences produce both alignment and phylogeny

• Known as the generalized tree alignment problem---MAX-SNP hard

• Iterative improvement heuristic:– Take starting tree– Modify it using say NNI, SPR, or TBR– Compute tree alignment score– If better then select tree otherwise continue until

reached a local minimum

Page 47: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Median alignment

• Idea: iterate over the phylogeny and align every triplet of sequences---takes o(m3) (in general for n sequences it takes O(2nmn) time

• Same profiles can be used as in progressive alignment

• Produces better tree alignment scores (as observed in experiments)

• Iteration continues for a specified limit

Page 48: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Popular alignment programs

• ClustalW: most popular, progressive alignment• MUSCLE: fast and accurate, progressive and

iterative combination• T-COFFEE: slow but accurate, consistency

based alignment (align sequences in multiple alignment to be close to the optimal pairwise alignment)

• PROBCONS: slow but highly accurate, probabilistic consistency progressive based scheme

• DIALIGN: very good for local alignments

Page 49: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

MUSCLE

Page 50: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

MUSCLE

Page 51: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

MUSCLE

Profile sum-of-pairs score

Log expectation score used by MUSCLE

Page 52: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Evaluation of multiple sequence alignments

• Compare to benchmark “true” alignments

• Use simulation

• Measure conservation of an alignment

• Measure accuracy of phylogenetic trees

• How well does it align motifs?

• More…

Page 53: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BAliBASE

• Most popular benchmark of alignments

• Alignments are based upon structure

BAliBASE currently consists of 142 reference alignments, containing over 1000 sequences. Of the 200,000 residues in the database, 58% are defined within the core blocks. The remaining 42% are in ambiguous regions that cannot be reliably aligned. The alignments are divided into four hierarchical reference sets, reference 1 providing the basis for construction of the following sets. Each of the main sets may be further sub-divided into smaller groups, according to sequence length and percent similarity.

Page 54: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BAliBASE

• The sequences included in the database are selected from alignments in either the FSSP or HOMSTRAD structural databases, or from manually constructed structural alignments taken from the literature. When sufficient structures are not available, additional sequences are included from the HSSP database (Schneider et al., 1997). The VAST Web server (Madej, 1995) is used to confirm that the sequences in each alignment are structural neighbours and can be structurally superimposed. Functional sites are identified using the PDBsum database (Laskowski et al., 1997) and the alignments are manually verified and adjusted, in order to ensure that conserved residues are aligned as well as the secondary structure elements.

Page 55: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

BAliBASE

• Reference 1 contains alignments of (less than 6) equi-distant sequences, ie. the percent identity between two sequences is within a specified range. All the sequences are of similar length, with no large insertions or extensions. Reference 2 aligns up to three "orphan" sequences (less than 25% identical) from reference 1 with a family of at least 15 closely related sequences. Reference 3 consists of up to 4 sub-groups, with less than 25% residue identity between sequences from different groups. The alignments are constructed by adding homologous family members to the more distantly related sequences in reference 1. Reference 4 is divided into two sub-categories containing alignments of up to 20 sequences including N/C-terminal extensions (up to 400 residues), and insertions (up to 100 residues).

Page 56: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Comparison of alignments on BAliBASE

Page 57: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Parsimonious aligner (PAl)

1. Construct progressive alignment A

2. Construct MP tree T on A

3. Construct progressive alignment A’ on guide-tree T

4. Set A=A’ and go to 3

5. Output alignment and tree with best MP score

Page 58: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

PAl

• Faster than iterative improvement• Speed and accuracy both depend upon

progressive alignment and MP heuristic• In practice MUSCLE and TNT are used for

constructing alignments and MP trees• How does PAl compare against traditional

methods?• PAl not designed for aligning structural regions

but focuses on evolutionary conserved regions • Let’s look at performance under simulation

Page 59: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Evaluating alignments under simulation

• We first need a way to evolve sequences with insertions and deletions

• NOTE: evolutionary models we have encountered so far do not account for insertions and deletions

• Not known exactly how to model insertions and deletions

Page 60: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

ROSE

• Evolve sequences under an i.i.d. Markov Model• Root sequence: probabilities given by a probability vector

(for proteins default is Dayhoff et. al. values)• Substitutions

– Edge length are integers– Probability matrix M is given as input (default is PAM1*)– For edge of length b probabilty of x y is given by Mb

xy

• Insertion and deletions:– Insertions and deletions follow the same probabilistic model– For each edge probability to insert is iins . – Length of insertion is given by discrete probability distribution

(normally exponential)– For edge of length b this is repeated b times.

• Model tree can be specified as input

Page 61: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Evaluation of alignments

Let’s simulate alignments and

phylogenies and compare them under

simulation!!

Page 62: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Parameters for simulation study

• Model trees: uniform random distribution and uniformly selected random edge lengths

• Model of evolution: PAM with insertions and deletions probabilities selected from a gamma distribution (see ROSE software package)

• Replicate settings: Settings of 50, 100, and 400 taxa, mean sequence lengths of 200 and 500 and avg branch lengths of 10, 25, and 50 were selected. For each setting 10 datasets were produced

Page 63: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Phylogeny accuracy

Page 64: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Alignment accuracy

Page 65: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Running time

Page 66: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• DIALIGN seems to perform best followed by PAl, MUSCLE, and PROBCONS

• DIALIGN, however, is slower than PAl

• Does this mean DIALIGN is the best alignment program?

Page 67: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• DIALIGN seems to perform best followed by PAl, MUSCLE, and PROBCONS

• DIALIGN, however, is slower than PAl• Does this mean DIALIGN is the best

alignment program?• Not necessarily: experiments were

performed under uniform random trees with uniform random edge lengths. Not clear if this emulates the real deal.

Page 68: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• DIALIGN seems to perform best followed DIALIGN seems to perform best followed by PAl, MUSCLE, and PROBCONSby PAl, MUSCLE, and PROBCONS

• DIALIGN, however, is slower than PAlDIALIGN, however, is slower than PAl• Does this mean DIALIGN is the best Does this mean DIALIGN is the best

alignment program?alignment program?• Not necessarily: experiments were Not necessarily: experiments were

performed under uniform random trees performed under uniform random trees with uniform random edge lengths. Not with uniform random edge lengths. Not clear if this emulates the real deal.clear if this emulates the real deal.

• What about sum-of-pairs vs MP scores?

Page 69: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sum-of-pairs vs MP score

Page 70: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Sum-of-pairs vs MP score

Page 71: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• Optimizing MP scores under this simulation model leads to better phylogenies and alignments

Page 72: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• Optimizing MP scores under this simulation model leads to better phylogenies and alignments

• What other models can we try?

Page 73: Http://creativecommons.org/licenses/by-sa/2.0/. BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.

Conclusions

• Optimizing MP scores under this simulation model leads to better phylogenies and alignments

• What other models can we try?• Real data phylogenies as model trees• Birth-death model trees• Other distributions for model trees…• Branch lengths: similar issues…• Evolutionary model parameters estimated

from real data


Recommended