1
Linear Sequence AlignmentLinear Sequence Alignment
Travis Hillenbrand
2
Methods of ComparisonMethods of Comparison
Dot Matrix
Dynamic Programming Algorithm
Greedy X-drop Approach
Linear Alignment
3
Dot Matrix MethodDot Matrix Method
http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
4
Sequence AlignmentSequence Alignment
ATCGATACG, ATGGATTACG
3 possibilities
Mismatch
…C…
…G…
Indel
…C…
…-…
Match
…C…
…C…|
5
Global Pairwise AlignmentGlobal Pairwise Alignment
ATCGAT-ACG
ATGGATTACG
ATCGATACG, ATGGATTACG
|| ||| |||+1 +1 +1+1 +1 +1 +1 +1Matches: = +8
-1Mismatches: = -1-2Gaps: = -2
Total score = +5
6
Dynamic Programming Dynamic Programming
0 - G A T C
- 0
G
A
C
Global alignment (Needleman-Wunsch) algorithm
7
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G
A
C
Global alignment (Needleman-Wunsch) algorithm
8
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G -2
A -4
C -6
Global alignment (Needleman-Wunsch) algorithm
9
Dynamic Programming Dynamic Programming
+ MATCH + GAP
+ GAP
0 - G A T C
- 0 -2 -4 -6 -8
G -2
A -4
C -6
+1
Max= 1
Global alignment (Needleman-Wunsch) algorithm
10
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G -2 1 -1 -3 -5
A -4 -1 2 0 -2
C -6 -3 0 1 1
Global alignment (Needleman-Wunsch) algorithm
11
Dynamic Programming Dynamic Programming
- G A T C
- 0 -2 -4 -6 -8
G -2 1 -1 -3 -5
A -4 -1 2 0 -2
C -6 -3 0 1 1
GATC
GA-C
Global alignment (Needleman-Wunsch algorithm)
|| |
12
Greedy X-drop Alignment Greedy X-drop Alignment
Aligns sequences that differ by sequencing errors
Works with measure of difference
Restricts indel penalty
Zhang et al. 2000
2
matmisindel
13
Greedy X-drop Alignment Greedy X-drop Alignment
Zhang et al. 2000
14
Greedy X-drop Alignment Greedy X-drop Alignment
CA 0G 0- 0
- G A T C
15
Greedy X-drop Alignment Greedy X-drop Alignment
C 1 1 1A 0 1G 0- 0
- G A T C
X-drop condition saves computation
16
Linear Alignment Linear Alignment
17
Index of coincidence– Maximum number of matches between two sequences
– Ungapped alignment
Linear Alignment Linear Alignment
ATCGATACG
ATGGATTACG
ATCGATACG
ATGGATTACG
ATCGATACG
ATGGATTACG |
ATCGATACG
ATGGATTACG|| |||
ATCGATACG
ATGGATTACG …
18
Attempt to increase similarity
Linear Alignment Linear Alignment
ATCGATACG
ATGGATTACG|| |||
-ATCGATACG
ATGGATTACG ||||
ATCGATACG
-ATGGATTACG |
Window score: 2 -3 -3
ATCGATACG
ATGGATTACG|| |||
19
9 human/mouse homologous gene cds pairs retrieved (Jareborg et al. 1999)
Greedy alignment run firstmat=10, mis=-6, X=2200 (indel=-11)
Dynamic Programming and Linear alignment using truncated seqs
Comparison of alignments Comparison of alignments
20
Similarity scores
Comparison of alignments Comparison of alignments
05000
1000015000200002500030000350004000045000
IOC Linear Greedy DynProg
Sco
re
AHSG
PANK3
PBX2
Protein C
Cyp21
CREB-RP
H2 TAP1
C4
notch4
21
Similarity percentage
Comparison of alignments Comparison of alignments
0
20
40
60
80
100
IOC Linear Greedy DynProg
Sim
ilari
ty (
%)
AHSG
PANK3
PBX2
Protein C
Cyp21
CREB-RP
H2 TAP1
C4
notch4
22
Comparison of alignments Comparison of alignments
1
10
100
1000
10000
100000
AHSG PANK3 PBX2 ProteinC
Cyp21 CREB-RP
H2TAP1
C4 notch4
Tim
e (m
s) Dyn Prog
Greedy
Linear
23
Comparison of alignments Comparison of alignments
1
10
100
1000
10000
100000
AHSG PANK3 PBX2 ProteinC
Cyp21 CREB-RP
H2TAP1
C4 notch4
Tim
e (m
s) Dyn Prog
Greedy
Linear
24
Comparison of alignments Comparison of alignments
0
10
20
30
40
50
60
70
80
90
100
w/ IOC w/o IOC
Sim
ilari
ty (
%)
PACAP
PANK3
CD4
PBX2
Protein C
AHSG
Cyp21
H2 TAP1
CREB-RP
C4
notch4
25
Comparison of alignments Comparison of alignments
Maximum coincidence alignment: Offset -72 yielded 1642 matches of 2175 possible (75.4943% similarity), score 6611
ACAGTACTGCTACTTCTCGCCGACTGGGTGCTGCTCCGGACCGCGCTGCCCCGCATATTCTCCCTGCTGGTGCCCACCGCGCTGCCACTGCTCCGGGT
| | || | | | ||||||| | | | | | | || | || | | ||| |
ATGGCTGCGCACGTCTGGCTGGCGGCCGCCCTGCTCCTTCTGGTGGACTGGCTGCTGCTGCGGCCCATGCTCCCGGGAATCTTCTCCCTGTTGGTTCC
ACGGGCCGCCTCACTGACTGGATTCTACAAGATGGCTCAGCCGATACCTTCACTCGAAACTTAACTCTCATGTCCATTCTCACCATAGCCAGTGCAGT
||||||||| |||||||||||||||| || ||| ||| || |||||| || ||| || |||||||||||||||||||||||||| |||
ACGGGCCGCATCACTGACTGGATTCTTCAGGATAAGACAGTTCCTAGCTTCACCCGCAACATATGGCTCATGTCCATTCTCACCATAGCCAGCACAGC
Decreasing the gap penalty allows similar regions to be aligned without using IOC
26
Comparison of alignments Comparison of alignments
References
Needleman, S. B. & Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 48: 443-453.
Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology. Pacific Grove, California: Brooks/Cole.
Zhang, Z.; Schwartz, S.; Wagner, L.; and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7:203-214.
27
Linear Sequence AlignmentLinear Sequence Alignment
Travis Hillenbrand