+ All Categories
Home > Documents > Linear Sequence Alignment

Linear Sequence Alignment

Date post: 18-Jan-2016
Category:
Upload: alpha
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Linear Sequence Alignment. Travis Hillenbrand. Dot Matrix Dynamic Programming Algorithm Greedy X-drop Approach Linear Alignment. Methods of Comparison. Dot Matrix Method. http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html. Match …C… …C…. Mismatch …C… …G…. Indel …C… …-…. |. - PowerPoint PPT Presentation
27
1 Linear Sequence Linear Sequence Alignment Alignment Travis Hillenbrand
Transcript
Page 1: Linear Sequence Alignment

1

Linear Sequence AlignmentLinear Sequence Alignment

Travis Hillenbrand

Page 2: Linear Sequence Alignment

2

Methods of ComparisonMethods of Comparison

Dot Matrix

Dynamic Programming Algorithm

Greedy X-drop Approach

Linear Alignment

Page 3: Linear Sequence Alignment

3

Dot Matrix MethodDot Matrix Method

http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html

Page 4: Linear Sequence Alignment

4

Sequence AlignmentSequence Alignment

ATCGATACG, ATGGATTACG

3 possibilities

Mismatch

…C…

…G…

Indel

…C…

…-…

Match

…C…

…C…|

Page 5: Linear Sequence Alignment

5

Global Pairwise AlignmentGlobal Pairwise Alignment

ATCGAT-ACG

ATGGATTACG

ATCGATACG, ATGGATTACG

|| ||| |||+1 +1 +1+1 +1 +1 +1 +1Matches: = +8

-1Mismatches: = -1-2Gaps: = -2

Total score = +5

Page 6: Linear Sequence Alignment

6

Dynamic Programming Dynamic Programming

0 - G A T C

- 0

G

A

C

Global alignment (Needleman-Wunsch) algorithm

Page 7: Linear Sequence Alignment

7

Dynamic Programming Dynamic Programming

0 - G A T C

- 0 -2 -4 -6 -8

G

A

C

Global alignment (Needleman-Wunsch) algorithm

Page 8: Linear Sequence Alignment

8

Dynamic Programming Dynamic Programming

0 - G A T C

- 0 -2 -4 -6 -8

G -2

A -4

C -6

Global alignment (Needleman-Wunsch) algorithm

Page 9: Linear Sequence Alignment

9

Dynamic Programming Dynamic Programming

+ MATCH + GAP

+ GAP

0 - G A T C

- 0 -2 -4 -6 -8

G -2

A -4

C -6

+1

Max= 1

Global alignment (Needleman-Wunsch) algorithm

Page 10: Linear Sequence Alignment

10

Dynamic Programming Dynamic Programming

0 - G A T C

- 0 -2 -4 -6 -8

G -2 1 -1 -3 -5

A -4 -1 2 0 -2

C -6 -3 0 1 1

Global alignment (Needleman-Wunsch) algorithm

Page 11: Linear Sequence Alignment

11

Dynamic Programming Dynamic Programming

- G A T C

- 0 -2 -4 -6 -8

G -2 1 -1 -3 -5

A -4 -1 2 0 -2

C -6 -3 0 1 1

GATC

GA-C

Global alignment (Needleman-Wunsch algorithm)

|| |

Page 12: Linear Sequence Alignment

12

Greedy X-drop Alignment Greedy X-drop Alignment

Aligns sequences that differ by sequencing errors

Works with measure of difference

Restricts indel penalty

Zhang et al. 2000

2

matmisindel

Page 13: Linear Sequence Alignment

13

Greedy X-drop Alignment Greedy X-drop Alignment

Zhang et al. 2000

Page 14: Linear Sequence Alignment

14

Greedy X-drop Alignment Greedy X-drop Alignment

CA 0G 0- 0

- G A T C

Page 15: Linear Sequence Alignment

15

Greedy X-drop Alignment Greedy X-drop Alignment

C 1 1 1A 0 1G 0- 0

- G A T C

X-drop condition saves computation

Page 16: Linear Sequence Alignment

16

Linear Alignment Linear Alignment

Page 17: Linear Sequence Alignment

17

Index of coincidence– Maximum number of matches between two sequences

– Ungapped alignment

Linear Alignment Linear Alignment

ATCGATACG

ATGGATTACG

ATCGATACG

ATGGATTACG

ATCGATACG

ATGGATTACG |

ATCGATACG

ATGGATTACG|| |||

ATCGATACG

ATGGATTACG …

Page 18: Linear Sequence Alignment

18

Attempt to increase similarity

Linear Alignment Linear Alignment

ATCGATACG

ATGGATTACG|| |||

-ATCGATACG

ATGGATTACG ||||

ATCGATACG

-ATGGATTACG |

Window score: 2 -3 -3

ATCGATACG

ATGGATTACG|| |||

Page 19: Linear Sequence Alignment

19

9 human/mouse homologous gene cds pairs retrieved (Jareborg et al. 1999)

Greedy alignment run firstmat=10, mis=-6, X=2200 (indel=-11)

Dynamic Programming and Linear alignment using truncated seqs

Comparison of alignments Comparison of alignments

Page 20: Linear Sequence Alignment

20

Similarity scores

Comparison of alignments Comparison of alignments

05000

1000015000200002500030000350004000045000

IOC Linear Greedy DynProg

Sco

re

AHSG

PANK3

PBX2

Protein C

Cyp21

CREB-RP

H2 TAP1

C4

notch4

Page 21: Linear Sequence Alignment

21

Similarity percentage

Comparison of alignments Comparison of alignments

0

20

40

60

80

100

IOC Linear Greedy DynProg

Sim

ilari

ty (

%)

AHSG

PANK3

PBX2

Protein C

Cyp21

CREB-RP

H2 TAP1

C4

notch4

Page 22: Linear Sequence Alignment

22

Comparison of alignments Comparison of alignments

1

10

100

1000

10000

100000

AHSG PANK3 PBX2 ProteinC

Cyp21 CREB-RP

H2TAP1

C4 notch4

Tim

e (m

s) Dyn Prog

Greedy

Linear

Page 23: Linear Sequence Alignment

23

Comparison of alignments Comparison of alignments

1

10

100

1000

10000

100000

AHSG PANK3 PBX2 ProteinC

Cyp21 CREB-RP

H2TAP1

C4 notch4

Tim

e (m

s) Dyn Prog

Greedy

Linear

Page 24: Linear Sequence Alignment

24

Comparison of alignments Comparison of alignments

0

10

20

30

40

50

60

70

80

90

100

w/ IOC w/o IOC

Sim

ilari

ty (

%)

PACAP

PANK3

CD4

PBX2

Protein C

AHSG

Cyp21

H2 TAP1

CREB-RP

C4

notch4

Page 25: Linear Sequence Alignment

25

Comparison of alignments Comparison of alignments

Maximum coincidence alignment: Offset -72 yielded 1642 matches of 2175 possible (75.4943% similarity), score 6611

ACAGTACTGCTACTTCTCGCCGACTGGGTGCTGCTCCGGACCGCGCTGCCCCGCATATTCTCCCTGCTGGTGCCCACCGCGCTGCCACTGCTCCGGGT

| | || | | | ||||||| | | | | | | || | || | | ||| |

ATGGCTGCGCACGTCTGGCTGGCGGCCGCCCTGCTCCTTCTGGTGGACTGGCTGCTGCTGCGGCCCATGCTCCCGGGAATCTTCTCCCTGTTGGTTCC

ACGGGCCGCCTCACTGACTGGATTCTACAAGATGGCTCAGCCGATACCTTCACTCGAAACTTAACTCTCATGTCCATTCTCACCATAGCCAGTGCAGT

||||||||| |||||||||||||||| || ||| ||| || |||||| || ||| || |||||||||||||||||||||||||| |||

ACGGGCCGCATCACTGACTGGATTCTTCAGGATAAGACAGTTCCTAGCTTCACCCGCAACATATGGCTCATGTCCATTCTCACCATAGCCAGCACAGC

Decreasing the gap penalty allows similar regions to be aligned without using IOC

Page 26: Linear Sequence Alignment

26

Comparison of alignments Comparison of alignments

References

Needleman, S. B. & Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 48: 443-453.

Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology. Pacific Grove, California: Brooks/Cole.

Zhang, Z.; Schwartz, S.; Wagner, L.; and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7:203-214.

Page 27: Linear Sequence Alignment

27

Linear Sequence AlignmentLinear Sequence Alignment

Travis Hillenbrand


Recommended