Sequence alignment
Gabor T. Marth
Department of Biology, Boston [email protected]
BI420 – Introduction to Bioinformatics
Sequence alignment – Biology
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html
Biologically significant sequence alignment
Sequence alignment – Biology
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html
Biologically plausible sequence alignment
Sequence alignment – Biology
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html
Spurious alignment
Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison
Alignment types
Examples from: BLAST. Korf, Yandell, Bedell
How do we align the words: CRANE and FRAME?
CRANE || |FRAME
3 matches, 2 mismatches
How do we align words that are different in length?
COELACANTH || |||P-ELICAN--
COELACANTH || |||-PELICAN--
5 matches, 2 mismatches, 3 gaps
In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x 1 + 1 x (-1) + 3 x (-1) = 0. This is the alignment score.
Finding the “best” alignment
COELACANTH || |||P-ELICAN--
COELACANTH | |||PE-LICAN--
COELACANTH || P-EL-ICAN-
COELACANTH PELICAN--
S=-2 S=-6 S=-10
S=0
Global alignment – Needleman-Wunsch
C O E L A C A N T H
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
P -1 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
E -2 -2 -2 -1 0 -3 -4 -5 -6 -7 -8
L -3 -3 -3 -2 -2 -1 -2 -3 -4 -5 -6
I -4 -4 -4 -3 -1 -1 -2 -1 -4 -5 -6
C -5 -3 -4 -4 -2 -2 0 -1 -2 -3 -4
A -6 -4 -4 -5 -3 -1 -1 -1 0 -1 -2
N -7 -5 -5 -5 -4 -2 -2 0 -2 -1 0
Local alignment – Smith-Waterman
C O E L A C A N T H
0 0 0 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 1 0 0 0 0 0 0 0
L 0 0 0 0 2 1 0 0 0 0 0
I 0 0 0 0 1 1 0 0 0 0 0
C 0 1 0 2 0 0 2 0 0 0 0
A 0 0 0 0 0 1 0 3 2 1 0
N 0 0 0 0 0 0 0 1 4 3 2
Visualizing pair-wise alignments
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html
Sequence similarity and scoring
Match-mismatch-gap penalties: e.g. Match = 1 Mismatch = -5 Gap = -10
Scoring matrices
Multiple alignments
Anchored multiple alignment