Local Alignment
Tutorial 2
• When to use local alignment?• How to solve a local alignment matrix• Comparison to global alignment
• Cool story of the day
Local alignment
When to use local alignment?
When the aim is to find short similarities inside a sequence.
short -> compared to the sequence they’re in
When to use local alignment?
Example: When looking for motifs in a sequence
Binding site: ATGGC
ATGGCATGGGTATGCTCGCTCGCTGATGGCATAGCTGATGCTGATCGGGCTCGCTCGCTCGCTC
ATGGCGCTGCTCGCTCGCTCGCATGTCTAGATAAGAGATAATAAGCTGATGCTAGCTGATGCTT
ATGGCTGCGTAGAGTATAGCGTGTGATGCTAGCTAGCTAGCTGGTAGCA-GGCTGATCGTAGCT
Dynamic programming – local alignment
N1Nn
M1
Mm
Alignment
N1Nn
M1
Mm
[I,J ]Best alignment M1..I, N1..J
Alignment
All possible alignments encoded as path in matrix
The differences:
1. We can start a new match instead of extending a previous alignment.
2. Instead of looking only at the far corner, we look anywhere in the table for the best score.
Global vs Local
Global Local
Local Alignment
Scoring System– Match : +1 Ni=Mj
– Mismatch: -1 Ni=Mj
– Indel : -2
N1Nn
M1
Mm
Local Alignment
Scoring System– Match : +1 Ni=Mj
– Mismatch: -1 Ni=Mj
– Indel : -2
N1Nn
M1
Mm
Local Alignment
Scoring System– Match : +1– Mismatch : -1
– Indel : -2
N1Nn
M1
Mm
Local Alignment
Scoring System– Match : +1– Mismatch : -1
– Indel : -2
N1Nn
M1
Mm
N1-
Local Alignment
Scoring System– Match : +1– Mismatch : -1
– Indel : -2
N1Nn
M1
Mm
-
M1
Local Alignment
Scoring System– Match : +1– Mismatch: -1– Indel : -2 N1N2
M1
M2
N1-M1M2
Local AlignmentFill:1.We fill the table like in global alignment, but we don’t allow negative
numbers (turn every negative number to 0)2.No arrows coming out from cells with a 0.
Scoring System– Match : +1– Mismatch: -1– Indel : -2
+1 if M2=N2; -1 if M2=N2
-2
N1N2Nn
M1
M2
Mm
N1N2
M1M2
N1 -M1M2
N1N2
M1 -
Local Alignment
Trace:We trace back from the highest scoring cells.
+1 if M2=N2; -1 if M2=N2
-2
N1N2Nn
M1
M2
Mm
N1N2
M1M2
N1 -M1M2
N1N2
M1 -
17
If you like formulas…
Z = max (Si,j+w, Si+1,j+w, Si,j+1+w)Z
i i+1
j
j+1
When w is the score according to the scoring matrix
w= +1 match
-2 mismatch/indel
For example
Z = max (Si,j+2/-1, Si+1,j-1, Si,j+1-1)
match or mismatch
indel
Seq 1TACTAASeq 2TAATA
Local Alignment – let’s get this party started
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0
T 1
A 2
A 3
T 4
A 5
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1
A 2
A 3
T 4
A 5
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 0
A 2 0
A 3 0
T 4 0
A 5 0
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 0
A 2 0
A 3 0
T 4 0
A 5 0
-T
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 0
A 2 0
A 3 0
T 4 0
A 5 0
T-
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 0?
A 2 0
A 3 0
T 4 0
A 5 0
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 0?
A 2 0
A 3 0
T 4 0
A 5 0
-T
T-
TT
-2
+1-2
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 01
A 2 0
A 3 0
T 4 0
A 5 0
0
T
1
A
2
C
3
T
4
A
5
A
6
0 0000000
T 1 010
A 2 0
A 3 0
T 4 0
A 5 0
0A 5
0T 4
0A 3
0A 2
0010010T 1
00000000
A
6
A
5
T
4
C
3
A
2
T
10
0A 5
0T 4
0A 3
1200200A 2
0010010T 1
00000000
A
6
A
5
T
4
C
3
A
2
T
10
0A 5
0T 4
3101100A 3
1200200A 2
0010010T 1
00000000
A
6
A
5
T
4
C
3
A
2
T
10
0A 5
1020010T 4
3101100A 3
1200200A 2
0010010T 1
00000000
A
6
A
5
T
4
C
3
A
2
T
10
1300200A 5
1020010T 4
3101100A 3
1200200A 2
0010010T 1
00000000
A
6
A
5
T
4
C
3
A
2
T
10
0
T
1
A
2
C
3
T
4
A
5
A
6
0 000000
T 1 010010
A 2 0020021
A 3 0011013T 4 0100201
A 5 0020031
0
T
1
A
2
C
3
T
4
A
5
A
6
0 000000
T 1 010010
A 2 0020021
A 3 0011013
T 4 0100201
A 5 0020031
Leave only paths from highest score
TAATAA
TACTATAATA
1300200A 5
1020010T 4
3101100A 3
1200200A 2
010010T 1
0000000
A
6
A
5
T
4
C
3
A
2
T
10
1300200A 5
1020010T 4
3101100A 3
1200200A 2
010010T 1
0000000
A
6
A
5
T
4
C
3
A
2
T
10
Both have a score of 3
And Now… Global Alignment
1.We keep negative numbers.2.Arrows coming out from any cell.3.We trace back from right-bottom to left-top of the table.
Scoring System– Match : +1– Mismatch: -1– Indel : -2
+1 if M2=N2; -1 if M2=N2
-2
N1N2Nn
M1
M2
Mm
N1N2..M1M2..
N1 ..-M1M2..
N1N2..M1 ..-
A 5
T 4
A 3
A 2
T 1
0
A
6
A
5
T
4
C
3
A
2
T
10
Match: +1
Mismatch:-1
Indel: -2
-12-10-8-6-4-2
-10
-8
-6
-4
-2
0
-9-7-5-3-11
130-3-4-7
-202-1-2-5
-3-1-110-3
-6-4-202-1
A 5
T 4
A 3
A 2
T 1
0
A
6
A
5
T
4
C
3
A
2
T
10
Match: +1
Mismatch:-1
Indel: -2
-12-10-8-6-4-2
-10
-8
-6
-4
-2
0
-9-7-5-3-11
130-3-4-7
-202-1-2-5
-3-1-110-3
-6-4-202-1
130-3-1-4-5A 5
-102-1-2-2-4T 4
1-1-110-3-3A 3
-20-202-1-2A 2
-6-4-2-3-11-1T 1
-6-5-4-3-2-100
A
6
A
5
T
4
C
3
A
2
T
10
TACTAATAATA-
TACTAATAAT-A
130-3-1-4-5A 5
-102-1-2-2-4T 4
1-1-110-3-3A 3
-20-202-1-2A 2
-6-4-2-3-11-1T 1
-6-5-4-3-2-100
A
6
A
5
T
4
C
3
A
2
T
10
Both have a score of 1
TAATAA
TACTATAATA
1300200A 5
1020010T 4
3101100A 3
1200200A 2
010010T 1
0000000
A
6
A
5
T
4
C
3
A
2
T
10
1300200A 5
1020010T 4
3101100A 3
1200200A 2
010010T 1
0000000
A
6
A
5
T
4
C
3
A
2
T
10
TACTAATAATA-
TACTAATAAT-A
130-3-1-4-5A 5
-102-1-2-2-4T 4
1-1-110-3-3A 3
-20-202-1-2A 2
-6-4-2-3-11-1T 1
-6-5-4-3-2-100
A
6
A
5
T
4
C
3
A
2
T
10
LocalGlobal
Cool Story of the day
How Archaea was discovered
• Until the 20th century, most biologists considered all living things to be classifiable as either a plant or an animal.
• But in the 1950s and 1960s, most biologists came to the realization that this system failed to accommodate the fungi, protists, and bacteria.
• The scientific community was understandably shocked in the late 1970s by the discovery of an entirely new group of organisms - the Archaea.
http://www.ucmp.berkeley.edu/archaea/archaea.html
Carl Woese
In order to study and compare different creatures one needs to find a common trait.
Ribosomal RNA “…the component of all self-replicating systems…”“…its sequence changes but slowly with time, permitting the detection od relatedness among very distant species…”
Woese and his colleagues compared the sequences of rRNAs from different creatures
Nicholas Barton et al (2007) 'Evolution' Backcover
Stay tuned…
More on phylogenetic trees, multiple sequence alignment and clustering in the next lessons…