+ All Categories
Home > Documents > ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

Date post: 11-Feb-2016
Category:
Upload: emery
View: 44 times
Download: 0 times
Share this document with a friend
Description:
ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES. - PowerPoint PPT Presentation
40
1 ALIGNMENT OF NUCLEOTIDE ALIGNMENT OF NUCLEOTIDE & & AMINO-ACID SEQUENCES AMINO-ACID SEQUENCES
Transcript
Page 1: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

1

ALIGNMENT OF NUCLEOTIDEALIGNMENT OF NUCLEOTIDE&&

AMINO-ACID SEQUENCESAMINO-ACID SEQUENCES

Page 2: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

2

An alignment is an evolutionarily meaningful comparison of two or more sequences (DNA, RNA, or proteins).

In the case of two DNA sequences, an alignment consists of a series of paired bases, one base from each sequence. There are three types of pairs:

(1) matches = the same nucleotide appears in both sequences. (2) mismatches = different nucleotides are found in the two sequences. (3) gaps = a base in one sequence and a null base in the other. GCGGCCCATCAGGTAGTTGGTG-G

GCGTTCCATC--CTGGTTGGTGTG***..***** .*.******* *

Page 3: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

3

Alignment:Alignment: A hypothesis concerning positional homology among residues in a sequence.

Positional homologyPositional homology = A pair of nucleotides from two aligned sequences that have descended from one nucleotide in the ancestor of the two sequences.

GCGGCCCATCAGGTAGTTGGTG-GGCGTTCCATC--CTGGTTGGTGTG***..***** .*.******* *

Page 4: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

4

Positional homologyPositional homology = A pair of nucleotides from two aligned sequences that have descended from one nucleotide in the ancestor of the two sequences.

GCGGCCCATCAGGTAGTTGGTG-GGCGTTCCATC--CTGGTTGGTGTG***..***** .*.******* *

These two nucleotides are derived from the ancestor of cats and armadillos.

Page 5: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

5

Homology:Homology: The term was coined by Richard Owen in 1843.

Definition: Similarity resulting from common ancestry.

Page 6: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

6

Homology: A qualitative statment

• Homology designates a relationship of common descent between entities

• Two genes are either homologs or not– it doesn’t make sense to say “two genes are 43% homologous.”

– it doesn’t make sense to say “Linda is 43% pregnant.”

Page 7: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

7

By comparing homologous characters, we can reconstruct the evolutionary events that have led to the formation of the extant sequences from the common ancestor.

Homology

Page 8: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

8

When dealing with sequences, we are interested in POSITIONAL HOMOLOGY.

We identify positional homology by ALIGNMENT.

Homology

Page 9: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

9

ACTGGGCCCAAATC

1 deletion 1 substitution

1 insertion 1 substitution

ACAGGGCCACAAATCACTGGCCCAGATC

ACTGGCCCAGATC--ACAGGGCCACAAATC**.**.***.*..--

ACT-GGCC-CAGATCACAGGGCCACAAATC**.-****-**.***

Correct alignmentCorrect alignment Incorrect alignmentIncorrect alignment

ACTGGGCCCAAATCG

ACTGGGCCCAAATCA

A

Page 10: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

10

unknown

ACAGGGCCACAAATCACTGGCCCAGATC

ACTGGCCCAGATC--ACAGGGCCACAAATC**.**.***.*..--

ACT-GGCC-CAGATCACAGGGCCACAAATC**.-****-**.***

Correct alignment?Correct alignment? Incorrect alignment?Incorrect alignment?

unknownunknown

Page 11: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

11

Sequence alignment = Sequence alignment = The identification of the location of deletion or insertions that might have occurred in either of the two lineages since their divergence from a common ancestor. InInsertionsertion ++ DelDeletionetion == IndelIndel oror GapGap

Page 12: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

12

Sequence alignment Sequence alignment

1. Pairwise 1. Pairwise alignment alignment

2. Multiple 2. Multiple alignmentalignment

Page 13: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

13

- Two DNA sequences: A and B.- Two DNA sequences: A and B.- Lengths are - Lengths are mm and and nn, , respectively. respectively. - The number of matched pairs is - The number of matched pairs is xx. . - The number of mismatched pairs - The number of mismatched pairs is is yy. . - Total number of bases in gaps - Total number of bases in gaps is is zz..

Page 14: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

14

An gap indicates that a deletiondeletion or an insertioninsertion has occurred in one of the two lineages.

GCGG-CCATCAGGTAGTTGGTG--GCGTTCCATC--CTGGTTGGTGTG

Page 15: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

15

The alignment is the first step in many evolutionary and functional studies.

Errors in alignment tend to amplify in later computational stages.

Page 16: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

16

Methods of alignment:

1. Manual2. Dot matrix3. Algorithmic (scoring matrices and gap penalties)

Page 17: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

17

Manual aliManual aliggnmentnment. When there are few gaps and the two sequences are not too different from each other, a reasonable alignment can be obtained by visual inspection.

GCG-TCCATCAGGTAGTTGGTGTGGCGTTCCATCAGGTGGTTGGTGTG*** **********.*********

Page 18: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

18

Advantages of manual alignment: (1) use of a powerful and trainable tool (the brain, well…, some brains).(2) ability to integrate additional data, e.g., domain structure, biological function (e.g., 3D structure).

Page 19: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

19

Disadvantages of manual alignment: 1. Subjectivity = the inability to formally specify the algorithm.2. Irreproducibility = the inability of two researchers to reach the same result. 3. Unscalability = the inability to apply the method to long sequences. 4. Incommensurability = the inability to compare the results to those derived from other methods.

Page 20: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

20

The dot-matrix method: The two sequences are written out as column and row headings of a two-dimensional matrix. A dot is put in the dot-matrix plot at a position where the nucleotides in the two sequences are identical.

Page 21: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

21

The alignment is defined by a path from the upper-left element to the lower-right element.

Page 22: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

22

There are 4 possible steps in the There are 4 possible steps in the path: path:

(1) a diagonal step through a dot = match.

(2) a diagonal step through an empty element of the matrix = mismatch.

(3) a horizontal step = a gap in the sequence on the top of the matrix.

(4) a vertical step = a gap in the sequence on the left of the matrix.

Page 23: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

23

alloweddirections

forbiddendirections

Page 24: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

24

A dot matrix may become cluttered. With DNA sequences, ~25% of the elements will be occupied by dots by chance alone.

Page 25: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

25

The number of spurious matches is determined by: window size, stringency, & alphabet size.

window size =1stringency = 1alphabet size = 4

Page 26: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

26

window size =1stringency = 1alphabet size = 4

window size = 3stringency = 2alphabet size = 4

Page 27: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

27

window size = 1stringency = 1alphabet size = 20

Page 28: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

28

Dot-matrix methods:Dot-matrix methods:Advantages: May unravel information on the evolution of sequences.

Page 29: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

29

Advantages:Highlighting Information

The vertical gap The vertical gap indicates that a indicates that a coding region coding region corresponding to corresponding to ~75 amino acids ~75 amino acids has either been has either been deleted from the deleted from the human gene or human gene or inserted into the inserted into the bacterial gene. bacterial gene.

Window size = 60 amino acids; Stringency = 24 matches

Page 30: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

30

The two The two diagonally diagonally oriented parallel oriented parallel lines most lines most probably indicate probably indicate that a small that a small internal internal duplication has duplication has occurred in the occurred in the bacterial gene. bacterial gene.

Window size = 60 amino acids; Stringency = 24 matches

Advantages:Highlighting Information

Page 31: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

31

Dot-matrix Dot-matrix methods:methods:Disadvantage: May not identify the best alignment.

Page 32: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

32

Scoring Matrices & Gap Penalties

Page 33: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

The true alignment between two sequences is the one that reflects accurately the evolutionary relationships between the sequences.

Since the true alignment is unknown, in practice we look for the optimal alignment, which is the one in which the numbers of mismatches and gaps are minimized according to certain criteria.

Page 34: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

34

Unfortunately, reducing the number of mismatches results in an increase in the number of gaps, and vice versa.

Page 35: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

35

= matches = mismatches = nucleotides in gaps = gaps

Page 36: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

36

The scoring scheme comprises a gap penalty and a scoring matrix, M(a,b), that specifies the score for each type of match (a = b) or mismatch (a b).

The units in a scoring matrix may be the nucleotides in the DNA or RNA sequences, the codons in protein-coding regions, or the amino acids in protein sequences.

Page 37: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

37

If you want to know the secrets behind the black box of sequence alignment, you will have to take a class in BIOINFORMATICS.

Page 38: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

38

Multiple Sequence

Alignment is infinitely more complicated than

pairwise alignment

Page 39: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

39

Multiple Sequence

Alignment does not have an

exact optimal solution. It is solved heuristically.

Page 40: ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES

40

A Multiple Sequence Alignment

GCGGCTCA TCAGGTAGTT GGTG-G SpinachGCGGCCCA TCAGGTAGTT GGTG-G RiceGCGTTCCA TC--CT-GTT GGTGTG MosquitoGCGTCCCA TCAGCTAGTT GTTG-G MonkeyGCGGCGCA TTAGCTAGTT GGTG-A Human***...** *.--.*-*** *.**-.


Recommended