+ All Categories
Home > Documents > Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by...

Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by...

Date post: 19-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
31
Sequence Alignment Submitted by Purnima Department of Bioinformatics
Transcript
Page 1: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Sequence Alignment

Submitted by

Purnima

Department of Bioinformatics

Page 2: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Definition

• In bioinformatics, a Sequence Alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences

• Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.

Page 3: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

• Aligned sequences of nucleotide or amino

acid residues are typically represented as

rows within a matrix. Gaps are inserted

between the residues so that identical or

similar characters are aligned in

successive columns.

Page 4: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Similarity versus Homology

• Similarity refers to the likeness or % identity

between 2 sequences.Similarity means

sharing a statistically significant number of

bases or amino acids Similarity does not

imply homology.

• Homology refers to shared ancestry.Two

sequences are homologous if they are

derived from a common ancestral sequence

Homology usually implies similarity.

Page 5: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

• Homolog :-Protein/gene that shares a common ancestor and

which has good sequence and/or structure similarity to

another.

• Homology: genes that derive from a common ancestor- these

gene are called homologs.

• Paralog:-A homologue which arose through gene duplication

in the same species/chromosome.

• Paralogous genes are homologous genes in one organism

that derive from gene duplication.

• Gene duplication: one gene is duplicated in multiple copies

that are therefore free to evolve and assume new functions .

• Ortholog:- A homologue which arose through speciation

(found in different species) Orthologous genes are

homologous genes in different organisms.

Page 6: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 7: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

1. Pairwise Sequence Alignment

A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length and where each pair of residues represents a homologous position.

The overall goal of pairwise sequence alignment is to find the best pairing of two sequences, such that there is maximum correspondence among residues.

To achieve this goal, one sequence needs to be shifted relative to the other to find the position where maximum matches are found.

Page 8: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Alignment Algorithms

Page 9: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

1.1 Dot Matrix Method

In a dot matrix, two sequences to be compared are written in the horizontal and vertical axes of the matrix.

The comparison is done by scanning each residue of one sequence for similarity with all residues in the other sequence.

If a residue match is found, a dot is placed within the graph. Otherwise, the matrix positions are left blank.

When the two sequences have substantial regions of similarity, many dots line up to form contiguous diagonal lines, which reveal the sequence alignment.

If there are interruptions in the middle of a diagonal line, they indicate insertions or deletions. Parallel diagonal lines within the matrix represent repetitive regions of the sequences

Page 10: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Dot Matrix pairwise alignment B. Bayesian Method for sequence alignment This method is rarely used for pairwise alignment and is used to measure the evolutionary distance between DNA sequences. This method involves the probabilities of all aligned sequences in a profile, their gap scores and substitution matrix value to assess the probability of the next alignment. No need to specify all parameters in Bayesian method. It describes the exact uncertainty and derives significant measure. This method assesses the probability of the alignment and there is no need of substitution matrix or gap scoring

Page 11: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

1.2 Dynamic programming

Dynamic programming can be useful in aligning nucleotide to protein sequences, a task complicated by the need to take into account frameshift mutations (usually insertions or deletions).

The framesearch method produces a series of global or local pairwise alignments between a query nucleotide sequence and a search set of protein sequences, or vice versa.

Its ability to evaluate frameshifts offset by an arbitrary number of nucleotides makes the method useful for sequences containing large numbers of indels, which can be very difficult to align with more efficient heuristic methods.

In practice, the method requires large amounts of computing power or a system whose architecture is specialized for dynamic programming. The BLAST and EMBOSS suites provide basic tools for creating translated alignments

Page 12: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Example of pairwise alignment of two sequences using dynamic programming. The score for the lower right square (A) of a 2 × 2 matrix is the maximum score from the one of other three neighboring squares (X, Y, and Z) plus and minus the exact single residue match score (a) for the lower right corner and the gap penalty (g.p.), respectively. A matrix is set up for the two short sequences. A simple scoring system is applied in which an identical match is assigned a score of 1, a mismatch a score 0, and gap penalty (see below) is −1. The scores in the matrix are filled one row at a time and one cell at a time beginning from top to bottom. The best scores are filled to the lower right corner of a submatrix (grey boxes) according to this rule. When all the cells are filled with scores, a best alignment is determined through a trace-back procedure to search for the path with the best total score. When a path moves horizontally or vertically, a penalty is applied

Page 13: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 14: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Word or K_tuple method

Page 15: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

2. Global Alignment

In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length.

Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences.

This method is more applicable for aligning two closely related sequences of roughly the same length.

For divergent sequences and sequences of variable lengths, this method may not be able to generate optimal results because it fails to recognize highly similar local regions between the two sequences.

Page 16: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

GLOBAL ALIGNMENT

Page 17: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

2.1 Needleman–Wunsch Algorithm

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences.

The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem.

The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the utmost importance. The algorithm assigns a score to every possible alignment, and the purpose of the algorithm is to find all possible alignments having the highest score.

Page 18: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Needleman–Wunsch Algorithm

Page 19: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

3. Local Alignment

Local alignment only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions.

This approach can be used for aligning more divergent sequences with the goal of searching for conserved patterns in DNA or protein sequences.

The two sequences to be aligned can be of different lengths. This approach is more appropriate for aligning divergent biological sequences containing only modules that are similar, which are referred to as domains or motifs.

Page 20: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

LOCAL ALIGNMENT

Page 21: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

3.1 Smith–Waterman Algorithm

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences.

Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.

Like the Needleman–Wunsch algorithm, of which it is a variation, Smith–Waterman is a dynamic programming algorithm.

It has the desirable property that it is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme).

Page 22: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Smith–Waterman Algorithm

Page 23: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 24: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 25: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 26: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

4. Multiple sequence alignment

Multiple sequence alignment is an extension of pairwise alignment to incorporate more than two sequences at a time. Multiple alignment methods try to align all of the sequences in a given query set.

Multiple alignments are often used in identifying conserved sequence regions across a group of sequences hypothesized to be evolutionarily related.

Alignments are also used to aid in establishing evolutionary relationships by constructing phylogenetic trees.

Multiple sequence alignments are computationally difficult to produce and most formulations of the problem lead to NP-complete combinatorial optimization problems.

Page 27: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Multiple Sequence Alignment

Page 28: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length
Page 29: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

4.1 Dynamic programming

The technique of dynamic programming is theoretically applicable to any number of sequences however, it is rarely used for more than three or four sequences in its most basic form.

This method requires constructing the n-dimensional equivalent of the sequence matrix formed from two sequences, where n is the number of sequences in the query.

Standard dynamic programming is first used on all pairs of query sequences and then the "alignment space" is filled in by considering possible matches or gaps at intermediate positions, eventually constructing an alignment essentially between each two-sequence alignment.

Its guarantee of a global optimum solution is useful in cases where only a few sequences need to be aligned accurately. One method for reducing the computational demands of dynamic programming, which relies on the "sum of pairs" objective function, has been implemented in the MSA software package.

Page 30: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

Dynamic programming

Page 31: Sequence Alignment€¦ · A pairwise sequence alignment is an alignment of 2 sequences obtained by inserting gaps (“-”) such that the resulting sequences have the same length

References

• http://www.aun.edu.eg/molecular_biology/Procedure%20Bioinformatics22.23-4-2015/Xiong%20-%20Essential%20Bioinformatics%20send%20by%20Amira.pdf

• https://www.slideshare.net/sheetalvincent/global-alignment • https://slideplayer.com/slide/4035273/ • https://www.slideshare.net/PritomChaki/global-and-local-alignment-

bioinformatics • https://www.researchgate.net/figure/Dot-Matrix-pairwise-alignment-B-Bayesian-

Method-for-sequence-alignment-This-method-is_fig6_333009225 • https://www.cs.rice.edu/~nakhleh/COMP571/Slides/SequenceAlignment-

PairwiseDP-Handout.pdf • http://ugene.net/amino-translations-and-identical-sequences/ • https://static1.squarespace.com/static/55624f9fe4b0077f89b6ed3d/t/55726faae4

b0620d1b37225e/1433563050467/Bioinformatics+Lesson+4.pdf • https://link.springer.com/article/10.1007/s11042-017-5092-0 • https://www.nature.com/articles/nbt0704-909


Recommended