Post on 21-Dec-2015
transcript
Sequence Alignments
Chi-Cheng Lin, Ph.D.Associate Professor
Department of Computer ScienceWinona State University – Rochester Center
clin@winona.edu
Intro to Bioinformatics – Sequence Alignment 2
Sequence Alignments Cornerstone of bioinformatics What is a sequence?
• Nucleotide sequence• Amino acid sequence
Pairwise and multiple sequence alignments What alignments can help
• Determine function of a newly discovered gene sequence
• Determine evolutionary relationships among genes, proteins, and species
• Predicting structure and function of protein
Acknowledgement: This notes is adapted from lecture notes of both Wright State University’s Bioinformatics Program.
Intro to Bioinformatics – Sequence Alignment 3
DNA Replication Prior to cell division, all the
genetic instructions must be “copied” so that each new cell will have a complete set
Intro to Bioinformatics – Sequence Alignment 4
Over time, genes accumulate mutations Environmental factors
• Radiation
• Oxidation Mistakes in replication or
repair
• Deletions, Duplications
• Insertions, Inversions
• Translocations
• Point mutations
Intro to Bioinformatics – Sequence Alignment 5
Codon deletion:ACG ATA GCG TAT GTA TAG CCG…• Effect depends on the protein, position, etc.• Almost always deleterious• Sometimes lethal
Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…• Almost always lethal
Deletions
Intro to Bioinformatics – Sequence Alignment 6
Indels Comparing two genes it is generally
impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:
ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT
Intro to Bioinformatics – Sequence Alignment 7
The Genetic Code
SubstitutionsSubstitutions are mutations accepted by natural selection.
Synonymous: CGC CGA
Non-synonymous: GAU GAA
Intro to Bioinformatics – Sequence Alignment 8
Point Mutation Example: Sickle-cell Disease
Wild-type hemoglobin
DNA
3’----CTT----5’
mRNA
5’----GAA----3’
Normal hemoglobin
------[Glu]------
Mutant hemoglobin
DNA
3’----CAT----5’
mRNA
5’----GUA----3’
Mutant hemoglobin
------[Val]------
Intro to Bioinformatics – Sequence Alignment 9image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
Intro to Bioinformatics – Sequence Alignment 10
Comparing Two Sequences Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT
Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
Intro to Bioinformatics – Sequence Alignment 11
Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA
• What does it do?What does it do?
One approach: Is there a similar gene in another species?• Align sequences with known genes• Find the gene with the “best” match
Intro to Bioinformatics – Sequence Alignment 12
Scoring a Sequence Alignment Match score: +1 Mismatch score: +0
Gap penalty: –1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT
Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1)
Score = +11Score = +11
Intro to Bioinformatics – Sequence Alignment 13
How can we find an optimal alignment? Finding the alignment is computationally
hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT
There are ~888,000 possibilities to align the two sequences given above.
Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.
Intro to Bioinformatics – Sequence Alignment 14
Global and Local alignments Global alignments – score the entire
alignment Local alignment – find the best matching
subsequence Why local sequence alignment?
• Subsequence comparison between a DNA sequence and a genome
• Protein function domains• Exons matching
Intro to Bioinformatics – Sequence Alignment 15
Example Compare the two sequences:TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG
Global alignment (does it look good?)TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG
Local alignment (does it look good?)---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------
Intro to Bioinformatics – Sequence Alignment 16
Dot Plots One of the simplest and oldest methods for
sequence alignment Visualization of regions of similarity
• Assign one sequence on the horizontal axis• Assign the other on the vertical axis• Place dots on the space of matches• Diagonal lines means adjacent regions of
identity
Intro to Bioinformatics – Sequence Alignment 17
A Simple Example Construct a simple
dot plot for
TAGTCGATGTGGTCATC
The alignment is
TAGTCGATGTGGTC-ATC
T A G T C G A T G
T * * *
G * * *
G * * *
T * * *
C *
A * *
T * * *
C *
Intro to Bioinformatics – Sequence Alignment 18
What else can it do (and how)? Gaps Inverse substring Repeat Palindrome Gene conservation and order study