+ All Categories
Home > Documents > Before we begin…

Before we begin…

Date post: 02-Feb-2016
Category:
Upload: fell
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Before we begin…. - PowerPoint PPT Presentation
Popular Tags:
41
|| || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG TGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAG GAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACT GATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCA GAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAG GTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACA ACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCC TGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGT CATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGC ATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTT TCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACA ATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTT TCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTA CTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAG GGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGG TTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAAC AAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGT CTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAA GGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCC CTGGCTCACAAGTACCATTGA MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE… Before we begin Before we begin
Transcript
Page 1: Before we begin…

|| || ||||| ||| || || |||||||||||||||||||MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE…

ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACGTGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAGGAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACTGATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCAGAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAGGTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACAACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGTCATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGCATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTTTCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACAATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTTTCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTACTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAGGGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGGTTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAACAAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGTCTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAAGGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCCCTGGCTCACAAGTACCATTGA

MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE…

Before we beginBefore we begin……

Page 2: Before we begin…

Pairwise Pairwise Sequence Sequence AlignmentAlignment

Lesson 2Lesson 2

Page 3: Before we begin…

What is sequence alignmentWhat is sequence alignment??

Alignment: Alignment: Comparing two (pairwise) or Comparing two (pairwise) or more (multiple) sequences. Searching for more (multiple) sequences. Searching for a series of identical or similar characters in a series of identical or similar characters in the sequences.the sequences.

MVNLTSDEKTAVLALWNKVDVEDCGGE|| || ||||| ||| || || ||MVHLTPEEKTAVNALWGKVNVDAVGGE

Page 4: Before we begin…

Why sequence alignment?Why sequence alignment?

Predict characteristics of a protein – Predict characteristics of a protein –

use the structure or function information on use the structure or function information on known proteins with similar sequences available known proteins with similar sequences available in databases in order to predict the structure or in databases in order to predict the structure or function of an unknown proteinfunction of an unknown protein

Assumptions: similar sequences Assumptions: similar sequences produce similar proteinsproduce similar proteins

Page 5: Before we begin…

Local vs. GlobalLocal vs. Global Global alignmentGlobal alignment – finds the best – finds the best

alignment across the alignment across the wholewhole two two sequences.sequences.

Local alignmentLocal alignment – finds regions of – finds regions of high similarity in high similarity in partsparts of the of the sequences.sequences.

ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN-CDRYYQ

ADLG CDRYFQ|||| |||| |ADLG CDRYYQ

Global alignment:

forces alignment in

regions which differ

Local alignment

concentrates on regions of high similarity

Page 6: Before we begin…

In the course of evolution, the sequences changed In the course of evolution, the sequences changed from the ancestral sequence by random mutationsfrom the ancestral sequence by random mutations

Three types of changes:Three types of changes:1.1. InsertionInsertion - an insertion of a letter or several letters to the - an insertion of a letter or several letters to the

sequence. AAGAsequence. AAGA AAG AAGTTAA

Sequence evolutionSequence evolution

AAGAAGAA

InsertionInsertion

Page 7: Before we begin…

In the course of evolution, the sequences changed In the course of evolution, the sequences changed from the ancestral sequence by random mutationsfrom the ancestral sequence by random mutations

Three types of Three types of changeschanges : :1.1. InsertionInsertion - an insertion of a letter or several letters to the - an insertion of a letter or several letters to the

sequence. AAGAsequence. AAGA AAG AAGTTAA2.2. DeletionDeletion – a deletion of a letter (or more) from the sequence. – a deletion of a letter (or more) from the sequence.

AAAAGAGA AGA AGA

Sequence evolutionSequence evolution

AA AGAG

DeletionDeletion

AA

Page 8: Before we begin…

In the course of evolution, the sequences changed In the course of evolution, the sequences changed from the ancestral sequence by random mutationsfrom the ancestral sequence by random mutations

Three types of mutations:Three types of mutations:1.1. InsertionInsertion - an insertion of a letter or several letters to the - an insertion of a letter or several letters to the

sequence. AAGAsequence. AAGA AAG AAGTTAA2.2. DeletionDeletion - deleting a letter (or more) from the sequence. - deleting a letter (or more) from the sequence.

AAAAGAGA AGA AGA3.3. SubstitutionSubstitution – a replacement of one (or more) sequence letter by – a replacement of one (or more) sequence letter by

anotheranother AAAAGGAA AA AACCAA

Evolutionary changes in sequencesEvolutionary changes in sequences

AAAA AA

SubstitutionSubstitution

GGCCInsertionInsertion + + DeletionDeletion IndelIndel

Page 9: Before we begin…

Sequence alignmentSequence alignment

AAGCTGAATTCGAAAGGCTCATTTCTGA

AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-

One possible alignment:

This alignment includes:

2 mismatches 4 indels (gap)

10 perfect matches

Page 10: Before we begin…

Choosing an alignment: Choosing an alignment:

Many different alignments are possible:Many different alignments are possible:

AAGCTGAATTCGAAAGGCTCATTTCTGA

A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-

Which alignment is better?

AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-

Page 11: Before we begin…

Scoring an alignment:Scoring an alignment:example - naïve scoring system:example - naïve scoring system: Match: Match: +1+1 Mismatch: Mismatch: -2-2 Indel: Indel: -1-1

AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-

Score: = (+1)x10 + (-2)x2 + (-1)x4 = 2 Score: = (+1)x9 + (-2)x2 + (-1)x6 = -1

A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-

Higher score Better alignment

Page 12: Before we begin…

Scoring systemScoring system::

Different scoring systems can produce Different scoring systems can produce different optimal alignmentsdifferent optimal alignments

Scoring systems implicitly represent a Scoring systems implicitly represent a particular theory of similarity/dissimilarity particular theory of similarity/dissimilarity between sequence characters: evolution between sequence characters: evolution based, physico-chemical properties based based, physico-chemical properties based Some mismatches are more plausibleSome mismatches are more plausible

• Transition vs. Transversion Transition vs. Transversion

• LysLysArgArg ≠≠ LysLysCysCys Gap extension Vs. Gap openingGap extension Vs. Gap opening

Page 13: Before we begin…

Substitutions Matrices Substitutions Matrices

Nucleic acids:Nucleic acids: Transition-transversionTransition-transversion

Amino acids:Amino acids: Evolution (empirical data) based: (PAM, Evolution (empirical data) based: (PAM,

BLOSUM)BLOSUM) Physico-chemical properties based Physico-chemical properties based

(Grantham, McLachlan)(Grantham, McLachlan)

Page 14: Before we begin…

PAM MatricesPAM Matrices Family of matrices PAM 80, PAM 120, PAM Family of matrices PAM 80, PAM 120, PAM

250250

The number with PAM matrices represent The number with PAM matrices represent evolutionary distance evolutionary distance

Greater numbers denote greater distancesGreater numbers denote greater distances

Page 15: Before we begin…

Which PAM matrix to useWhich PAM matrix to use??

Low PAM numbers: strong similaritiesLow PAM numbers: strong similarities

High PAM numbers: weak similaritiesHigh PAM numbers: weak similarities

PAM120 for general use (40% identity)PAM120 for general use (40% identity) PAM60 for close relations (60% identity)PAM60 for close relations (60% identity) PAM250 for distant relations (20% identity)PAM250 for distant relations (20% identity)

If uncertain, try several different matricesIf uncertain, try several different matrices PAM40, PAM120, PAM250PAM40, PAM120, PAM250

Page 16: Before we begin…

PAM - limitationsPAM - limitations

Based on only one original datasetBased on only one original dataset

Examines proteins with few differences Examines proteins with few differences (85% identity)(85% identity)

Based mainly on small globular proteins Based mainly on small globular proteins so the matrix is biased so the matrix is biased

Page 17: Before we begin…

BLOSUM MatricesBLOSUM Matrices

Different BLOSUMDifferent BLOSUMnn matrices are matrices are calculated independently from BLOCKScalculated independently from BLOCKS

BLOSUMBLOSUMnn is based on sequences that is based on sequences that share at least share at least nn percent identity percent identity

BLOSUMBLOSUM6262 represents closer sequences represents closer sequences than BLOSUMthan BLOSUM4545

Page 18: Before we begin…

Example : Blosum62Example : Blosum62

derived from blocks of sequences that share at least 62% identity

Page 19: Before we begin…

Which BLOSUM matrix to useWhich BLOSUM matrix to use??

Low BLUSOM numbers for distant Low BLUSOM numbers for distant sequencessequences

High BLUSOM numbers for similar High BLUSOM numbers for similar sequencessequences

BLOSUM62 for general useBLOSUM62 for general use BLOSUM80 for close relationsBLOSUM80 for close relations BLOSUM45 for distant relationsBLOSUM45 for distant relations

Page 20: Before we begin…

PAM Vs. BLOSUMPAM Vs. BLOSUM

PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45

More distant sequences

Page 21: Before we begin…

Gap penaltyGap penalty

We expect to penalize gaps We expect to penalize gaps A different score for gap opening and for A different score for gap opening and for

extensionextension Insertions and deletions are rare in evolution Insertions and deletions are rare in evolution But once they occur, they are easy to extendBut once they occur, they are easy to extend Gap-extension penalty < gap-opening penaltyGap-extension penalty < gap-opening penalty

Page 22: Before we begin…

Web servers for pairwise alignmentWeb servers for pairwise alignment

Page 23: Before we begin…

BLAST 2 sequences (bl2Seq) at BLAST 2 sequences (bl2Seq) at NCBI NCBI

Produces the Produces the locallocal alignment of two given alignment of two given sequences using sequences using BLASTBLAST (Basic Local (Basic Local Alignment Search Tool)Alignment Search Tool) engine for local engine for local alignmentalignment

Does not use an exact algorithm but a Does not use an exact algorithm but a heuristicheuristic

Page 24: Before we begin…

Back to NCBIBack to NCBI

Page 25: Before we begin…

BLAST – bl2seqBLAST – bl2seq

Page 26: Before we begin…

blastnblastn – nucleotide – nucleotide

blastpblastp – protein – protein

Bl2Seq - queryBl2Seq - query

Page 27: Before we begin…

Bl2seq resultsBl2seq results

Page 28: Before we begin…

Bl2seq resultsBl2seq results

MatchMatch DissimilarityDissimilarity GapsGaps SimilaritySimilarity Low Low

complexitycomplexity

Page 29: Before we begin…

Bl2seq resultsBl2seq results::

Bits scoreBits score – A score for the alignment according – A score for the alignment according to the number of similarities, identities, etc.to the number of similarities, identities, etc.

Expected-score (E-value)Expected-score (E-value) –The number of –The number of alignments with the same score one can alignments with the same score one can “expect” to see by chance when searching a “expect” to see by chance when searching a database of a particular size. The closer the e-database of a particular size. The closer the e-value approaches zero, the greater the value approaches zero, the greater the confidence that the hit is realconfidence that the hit is real

Page 30: Before we begin…

BLAST – programsBLAST – programs

Query: DNA Protein

Database: DNA Protein

Page 31: Before we begin…

BLAST – BlastpBLAST – Blastp

Page 32: Before we begin…

Blastp - resultsBlastp - results

Page 33: Before we begin…

Blastp – results (cont’)Blastp – results (cont’)

Page 34: Before we begin…

Blastp – acquiring sequencesBlastp – acquiring sequences

Page 35: Before we begin…

blastp – acquiring sequences blastp – acquiring sequences (cont’)(cont’)

Page 36: Before we begin…

Fasta format – multiple sequencesFasta format – multiple sequences>gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens] MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVAN ALAHKYH

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

>gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens] MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLT SFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAI ALAHKYH

>gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVAS ALSSRYH

>gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVAS ALSSRYH

Page 37: Before we begin…

Searching for remote homologsSearching for remote homologs

Sometimes BLAST isn’t enoughSometimes BLAST isn’t enough Large protein family, and BLAST only finds Large protein family, and BLAST only finds

close members. We want more distant close members. We want more distant members members

PSI-BLASTPSI-BLAST Profile HMMs (not discussed in this Profile HMMs (not discussed in this

exercise)exercise)

Page 38: Before we begin…

PSI-BLASTPSI-BLAST

PPosition osition SSpecific pecific IIterated BLASTterated BLAST

Regular blast

Construct profile from blast results

Blast profile search

Final results

Page 39: Before we begin…

PSI-BLASTPSI-BLAST

Advantage:Advantage: PSI-BLAST looks for seq’s PSI-BLAST looks for seq’s that are close to the query, and learns that are close to the query, and learns from them to extend the circle of friendsfrom them to extend the circle of friends

Disadvantage:Disadvantage: if we obtained a WRONG if we obtained a WRONG hit, we will get to unrelated sequences hit, we will get to unrelated sequences (contamination). This gets worse and (contamination). This gets worse and worse each iterationworse each iteration

Page 40: Before we begin…

BLAST – PSI-BlastBLAST – PSI-Blast

Page 41: Before we begin…

PSI-Blast - resultsPSI-Blast - results


Recommended