Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | justina-dixon |
View: | 219 times |
Download: | 3 times |
Michael Schroeder BioTechnological CenterTU Dresden Biotec
Introduction
based onChapter 1
Lesk, Introduction to Bioinformatics
By Michael Schroeder, Biotec, 2
Contents
Molecular biology primer The role of computer science Phylogeny Sequence Searching
By Michael Schroeder, Biotec, 3
23 June 2000: Draft of Human genome sequenced!
1953: Watson and Crick discover the structure of DNA 2000: Draft of human genome is published
“The most wondrous map ever produced by human kind” “One of the most significant scientific landmarks of all
time, comparable with the invention of the wheel or the splitting of the atom”
By Michael Schroeder, Biotec, 4
High-throughput biomedicine
Microarrays Measure activity of thousands of genes at the same time Example:
Cancer Compare activity with and without drug treatment Result: Hundreds of candidate drug targets
RNAi (Noble prize 2004, Fire and Mello) Knock-down genes and observe effect Example:
Infectious diseases Which proteins orchestrate entry into cell? Result: Hundreds of candidate proteins
Atomic force microscopes (Noble prize Binnig) Pull protein out of membrane and measure force Example:
Eye diseases resulting fomr misfolding Result: Hundreds of candidate residues
By Michael Schroeder, Biotec, 5
Drug Discovery
Challenge: Longer time to market, fewer drugs, exploding costs
Approach: Use of compound libraries and high-throughput screening
By Michael Schroeder, Biotec, 6
HTS and Bioinformatics
High-throughput technologies have completely changed the work of biomedical researchers
Challenge: Interpret (often large) results of screens
Approach: Before running secondary assays use bioinformatics and IT to assemble all possible information
By Michael Schroeder, Biotec, 7
Good News
10 thousands of 3D Structures
Millions ofSequences
Millions ofArticles
Hundreds of DBs/Tools
By Michael Schroeder, Biotec, 8
Bad News: Data != Knowledge
How to analyse data, how to integrate data?
Comptuer science to the rescue…
By Michael Schroeder, Biotec, 9
Examlpe: computer science is key for sequencing
Human genome is a string of length 3.200.000.000 Shotgun sequencing: Break multiple copies of string
into shorter substrings Example:
shotgunsequencing shotgunsequencing shotgunsequencing
cing en encing equ gun ing ns otgu seq sequ sh sho shot tg uenc un
Computing problem: Assemble strings
By Michael Schroeder, Biotec, 10
Computer science key for sequencing
sh sho shot otgu tg gun un ns seq sequ equ uenc encing en cing ing
QUESTION: How can you handle long repetitive sequences?
Heeeeelllllllllllooooooo
QUESTION: Why was a draft announced? When was the finalversion ready?
By Michael Schroeder, Biotec, 11
Arabidopsis thaliana
mouse
rat
Caenorhabitis elegans
Drosophilamelanogaster
Mycobacteriumleprae
Vibrio cholerae
Plasmodiumfalciparum
Mycobacteriumtuberculosis
Neisseria meningitidis
Z2491
Helicobacter pylori
Xylella fastidiosa
Borrelia burgorferi
Rickettsia prowazekii
Bacillus subtilis
Archaeoglobusfulgidus
Campylobacter jejuni
Aquifex aeolicus
Thermotoga maritima
Chlamydiapneumoniae
Pseudomonasaeruginosa
Ureaplasmaurealyticum
Buchnerasp. APS
Escherichia coli
Saccharomycescerevisiae
Yersinia pestis
Salmonellaenterica
Thermoplasmaacidophilum
By Michael Schroeder, Biotec, 12
DNA – the molecule of life
http://www.ornl.gov/hgmis
By Michael Schroeder, Biotec, 13
The genetic code
By Michael Schroeder, Biotec, 14
Protein Structure
DNA: Nucleotides are very similar
and hence the structure of DNA is very uniform
Proteins: Great variety in three-
dimensional conformation to support diverse structure and functions
If heated, protein “unfolds” to biologically-inactive structure; in normal conditions protein folds
By Michael Schroeder, Biotec, 15
Paradox
Translation from DNA sequence to amino acid sequence is very simple to describe, but requires immensely complicated machinery
(ribosome, tRNA) The folding of the protein sequence into its three-
dimensional structure is very difficult to describe But occurs spontaneously
By Michael Schroeder, Biotec, 16
Central Dogma
DNA sequence determines protein sequence Protein sequence determines protein structure Protein structure determines protein function
By Michael Schroeder, Biotec, 17
Sequence vs. structure similarity
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
By Michael Schroeder, Biotec, 18
Sequence vs. structure similarity
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
High sequence similarity = high structure similary
By Michael Schroeder, Biotec, 19
Sequence vs. structure similarity
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
Low sequence similarityusuallylow structure similarity
By Michael Schroeder, Biotec, 20
Sequence vs. structure similarity
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
Low sequence similarity possibly stillhigh structure similary
11% sequence identity, structure perfectly match
By Michael Schroeder, Biotec, 21
Sequence similarity is key concept
Similar sequences are a hint for common ancestry and possibly similar function
Sequence similarity is key concept
Similar sequences are a hint for common ancestry and possibly similar function
Sequence similarity is key conceptExample: v-sys vs. PDGF
Example from early 80s: V-sys in simian sarcoma virus leads to cancer in infected cells PDGF in humans is a normal growth factor for cells V-sys and PDGF are 85% similar
Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf
Sequence similarity is key concept
If an unknown sequence is found, deduce its function/structure indirectly by finding similar sequences, whose function/structure is known
Assumption: Evolution changes sequences “slowly” often maintaining main features of a sequence’s function/structure
Sequence similarity is key concept
Similar sequences are a hint for common ancestry and possibly similar function
Sequence is hint for evolutionary relationship
By Michael Schroeder, Biotec, 28
How similar are sequences?
>sp|P00674|RNP_HORSE Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Equus caballus (Horse).
KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQKNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHFDASVEVST
>sp|P00673|RNP_BALAC Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Balaenoptera acutorostrata (Minke whale) (Lesser rorqual).
RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQKNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHFDNSV
>sp|P00686|RNP_MACRU Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Macropus rufus (Red kangaroo) (Megaleia rufa).
ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEGQYVPVHFDAYV
By Michael Schroeder, Biotec, 29
Multiple Alignment with ClustalW (www.ebi.ac.uk/clustalw)
CLUSTAL W (1.82) multiple sequence alignmensp|P00674|RNP_HORSEsp|P00673|RNP_BALACsp|P00686|RNP_MACRU
KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60 -ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ 59 *:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* *
KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118:*: ****::***:*.* : **:** *..****** *:**: :::******* ******
DASVEVST 128 DNSV---- 124 DAYV---- 122 * *
By Michael Schroeder, Biotec, 30
Example: Number of Aligned Residues
Horse and Minke whale: 95 Minke whale and Red kangoroo: 82 Horse and Red kangoroo: 75
Conclusion: Horse and whale share the most identical residues
Horse and whale are placental, kangaroo is marsupial
By Michael Schroeder, Biotec, 31
Example: Elephant and Mammoth
Mitochondrial cytochrome b from Siberian woolly mammoth
(Mammuthus primigenius) preserved in arctic perma frost
African elephant (Loxodonta africana) Indian elephant (Elephans maximus)
By Michael Schroeder, Biotec, 32
Indian elephant: sp|P24958|CYB_LOXAF Mammoth: sp|P92658|CYB_MAMPR African elephant: sp|O47885|CYB_ELEMA
MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60MTHIRKSHPLLKILNKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60MTHTRKFHPLFKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60*** ** ***:**:**********************************************
TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120************************************************************
LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA 180LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFA 180LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA 180**************************************:*********************
LHFILPFTMIALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLL 240LHFILPFTMIALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILFLL 240FHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLL 240:********:***********************************************:**
LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILI 300LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILI 300LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSILI 300******************************************************:*****
LGLMPLLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEYPYIIIGQMASILYFS 360LGIMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEYPYIIIGQMASILYFS 360LGLMPLLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEHPYIIIGQMASILYFS 360**:*************************: *** **********:***************
IILAFLPIAGVIENYLIK 378IILAFLPIAGMIENYLIK 378IILAFLPIAGMIENYLIK 378**********:*******
By Michael Schroeder, Biotec, 33
Example: Elephant and Mammoth
Mammoth and African elephant have 10 mismatches, mammoth and Indian elephant 14.
Significant?
By Michael Schroeder, Biotec, 34
Similarity and Homology
Important difference: Similarity is the measurement of resemblance of
sequences Homology: common ancestor
Similarity is gradual, homology is either true or false Similarity = now, homology = past events Homology is only very rarely directly observed (e.g. lab
population, clinical study of viral infection)
Homology is inferred from sequence similarity
By Michael Schroeder, Biotec, 35
Homology = derived from common ancestor
Characteristics derived from a common ancestor are called homologous
E.g. eagle’s wing and human’s arm
Other apparently similar characteristics may have arisen independently by convergent evolution
E.g. eagle’s wing and bee’s wing. The most common ancestor of eagles and bees did not have wings
Homologous characters may diverge functionally E.g. bones in human middle and jaws of primitive fish
By Michael Schroeder, Biotec, 36
Example: Homology/Similarity
The assertion that the cytocrome b sequences are homologues means that there is a common ancestor
BUT: 1. Maybe cytochrome b functionally requires so many
conserved residues and will hence occur in many species ( In fact, This is not the case here)
2. Maybe cytochrome b has to function this way in elephant-like species, but in fact started out from different ancestors (i.e. convergent evolution)
3. Maybe mammoth and African elephant have only fewer mismatches, because Indian elephant’s DNA mutated faster
4. Maybe all of them acquired cytochrome b through a virus (horizontal gene transfer)
Similarity vs. Homology
Any sequence can be similar Sequences homologues if evolved from common
ancestor Homologous sequences:
Orthologs: similar biological function Paralogs: different biological function (after gene
duplication), e.g. lysozyme and α-lactalbumin, a mammalian regulatory protein
Assumption: Similarity indicator for homology Note, altered function of the expressed protein will
determine if the organism will survive to reproduce, and hence pass on the altered gene
Sequence similarity is key concept
How similar are two sequences?How to align the sequences?How to align multiple sequences?How to find motifs?
By Michael Schroeder, Biotec, 39
Sequence alignment
Global match: align all of one with all of the other sequence (mismatches, insertions, deletions) And.--so,.from.hour.to.hour.we.ripe.and.ripe|||| |||||||||||||||||||||||| ||||||And.then,.from.hour.to.hour.we.rot-.and.rot-
Local match: find region in one sequence that matches the other (mismatches, insertions, deletions ; ends can be ignored) My.care.is.loss.of.care,.by.old.care.done, ||||||||| ||||||||||||| |||||| ||Your.care.is.gain.of.care,.by.new.care.won
By Michael Schroeder, Biotec, 40
Sequence alignment
Motif search: find matches of short sequence in long sequence Option:
perfect, 1 mismatch, mismatches+gaps+insertions+deletions
match ||||for the watch to babble and to talk is most tolerable
By Michael Schroeder, Biotec, 41
Sequence alignment
Multiple sequence alignment
No.sooner.---met.--------.but.they.look’d
No.sooner.look’d.--------.but.they.lo-v’d
No.sooner.lo-v’d.--------.but.they.sigh’d
No.sooner.sigh’d.--------.but.they.--asked.one.another.the.reason
No.sooner.knew.the.reason.but.they.-------------sought.the.remedy
No.sooner. .but.they.
By Michael Schroeder, Biotec, 42
Quick check
By now you should Know the main data sources (sequence and structure) Know the role that bioinformatics plays Understand the difference between homology and similarity Understand what sequence comparison and alignment are