+ All Categories
Home > Documents > Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk,...

Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk,...

Date post: 28-Dec-2015
Category:
Upload: justina-dixon
View: 219 times
Download: 3 times
Share this document with a friend
42
Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based Chapter Lesk, Introduction to Bioinformati
Transcript
Page 1: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Michael Schroeder BioTechnological CenterTU Dresden Biotec

Introduction

based onChapter 1

Lesk, Introduction to Bioinformatics

Page 2: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 2

Contents

Molecular biology primer The role of computer science Phylogeny Sequence Searching

Page 3: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 3

23 June 2000: Draft of Human genome sequenced!

1953: Watson and Crick discover the structure of DNA 2000: Draft of human genome is published

“The most wondrous map ever produced by human kind” “One of the most significant scientific landmarks of all

time, comparable with the invention of the wheel or the splitting of the atom”

Page 4: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 4

High-throughput biomedicine

Microarrays Measure activity of thousands of genes at the same time Example:

Cancer Compare activity with and without drug treatment Result: Hundreds of candidate drug targets

RNAi (Noble prize 2004, Fire and Mello) Knock-down genes and observe effect Example:

Infectious diseases Which proteins orchestrate entry into cell? Result: Hundreds of candidate proteins

Atomic force microscopes (Noble prize Binnig) Pull protein out of membrane and measure force Example:

Eye diseases resulting fomr misfolding Result: Hundreds of candidate residues

Page 5: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 5

Drug Discovery

Challenge: Longer time to market, fewer drugs, exploding costs

Approach: Use of compound libraries and high-throughput screening

Page 6: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 6

HTS and Bioinformatics

High-throughput technologies have completely changed the work of biomedical researchers

Challenge: Interpret (often large) results of screens

Approach: Before running secondary assays use bioinformatics and IT to assemble all possible information

Page 7: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 7

Good News

10 thousands of 3D Structures

Millions ofSequences

Millions ofArticles

Hundreds of DBs/Tools

Page 8: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 8

Bad News: Data != Knowledge

How to analyse data, how to integrate data?

Comptuer science to the rescue…

Page 9: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 9

Examlpe: computer science is key for sequencing

Human genome is a string of length 3.200.000.000 Shotgun sequencing: Break multiple copies of string

into shorter substrings Example:

shotgunsequencing shotgunsequencing shotgunsequencing

cing en encing equ gun ing ns otgu seq sequ sh sho shot tg uenc un

Computing problem: Assemble strings

Page 10: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 10

Computer science key for sequencing

sh sho shot otgu tg gun un ns seq sequ equ uenc encing en cing ing

QUESTION: How can you handle long repetitive sequences?

Heeeeelllllllllllooooooo

QUESTION: Why was a draft announced? When was the finalversion ready?

Page 11: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 11

Arabidopsis thaliana

mouse

rat

Caenorhabitis elegans

Drosophilamelanogaster

Mycobacteriumleprae

Vibrio cholerae

Plasmodiumfalciparum

Mycobacteriumtuberculosis

Neisseria meningitidis

Z2491

Helicobacter pylori

Xylella fastidiosa

Borrelia burgorferi

Rickettsia prowazekii

Bacillus subtilis

Archaeoglobusfulgidus

Campylobacter jejuni

Aquifex aeolicus

Thermotoga maritima

Chlamydiapneumoniae

Pseudomonasaeruginosa

Ureaplasmaurealyticum

Buchnerasp. APS

Escherichia coli

Saccharomycescerevisiae

Yersinia pestis

Salmonellaenterica

Thermoplasmaacidophilum

Page 12: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 12

DNA – the molecule of life

http://www.ornl.gov/hgmis

Page 13: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 13

The genetic code

Page 14: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 14

Protein Structure

DNA: Nucleotides are very similar

and hence the structure of DNA is very uniform

Proteins: Great variety in three-

dimensional conformation to support diverse structure and functions

If heated, protein “unfolds” to biologically-inactive structure; in normal conditions protein folds

Page 15: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 15

Paradox

Translation from DNA sequence to amino acid sequence is very simple to describe, but requires immensely complicated machinery

(ribosome, tRNA) The folding of the protein sequence into its three-

dimensional structure is very difficult to describe But occurs spontaneously

Page 16: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 16

Central Dogma

DNA sequence determines protein sequence Protein sequence determines protein structure Protein structure determines protein function

Page 17: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 17

Sequence vs. structure similarity

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

Page 18: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 18

Sequence vs. structure similarity

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

High sequence similarity = high structure similary

Page 19: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 19

Sequence vs. structure similarity

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

Low sequence similarityusuallylow structure similarity

Page 20: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 20

Sequence vs. structure similarity

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

Low sequence similarity possibly stillhigh structure similary

Page 21: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

11% sequence identity, structure perfectly match

By Michael Schroeder, Biotec, 21

Page 22: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key concept

Similar sequences are a hint for common ancestry and possibly similar function

Page 23: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key concept

Similar sequences are a hint for common ancestry and possibly similar function

Page 24: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key conceptExample: v-sys vs. PDGF

Example from early 80s: V-sys in simian sarcoma virus leads to cancer in infected cells PDGF in humans is a normal growth factor for cells V-sys and PDGF are 85% similar

Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf

Page 25: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key concept

If an unknown sequence is found, deduce its function/structure indirectly by finding similar sequences, whose function/structure is known

Assumption: Evolution changes sequences “slowly” often maintaining main features of a sequence’s function/structure

Page 26: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key concept

Similar sequences are a hint for common ancestry and possibly similar function

Page 27: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence is hint for evolutionary relationship

Page 28: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 28

How similar are sequences?

>sp|P00674|RNP_HORSE Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Equus caballus (Horse).

KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQKNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHFDASVEVST

>sp|P00673|RNP_BALAC Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Balaenoptera acutorostrata (Minke whale) (Lesser rorqual).

RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQKNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHFDNSV

>sp|P00686|RNP_MACRU Ribonuclease pancreatic (EC 3.1.27.5) (RNase 1) (RNase A) - Macropus rufus (Red kangaroo) (Megaleia rufa).

ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEGQYVPVHFDAYV

Page 29: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 29

Multiple Alignment with ClustalW (www.ebi.ac.uk/clustalw)

CLUSTAL W (1.82) multiple sequence alignmensp|P00674|RNP_HORSEsp|P00673|RNP_BALACsp|P00686|RNP_MACRU

KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60 -ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ 59 *:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* *

KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118:*: ****::***:*.* : **:** *..****** *:**: :::******* ******

DASVEVST 128 DNSV---- 124 DAYV---- 122 * *

Page 30: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 30

Example: Number of Aligned Residues

Horse and Minke whale: 95 Minke whale and Red kangoroo: 82 Horse and Red kangoroo: 75

Conclusion: Horse and whale share the most identical residues

Horse and whale are placental, kangaroo is marsupial

Page 31: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 31

Example: Elephant and Mammoth

Mitochondrial cytochrome b from Siberian woolly mammoth

(Mammuthus primigenius) preserved in arctic perma frost

African elephant (Loxodonta africana) Indian elephant (Elephans maximus)

Page 32: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 32

Indian elephant: sp|P24958|CYB_LOXAF Mammoth: sp|P92658|CYB_MAMPR African elephant: sp|O47885|CYB_ELEMA

MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60MTHIRKSHPLLKILNKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60MTHTRKFHPLFKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60*** ** ***:**:**********************************************

TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120************************************************************

LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA 180LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFA 180LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA 180**************************************:*********************

LHFILPFTMIALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLL 240LHFILPFTMIALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILFLL 240FHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLL 240:********:***********************************************:**

LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILI 300LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILI 300LLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSILI 300******************************************************:*****

LGLMPLLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEYPYIIIGQMASILYFS 360LGIMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEYPYIIIGQMASILYFS 360LGLMPLLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEHPYIIIGQMASILYFS 360**:*************************: *** **********:***************

IILAFLPIAGVIENYLIK 378IILAFLPIAGMIENYLIK 378IILAFLPIAGMIENYLIK 378**********:*******

Page 33: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 33

Example: Elephant and Mammoth

Mammoth and African elephant have 10 mismatches, mammoth and Indian elephant 14.

Significant?

Page 34: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 34

Similarity and Homology

Important difference: Similarity is the measurement of resemblance of

sequences Homology: common ancestor

Similarity is gradual, homology is either true or false Similarity = now, homology = past events Homology is only very rarely directly observed (e.g. lab

population, clinical study of viral infection)

Homology is inferred from sequence similarity

Page 35: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 35

Homology = derived from common ancestor

Characteristics derived from a common ancestor are called homologous

E.g. eagle’s wing and human’s arm

Other apparently similar characteristics may have arisen independently by convergent evolution

E.g. eagle’s wing and bee’s wing. The most common ancestor of eagles and bees did not have wings

Homologous characters may diverge functionally E.g. bones in human middle and jaws of primitive fish

Page 36: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 36

Example: Homology/Similarity

The assertion that the cytocrome b sequences are homologues means that there is a common ancestor

BUT: 1. Maybe cytochrome b functionally requires so many

conserved residues and will hence occur in many species ( In fact, This is not the case here)

2. Maybe cytochrome b has to function this way in elephant-like species, but in fact started out from different ancestors (i.e. convergent evolution)

3. Maybe mammoth and African elephant have only fewer mismatches, because Indian elephant’s DNA mutated faster

4. Maybe all of them acquired cytochrome b through a virus (horizontal gene transfer)

Page 37: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Similarity vs. Homology

Any sequence can be similar Sequences homologues if evolved from common

ancestor Homologous sequences:

Orthologs: similar biological function Paralogs: different biological function (after gene

duplication), e.g. lysozyme and α-lactalbumin, a mammalian regulatory protein

Assumption: Similarity indicator for homology Note, altered function of the expressed protein will

determine if the organism will survive to reproduce, and hence pass on the altered gene

Page 38: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

Sequence similarity is key concept

How similar are two sequences?How to align the sequences?How to align multiple sequences?How to find motifs?

Page 39: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 39

Sequence alignment

Global match: align all of one with all of the other sequence (mismatches, insertions, deletions) And.--so,.from.hour.to.hour.we.ripe.and.ripe|||| |||||||||||||||||||||||| ||||||And.then,.from.hour.to.hour.we.rot-.and.rot-

Local match: find region in one sequence that matches the other (mismatches, insertions, deletions ; ends can be ignored) My.care.is.loss.of.care,.by.old.care.done, ||||||||| ||||||||||||| |||||| ||Your.care.is.gain.of.care,.by.new.care.won

Page 40: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 40

Sequence alignment

Motif search: find matches of short sequence in long sequence Option:

perfect, 1 mismatch, mismatches+gaps+insertions+deletions

match ||||for the watch to babble and to talk is most tolerable

Page 41: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 41

Sequence alignment

Multiple sequence alignment

No.sooner.---met.--------.but.they.look’d

No.sooner.look’d.--------.but.they.lo-v’d

No.sooner.lo-v’d.--------.but.they.sigh’d

No.sooner.sigh’d.--------.but.they.--asked.one.another.the.reason

No.sooner.knew.the.reason.but.they.-------------sought.the.remedy

No.sooner. .but.they.

Page 42: Michael Schroeder BioTechnological Center TU Dresden Biotec Introduction based on Chapter 1 Lesk, Introduction to Bioinformatics.

By Michael Schroeder, Biotec, 42

Quick check

By now you should Know the main data sources (sequence and structure) Know the role that bioinformatics plays Understand the difference between homology and similarity Understand what sequence comparison and alignment are


Recommended