+ All Categories
Home > Documents > Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5...

Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5...

Date post: 02-Apr-2015
Category:
Upload: anahi-morell
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
88
Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, cha Details on SCOP and CATH can be fou Structural Bioinformatics, Bourne/Weissig, chapter 12
Transcript
Page 1: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

Michael Schroeder BioTechnological CenterTU Dresden Biotec

Protein Structure

Lesk, chapter 5Details on SCOP and CATH can be found in

Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13

Page 2: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 2

Folding Proteins are linear polymer

mainchains with different amino acid side chains

Proteins fold spontaneously reaching a state of minimal energy Side and main chains

interact with one another and with solvent

Example movie

Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NK-Lysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1, 185-191

Page 3: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 3

Examining Proteins

Specialised tools with different views of structure Corey, Pauling, Koltun

(CPK) Diameter of sphere ~

atomic radius Hydrogen white,

carbon grey, nitrogen blue, oxygen red, sulphur yellow

Cartoon Wire Balls

Page 4: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 4

Examining Proteins

Page 5: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 5

Protein Folding

Residue

Image taken from www.expasy.org/swissmod/course

Conformation of residue Rotation around N-Ca bond, (phi) Rotation around Ca-C bond, (psi) Rotation around peptide bond (omega)

Peptide bond tends to be planar and in one of two states:

trans 180 (usually) and cis, 0 (rarely, and mostly proline)

Page 6: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 6

Sasisekharan-Ramakrishnan-Ramachandran plot

Solid line = energetically preferred

Outside dotted line = disallowed

Most amino acids fall into R region (right-handed alpha helix) or -region (beta-strand)

Glycine has additional conformations (e.g. left-handed alpha helix = L region) and in lower right panel

Image taken from www.expasy.org/swissmod/course

Page 7: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 7

Ramachandran plot

Plot for a protein with mostly beta-sheets

Example for conformations

Image taken from www.expasy.org/swissmod/course

Page 8: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 8

Helices and Strands

Consecutive residues in alpha or beta conformation generate alpha-helices and beta-strands, respectively

Such secondary structure elements are stabilised by weak hydrogen bonds

They are by turns or loops, regions in which the chain alters direction

Turns are often surface exposed and tend to contain charged or polar residues

Page 9: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 9

Alpha Helix

Residue j is hydrogen-bonded to residue j+4

3.6 residues per turn 1.5A rise per turn Repeat every 3.6*1.5A = 5.4 A = -60 , = -45

Image taken from www.expasy.org/swissmod/course

Page 10: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 10

Beta strand

Image taken from www.expasy.org/swissmod/course

Page 11: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 11

Beta Sheets

Image taken from www.expasy.org/swissmod/course

Page 12: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 12

Turn Residue j is bonded to

residue j+3

Often proline and glycine

Image taken from www.expasy.org/swissmod/course

Page 13: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 13

How to Fold a Structure All residues must have stereochemically allowed

conformations Buried polar atoms must be hydrogen-bonded

If a few are missed, it might be energetically preferable to bond these to solvent

Enough hydrophobic surface must be buried and interior must be sufficiently densely packed

There is evidence, that folding occurs hierarchically: First secondary structure elements, then super-secondary,…

This justifies hierarchic approach when simulating folding

Page 14: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 14

Structure Alignment

+

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 15: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 15

Structure Alignment

+

Page 16: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 16

Structure Alignment

In the same way that we align sequences, we wish to align structure

Let’s start simple: How to score an alignment Sequences: E.g. percentage of matching residues Structure: rmsd (root mean square deviation)

Page 17: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 17

Root Mean Square Deviation

What is the distance between two points a with coordinates xa and ya and b with coordinates xb and yb? Euclidean distance:

d(a,b) = √ (xa--xb )2 + (ya -yb )2 + (za -zb )2

a

b

Page 18: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 18

Root Mean Square Deviation

In a structure alignment the score measures how far the aligned atoms are from each other on average

Given the distances di between n aligned atoms, the root mean square deviation is defined as

rmsd = √ 1/n ∑ di2

Page 19: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 19

Quality of Alignment and Example Unit of RMSD => e.g. Ångstroms

Identical structures => RMSD = “0” Similar structures => RMSD is small (1 – 3 Å) Distant structures => RMSD > 3 Å

Structural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A

Page 20: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 20

Pitfalls of RMSD

all atoms are treated equally(e.g. residues on the surface have a higher degree of freedom than those in the core)

best alignment does not always mean minimal RMSD

significance of RMSD is size dependent

From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650

Page 21: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 21

Alternative RSMDs

aRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms

bRMSD = the RMSD over the highest scoring residue pairs

wRMSD = weighted RMSD

Source: W. Taylor(1999), Protein Science, 8: 654-665.http://www.prosci.uci.edu/Articles/Vol8/issue3/8272/8272.html#relat

From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650

Page 22: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 22

Computing Structural Alignments DALI (Distance-matrix-ALIgnment) is one of the first tools for structural

alignment How does it work?

Atoms: Given two structures’ atomic coordinates

Compute two distance matrices: Compute for each structure all pairwise inter-atom distances.

This step is done as the computed distances are independent of a coordinate system

The two original atomic coordinate sets cannot be compared, the two distance matrices can

Align two distance matrices: Find small (e.g. 6x6) sub-matrices along diagonal that match Extend these matches to form overall alignment

This method is a bit similar to how BLAST works.

SSAP (double dynamic programming) in term 3.

Page 23: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 23

DALI Example

The regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red):

Page 24: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 24

Protein zinc finger (4znf)

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 25: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 25

Superimposed 3znf and 4znf

30 CA atoms RMS = 0.70Å248 atoms RMS = 1.42Å

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Lys30

Page 26: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 26

Superimposed 3znf and 4znf backbones

30 CA atoms RMS = 0.70Å

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 27: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 27

RMSD vs. Sequence Similarity At low sequence identity, good structural

alignments possible

Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt

Page 28: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 28

Structure Classification

Page 29: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 29

Why classify structures?

Structure similarity is good indicator for homology, therefore classify structures

Classification at different levels Similar general folding patterns (structures not

necessarily related) Possibly low sequence similarity, but similar structure

and function implies very likely homology High sequence similarity implies similar structures

and homology Classification can be used to investigate

evolutionary relationships and possibly infer function

Page 30: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 30

Structure Classification

SCOP: Structural Classification of Proteins Hand curated (Alexei Murzin, Cambridge) with some

automation CATH: Class, Architecture, Topology, Homology

Automated, where possible, some checks by hand FSSP: Fold classification based on Structure-

Structure alignment of Proteins Fully automated

Reasonable correspondance (>80%)

Page 31: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 31

Evolutionary Relation

Strong sequence similarity is assumed to be sufficient to infer homology

Close structural and functional similarity together are also considered sufficient to infer homology Similar structure alone not sufficient, as proteins may have

converged on structure due to physiochemical necessity Similar function alone not sufficient, as proteins may have

developed it due to functional selection In general, structure is more conserved than sequence

Beware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect

Page 32: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 32

What is a domain? Single and Multi-Domain Proteins

Page 33: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 33

What is a domain?

Functional: Domain is “independent” functional unit, which occurs in more than one protein

Physiochemical: Domain has a hydrophobic core

Topological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal

Difficult to exactly define domain Difficult to agree on exact domain border

Page 34: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 34

Domains re-occur

A domain re-occurs in different structures and possibly in the context of different other domains

P-loop domain in 1goj: Structure Of A

Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules  Motor Protein (single domain)

1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains)

Page 35: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 35

Domains re-occur

1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain

1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain

Page 36: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 36

Domains have hydrophobic core

Kyte J., Doolittle R.F, J. Mol. Biol. 157:105-132(1982).

Hydrophobicity Plot for 1GOJ Kinesin Motor

-3

-2

-1

0

1

2

3

1 51 101 151 201 251 301

Residue

Hydrophobicity

Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200

Page 37: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 37

Intra-domain distances minimal

Distances between atoms within domain are minimal

Distances between atoms of two different domains are maximal

Page 38: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 38

PDB, Proteins, and Domains

Ca. 20.000 structures in PDB 50% single domain 50% multiple domain 90% have less than 5 domains

Distribution of Number of Domains

-2000

0

2000

4000

6000

8000

10000

0 10 20 30 40 50 60

Number of Domains

Frequency

Dom# Freq.

1 8464

2 4358

3 926

4 1888

5 148

6 624

7 42

8 491

9 22

10 58

30 7

31 1

32 16

36 1

40 8

42 1

48 3

49 1

Page 39: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 39

A structure with 49 domains 1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7

Page 40: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 40

SCOP: Structural Classification of Proteins

FOLD

CLASS top

SUPERFAMILY

FAMILY

C1 set domains (antibody constant)

V set domains (antibody variable)

All alpha (218) All Beta (144) Alpha/Beta (136)Alpha+Beta (279)

Trypsin-like serine proteases (1) Immunoglobulin-like (23)

Transglutaminase (1) Immunoglobulin (6)

Page 41: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 41

Class

All alpha (possibly small beta

adornments)

All beta (possibly small alpha

adornments)

Page 42: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 42

Class Alpha/beta (alpha and beta) =

single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next subclass: beta sheet forming barrel

surrounded by alpha helices sublass: central planar beta sheet

Alpha+beta (alpha plus beta) = Alpha and beta units are largely separated Strands joined by hairpins leading

to antiparallel sheets

Page 43: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 43

Class

Multi-domain proteins have domains placed in

different classes domains have not been

observed elsewhere

E.g. 1hle

Page 44: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 44

Class

Membrane (few and most unique) and cell surface proteins E.g. Aquaporin 1ih5

Page 45: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 45

Class

Small Proteins E.g. Insulin, 1pid

Page 46: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 46

Class

Coiled coil proteins E.g. 1i4d, Arfaptin-Rac

binding fragment

Page 47: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 47

Class

Low-resolution structures, peptides, designed proteins

E.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain

Page 48: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 48

Fold, Superfamily, Family

Fold Common core structure

i.e. same secondary structure elements in the same arrangement with the same topological structure

Superfamily Very similar structure and function

Family Sequence identity (>30%) or extremely similar

structure and function

Page 49: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 49

Distribution (2007)

Class Fold Superfamily Family

All alpha 259 459 772

All beta 165 331 679

Alpha/beta 141 232 736

Alpha+beta 334 488 897

Multidomain 53 53 74

Membrane and cell surface

50 92 104

Small proteins 85 122 202

Total 1086 1777 3464

Page 50: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 50

Uses of SCOP

Automatic classification Understanding of protein enzymatic function Use superfamily and fold to study distantly related

proteins Study sequence and structure variability Derive substitution matrices for sequence

comparison Extract structural principles for design Study decomposition of multi domain proteins Estimate total number of folds Derived databases

Page 51: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 51

PDB, Proteins, Domains revisited

80% of PDB have only one type of SCOP superfamily

15% of PDB have two different SCOP superfamilies

Frequency of Number of SCOP Superfamilies

-2000

02000

4000

60008000

10000

1200014000

16000

0 5 10 15 20 25

Number of Superfamilies

Frequency

sfNo sfNoFreq

1 13960

2 2721

3 495

4 178

5 33

6 25

7 1

9 4

20 9

21 1

22 1

23 6

Page 52: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 52

A structure with 23 different

superfamilies

1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome

Page 53: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 53

The 20 Most Frequently Occurring

Superfamilies

Suyperfamily SCOP ID #PDB

Immunoglobulin b.1.1 823

Lysozyme-like d.2.1 777

Trypsin-like serine proteases b.47.1 649

P-loop containing nucleotide triphosphate hydrolases c.37.1 521

NAD(P)-binding Rossmann-fold domains c.2.1 384

Globin-like a.1.1 384

(Trans)glycosidases c.1.8 332

Acid proteases b.50.1 288

Concanavalin A-like lectins/glucanases b.29.1 230

Thioredoxin-like c.47.1 217

EF-hand a.39.1 212

alpha/beta-Hydrolases c.69.1 195

Cupredoxins b.6.1 178

Ribonuclease H-like c.55.3 178

PLP-dependent transferases c.67.1 176

Periplasmic binding protein-like II c.94.1 171

Carbonic anhydrase b.74.1 169

Metalloproteases (\zincins\"), catalytic domain" d.92.1 169

FAD/NAD(P)-binding domain c.3.1 162

Cytochrome c a.3.1 161

Page 54: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 54

CATH

Class secondary structure

composition Architecture

orientation in 3D Topology

connectivity Homology

Grouped by evidence for homology (sequence, structure and function)

Page 55: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 55

Generating CATH

1. Identify close relatives by pairwise sequence alignment

2. Detect more distant relatives using 2a. sequence profiles and 2b. structure alignment

3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries

4. Try 2. and 3. again 5. If still unclassified assign manually

Page 56: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 56

CATH step 1: Sequence-based Identification of

Homologues Structures

> 30% sequence similarity implies similar structure

Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage

Reminder…

Page 57: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 57

Hierarchical Clustering

(1,2) 3 (4,5)

(1,2) 0 5 8

3 0 4

(4,5) 0

1 2 3 4 5

1 0 2 6 10 9

2 0 5 9 8

3 0 4 5

4 0 3

5 0

(1,2) 3 4 5

(1,2) 0 5 9 8

3 0 4 5

4 0 3

5 0

(1,2) (3,(4,5))

(1,2) 0 5

(3,(4,5)) 0

5

4

3

2

1

0

1 2 3 4 5

Page 58: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 58

Hierarchical Clustering: How to define distance between clusters?

Single linkage: Minimum Example: Distance (A,B) to C is 1

Complete linkage: Maximum Example: Distance (A,B) is C is 2

Average linkage: Average Example: Distance (A,B) to C is 1.5

Are dendrograms always the same independent of the linkage method?

0C

10B

210A

CBA

A B C A B C

Page 59: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 59

Hierarchical Clustering: Chaining Beware of chaining

when using single linkage

As nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different

A B C D … Z

A 0 1 2 3 … 25

B 0 1 2 … 24

C 0 1 … 23

D 0 … 22

… …

Z 0

A B C D … Z

Page 60: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 60

CATH and single linkage

It is argued that structural data is quite sparse, hence it cannot be expected that all cluster

members will be very similar (in terms of sequence) to each other,

so that the chaining effect is even useful

Page 61: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 61

CATH step 2a:

Profile-based methods such as PSI-BLAST are used to detect distant relatives

Build profiles using all sequence data available (rather than only sequences for which structure exists)

This increases quality of profiles dramatically 51% distant relatives retrieved using profiles based on

sequences with known structure only 82% distant relatives retrieved using profile based on

all sequences

Page 62: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 62

CATH step 2b: Structure-based methods to detect distant relatives

For ca. 15% of structures, sequence-based method does not work Example: For globins sequence similarity can fall

below 10%, yet structure and function (oxygen-binding) are preserved

Use SSAP, the Sequential Structure Alignment Program

Page 63: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 63

Clustering Result of Structure Alignment

Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage

Page 64: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 64

Improving Efficiency: GRATH

Screening large structures (>300 residues) against database can take days

Idea of GRATH (Graphical Representation of CATH): Improve efficiency by filtering at a higher level before doing

detailed comparison Represent protein as graph where

Nodes are secondary structure elements represented as their midpoint, tilt, and rotation

Edges distances between midpoints of secondary structure elements

Use algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one) Yes, then do detailed comparison using SSAP

Page 65: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 65

Structure Prediction and Modelling

Page 66: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 66

Structure Prediction:Four Main Problem Areas

Given a sequence with unknown structure, predict its structure

Secondary structure prediction Predict regions of helices and strands

Homology modelling Predict structure from known structures of one or more related

proteins Fold recognition

Given a library of structures, determine which one (if any) is the fold of the given sequence

Prediction of novel folds: A-priori and knowledge-based methods

Page 67: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 67

Structure Prediction of Novel Folds: Two Approaches

A priori: Most approaches aim to reproduce inter-atomic

interactions by defining an energy function and trying to find global minimum

Problem: Inadequacy of the energy function Algorithms get stuck in local minima

Knolwedge-based: Find similarities to known structures or sub-

structures

Page 68: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 68

Secondary Structure Prediction A successful tool for secondary structure prediction is PROF PROF uses a neural networks to learn secondary structure from

known structures ¾ of PROF’s prediction are correct At CASP 2000 it predicted e.g. the following

|10 |20 |30 |40 |50Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTGPrediction HH------------EEE------HHHHHHHHHH-HHHHHHHHHHHHHHH-Experiment -E-------------E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH-

|60 |70 |80 | 90 |100IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK--EEEEEEEEEEEEEEEE-----------EEEEEEEE—-EEEE-HHHHHH----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH

|110 |120EREVYRLEALIRRREEEVFLEVRERAKRQHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHHHHHHHHHHHHHH--

Page 69: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 69

PROF’s prediction The regions

predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture

Page 70: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 70

Homology modelling Define the model of an unknown structure by making

minimal changes to a relative with known structure

Align amino acid sequences of target and one or more known structures Insertions and deletions should be in loop regions

Determine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure

Replace the sidechains of the residues that have been mutated

Examine the model (by hand and computationally) to detect collisions between atoms

Refine the model by limited energy minimisation

Page 71: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 71

Accuracy of Homology Modelling

Works for >40-50% sequence similarity Example: SWISS-MODEL Prediction of neurotoxin of red

scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)

Page 72: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 72

Fold Recognition: 3D Profiles

Given a sequence determine which (if any) fold is most similar Can we build profiles to represent structures of similar fold

(similar to sequence profiles)? 3D profiles:

Classify the environment of each residue Secondary structure:

Is it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions)

Surface exposure: <40A2, 40-114A2, or >114A2 accessible surface area

Polar or non-polar nature of environment Total of 18 residue classes, one of which each residue is part of Sequence of these residue classes is 3D profile

Page 73: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 73

3D Profiles and Alignments Structure-Structure Alignment:

3D profiles of two known structures can be aligned against each other Sequence-Structure Alignment:

Based on existing 3D profiles, probability can be determined for a residue occurring in a residue class.

Using this probability, we can assign 3D profile to a sequence And hence align the sequence 3D profile to a structure 3D profile

For correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well

However, other proteins may score even better

If a structure does not match its own 3D profile well it is likely that there is an error in the structure determination

Page 74: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 74

Threading

Pull query sequence through known structure and rate the score

Necessary: Method to score the

models to select best one Method to calibrate the

scores to decide which of the best is correct

Homology modelling

Threading

Identify homologues

Try all possible parents

Determine optimal alignment

Try many alignments

Optimize one model

Evaluate many rough models

Page 75: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 75

Scoring for Threading

Empirical patterns of residue neighbours derived from known structures

Observe distribution of inter-residue distances for all 20 x 20 residue pairs

Derive probability distribution as function of distance in space and on sequence

Boltzmann equation relates probability and energy Reverse this and derive energy function from

probability distribution

Page 76: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 76

Threading the sequence

template

Target

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 77: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 77

“Threaded” sequence

Yellow = adrenergic receptor sequenceBlue = adrenergic receptor (PDB 1F88 )

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 78: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 78

Modeled structure

Gaps

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 79: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 79

Corrected Model

Slides from Hanekamp, University of Wyoming, www.uwyo.edu

Page 80: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 80

Ab initio Structure Prediction

Page 81: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 81

Molecular dynamics

Structure prediction = place atoms so that interactions between them create a unique state of maximum stability

Problem: Model of inter-atomic distances is not complete Computational scale:

Large number of variables and massive search space Non-linearities Rough energy surface with many local minima

Page 82: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 82

Conformational energy calculations

Bond stretching: Bond angle bend Torsion angle (e.g. , , ) Van der Waals interactions

Short-range repulsion ~R-12 and long-range attraction ~R-6, where R is the inter-atom distance

Hydrogen bond Weak chemical/electrostatic interaction, ~R-12 and ~R-10

Electrostatics Charges on atoms

Solvent Interactions with water, salt, sugar, etc.

Page 83: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 83

Rosetta

Predicts structure by first generating structures of fragments using known structures (3-9 residues)

Combine fragments using Monte Carlo simulation using an energy function with terms for Paired beta-sheets Burial of hydrophobic residues

Carries out 1000 simulations Results are clustered and the centre of the largest

cluster is presented as prediction

Demo

Page 84: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 84

ROSETTA The program ROSETTA, by D. Baker and colleagues,

can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction

Page 85: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 85

Rosetta

Prediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction

Page 86: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 86

LINUS Another programme with similar idea Prediction by LINUS (program by G.D. Rose and R. Srinivasan) of C-

terminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction

Page 87: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 87

Monte Carlo Simulation Objective: Find conformation with minimal energy Problem: Avoid local minima

Algorithm: 1. Generate a random initial conformation x 2. Perturb conformation x to generate a neighbouring conformation x’ 3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ 4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept

x’ as new conformation and go to 2. 5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’

as new conformation with probability p 6. The probability p to accept uphill moves is reduced with every step 7. Go to step 2.

Step 1.-4. make sure that we “walk” downhill towards a minimum Step 5.-7. make sure that if we are in local minimum there is a chance to get out

of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill

Page 88: Michael Schroeder BioTechnological Center TU Dresden Biotec Protein Structure Lesk, chapter 5 Details on SCOP and CATH can be found in Structural Bioinformatics,

By Michael Schroeder, Biotec, 88

Summary You should know now

What helices, strands, sheets are What a Ramachandran plot is How to score a structural alignment (rmsd) How to compute a structural alignment How a domain can be characterised Why structure classification is useful What the main structure classes are How classifications can be generated automatically What the problems are What secondary structure prediction, homology modelling, threading,

ab-initio and knowledge-based structure prediction of novel folds are

Visit PDB, SCOP and CATH websites and Read chapter 5


Recommended