Systems Biology: Applications in
pharma research
20 September 2010, TU München
Andrea Schafferhans
Andrea Schafferhans @ TU München
Similar proteins have similar interaction partners
(?)
20 January 2011 Introduction 2
Andrea Schafferhans @ TU München
Applications
• Function prediction
• Drug development – “Target Class” approach – Side effects – “Polypharmacology” / “Network pharmacology”
20 January 2011 Introduction 3
Hopkins,A.L. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4, 682-690.
Andrea Schafferhans @ TU München
Contents
1. Introduction 2. Protein comparison
– Computational binding site identification – Binding site comparison
3. Application examples
20 January 2011 Introduction 4
Andrea Schafferhans @ TU München
Types of protein similarity
• Function
• Sequence – Paralogs – within species
– Orthologs – across species
• Binding sites / interaction patterns
20 January 2011 Protein similarity 5
Andrea Schafferhans @ TU München
What is a binding site?
• Function – Binding other proteins (e.g. signal transduction) – Binding substrates (enzymes) – Binding Co-Factors (e.g. Heme) – …
• Form – Cavity in the protein – CAVE: induced fit / conformational selection more realistic
• Pragmatic – Around all HETATM records in PDB (CAVE: e.g. metals…)
20 January 2011 Protein similarity 6
Andrea Schafferhans @ TU München
Binding site characteristics
• Usually a pocket or cleft in the protein • Less hydrophobic than the interior of a protein • Specific through complementarity of
– Form – Electrostatic interactions – Hydrogen bonds – Hydrophobic interactions
Henrich S, Salo-Ahen OM, Huang B, et al.: Computational approaches to identifying and characterizing protein binding sites for ligand design. Journal of Molecular Recognition 2010, 23:209-219
20 January 2011 Protein similarity 7
Andrea Schafferhans @ TU München
Binding site analysis – Applications
• Automated drug target annotation – E.g. estimation of druggability
(binding site size, hydrophobicity, etc.)
• Virtual screening – Restrict the search space for docking experiments
• Function prediction • Prediction of drug side effects
20 January 2011 Protein similarity 8
Andrea Schafferhans @ TU München
Finding binding sites – geometrically
Observation: Binding sites usually are the largest pockets
e.g. 83% of enzyme active sites found in the largest pocket (Laskowski RA, et al. Protein clefts in molecular recognition and function. Protein Sci. 1996; 5:2438-2452.)
20 January 2011 Protein similarity 9
Andrea Schafferhans @ TU München
• Fill the protein with a grid (3 Å spacing) • Mark grid points as “protein“
(within 3 Å of an atom ) or “solvent“ • Go along grid and mark “solvent” points
that lie between “protein” points for potential pocket • Find largest “clusters” of “pocket” points
Levitt D, Banaszak L. POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph 1992, 10:229-234.
20 January 2011 Protein similarity 10
Andrea Schafferhans @ TU München
LIGSITE
Differences to POCKET • More efficient searching for
neighbour atoms • Cubic diagonals also used for
finding pockets less dependent on orientation
• Grid points scored by the number of times they are found (between 0 and 7) adjustable “buriedness“
• Smaller and adjustable grid spacing (best: 0.5 to 0.75 Å) Hendlich M, et al.: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Mod. 1997, 15:359-363
20 January 2011 Protein similarity 11
Andrea Schafferhans @ TU München
Finding binding sites – energetically
Binding sites interact with the bound molecules Find location of favourable interaction energies
20 January 2011 Protein similarity 12
Andrea Schafferhans @ TU München
GRID
• Calculates interaction energies of probe molecules • Uses three terms:
– Lennard-Jones (attraction + repulsion) – electrostatic – directional hydrogen bond
Goodford, P.J. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 1985 28:849-857
20 January 2011 Protein similarity 13
Andrea Schafferhans @ TU München
GRID application
• Cluster energy minima binding site • BUT:
– Hard to cluster – Computationally intensive
• Good for binding site characterisation
Picture from: Henrich S, Salo-Ahen OM, Huang B, et al. JMR 2010, 23:209-19.
20 January 2011 Protein similarity 14
Andrea Schafferhans @ TU München
Q-SiteFinder
• GRID methyl probe (0.9 Å grid) • Cluster:
adjacent grid points that meet energy criterion
→ Success: > 70% first predicted binding site > 90% first three
→ 68% average precision (precision: overlap between ligand
and predicted binding site)
Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21:1908-16
20 January 2011 Protein similarity 15
Andrea Schafferhans @ TU München
i-Site
20 January 2011 Protein similarity 16
Variation of Q-Site: • Better probe distribution
(more dense grid) • Two energy limits
– low value for cluster seeds – higher value for extension filtering out meaningful clusters
• AMBER force field
Morita M, Nakamura S, Shimizu K: Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins 2008, 73:468-479
Andrea Schafferhans @ TU München
i-Site
20 January 2011 Protein similarity 17
Variation of Q-Site: • Better probe distribution
(more dense grid) • Two energy limits
– low value for cluster seeds – higher value for extension filtering out meaningful clusters
• AMBER force field
Morita M, Nakamura S, Shimizu K: Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins 2008, 73:468-479
Andrea Schafferhans @ TU München
Challenges in binding site identification
• Protein flexibility can “hide” binding sites → Use multiple experimental conformations → Use molecular dynamics to generate conformations
• Dimerisation has to be considered → Carefully look at PDB unit cell → Carefully look at information about the protein
20 January 2011 Protein similarity 18
Andrea Schafferhans @ TU München
Characterising binding sites
Properties to characterise: • Geometry • Amino acid composition • Solvation • Hydrophobicity • Electrostatics • Interactions with functional groups
20 January 2011 Protein similarity 19
Andrea Schafferhans @ TU München
Hydrophobicity
Measured by logP (partitioning between water and octanol) • Map atom / residue based
contributions • Calculate interaction
energies of hydrophobic probes (e.g. GRID)
20 January 2011 Protein similarity 20
Andrea Schafferhans @ TU München
Electrostatics
• Map electrostatic potential onto surface (e.g. using DelPhi, see http://structure.usc.edu/howto/delphi-surface-pymol.html)
• CAVE: dependence on protonation!
20 January 2011 Protein similarity 21
Andrea Schafferhans @ TU München
Functional groups
• Superstar – Analyse the spatial distribution of
functional groups in CSD density maps
– Break the protein into fragments found in CSD
– Map the observed distribution of interaction partners onto the protein
Verdonk ML, Cole JC, Taylor R: SuperStar: a knowledge-based approach for identifying interaction sites in proteins. Journal of molecular biology 1999, 289:1093-108.
20 January 2011 Protein similarity 22
Andrea Schafferhans @ TU München
Binding site comparison
• Align structures in 3D • Analyse differences and similarities of
– Amino acid composition – Local conformation – Pocket size – Presence of interaction
partners
• Straightforward in case of – Sequence similarity or – Structural similarity
20 January 2011 Protein similarity 23
Andrea Schafferhans @ TU München
RELIBASE
20 January 2011 Protein similarity 24
Andrea Schafferhans @ TU München
RELIBASE
• Stores binding sites from PDB structures • Allows superposition of related binding sites • Computes differences between binding sites
Hendlich M, Bergner A, Günther J, Klebe G: Relibase: Design and Development of a Database for Comprehensive Analysis of Protein-Ligand Interactions. Journal of Molecular Biology 2003, 326:607-620. http://relibase.ccdc.cam.ac
20 January 2011 Protein similarity 25
Andrea Schafferhans @ TU München
• cAMP-dependent protein kinase (1cdk) with adenyl-imido-triphosphate
• trypanothione reductase (1aog) with flavine-adenine-dinucleotide
20 January 2011 Protein similarity 26
Similar but not homologous binding sites
Andrea Schafferhans @ TU München
20 January 2011 Protein similarity 27
Similar but not homologous binding sites
Graphics from www.ebi.ac.uk/pdbsum/
Andrea Schafferhans @ TU München
20 January 2011 Protein similarity 28
Similar but not homologous binding sites
Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406
Andrea Schafferhans @ TU München
Problems in binding site comparison
• Automatically locate binding site • Capture important features in efficient representation • Search efficiently across all structures
– Find best superimposition – Score the alignment
20 January 2011 Protein similarity 29
Andrea Schafferhans @ TU München
Binding site comparison methods • Representation by
– Coordinate set with physico-chemical or evolutionary properties • Atoms • Chemical groups • Surface points
– 3D shape descriptors • Superimposition by
– Geometric hashing – Graph theory, clique search
• Similarity measurement by – RMSD – Residue conservation – Physico-chemical property similarity
20 January 2011 Protein similarity 30
Andrea Schafferhans @ TU München
CavBase – Structure representation • Cavity detection with LIGSITE (stored in Relibase)
• Cavity-flanking residues represented as pseudo-centers: – Donor – Acceptor – Donor-Acceptor – Aliphatic – PI – several per residue if necessary
• Create Graph: – Nodes: pseudo-centers – Edges: distances between the pseudo-centres
Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406
20 January 2011 Protein similarity 31
Andrea Schafferhans @ TU München
CavBase – Alignment Create associated graph:"
Node: ""node from protein A and node from protein B with similar interaction properties"
Edge:""member nodes in protein A and B are connected member node distance <12Å distance difference <2Å
Find maximal common subgraph (Bron-Kerbosh) similar arrangement of pseudo-centers in original graphs
20 January 2011 Protein similarity 32
Andrea Schafferhans @ TU München
CavBase – Scoring • Scoring based on
overlap of similarly typed surface patches
Kuhn D, Weskamp N, Schmitt S, Hüllermeier E, Klebe G: From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase. Journal of Molecular Biology 2006, 359:1023-1044
20 January 2011 Protein similarity 33
Andrea Schafferhans @ TU München
SOIPPA – Structure representation
• Delaunay tesselation of Cα atoms -> 1 tetrahedron/Cα
• Environmental boundary (red) and protein boundary (blue)
Bourne PE, Xie L: A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007, 8:S9. Bourne PE, Xie L: A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009, 25:i305-312.
20 January 2011 Protein similarity 34
Andrea Schafferhans @ TU München
SOIPPA – Structure representation (2)
• Each Cα characterized by – Vector with distance and direction
of boundaries – Substitution matrix
• Graph: Node: Cα Edge: connection of tetrahedra
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 Protein similarity 35
Andrea Schafferhans @ TU München
SOIPPA - Alignment Create associated graph:"
Node: ""node(A) + node(B) with similar geometric potential ""weight: amino acid frequency profile similarity"
Edge:""member nodes in protein A and B are connected""distance difference <2Å surface normal difference <30°
Find maximum-weight common subgraph (MWCS)
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 Protein similarity 36
Andrea Schafferhans @ TU München
SOIPPA – Scoring • Sum over aligned residue pairs:
Residue similarity "weighted by distance
and normal vector angle
• Statistical significance of score Background score distribution: – compare unrelated structures with random sequences – fit resulting score distribution to extreme value distribution function giving probability of randomness dependent on score
€
Sij = (Mij × paij × pdij )i, j∑
Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.
20 January 2011 Protein similarity 37
Andrea Schafferhans @ TU München
Isocleft • Structure representation: Cα / atoms within 5 Å of ligand
• Alignment: Bron-Kerbosh of associated graph
• Scoring:
Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008, 24:i105 http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/icfdb/StartPage.pl
€
S =NC
NA + NB − NC
20 January 2011 Protein similarity 38
Andrea Schafferhans @ TU München
Isocleft - innovations • Two iterations of alignment:
1. Nodes: Cα atoms, Edges: distance difference <3.5 Å, minimal residue similarity Superimpose based on found graph
2. Nodes: all heavy atoms, Edges: distance <4 Å, similar atom type (hydrophilic, acceptor, donor, hydrophobic, aromatic, neutral, neutral-donor and neutral-acceptor)
• Use first result of Bron-Kerbosch, then terminate
Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008, 24:i105
20 January 2011 Protein similarity 39
Andrea Schafferhans @ TU München
Example 1: Explaining side effects
Problem: side effects of ERα modulators (SERMs)
Finding “off target” effects: • Map sequences to structures (BLAST) • Limit to “druggable” proteins (?) • Search with SOIPPA => SERCA (SarcoplasmicReticulum
Ca2+ channel ATPase)
20 January 2011 Application examples 40
Xie L, Wang J, Bourne PE (2007) In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput Biol 3(11)
Andrea Schafferhans @ TU München
Example 1: Validating results
• Inverse search
• Docking – SERM – similar compounds, correlate (?)
20 January 2011 Application examples 41
Graphics from Xie L, Wang J, Bourne PE (2007) PLoS Comput Biol 3(11)
Andrea Schafferhans @ TU München
Example 2: Repositioning known drug
Problem: new tuberculosis drugs needed, but many parameters to optimise
Finding compound to reuse against InhA: • Search other structures binding Adenine
(ATP, ADP, NAD, FAD, ...) • Compare binding sites with SOIPPA => SAM-dependent methyltransferases
20 January 2011 Application examples 42
Kinnings SL, Liu N, Buchmeier N, Tonge PJ, Xie L, et al. (2009) Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis. PLoS Comput Biol 5(7)
Andrea Schafferhans @ TU München
Example 2: Structure match
20 January 2011 Application examples 43
Graphics from Kinnings SL et al. (2009) PLoS Comput Biol 5(7)
Andrea Schafferhans @ TU München
Example 3: Analysing target relationships
Nodes: proteins Edges: similar binding
(within factor 103)
20 January 2011 Application examples 44
Paolini,G.V. et al. (2006) Global mapping of pharmacological space. Nature biotechnology, 24, 805-15.
Andrea Schafferhans @ TU München
Example 3: Analysing target relationships
20 January 2011 Application examples 45
Paolini,G.V. et al. (2006) Global mapping of pharmacological space. Nature biotechnology, 24, 805-15.
Andrea Schafferhans @ TU München
Summary
Pharma research focus moving from only individual interactions to system oriented research
Challenges: • How to compare? • Computational overhead
20 January 2011 Summary 46