SP4: In silico methodsPartner 16A EMBL (Russell, Bork)Partner 1 (CRG Serrano)Partner 5 (NKI Perrakis)Partner 10 (HU Margalit)Partner 12 (CCNet)Partner 17 IRB (Aloy)Partner 3A (Paris-Sud, Janin)
SP4 In silico methods
• WP4.1: Target identification & annotationPartners: EMBL-Bork/Russell, HU, CCNet, IRB
• WP4.2: Complex modelingPartners: EMBL-Russell, IRB, Gif, CRG
• WP4.3: Interface to the scientific community & scientific data managementPartners: NKI, EMBL-Russell, CCNet
WP4.1: Target identification & annotation
Partners EMBL-HD, HU, CCN, IRB
Activities in:• Interaction prediction (HU, EMBL-Bork)• Complex prediction & ranking, the ‘list of 20’
(IRB)• Complex visualisation (CCNet/EMBL)• Data gathering (CCNet)
(e.g. protein-chemical interactions)• Gel processing (EMBL)
WP4.2: Complex modelingPartners EMBL-HD, IRB, Gif, CRG
Activities in:• Complex modelling
– Automated procedures (EMBL-Russell/IRB)– Interaction prediction via structure (EMBL-Russell/IRB)
• New methods for modelling– FoldX (CRG/EMBL)– High-throughput Docking (IRB)
• Analyses, individual models (Everybody)
Building a complex from pieces
Aloy et al, Curr. Opin. Struct. Biol, 2005.
For 636 complexes in yeast 3505 : proteins modelable 419 : complexes single subunit models 224 : 2+ subunit models 122 : 3+ subunit models
eIF2 /eIF2B complex
Preliminary reconstructions (Bettina Boettcher)
GCD7GCD2
GCN3
SUI2
SUI4
SUI3
GCN3
GCD6
GCD1
SUI4
GCD7
SUI4SUI3
GCD1
Sub-complexes
Intact complex MS (Carol Robinson)
Modelling (Damien Devos)
Defining new interfaces:424 candidate interfaces to date
Complex 1
Complex 2 Complex 3
New complex
Complex 1,2 & 3
Complex 1Complex 2
Complex 1 & 2
New complex
SuperimpositionSuperimposition
A common shape denotes a similar fold
Example: Transcription factor SPX dimer
Domain 1, 1z3eA.c.47.1.12-1-trans3 (chain A) Transcriptional regulator SPX (B.subtilis) Domain 2, 1z3eA.c.47.1.12-1-trans4 (chain B) Transcriptional regulator SPX (B.subtilis) Domain 3, 1z3eB.a.60.3.1-1-trans1p (chain C) RNA polymerase alpha (B.subtilis) Domain 4, 1z3eB.a.60.3.1-1-trans2p (chain D) RNA polymerase alpha (B.subtilis) Domain 5, 1lb2E.a.60.3.1-1-trans1 (chain E) RNA polymerase alpha (E.coli) Domain 6, 1lb2B.a.60.3.1-1-trans2 (chain F) RNA polymerase alpha (E.coli)
E.coli dimer in one protein, forms nice interface in B.subtilis – good evidence from other sources (Myco TAP)
Complex 1
Complex 2
Complex 3
New interface
Enabling/disabling loops can predict mulimerization state
Homodimer E. Coli Guanylate kinase
Monomer S. Cerevisiae
Guanylate kinase
E. Coli Guanylate kinase
V. Cholerae Guanylate kinase
Yeast Guanylate kinase
Mouse Guanylate kinase
Pig Guanylate kinase
Bovine Guanylate kinase
Homodimers
enabling loop
Monomers
When modelling fails – docking?
• The Aloy group (IRB) is currently running many tens of thousands of docking experiments using Mare nostrum, the largest supercomputer in Europe
• Aim is to identify promising docking candidates to help model key interactions of interest
Modelling versus docking• We can model an interaction structure if there is a
previously determined structure containing parts homologous to the two interacting proteins
• We can predict an interaction structure by docking if we have structures or models for parts of the interacting proteins
homology
homology
Large-scale Docking
Proteins Interactions all vs all 100
CPUs 1000 CPUs
100 CPUs
1000 CPUs
pdb 90% 8899 9645 39596100 24747,5 2474,7 197980,5 19798,0 pdb 30% 5224 5836 13645088 8528,2 852,8 68225,4 6822,5
Interaction types 2782 3737 3869762 2418,6 241,8 19348,8 1934,8 yeast xray orfs 79 97 3120,5 1,9 0,2 15,6 1,6
Unrefined Refined
Using FoldX to assess docked or modelled interactions
Good Interaction, but many clashes, Model is not so good but could be rescuedBy backbone moves/further docking
WP4.3: Interface to the scientific community and scientific data management
Partners: EMBL-HD, NKI, [CCN]
Activities in:• Web site maintenance
– New data (copy number, structural annotation)– Various optimisation– YeastWiz
• Complex target DB– Resting period for software development– Needs data. Listen to Tassos.
Yeast Wiz (CCNet)
Interactions of known structure
www.3drepertoire.org/yeastwizWindows XPLinuxManual (rather beta)
Accounts enabled Monday for everybodySuggestions for new data promising
Matthew Betts (EMBL), Tomasz Ignasiak (CCNet)
Current data contributions
CCN-DB
EMBL HU
IRB
STRING (interactions)SMART (orthologues)3D Interaction predictionsOrthologyModelsEtc.
Protein-protein (>10 sources, manual)Protein-chemical (Manual, TM)
Dom-dom profilesContext interactionsEnabling loops
Docking solutionsList of 20
3DR
Analysis of a “gold set” of 61 models of known interactions by FoldX
Easy case : Good Interaction Energy, few clashesGood Model
Analysis of a “gold set” of 61 models of known interactions by FoldX
Bad Interaction, loads of clashes, interpenetrating mainchainsBad Model
Analysis of a “gold set” of 61 models of known interactions by
FoldX
Bad Interaction, many clashes, but Model could be rescued by some backbone moves/ further docking
Analysis of a “gold set” of 61 models of known interactions by
FoldX
Good Interaction, many clashes, interpenetrating mainchains, gaps in the structureBad Model
Analysis of a “gold set” of 61 models of known interactions by FoldX
Easy case : Good Interaction Energy, few clashes : Good Model
Bad Interaction, many clashes - interpenetrating mainchains, gaps in the structure : Bad Model- mainchains too close on a large region but this can be solved bybackbone moves/further docking (could improve the model?)
Good Interaction, many clashes: - interpenetrating mainchains, gaps in the structure : Bad Model- mainchains too close on a large region but this can be solved bybackbone moves/further docking
The magnitude of the local clashes correlate with the possibility to rescue or not a model (mild clashes on a lot of residues), but still there are exceptions.
Could we really skip a step of visualization?
From protein-protein interactions to domain-domain interactions and back
Hanah MargalitThe Hebrew University of Jerusalem
domain pairs
protein-protein interactions
Modularity in protein-protein interactions
fine tuners Yes YesNo No
positive datasetreliable protein-protein interactions
reliable pairs of proteins that do not interact
negative dataset
What are the fine tuners of domain-domain recognition?
Homodimers and monomers provide an ideal dataset
Domains that mediate homodimerization Domains that mediate homodimerization are found also in monomersare found also in monomers
homodimers: co-localized co-expressed interact
monomers: co-localized co-expressed do not interact
Database of 50 homodimers/monomerswith the same domain for which structural data is available
Phosphorylations PP
monomersInterface residuesubstitutions
Different fine-tuners determine theself-interaction potential of domains
homodimers
Enabling loops mediate homodimerization
Homodimer E. Coli Guanylate kinase
Monomer S. Cerevisiae
Guanylate kinase
E. Coli Guanylate kinase
V. Cholerae Guanylate kinase
Yeast Guanylate kinase
Mouse Guanylate kinase
Pig Guanylate kinase
Bovine Guanylate kinase
Homodimers
enabling loop
Monomers
Disabling loops prevent homodimerization
Monomer: Bovine inositol polyphosphate 1-phosphatase
Homodimer: Bovine inositol monophosphatase
DL
Monomers
Homodimers
Loop profiles
A multiple-sequence alignment with locations of potential loops
Presence AND absence are informative
monomerprotein 2
homodimerprotein 1
enablingloop
disablingloop
Test set
experimental oligomeric state
loop profile
monomer
dimer
72/80 are consistent (90%, p-value ≤ 5•10-6)
80 proteins with documented oligomeric state based on experimental data
monomer
95
dimer
363
Large-scale prediction of domain-domain interaction
pfkB carbohydrate kinase domain proteins
EL
DL DL DL
108 homodimers
31 monomers
monomer
homodimer
core
Homodimer
Monomertest
>1000predictable
Metallo-beta-lactamase domain
Dominance of disabling over enabling loops
RNase Z (B. Subtilis)ccrA (B. Fragilis)
Summary
1. Enabling/disabling loops are newly discovered fine-tuners of domain-domain interaction
2. Their presence/absence is highly preserved in evolution, implying that prevention of unwanted interactions is an evolutionary constraint
3. Prediction of self-interaction potential of domains according to loop profiles is highly
accurate (~90%)