ForumBlind Evaluation ofHybrid ProteinStructure AnalysisMethods based onCross-LinkingAdam Belsom,1
Michael Schneider,2
Oliver Brock,2 andJuri Rappsilber1,3,*
Hybrid methods combine experi-mental data and computationalmodeling to analyze protein struc-tures that are elusive to structuredetermination. To spur the devel-opment of hybrid methods, we pro-pose to test them in the context ofthe CASP experiment and wouldlike to invite experimental groupsto participate in this initiative.
Determination of protein structure is animportant prerequisite for understandingprotein function, yet it remains one of thegreat scientific challenges of our time. Onequestion: what are the tools that we wouldlike to use? Light microscopes have beenused for centuries to look at cellular struc-tures, but we have not yet been able todevelop a microscope powerful enough toobserve or film a protein structure. How-ever, we have been able to observe pro-tein structure by interpreting physicalmeasurements from X-ray diffraction,nuclear magnetic resonance (NMR) spec-troscopy, and electronmicroscopy. Thesemethods have provided us with most ofthe more than 110 000 structures in theProtein Data Bank (www.pdb.org) [1].
As we aim to chart the protein structuraluniverse more widely and in more detail,established methods face some roughseas and potentially crippling challenges.Many proteins and complexes seem out ofreach for existing methods, because they
cannot be purified, are unstable, or theirnature is intrinsically dynamic [2]. So-called‘hybrid’ methods (methods that combinesparse and low-resolution experimentaldata and also high-resolution yet sparsestructures, with computational structuremodeling methods) could have the poten-tial to overcome some of these limitations.The sparse, low-resolution data used inhybrid methods are by themselves insuffi-cient to determine protein structure. How-ever, their combination with computationalstructure modeling methods has beenshown to enable the determination of com-plex model structures [3].
For hybrid methods to realize their poten-tial, we must advance both the experimen-tal methods and the correspondingcomputationalmethods. This developmentmust occur in tandem so as to be able toachieve the most effective synergiesbetween the strengths of both sides: thenature of the experimental datamust deter-mine what the most appropriate computa-tional methods are, and the challenges ofcomputational methods can guide thedevelopment of experimental methods.
One promising type of low-resolutionexperimental data exploitable by hybridmethods is obtained by cross-linking/mass spectrometry. Cross-linking/massspectrometry is so promising because itappears to complement existing compu-tational approaches very well [4]. Also,cross-linking/mass spectrometry is wellestablished in the structural biology lexi-con. It has been accepted and proven (bynumerous successes) to elucidate thearchitecture of large protein complexes[5]. This has involved exogenous, homo-bifunctional cross-linkers that predomi-nantly link lysine residues. Nevertheless,this robust and popular application hasbeen limited in terms of the extent of detailthat it reveals, which is largely a conse-quence of using selective cross-linkers.Solving entire structures appears to beout of reach. However, using a promiscu-ous and photoactivatable cross-linkerinstead may provide a fundamental
change, at least for individual proteins.We validated the combination of high-density cross-linking data with controlledfalse discovery rates (FDR) and a confor-mational space search, because itenabled the determination of the structureof human serum albumin (HSA) domainswith an RMSD to the X-ray structure of upto 2.5 Å, or 3.4 Å in the context of bloodserum [4]. The generation and conjunctionof high-density cross-linking/mass spec-trometry data with computational struc-ture modeling for ab initio structureprediction is very new and, consequently,needs to be questioned, tested, anddeveloped further.
If we are going to spark the rapid devel-opment of both hybrid and componentmethods within hybrid methods, we needto target two important goals. First, weneed to bring the experimental andcomputational communities together. Thisis important to allow cross-fertilization ofideas and to ensure that the latest devel-opments in both fields are used. Second,we need to establish evaluation standardsfor hybrid methods to test their ability forstructure determination on a highly rigor-ous but even playing field. Many hybridapproaches (and component methods)have been developed in the context ofspecific proteins and complexes and itis often not clear whether an approachwill work for other proteins. To reachour two goals, we are proposing to nowbring the two communities (experimentalgroups and protein-modeling experts)together in the context of the commu-nity-wide experiment, Critical Assessmentof protein Structure Prediction (CASP)[6–8]. We propose the use of CASP asa platform to facilitate progress in hybridmethod development. To accomplish thisgoal, we are soliciting the participation ofexperimentalists to provide protein struc-ture data for the upcoming CASP12, heldin May–August 2016.
CASP has taken place every 2 years since1994 and provides a stringent assess-ment platform of structure prediction
TIBS 1259 No. of Pages 4
Trends in Biochemical Sciences, Month Year, Vol. xx, No. yy 1
TIBS 1259 No. of Pages 4
(A)
0.08
Cros
s-lin
k pa
irfr
eque
ncy
Cros
s-lin
k pa
irfr
eque
ncy
Cros
s-lin
k pa
irfr
eque
ncy
Cros
s-lin
k pa
irfr
eque
ncy
Observed distribu�on
Tx781 - 305 links
Tx808 - 265 links
Tx767 - 381 links
Tx812 - 201 links
Observed distribu�on
Observed distribu�on
Observed distribu�on
Random distribu�on
Random distribu�on
Random distribu�on
Random distribu�on
C-alpha distance [A° ]
C-alpha distance [A° ]
C-alpha distance [A° ]
C-alpha distance [A° ]
0.06
0.04
0.02
0.12
0.080.060.040.02
0
0.1
0.12
0.080.060.040.02
0
0.1
0.12
0.080.060.040.02
0
0.1
05 15 25 35 45 55 65
5 15 25 35 45 55
5 15 25 35 45
5 15 25 35 45
55
200 150
100
50
150
100
50
150
100
50
200
250
300
350
400
420
1
200
250
300
350
400
418
1
50
100
150
200204 1
1318
300
250
65
(B)
(C)
(D)
Figure 1. Cross-Linking/Mass Spectrometry Data Used in Critical Assessment of protein Structure Prediction 11 (CASP11). (A–D) Left panels show theC-alpha pair distance distribution of observed constraints at 5% false discovery rate (FDR) against the random constraint distribution. Middle panels show the cross-linknetworks for four CASP targets shown for estimated 5% FDR cut-off. Gray outer lines represent target sequences. Constraints missing from the crystal structure and,therefore, unverifiable are represented in black. Right panels show the observed constraints at 5% FDR against the X-ray structure. In all panels, constraints with C/–C/cross-linking distances less than 25 Å are shown in purple and constraints with distances 25 Å and over are shown in green. (A) Cross-linked residue pairs of Tx781 inProtein Data Bank (PDB)j4qan, N = 305. (B) Cross-linked residue pairs of Tx808 in PDBj4qhw, N = 265. (C) Cross-linked residue pairs of Tx767 in PDBj4qpv, N = 381.(D) Cross-linked residue pairs of Tx812 in crystal structure (structure not deposited in PDB), N = 201. Note, Tx781 was compromised during shipment and showedaggregates upon cross-linking that led to the pronounced presence of constraints not fitting the X-ray structure of that protein.
2 Trends in Biochemical Sciences, Month Year, Vol. xx, No. yy
TIBS 1259 No. of Pages 4
methods. The organizers release proteinsequences with known but unpublishedstructures to modeling groups, who canthen test their ability to predict structures.The predicted structures are then evalu-ated by independent evaluation groups,with the goal of determining the mostpromising approaches and researchdirections. Importantly, this experimentis double blind to prediction groups,who do not know the protein structures,and evaluation groups, who do not knowthe origin of the predictions. Thus, CASPhas established a rigorous assessmentstandard in the field of structure predictionthat is unmatched in many areas of sci-ence and is considered to be one of thehallmark accomplishments of structuralbioinformatics [9]. CASP was the platformfor demonstrating the effectiveness ofmodern structure prediction methods,such as assembly from structural frag-ments, the detection of remote homologs,and, most recently, the use of evolutionarycontacts [8,10,11]. This rigorous testing ofstructure prediction methods spurredtheir development into a technology thatis now routinely applied in protein engi-neering and drug design [12]. CASP alsoinspired similar efforts for docking of pro-teins into complexes (CAPRI [13]) and theautomated testing of prediction servers(CAMEO [14]).
Following 20 years of purely computa-tional work, in 2014 (CASP11) experimen-tal data was made available to modelinggroups to assist predictions for the firsttime [15]. Cross-linking/mass spectrome-try succeeded in providing distance con-straints for four proteins with a turnaroundtime of 2 weeks per protein. Here,CASP11 allowed us to test the readinessof the approach in a blind study and, at thesame time, test the current value of cross-link data for structure prediction.
We identified between 201 and 381 uniqueresidue pairs at an estimated 5% FDR, forthe four proteins for which we provideddata (Figure 1). This equates to between0.63 and 1.20 cross-links per residue,
which is comparable to that obtained inthe HSA study (0.85 links per residue at5% FDR). Initial results of CASP11 havesuggested that improvements in ab initiostructure prediction using cross-link dataare slight [15]. Most significantly, however,CASP11 revealed some of the current lim-itations of cross-linking, defining areas inwhich the method must develop in thefuture. The observed cross-links werespread unevenly over the sequence. Inaddition, beta sheets had both a lack oflinks and weak definition of observed linksover the structure. These cross-linking/mass spectrometry methodology limita-tions, identified during the course ofCASP11, were not specific to this experi-ment; rather they are limitations that will bepresent for the whole field. By exposingthese limitations, we hope that science isnow better able to find the necessary sol-utions. Blind testing under the auspices ofCASP, or a similar structure, allowsmethoddevelopers to clearly identify the mostpromising approaches as well as areasfor future development and, perhaps morefundamentally, allows scientists at large tosee the current maturity of the approachesas general methods.
Call for ParticipationWe would like to open a call to all cross-linking/mass spectrometry groupswho areinterested in the further development ofcross-linking/mass spectrometry technol-ogy and its ties to structure elucidation toconsider participation in the next round ofCASP. In addition, we would like to wel-come all experimentalists who are able toproduce low-resolution data for the devel-opment of hybrid structure determination.This could be heralded as a stepping-stonetowards joining all experimental methodsthat provide some information on proteinstructures with the modeling community.Participation of experimental groups is acrucial element for successfully leveragingthe full potential of this initiative. We shouldembrace this great opportunity to drivedevelopment of all aspects of protein struc-ture modeling, whether it is the develop-ment of hybrid methods, modeling
algorithms, or experimental data provision.We look forward to the development ofnovel tools in our toolbox, and the unprec-edented discoveries in the protein universethat they will lead to.
For further details on how to participate inCASP12 as an experimentalist and to signup, please contact: http://predictioncenter.org/casp12/registration.cgi.
AcknowledgmentsWe would like to thank the organizers of CASP, Krzyz-
stof Fidelis, Andriy Kryshtafovych, and Bohdan Mon-
astyrskyy, for opening up CASP to experimentalists
and their target/sample identification and acquisition
efforts, and the generous providing of protein samples
by the groups that solved the respective X-ray struc-
tures. A.B. and J.R. were supported by the Wellcome
Trust (grant nos. 103139, 092076, and 108504). M.S.
and O.B. were supported by the Alexander von Hum-
boldt Foundation and the Federal Ministry of Educa-
tion and Research (BMBF).
1Wellcome Trust Centre for Cell Biology, University of
Edinburgh, Edinburgh, EH9 3BF, UK2Robotics and Biology Laboratory, Department of
Electrical Engineering and Computer Science, Technische
Universität Berlin, 10587 Berlin, Germany3[1_TD$DIFF]Chair of Bioanalytics, Institute of Biotechnology,
Technische Universität Berlin, 13355 Berlin, Germany
*Correspondence: [email protected] (J. Rappsilber).
http://dx.doi.org/10.1016/j.tibs.2016.05.005
References1. Berman, H.M. et al. (2000) The Protein Data Bank. Nucleic
Acids Res. 28, 235–242
2. Sali, A. et al. (2015) Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure 23,1156–1167
3. Ward, A.B. et al. (2013) Biochemistry. Integrative structuralbiology. Science 339, 913–915
4. Belsom, A. et al. (2016) Serum albumin domain structuresin human blood serum by mass spectrometry and compu-tational biology. Mol. Cell Proteomics 15, 1105–1116
5. Leitner, A. et al. (2016) Crosslinking and mass spectrome-try: an integrated technology to understand the structureand function of molecular machines. Trends Biochem. Sci.41, 20–32
6. Kryshtafovych, A. et al. (2016) CASP11 statistics and theprediction center evaluation system. Proteins Publishedonline February 9, 2016. http://dx.doi.org/10.1002/prot.25005
7. Kryshtafovych, A. et al. (2015) Methods of model accuracyestimation can help selecting the best models from decoysets: Assessment of model accuracy estimations inCASP11. Proteins Published online September 7, 2015.http://dx.doi.org/10.1002/prot.24919
8. Monastyrskyy, B. et al. (2015) New encouraging develop-ments in contact prediction: assessment of the CASP11results. Proteins Published online November 17, 2015.http://dx.doi.org/10.1002/prot.24943
Trends in Biochemical Sciences, Month Year, Vol. xx, No. yy 3
TIBS 1259 No. of Pages 4
9. Samish, I. et al. (2015) Achievements and challenges instructural bioinformatics and computational biophysics.Bioinformatics 31, 146–150
10. Simons, K.T. et al. (1997) Assembly of protein tertiarystructures from fragments with similar local sequencesusing simulated annealing and Bayesian scoring functions.J. Mol. Biol. 268, 209–225
11. Söding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960
12. Khoury, G. et al. (2014) Protein folding and de novo proteindesign for biotechnological applications. Trends Biotech-nol. 32, 99–109
13. Janin, J. et al. (2003) CAPRI: a Critical Assessment ofPRedicted Interactions. Proteins 52, 2–9
14. Haas, J. et al. (2013) The Protein Model Portal: a compre-hensive resource for protein structure and model informa-tion. Database 2013, bat031
15. Schneider, M. et al. (2016) Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11. ProteinsPublished online March 4, 2016. http://dx.doi.org/10.1002/prot.25028
4 Trends in Biochemical Sciences, Month Year, Vol. xx, No. yy