+ All Categories
Home > Documents > ProTSAV: A protein tertiary structure analysis and validation server

ProTSAV: A protein tertiary structure analysis and validation server

Date post: 28-Nov-2023
Category:
Upload: iitdelhi
View: 0 times
Download: 0 times
Share this document with a friend
9
ProTSAV: A protein tertiary structure analysis and validation server Ankita Singh a,b,1 , Rahul Kaushik a,c,1 , Avinash Mishra a,c , Asheesh Shanker b , B. Jayaram a,c,d, a Supercomputing Facility for Bioinformatics & Computational Biology, IIT Delhi, India b Department of Bioinformatics, Banasthali Vidyapith, Banasthali, 304022, India c Kusuma School of Biological Sciences, IIT Delhi, India d Department of Chemistry, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India abstract article info Article history: Received 15 July 2015 Received in revised form 26 September 2015 Accepted 14 October 2015 Available online 22 October 2015 Keywords: Protein structure quality assessment Structure validation Structure evaluation Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure pre- diction. A highly efcient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assess- ment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specicity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specicity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2 Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in qual- ity assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp © 2015 Elsevier B.V. All rights reserved. 1. Introduction Protein structure prediction is a primary challenge in structural biol- ogy and is essential for gaining better insights into biological function. An understanding of three-dimensional structures is very crucial for rational drug design. Despite having strong methods like X-ray crystal- lography and NMR for protein 3D structure determination, the time and cost involved restrict their implementation to selective proteins [19].This led to the emergence of reliable and efcient computational methods to predict protein tertiary structures. In order to obtain the protein structural information at a large scale in a time and cost effective manner, a series of 3D protein structures are developed by means of high throughput fast computational protein structure prediction such as de novo, homology and hybrid techniques[1014]) and are found very capable for drug development[15]. In recent years, the protein structure prediction community has dedicated enormous efforts to predict more accurate structural models of proteins and their achieve- ments and progresses are chronicled through the biennial Critical Assessment of techniques for protein Structure Prediction (CASP) experiments [1634]. Utility of the predicted model structure can be ensured by accurate quality assessment methodology [35].Methods for assessing the quality of a model structure can be broadly classied based on the functions they employ into four categories as physics based potential functions, statistical potential functions, consensus based functions and machine-learning-based functions. Consensus methods are founded on the hypothesis that folds of the experimental structures are probable to feature more commonly in a set of native-like structures and these are among the best performing methods [3645]. Quality of predicted model structures directly determines their utility for studies like func- tional characterization, protein-protein interactions, ligand-protein interactions [46]. Knowledge of a binding pocket of a receptor for its ligand is very important for drug design particularly for conducting mu- tagenesis studies [47]. The principle of dening binding pocket [48] has proved to be quite useful in identifying functional domains encouraging the appropriate truncation experiments. A similar approach has also been applied to describe the binding pockets for many other receptor- ligand interactions important for drug design [1014]. The predicted model structures can be scanned for various features to assess their quality like main chain conformations in allowed regions of the Ramachandran map, planar peptide bonds, side chain conforma- tions that correspond to those in the rotamer libraries, hydrogen- bonding of polar atoms if they are buried, proper environments for hydrophobic and hydrophilic residues and number of bad atom-atom contacts [4951].Presently, diverse quality assessment programs like Naccess [52], Verify3D [53], Errat [54], Procheck [55], MolProbity [56], Biochimica et Biophysica Acta 1864 (2016) 1119 Corresponding author at: Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India. E-mail address: [email protected] (B. Jayaram). 1 These authors have made equal contribution to the work. http://dx.doi.org/10.1016/j.bbapap.2015.10.004 1570-9639/© 2015 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Biochimica et Biophysica Acta journal homepage: www.elsevier.com/locate/bbapap
Transcript

Biochimica et Biophysica Acta 1864 (2016) 11–19

Contents lists available at ScienceDirect

Biochimica et Biophysica Acta

j ourna l homepage: www.e lsev ie r .com/ locate /bbapap

ProTSAV: A protein tertiary structure analysis and validation server

Ankita Singh a,b,1, Rahul Kaushik a,c,1, Avinash Mishra a,c, Asheesh Shanker b, B. Jayaram a,c,d,⁎a Supercomputing Facility for Bioinformatics & Computational Biology, IIT Delhi, Indiab Department of Bioinformatics, Banasthali Vidyapith, Banasthali, 304022, Indiac Kusuma School of Biological Sciences, IIT Delhi, Indiad Department of Chemistry, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India

⁎ Corresponding author at: Supercomputing FaComputational Biology, Indian Institute of Technology110016, India.

E-mail address: [email protected] (B. Jayaram1 These authors have made equal contribution to the w

http://dx.doi.org/10.1016/j.bbapap.2015.10.0041570-9639/© 2015 Elsevier B.V. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 15 July 2015Received in revised form 26 September 2015Accepted 14 October 2015Available online 22 October 2015

Keywords:Protein structure quality assessmentStructure validationStructure evaluation

Quality assessment of predictedmodel structures of proteins is as important as the protein tertiary structure pre-diction. A highly efficient quality assessment of predicted model structures directs further research on function.Herewe present a new server ProTSAV, capable of evaluating predictedmodel structures based on some popularonline servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individualprotein structure along with a graphical representation and ranking in case of multiple protein structure assess-ment. The server is validated on ~64,446 protein structures including experimental structures from RCSB andpredicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting qualityof protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures andachieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under2 Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in qual-ity assessment.ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp

© 2015 Elsevier B.V. All rights reserved.

1. Introduction

Protein structure prediction is a primary challenge in structural biol-ogy and is essential for gaining better insights into biological function.An understanding of three-dimensional structures is very crucial forrational drug design. Despite having strong methods like X-ray crystal-lography and NMR for protein 3D structure determination, the timeand cost involved restrict their implementation to selective proteins[1–9].This led to the emergence of reliable and efficient computationalmethods to predict protein tertiary structures. In order to obtain theprotein structural information at a large scale in a time and cost effectivemanner, a series of 3D protein structures are developed by means ofhigh throughput fast computational protein structure prediction suchas de novo, homology and hybrid techniques[10–14]) and are foundvery capable for drug development[15]. In recent years, the proteinstructure prediction community has dedicated enormous efforts topredict more accurate structural models of proteins and their achieve-ments and progresses are chronicled through the biennial CriticalAssessment of techniques for protein Structure Prediction (CASP)experiments [16–34].

cility for Bioinformatics &Delhi, Hauz Khas, New Delhi,

).ork.

Utility of the predicted model structure can be ensured by accuratequality assessment methodology [35].Methods for assessing the qualityof a model structure can be broadly classified based on the functionsthey employ into four categories as physics based potential functions,statistical potential functions, consensus based functions andmachine-learning-based functions. Consensus methods are foundedon the hypothesis that folds of the experimental structures are probableto feature more commonly in a set of native-like structures and theseare among the best performing methods [36–45]. Quality of predictedmodel structures directly determines their utility for studies like func-tional characterization, protein-protein interactions, ligand-proteininteractions [46]. Knowledge of a binding pocket of a receptor for itsligand is very important for drug design particularly for conductingmu-tagenesis studies [47]. The principle of defining binding pocket [48] hasproved to be quite useful in identifying functional domains encouragingthe appropriate truncation experiments. A similar approach has alsobeen applied to describe the binding pockets for many other receptor-ligand interactions important for drug design [10–14].

The predicted model structures can be scanned for various featuresto assess their quality likemain chain conformations in allowed regionsof the Ramachandran map, planar peptide bonds, side chain conforma-tions that correspond to those in the rotamer libraries, hydrogen-bonding of polar atoms if they are buried, proper environments forhydrophobic and hydrophilic residues and number of bad atom-atomcontacts [49–51].Presently, diverse quality assessment programs likeNaccess [52], Verify3D [53], Errat [54], Procheck [55], MolProbity [56],

Table 1Data selection and curation for training dataset. (a) Details of experimental structures selected from RCSB. The structure without missing residues except for N-terminus and C-terminuswere considered. (b) Details of predicted structures taken from CASP7–CASP10. Only full length model structures for respective target sequences were considered before classifying intoclasses.

Table 1 (a)

Data typeTotal proteinstructures

MonomersChain length50–500 AA

X-ray structureswithin 2 Å

Final data withoutmissing residues

RCSB 98,770 42,078 36,159 19,233 8001

Table 1 (b)

Data TypeNumber oftargets

Total model structures submittedby server groups

Full length filteredstructures

Structures withinclass 0–2 Å

Structures withinclass 2–5 Å

Structureswithin 5–8 Å

CASP7 95 12,750 10,150 105 1403 1441CASP8 120 21,673 16,043 671 3963 2180CASP9 116 22,577 16,271 213 2761 1914CASP10 97 15,684 11,516 274 2515 1310Total 1263 10,642 6845

12 A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

ProSA [57], dDFIRE [58], D2N [59], ProQ [60], and PSN-QA [61],WhatCheck [62], QMEAN [63], ANNOLEA [64] etc. are available in publicdomain which scrutinize errors in predicted model structures basedon various parameters derived from experimentally determinedstructures. Each of these methods, while successful in highlightingdeficiencies in modeled structures, is not individually comprehensiveand hence the need for a server that draws upon the collective strengthof these methods. In view of the above, here we propose a noveltechnique to evaluate the reliability of any protein 3D structurespredicted by various computational approaches, in the hope ofaccelerating function assignment and drug development.

2. Material and methods

2.1. Input and output

Input for the server is a single model structure or multiple modelsin pdb file format. For single model structure quality assessment,ProTSAV furnishes the user with one cumulative global scoredepicted through a plot and for multiple model structures ProTSAVgenerates the scores for respective model structures and performstheir ranking.

2.2. Data selection and curation

Dataset selected for building the server is divided into two parts.First part consists of 8001 monomeric protein structures (List of PDBIDs along with SCOP IDs is provided in supplementary as Table S1)solved though x-ray crystallography possessing a resolution range of0–2 Å with chain lengths in the range of 50–500 amino acid residuesfrom RCSB database [65]. The second part consists of 18,750 decoystructures submitted by all server groups for CASP7 to CASP10 targets(full length models with respect to their target length)spanning a

Table 2List of various quality assessment structural features used in individual modules comparing Proquality assessment.

Features DFIRE Errat Naccess Pro

Non-covalent interactions ✓

Residue based contact potential ✓

Burial preference of residue contacts ✓

Accessible surface area ✓

Residue packing preference ✓

Contact order ✓

Globularity ✓

Secondary structure Information ✓

Φ and ψ distribution in Ramachandran plotEnergy based scorings ✓

Side chain packing

range of 0–8 Å root mean square deviations (rmsd) from their respec-tive native structures with the same chain length. The selected datasetsare checked for missing residues. Hydrogen atoms are added to all pdbstructures using tleap[66].The selected dataset consists of a total of26,751 protein structures with the first 9264 structures (8001 RCSBStructures and 1263 decoy Structures) in 0–2 Å class, the next 10,642decoy structures in 2–5 Å class and the last 6845 decoy structures in5–8 Å class. Table 1 gives the numbers of structures at various stagesof data selection and curation of training dataset.

3. Module selection

ProTSAV server is developed based on ten previously reported, well-known and thoroughly tested methods.

3.1. Naccess

Globular proteins exist in their native conformation in thepresence of water. Thus solvent accessible surface area is expectedto play a crucial role in the stability of protein structures and intheir evaluation. Naccess calculates the solvent accessibility of allatoms and residues with a user defined probe size. Relative accessi-ble surface area furnishes better insights on structure by comparingresidue-wise surface area with standard values derived from exper-imental structures. Higher the exposed surface areas of a proteinstructure, lesser are the chances of the structure to have near nativeconfirmation.

3.2. Verify3D

This tool performs the assessment of a protein tertiary structure bychecking its compatibility with its amino acid sequence with a measureof 3D-1D Profile Score for each residue. The algorithm implements a

TSAV. All the modules are mutually exclusive in terms of combination of features used for

SA Procheck Verify3d MolProbity D2N ProQ PSN-QA

✓ ✓ ✓

✓ ✓

✓ ✓ ✓

✓ ✓

✓ ✓ ✓

13A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

residue-level score for evaluating suitability of a residue to its structuralenvironment defined by the secondary structure, burial position andpolarity of positions in a structure. Residues with a 3D-1D Profile

Fig. 1. (a and b)A comparison of raw scores (left column) and normalized scores (right columntures considered. The first 9264 structures (8001 RCSB Structures and 1263 decoy Structures)class are expected to be in yellow and the last 6845 decoy structures in 5–8 Å class are expectedcorrect annotation by the selected modules.

Score ≥ 0.2 are considered to be suitable to their structural environment.Thus the percentage residues scoring above the threshold, representsoverall quality of the structure.

) for each module is shown. The abscissa reports protein index of the 26,751 protein struc-in 0–2 Å class are expected to be in green color, the next 10,642 decoy structures in 2–5 Åto be in orange color. Occurrence of these colors exclusively in the defined classes implies

Fig. 1 (continued).

14 A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

3.3. Errat

This tool distinguishes between correctly and incorrectly deter-mined regions based on characteristic atomic interactions. A functionof position with 9-residue sliding window is used to plot the error

parameters. Depending upon the information of non-bonded atom-atom interactions in the experimental structures, an error function isdefined. It provides an overall quality factor for the given structureexpressed as the percentage of protein with error value falling under95% limit. Based on the information derived from experimental

15A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

structures, the threshold overall quality factor is considered to be 91%for medium resolution structures and 95% or above for high resolution(good) structures.

3.4. Procheck

This tool evaluates the stereo-chemical quality of a protein structureby considering bond lengths, bond angles, main chain and side chainparameters, residue contacts, geometry, and distribution of backbonetorsion angles (Φ and Ψ) in Ramachandran plot. An overall G-factor iscalculated from the observed distributions of stereo-chemical proper-ties which measures the extent of normal or unusual parameters.The quality of a protein structure deteriorates with a decline in overallG-factor.

3.5. MolProbity

MolProbity uses a variety of physics-based and knowledge-based al-gorithms to assess a structure. All-atom contacts, side chain clashes, andRamachandran distribution of backbone torsion angles are the majorparameters for MolProbity structure-validation. An overall MolProbityscore inclusive of these parameters with lower values reflects goodquality of the model structures and vice-versa.

3.6. ProSA

ProSA is a protein tertiary structure diagnostic tool which reliesupon the statistical analysis derived from experimental protein struc-tures. It uses knowledge-based Cα potentials of mean force to evaluatemodel correctness. It delivers a Z-score along with a plot of its residueenergies. Z-scores falling within a certain range distinguish native-likeprotein structures from erroneous structures.

3.7. dDFIRE

It is an energy function which accounts for pair-wise atomic anddipolar interactions. It furnishes a free energy score for each proteinconformation. Structures with lower free energy scores are consideredto be of better quality than structures with higher free energy scores.

3.8. D2N

It is a random forest machine learning based tool which predicts thequality of any structure derived from six different physicochemicalfeatures of native protein structures. It considers all atom non-bondedenergy, total accessible surface area, Cβ geometrical constraint, second-ary structure penalty, solvent accessible surface area of polar residuesand charged residues. It delivers an estimated rmsd TM score and GDTscore for a given structure without using its native information.

3.9. ProQ

ProQ evaluates residue-wise local quality of a model structure byemploying a neural network approach which integrates contactsamong atoms and residues, solvent accessible surfaces, and secondarystructure statistics. It predicts quality of a model structure from the

Table 3Value of calculated mean (μ) and standard deviation (σ) for different modules for in respective

Modules DFIRE ERRAT NACCESS ProSA Pro

0–2 Åμ 0.42 0.05 0.27 0.57 0.27σ 0.037 0.048 0.054 0.082 0.08

2–5 Åμ 0.53 0.44 0.35 0.67 0.39σ 0.048 0.092 0.063 0.086 0.10

5–8 Åμ 0.60 0.53 0.41 0.72 0.45σ 0.082 0.162 0.093 0.102 0.13

extracted structural features by measuring LG Score and MaxSubScore. LG Score N 3 and MaxSub Score N 0.5 represent good quality ofstructures.

3.10. PSN-QA

This tool uses protein structure networks of native andmodeledpro-teins in combination with Support Vector Machines to estimate thequality of a protein structure. These networks are constructed on thebasis of non-covalent interactions between side chains of polypeptides.It ranks the predicted model structure based on its closeness to the na-tive protein structure. A PSN-QA rank beyond 16 represents native-likeconformation and a rank under 10 represents non-native likeconformation.

The Z-scores within respective classes are averaged for all the struc-tures in selected dataset and normalized between 0 and 1 based on theobservedminimum andmaximum. Normalized average Z-score is usedto represent ProTSAV quality assessment score.

PNQS ¼ Norm ZDFR þ ZERR þ ZASA þ ZPRS þ ZPRC þ ZV3D þ ZMPB þ ZD2N þ ZPRQ þ ZPSN� �

=10� �

where,

PNQS is the ProTSAV normalized quality score,

ZDFR is Z-score for DFIRE and calculated as ZDFR ¼ ðXDFR−μDFRσDFR

Þ,ZERR is Z-score for ERRAT and calculated as ZERR ¼ ðXERR−μERR

σERRÞ,

ZASA is Z-score for Naccess and calculated as ZASA ¼ ðXASA−μASAσASA

Þ,ZPRS is Z-score for ProSA and calculated as ZPRS ¼ ðXPRS−μPRS

σPRSÞ,

ZPRC is Z-score for Procheck and calculated as ZPRC ¼ ðXPRC−μPRCσPRC

Þ,ZV3D is Z-score for Verify3D and calculated as ZV3D ¼ ðXV3D−μV3D

σV3DÞ,

ZMPB is Z-score for MolProbity and calculated as ZMPB ¼ ðXMPB−μMPBσMPB

Þ,ZD2N is Z-score for D2N and calculated as ZD2N ¼ ðXD2N−μD2N

σD2NÞ,

ZPRQ is Z-score for ProQ and calculated as ZPRQ ¼ ðXPRQ−μPRQσPRQ

Þ and.ZPSN is Z-score for PSN-QA and calculated as ZGPS ¼ ðXPSN−μPSN

σPSNÞ.

x, μ and σ above are the score generated, the average and thestandard deviation respectively of the module (described in detailbelow underscore generation).

Table 2 provides the highlights of the various quality assessmentfeatures used by the selected modules in ProTSAV server. The selectedmodules are nearlymutually exclusive in terms of combination of struc-tural features used in their individual quality assessment. Source codesof these modules are available in public domain/on request.

4. Class definition

The selected dataset is classified into three classes based on rmsdvalues from the corresponding native structures. The first class hasstructureswith rmsds of range 0–2 Å. It comprises all the experimental-ly solved structures and some predicted model structures withrmsds b 2 Å (9264 structures). The second class represents structureswith rmsds ranging from 2 to 5 Å and comprises 10,642 decoy

classes.

Check Verify3D MolProbity D2N ProQ PSN-QA

0.17 0.23 0.27 0.48 0.575 0.060 0.117 0.069 0.066 0.071

0.36 0.66 0.38 0.74 0.917 0.072 0.119 0.074 0.076 0.029

0.53 0.70 0.46 0.78 0.920 0.089 0.109 0.108 0.071 0.035

Fig. 2. .Workflow of ProTSAV server for quality assessment of a given protein structure.

16 A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

structures. The third class comprises structures belonging to the rmsdrange of 5 to 8 Å with 6845 decoy structures. The structures containedin the second and third classes are mainly derived from model struc-tures submitted by various servers in CASP7 to CASP10 datasets(Table 1). The classes defined are based on the importance of the struc-tures for their further applicability in research. For instance, the first

Fig. 3. ProTSAV scores for the dataset of 26,751 protein structures. The abscissa reports proteinStructures and 1263 decoy Structures) in 0–2 Å class are expected to be in green color, the nedecoy structures in 5–8 Å class are expected to be in orange color. Occurrence of these colors e

class structures can be used for drug designing purposes directly. Thesecond class structures can be helpful in understanding the overalltopology/function and in ligand binding studies aswell as for further re-finement. The third class of structures needs to be revisited by predic-tion servers. Structures beyond the third class can be discarded as ofno further use.

index of the 26,751 protein structures considered. The first 9264 structures (8001 RCSBxt 10,642 decoy structures in 2–5 Å class are expected to be in yellow and the last 6845xclusively in the defined classes implies correct annotation by the selected modules.

Table 5Comparative Sensitivity and specificity of ProTSAV quality assessment and MetaMQAP onCASP11 dataset in all the classes.

ClassMetaMQAP ProTSAV

Specificity (%) Sensitivity (%) Specificity (%) Sensitivity (%)

0–2 49 53 88 912–5 71 43 74 885–8 27 38 58 74Above 8 53 34 83 49

17A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

5. Score generation

All the selectedmodules are run for the quality assessment of the se-lected dataset of protein structures and raw scores are generated for in-dividual modules. These raw scores are normalized between 0 and 1based on the observed minimum and maximum values for each mod-ule. A comparison of raw scores and the computed normalized scoresis shown Fig. 1a and 1b. Modules for which the quality scores are func-tions of sequence length, ratios of raw score to sequence length are plot-ted, to negate the effect of protein size. In Fig. 1a and 1b, the abscissareports protein index of the 26,751 protein structures considered. Thefirst 9264 structures (8001 RCSB Structures and 1263 decoy Structures)in 0–2 Å class are expected to be in green color, the next 10,642 decoystructures in 2–5 Å class are expected to be in yellow and the last6845 decoy structures in 5–8 Å class are expected to be in orangecolor. Occurrence of these colors exclusively in the defined classesimplies correct annotation by the selected modules. Fig. 1a and 1b alsoreflects a neutral impact (left column versus right column) of rescalingon efficiency of the individual modules. All the modules are able todiscriminate the structures of varied quality to different extents. Thisdifference is more prominent when switching from 0–2 Å class to 2–5 Åclass. However, the same is not true for 2–5 Å class to 5–8 Å class formost of the modules. Mean (μ) and standard deviation (σ) are calculatedwithin each defined class for all the modules (Table 3) and Z-scores arecalculated from normalized raw scores for all the modules. The Z-scoreswithin respective classes are averaged for all the structures in selecteddataset and normalized between 0 and 1 based on observed minimumandmaximum. Normalized average Z-score is used to represent ProTSAVquality assessment score.

Fig. 2 depicts the flowchart for the calculation of overall quality of asubmitted structure. Fig. 3 illustrates the ProTSAV scores for the entiredataset considered.

6. Results

A comprehensive reliable single score with a graphical result is themajor need of the hour in protein structure prediction research. Herewe have developed a server ProTSAV for quality assessment of proteinstructures pooling together ten different existing methods and validat-ed on ~25,000 additional experimental structures and 17,680 decoystructures fromCASP11 experiment. ProTSAV quality assessment is per-formed on top 15 structures for each target of CASP11 (http://www.predictioncenter.org) and prediction results are compared with actualrmsd values from corresponding native structures. A prediction accura-cy of 98.8% for experimental structures, and 91.0%, 88.9% and 74.8%prediction accuracies for modeled structures in 0–2 Å class, 2–5 Åclass and 5–8 Å class respectively is achieved which is higher than anyindividual modules. Table 4 gives a comparison of the results fromindividual modules and ProTSAV. The server has delivered a specificityof 100% and a sensitivity of 98% for experimental structures. Also

Table 4A comparison of percentage accuracy of different quality assessment tools on experimentaland predicted protein structures in assigned classes.

ModulesCrystalStructures

0–2 Å 2–5 Å 5–8 Å N 8 Å

Naccess 90.1 79.7 34.3 51.4 36.9Verify3D 92.6 72.5 37.0 57.1 39.3Errat 80.4 50.2 51.4 46.5 58.9Procheck 65.2 60.4 42.4 58.5 58.1MolProbity 64.3 62.7 40.8 53.4 59.1ProSA 25.3 57.4 41.9 67.4 50.0dDFIRE 66.5 68.5 42.8 53.6 53.7D2N 84.2 79.8 30.1 51.4 41.9ProQ 86.5 73.6 54.7 70.7 48.1PSN-QA 86.7 68.6 43.7 63.5 43.1ProTSAV 98.8 91.0 88.9 74.8 60.7

specificity and sensitivity vary from 88% to 58% and 91% to 74% respec-tively for the three classes (Table 4). Accuracy of the results predicted byProTSAV is compared to that of a previously reported metaserverMetaMQAP [67] and found to be more reliable (Table 5). The proposedmethodology was further validated on independent datasets of publicdecoys (http://zhanglab.ccmb.med.umich.edu/decoys/refined_decoys/decoy_set.tar.gz, http://ram.org/compbio/dd/ddownload.cgi?4state_reduced.tgz). The decoy sets selected for validation consist 21,766decoy structures of varying rmsds from their respective natives (Detailsof PDB IDs and their decoys are provided in supplementary as Table S2).ProTSAV performs well on the independent datasets in terms of sensi-tivity, specificity and accuracy as shown in Table 6.

Results of ProTSAV assessment of the quality of an input/modeledstructure, are provided in the formof a color illustration (Fig. 4). Thisfig-ure includes quality assessment by individualmodules aswell as overallProTSAV unified score. The green region reflects quality of a modelstructure in 0–2 Å rmsd, yellow region in 2–5 Å rmsd, orange regionin 5–8 Å rmsd and red region in a model structure beyond 8 Å rmsd.The blue colored asterisk symbol represents quality assessment scoreby individual modules and blue colored round symbol represents over-all ProTSAV score. A quick glance at the right extreme column indicatesthe overall quality of the input structure.

The individual modules often fail to discriminate the structuresbelonging to 2–5 Å class and 5–8 Å class as revealed in Table 4 andFig. 1a and 1b. ProTSAVrestores the lost accuracies in individual mod-ules via its collective score and performs better in discriminating thesestructures (Table 4, Fig. 3).

6.1. Discussion and conclusion

Development of a highly reliable and accurate strategy for qualityassessment of protein structures is necessitated by the availability of di-verse model structures through different computational approaches fora given protein sequence. The various modules available in the serverpresented here compute a variety of parameters for quality assessmentof protein structures and allow users to obtain detailed insights into theinput structure. For instance, from Fig. 4(a), it can be interpreted thatthree modules predicted the input structure as belonging to secondclass (2–5 Å, yellow region) while remaining modules supported firstclass (0–2 Å, green region). Similarly in Fig. 4b, c and d, different mod-ules predicted the input structures as belonging to different classes.This disagreement in quality assessment necessitates a more accurateunified score such as ProTSAV score which reflects the actual qualityof the input structure to a higher level of accuracy.

Table 6Sensitivity, specificity and accuracy of ProTSAV quality assessment performed on publicdecoys in all the classes.

Class Specificity (%) Sensitivity (%) Accuracy (%)

0–2 Å 86.8 90.2 89.72–5 Å 73.3 84.0 86.45–8 Å 62.7 76.5 78.6Above 8 Å 79.0 56.9 64.0

Fig. 4. ProTSAV quality assessment of input protein structures. Green region indicates the input structure to be in 0–2 Å rmsd, yellow region 2–5 Å rmsd, orange region 5–8 Å rmsd and redregion indicates structures beyond 8 Å rmsd. The blue colored asterisk symbol represents quality assessment score by individualmodule and blue colored round symbol represents overallscore by ProTSAV. (a) Quality assessment for a crystal structure (PDB id: 4LAZ) scores in green region. (b) Quality assessment for model structure (T0760, Zhang-Server_TS1) scores inyellow region (actual rmsd 2.95 Å). (c) Quality assessment for model structure (T0772, Pcons-net_TS1) scores in orange region (actual rmsd 6.95 Å). (d) Quality assessment for modelstructure (T0763, TASSER-VMT_TS1) scores in red region (actual rmsd 13.65 Å).

18 A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

The structures predicted in green region can be directly used forreviewing catalytic mechanisms, designing and refining ligands,docking of macromolecules, predicting protein partners and definingantibody epitopes. The structures predicted in yellow region can behelpful in understanding functional interactions from structural similar-ity, ligand binding studies and scrutinizing site-directed mutagenesis,recognizing spots of conserved surface residues and exploring function-al sites by 3D motif searching as well as can be pushed to green regionafter further refinements. The structures predicted in orange regionneed to be revisited by prediction servers and the structures falling inred region can be discarded [68].

A high sensitivity and specificity ensures the applicability of ProTSAVfor accurate quality assessment of a protein structure. Implementinggraphical representation to study biological problems can provide aspontaneous picture and valuable insights for analyzing complex rela-tions in these systems [69], as indicated bymany earlier studies on a se-ries of important biological topics, such as enzyme-catalyzed reactions[70–72], inhibition of HIV-1 reverse transcriptase [73,74], and usingwenxiang diagram or graph [75] to study protein-protein interactions[76,77].

Recent studies [78–80] and reviews [81–82] emphasized ondevelopment of user-friendly and publicly accessible web-servers forcomputational tools to significantly enhance their practical applications,value and impact.

Here, we have established a web-server, ProTSAV which delivers aclean web interface that unites a set of modules, producing a graphicalillustration of results that may be easily interpreted by a novice user.The server is freely accessible at: http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp along with a tutorial, sample files andinstructions for download.

Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.bbapap.2015.10.004.

Funding

This work is supported by Indian Council of Medical Research, India.

Transparency document

The Transparency document associated with this article can befound, in online version.

Acknowledgements

Authors are thankful to the developers of the individual modules forproviding the source codes and allowing their usage in the server.Authors are also thankful to the anonymous reviewers for their valuableinputs. Support to the Supercomputing Facility for Bioinformatics &Computational Biology (SCFBio), IIT Delhi from the Department of Bio-technology Govt. of India is gratefully acknowledged. RK is a recipientof senior research fellowship from CSIR.

References

[1] J. Cheng, A multi-template combination algorithm for protein comparative model-ing, BMC Struct. Biol. 8 (2008) 18.

[2] B. Jayaram, K. Bhushan, S.R. Shenoy, et al., Bhageerath: an energy basedweb enabledcomputer soft-ware suite for limiting the search space of tertiary structures of smallglobular proteins, Nucleic Acids Res. 34 (2006) 6195–6204.

[3] B. Jayaram, P. Dhingra, A. Mishra, et al., A homology/ab initio hybrid server forpredicting tertiary structures of monomeric soluble proteins, BMC Biochem. 15(2014) S7.

[4] S.R. Shenoy, B. Jayaram, Proteins: sequence to structure and function- current status,Curr. Protein Pept. Sci. 11 (2010) 498–514.

[5] P. Dhingra, B. Jayaram, A homology/ab initio hybrid algorithm for sampling near-native protein conformations, J. Comput. Chem. 34 (2013) 1925–1936.

[6] D. DasGupta, R. Kaushik, B. Jayaram, From Ramachandran maps to tertiarystructures of proteins, J. Phys. Chem. B 119 (34) (2015) 11136–11145.

[7] M.J. Berardi, W.M. Shih, S.C. Harrison, J.J. Chou, Mitochondrial uncoupling protein tostructure determined by NMR molecular fragment searching, Nature 476 (2011)109–113.

[8] B. OuYang, S. Xie, M.J Berardi, X.M. Zhao, J. Dev, W. Yu, unusual architecture of thep7 channel from hepatitis C virus, Nature 498 (2013) 521–525.

[9] S. Bruschweiler, Q. Yang, C. Run, Substrate-Modulated ADP/ATP-TransporterDynamics Revealed by NMR Relaxation Dispersion, Nature Structural & MolecularBiology, 2015 http://dx.doi.org/10.1038/nsmb.3059.

[10] D. Jones, R.L. Heinrikson, Prediction of the tertiary structure and substrate bindingsite of caspase-8, FEBS Lett. 419 (1997) 49–54.

19A. Singh et al. / Biochimica et Biophysica Acta 1864 (2016) 11–19

[11] K.C. Chou, Coupling interaction between thromboxane A2 receptor and alpha-13 sub-unit of guanine nucleotide-binding protein, J. Proteome Res. 4 (2005) 1681–1686.

[12] S.Q. Wang, Q.S. Du, R.B. Huang, Insights from investigating the interaction ofoseltamivir (Tamiflu) with neuraminidase of the 2009 H1N1 swine flu virus,Biochem. Biophys. Res. Commun. 386 (2009) 432–436.

[13] Y. Ma, S.Q. Wang, W.R. Xu, R.L. Wang, Design novel dual agonists for treating type-2diabetes by targeting peroxisome proliferator-activated receptors with core hop-ping approach, PLoS One 7 (2012), e38546.

[14] T. Singh, D. Biswas, B. Jayaram, AADS — an automated active site identification,docking, and scoring protocol for protein targets based on physicochemical descrip-tors, J. Chem. Inf. Model. 51 (2011) 2515–2525.

[15] K.C. Chou, Review: structural bioinformatics and its impact to biomedical science,Curr. Med. Chem. 11 (2004) 2105–2134.

[16] M.A. Marti-Renom, A.C. Stuart, A. Fiser, et al., Comparative protein structure model-ing of genes and genomes, Annu. Rev. Biophys. Biomol. Struct. 29 (2000) 291–325.

[17] A. Kryshtafovych, C. Venclovas, K. Fidelis, et al., Progress over the first decade ofCASP experiments, Proteins 61 (2005) 225–236.

[18] D. Cozzetto, A. Kryshtafovych, M. Ceriani, et al., Assessment of predictions in themodel quality assessment category, Proteins 69 (2007) 175–183.

[19] J. Moult, K. Fidelis, A. Kryshtafovychet, et al., Critical assessment of methods of pro-tein structure prediction — round VIII, Proteins 77 (2009) 1–4.

[20] A. Ray, E. Lindahl, B. Wallner, Improved model quality assessment using ProQ2, BMCBioinforma. 13 (2012) 224.

[21] B. Wallner, ProQM-resample: Improved model quality assessment for membraneproteins by limited conformational sampling, Bioinformatics 30 (2014) 2221–2223.

[22] Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structuretemplate quality, Proteins Struct. Funct. Bioforma. 57 (2004) 702–710.

[23] J. Cheng, et al., TheMULTICOM toolbox for protein structure prediction, BMCBiochem. 13 (2012) 65.

[24] R. Cao, D. Bhattacharya, B. Adhikari, J. Li, J. Cheng, Large-scale model quality assess-ment for improving protein tertiary structure prediction, Bioinformatics 31 (12)(2015) i116–i123.

[25] Z. Wang, J. Eickholt, J. Cheng, MULTICOM: a multi-level combination approach toprotein structure prediction and its assessment in CASP8, Bioinformatics 26(2010) 882–888.

[26] R. Cao, Z. Wang, J. Cheng, Designing and evaluating theMULTICOM protein local andglobalmodel quality predictionmethods in CASP10 experiment, BMC Struct. Biol. 14(2014) 14-13.

[27] J.L. McGuffin, TheModFOLD server for the quality assessment of protein structuralmodels, Bioinformatics 24 (2008) 586–587.

[28] J.L. McGuffin, Prediction of global and local model quality in CASP8 using theModFOLD server, Proteins 77 (2009) 185–190.

[29] J.L. McGuffin, T.M. Buenavista, B.D. Roche, The ModFOLD4 server for the quality as-sessment of 3D protein models, Nucleic Acids Res. 41 (2013) 368–372.

[30] J.L. McGuffin, B.D. Roche, Rapid model quality assessment for protein structure pre-dictions using the comparison of multiple models without structural alignments,Bioinformatics 26 (2010) 182–188.

[31] Z. Wang, J. Eickholt, J. Cheng, APOLLO: a quality assessment service for single andmultiple protein models, Bioinformatics 27 (2011) 1715–1716.

[32] B. Manavalan, J. Lee, J. Lee, Random forest-based protein model quality assessment(RFMQA) using structural features and potential energy terms, Public Libr. Sci.One 9 (2014), e106542 15.

[33] A. Kryshtafovych, A. Barbato, K. Fidelis, B. Monastyrskyy, T. Schwede, A. Tramontano,Assessment of the assessment: evaluation of themodel quality estimates in CASP10,Proteins S2 (2014) 112–126.

[34] B. Wallner, A. Elofsson, Identification of correct regions in protein models usingstructural, alignment and consensus information, Protein Sci. 15 (2006) 900–913.

[35] R.W. Hooft, G. Vriend, C. Sander, et al., Errors in protein structures, Nature 381(1996) 272.

[36] D. Petrey, B. Honig, Free energy determinants of tertiary structure and the evalua-tion of protein models, Protein Sci. 9 (2000) 2181–2191.

[37] A. Mishra, S. Rao, A. Mittal, et al., Capturing native/native like structures with aphysico-chemical metric (pcSM) in protein folding, Acta Protein Proteomics 1834(2013) 1520–1531.

[38] P. Narang, K. Bhushan, S. Bose, et al., Protein structure evaluation using an all-atomenergy based empirical scoring function, J. Biomol. Struct. Dyn. 23 (2006) 385–406.

[39] P. Narang, K. Bhushan, S. Bose, et al., A computational pathway for bracketingnative-like structures for small alpha helical globular proteins, Phys. Chem. Chem.Phys. 7 (2005) 2364–2375.

[40] M. Lu, A.D. Dousis, J. Ma, OPUS-PSP: an orientation-dependent statistical all-atompotential derived from side-chain packing, J. Mol. Biol. 376 (2008) 288–301.

[41] J. Cheng, Z. Wang, A.N. Tegge, et al., Prediction of global and local quality of CASP8models by MULTICOM series, Proteins 77 (2009) 9181–9184.

[42] P. Larsson, M.J. Skwark, B. Wallner, et al., Assessment of global and local modelquality in CASP8 using Pcons and ProQ, Proteins 77 (2009) 9167–9172.

[43] B. Wallner, A. Elofsson, Prediction of global and local model quality in CASP7 usingPcons and ProQ, Proteins 69 (2007) 8184–8193.

[44] K. Ginalski, A. Elofsson, D. Fischer, et al., 3D-Jury: a simple approach to improve pro-tein structure predictions, Bioinformatics 19 (2003) 1015–1018.

[45] J. Qiu, W. Sheffler, D. Baker, et al., Ranking predicted protein structures with supportvector regression, Proteins 71 (2008) 1175–1182.

[46] Y.Zhang, Protein structure prediction: when is it useful, Current Opinion in Structur-al Biology. 19 (2009)145–155.

[47] K.C. Chou, Review: structural bioinformatics and its impact to biomedical science,Curr. Med. Chem. 11 (2004) 2105–2134.

[48] K.D. Watenpaugh, R.L. Heinrikson, A model of the complex between cyclin-dependent kinase 5 (Cdk5) and the activation domain of neuronal Cdk5 activator,Biochem. Biophys. Res. Commun. 259 (1999) 420–428.

[49] N. Koga, Principles for designing ideal protein structures, Nat. Biotechnol. 491(2012) 222–229.

[50] G. Vriend, WHAT-IF — a molecular modeling and drug design program, J. Mol.Graph. 8 (1990) 52.

[51] A. Giorgetti, D. Raimondo, A.E. Miele, et al., Evaluating the usefulness of proteinstructure models for molecular replacement, Bioinformatics 21 (2005) ii72–ii76.

[52] B. Lee, F.M. Richards, The interpretation of protein structures: estimation of static ac-cessibility, J. Mol. Biol. 55 (1971) 379–400.

[53] R. Luthy, J.U. Bowie, D. Eisenberg, Assessment of protein models with three-dimensional profiles, Nature 356 (1992) 83–85.

[54] C. Colovos, T.O. Yeates, Verification of protein structures, patterns of non-bondedatomic interactions, Protein Sci. 2 (1993) 1511–1519.

[55] R.A. Laskowski, PROCHECK: a program to check the stereo chemical quality of pro-tein structures, J. Appl. Crystallogr. 26 (1993) 283–291.

[56] I.W. Davis, L.A. Fay, B.V. Chen, et al., MolProbity: all-atom contacts and structure val-idation for proteins and nucleic acids, Nucleic Acids Res. 35 (2007) W375–W383.

[57] M.Wiederstein, M.J. Sippl, ProSA-web: interactive web service for the recognition oferrors in three-dimensional structures of proteins, Nucleic Acids Res. 35 (2007)W407–W410.

[58] Y. Yang, Y. Zhou, Specific interactions for ab initio folding of protein terminal regionswith secondary structures, Proteins 72 (2008) 793–803.

[59] A. Mishra, P.S. Rana, A. Mittal, et al., D2N: distance to the native, Biochim. Biophys.Acta Protein Proteomics 10 (2014) 1798–1807.

[60] B. Wallner, A. Elofsson, Can correct protein models be identified? Protein Sci. 12(2003) 1073–1086.

[61] S. Ghosh, S. Vishveshwaraa, Ranking the quality of protein structure models usingside chain based network properties, F1000Res 3 (2014) 17.

[62] R.W. Hooft, G. Vriend, C. Sander, et al., Errors in protein structures, 381 (1986) 272-272.

[63] P. Benkert, M. Kuenzli, T. Schwede, QMEAN server for protein model quality estima-tion, Nucleic Acids Res. 37 (2009) W510–W514.

[64] F. Melo, E. Feytmans, Assessing protein structures with a non-local atomic interac-tion energy, J. Mol. Biol. 277 (1998) 1141–1152.

[65] H.M. Berman, J. Westbrook, Z. Feng, et al., The Protein Data Bank, Nucleic Acids Res.28 (2000) 1235–1242.

[66] D.A. Case, Amber 10, University of California, San Francisco, 2008.[67] M. Pawlowski, M.J. Gajda, R. Matlak, et al., MetaMQAP: a meta-server for the quality

assessment of protein models, BMC Biochem. 9 (2008) 403.[68] D. Baker, A. Sali, Protein structure prediction and structural genomics, Science 294

(2001) 93–96.[69] S.X. Lin, J. Lapointe, Theoretical and experimental biology in one, J. Biomed. Sci. Eng.

6 (2013) 435–442.[70] S. Forsen, Graphical rules for enzyme-catalyzed rate laws, Biochem. J. 187 (1980)

829–835.[71] G.P. Zhou, M.H. Deng, An extension of Chou's graphic rules for deriving enzyme ki-

netic equations to systems involving parallel reaction pathways, Biochem. J. 222(1984) 169–176.

[72] K.C. Chou, Graphic rules in steady and non-steady enzyme kinetics, J. Biol. Chem.264 (1989) 12074–12079.

[73] I.W. Althaus, A.J. Gonzales, F.J. Kezdy, et al., The quinoline U-78036 is a potent inhib-itor of HIV-1 reverse transcriptase, J. Biol. Chem. 268 (1993) 14875–14880.

[74] I.W. Althaus, M.R. Diebel, F.J. Kezdy, et al., Kinetic studies with the non-nucleosideHIV-1 reverse transcriptase inhibitor U-88204E, Biochemistry 32 (1993)6548–6554.

[75] C.T. Zhang, G.M. Maggiora, Disposition of amphiphilic helices in heteropolar envi-ronments, proteins: structure, function, and, Genetics 28 (1997) 99–108.

[76] G.P. Zhou, The disposition of the LZCC protein residues in wenxiang diagram pro-vides new insights into the protein-protein interaction mechanism, J. Theor. Biol.284 (2011) 142–148.

[77] G.P. Zhou, R.B. Huang, The pH-triggered conversion of the PrP(c) to PrP(sc.), currenttopics of, Med. Chem. 13 (2013) 1152–1163.

[78] W. Chen, P.M. Feng, H. Lin, IRSpot-PseDNC: identify recombination spots with pseu-do dinucleotide composition, Nucleic Acids Res. 41 (2013), e68.

[79] H. Lin, E.Z Deng, H. Ding, iPro54-PseKNC: a sequence-based predictor for identifyingsigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition,Nucleic Acids Res. 42 (2014) 12961–12972.

[80] B. Liu, F. Liu, X. Wang, J. Chen, Pse-in-one: a web server for generating variousmodes of pseudo components of DNA, RNA, and protein sequences, Nucleic AcidsRes. 43 (2015) W65–W71.

[81] K.C. Chou, Impacts of bioinformatics to medicinal chemistry, Med. Chem. 11 (2015)218–234.

[82] W. Chen, H. Lin, Pseudo Nucleotide Composition or PseKNC: An Effective Formula-tion for Analyzing Genomic Sequences, Molecular Bio Systems, 2015 http://dx.doi.org/10.1039/c5mb00155b.


Recommended