The Protein Data Bank (PDB)
Page 287
• PDB is the principal repository for protein structures• Established in 1971• Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org• Currently contains over 32,000 structure entities
Updated 9/05
PDB content growth (www.pdb.org)
year
stru
ctur
es
Fig. 9.6Page 281
PDB holdings (September, 2005)
29,876 proteins, peptides1,338 protein/nucl. complexes1,500 nucleic acids13 carbohydrates32,727 total
Table 9-2Page 281
Protein Data Bank
Swiss-Prot, NCBI, EMBL
CATH, Dali, SCOP, FSSP
Fig. 9.10 Page 285
gateways to access PDB files
databases that interpret PDB files
Access to PDB through NCBI
Page 289
You can access PDB data at the NCBI several ways.
• Go to the Structure site, from the NCBI homepage• Use Entrez• Perform a BLAST search, restricting the output to the PDB database
Access to PDB through NCBI
Page 291
Molecular Modeling DataBase (MMDB)
Cn3D (“see in 3D” or three dimensions):structure visualization software
Vector Alignment Search Tool (VAST):view multiple structures
Fig. 9.15 Page 290
Fig. 9.15 Page 290
Fig. 9.16 Page 291
Fig. 9.16 Page 291
Fig. 9.16 Page 291
Fig. 9.16 Page 291
Fig. 9.16 Page 291
Fig. 9.17 Page 292
Access to structure data at NCBI: VAST
Page 294
Vector Alignment Search Tool (VAST) offers a varietyof data on protein structures, including
-- PDB identifiers-- root-mean-square deviation (RMSD) values to describe structural similarities-- NRES: the number of equivalent pairs of alpha carbon atoms superimposed-- percent identity
Many databases explore protein structures
Page 293
SCOP
CATH
Dali Domain Dictionary
FSSP
Structural Classification of Proteins (SCOP)
Page 293
SCOP describes protein structures using a hierarchical classification scheme:
ClassesFoldsSuperfamilies (likely evolutionary relationship)FamiliesDomainsIndividual PDB entries
http://scop.mrc-lmb.cam.ac.uk/scop/
Class, Architecture, Topology, andHomologous Superfamily (CATH) database
Page 293
CATH clusters proteins at four levels:
C Class (, , & folds)A Architecture (shape of domain, e.g. jelly roll)T Topology (fold families; not necessarily homologous)H Homologous superfamily
http://www.biochem.ucl.ac.uk/basm/cath_new
SCOP statistics (September, 2005)
Class # folds # superfamilies # familiesAll 218 376 608All 144 290 560/ 136 222 629+ 279 409 717…Total 945 1539 2845
Table 9-4Page 298
= parallel sheets= antiparallel sheets
Fig. 9.23Page 298
Fig. 9.24Page 299
Fig. 9.25Page 300
Fig. 9.25Page 300
Fig. 9.26Page 301
Fig. 9.27Page 302
Fig. 9.28Page 303
Dali Domain Dictionary
Page 302
Dali contains a numerical taxonomy of all knownstructures in PDB. Dali integrates additional data for entries within a domain class, such as secondary structure predictions and solvent accessibility.
Fig. 9.29Page 303
Fig. 9.30Page 304
Fig. 9.30Page 304
Fig. 9.30Page 304
Fold classification based on structure-structurealignment of proteins (FSSP)
Page 293
FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length).Representative sets exclude sequence homologssharing > 25% amino acid identity.
The output includes a “fold tree.”
http://www.ebi.ac.uk/dali/fssp
Fig. 9.31Page 305
FSSP: fold tree
Fig. 9.32Page 306
Fig. 9.33Page 307
Fig. 9.34Page 307
Page 303-305
There are about >20,000 structures in PDB, andabout 1 million protein sequences in SwissProt/TrEMBL. For most proteins, structural modelsderive from computational biology approaches,rather than experimental methods.
The most reliable method of modeling and evaluatingnew structures is by comparison to previouslyknown structures. This is comparative modeling.
An alternative is ab initio modeling.
Approaches to predicting protein structures
obtain sequence (target)
fold assignment
comparativemodeling
ab initiomodeling
build, assess model Fig. 9.35Page 308
Approaches to predicting protein structures
Page 305
[1] Perform fold assignment (e.g. BLAST, CATH, SCOP); identify structurally conserved regions
[2] Align the target (unknown protein) with the template. This is performed for >30% amino acid identity over a sufficient length
[3] Build a model
[4] Evaluate the model
Comparative modeling of protein structures
Page 306
Errors may occur for many reasons
[1] Errors in side-chain packing
[2] Distortions within correctly aligned regions
[3] Errors in regions of target that do not match template
[4] Errors in sequence alignment
[5] Use of incorrect templates
Errors in comparative modeling
Page 306
In general, accuracy of structure prediction dependson the percent amino acid identity shared betweentarget and template.
For >50% identity, RMSD is often only 1 Å.
Comparative modeling
Baker and Sali (2000)Fig. 9.36Page 308
Page 309
Many web servers offer comparative modeling services.
Examples areSWISS-MODEL (ExPASy)Predict Protein server (Columbia)WHAT IF (CMBI, Netherlands)
Comparative modeling