Download - The Protein Data Bank (PDB)

The Protein Data Bank (PDB)

Page 287

• PDB is the principal repository for protein structures• Established in 1971• Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org• Currently contains over 32,000 structure entities

Updated 9/05

PDB content growth (www.pdb.org)

year

stru

ctur

es

Fig. 9.6Page 281

PDB holdings (September, 2005)

29,876 proteins, peptides1,338 protein/nucl. complexes1,500 nucleic acids13 carbohydrates32,727 total

Table 9-2Page 281

Protein Data Bank

Swiss-Prot, NCBI, EMBL

CATH, Dali, SCOP, FSSP

Fig. 9.10 Page 285

gateways to access PDB files

databases that interpret PDB files

Access to PDB through NCBI

Page 289

You can access PDB data at the NCBI several ways.

• Go to the Structure site, from the NCBI homepage• Use Entrez• Perform a BLAST search, restricting the output to the PDB database

Access to PDB through NCBI

Page 291

Molecular Modeling DataBase (MMDB)

Cn3D (“see in 3D” or three dimensions):structure visualization software

Vector Alignment Search Tool (VAST):view multiple structures

Fig. 9.15 Page 290

Fig. 9.15 Page 290

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.16 Page 291

Fig. 9.17 Page 292

Access to structure data at NCBI: VAST

Page 294

Vector Alignment Search Tool (VAST) offers a varietyof data on protein structures, including

-- PDB identifiers-- root-mean-square deviation (RMSD) values to describe structural similarities-- NRES: the number of equivalent pairs of alpha carbon atoms superimposed-- percent identity

Many databases explore protein structures

Page 293

SCOP

CATH

Dali Domain Dictionary

FSSP

Structural Classification of Proteins (SCOP)

Page 293

SCOP describes protein structures using a hierarchical classification scheme:

ClassesFoldsSuperfamilies (likely evolutionary relationship)FamiliesDomainsIndividual PDB entries

http://scop.mrc-lmb.cam.ac.uk/scop/

Class, Architecture, Topology, andHomologous Superfamily (CATH) database

Page 293

CATH clusters proteins at four levels:

C Class (, , & folds)A Architecture (shape of domain, e.g. jelly roll)T Topology (fold families; not necessarily homologous)H Homologous superfamily

http://www.biochem.ucl.ac.uk/basm/cath_new

SCOP statistics (September, 2005)

Class # folds # superfamilies # familiesAll 218 376 608All 144 290 560/ 136 222 629+ 279 409 717…Total 945 1539 2845

Table 9-4Page 298

= parallel sheets= antiparallel sheets

Fig. 9.23Page 298

Fig. 9.24Page 299

Fig. 9.25Page 300

Fig. 9.25Page 300

Fig. 9.26Page 301

Fig. 9.27Page 302

Fig. 9.28Page 303

Dali Domain Dictionary

Page 302

Dali contains a numerical taxonomy of all knownstructures in PDB. Dali integrates additional data for entries within a domain class, such as secondary structure predictions and solvent accessibility.

Fig. 9.29Page 303

Fig. 9.30Page 304

Fig. 9.30Page 304

Fig. 9.30Page 304

Fold classification based on structure-structurealignment of proteins (FSSP)

Page 293

FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length).Representative sets exclude sequence homologssharing > 25% amino acid identity.

The output includes a “fold tree.”

http://www.ebi.ac.uk/dali/fssp

Fig. 9.31Page 305

FSSP: fold tree

Fig. 9.32Page 306

Fig. 9.33Page 307

Fig. 9.34Page 307

Page 303-305

There are about >20,000 structures in PDB, andabout 1 million protein sequences in SwissProt/TrEMBL. For most proteins, structural modelsderive from computational biology approaches,rather than experimental methods.

The most reliable method of modeling and evaluatingnew structures is by comparison to previouslyknown structures. This is comparative modeling.

An alternative is ab initio modeling.

Approaches to predicting protein structures

obtain sequence (target)

fold assignment

comparativemodeling

ab initiomodeling

build, assess model Fig. 9.35Page 308

Approaches to predicting protein structures

Page 305

[1] Perform fold assignment (e.g. BLAST, CATH, SCOP); identify structurally conserved regions

[2] Align the target (unknown protein) with the template. This is performed for >30% amino acid identity over a sufficient length

[3] Build a model

[4] Evaluate the model

Comparative modeling of protein structures

Page 306

Errors may occur for many reasons

[1] Errors in side-chain packing

[2] Distortions within correctly aligned regions

[3] Errors in regions of target that do not match template

[4] Errors in sequence alignment

[5] Use of incorrect templates

Errors in comparative modeling

Page 306

In general, accuracy of structure prediction dependson the percent amino acid identity shared betweentarget and template.

For >50% identity, RMSD is often only 1 Å.

Comparative modeling

Baker and Sali (2000)Fig. 9.36Page 308

Page 309

Many web servers offer comparative modeling services.

Examples areSWISS-MODEL (ExPASy)Predict Protein server (Columbia)WHAT IF (CMBI, Netherlands)

Comparative modeling