Macromolecular structure
Bioinformatics
Contents
Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis
Structure alignment Domain recognition
Structure prediction Homology modelling Threading/folder recognition Secondary structure prediction ab initio prediction
Crystal
Hanging drop method / vapour diffusion method
Microscope slide
2-Concentrated salt solution
1-Dilute protein solutionMicroscope
many differentconditions of 1&2must be tried
Crystallisation
Slide courtesy from Shoshana Wodak
Diffraction pattern Atomic model
Determination of protein structure
Slide courtesy from Shoshana Wodak
A high resolution protein structure : 1.5 - 2.0 Å resolution
q
q
q
The resolution problem
Slide courtesy from Shoshana Wodak
Nuclear Magnetic Resonance (NMR)
Source: Branden & Tooze (1991)
Interatomic forces
Covalent interactions Hydrogen bonds Hydrophobic/hydrophilic interactions Ionic interactions van der Waals force Repulsive forces
Structure databases
PDB (Protein database) Official structure repository
SCOP (Stuctural Classification Of Proteins) Structure classification. Top level reflect structural classes.The
second level, called Fold, includes topological and similaritycriteria.
CATH (Class, Architecture, Topology and Homologoussuperfamily)
PDB entry header
HEADER TRANSCRIPTION REGULATION 06-MAR-92 1D66 1D66 2
COMPND GAL4 (RESIDUES 1 - 65) COMPLEX WITH 19MER DNA 1D66 3
SOURCE (SACCHAROMYCES $CEREVISIAE) OVEREXPRESSED IN (ESCHERICHIA 1D66 4
SOURCE 2 $COLI) 1D66 5
AUTHOR R.MARMORSTEIN,S.HARRISON 1D66 6
REVDAT 1 15-APR-93 1D66 0 1D66 7
JRNL AUTH R.MARMORSTEIN,M.CAREY,M.PTASHNE,S.C.HARRISON 1D66 8
JRNL TITL /DNA$ RECOGNITION BY /GAL4$: STRUCTURE OF A 1D66 9
JRNL TITL 2 PROTEIN(SLASH)/DNA$ COMPLEX 1D66 10
JRNL REF NATURE V. 356 408 1992 1D66 11
JRNL REFN ASTM NATUAS UK ISSN 0028-0836 006 1D66 12
REMARK 1 1D66 13
REMARK 2 1D66 14
REMARK 2 RESOLUTION. 2.7 ANGSTROMS. 1D66 15
REMARK 3 1D66 16
REMARK 3 REFINEMENT. 1D66 17
REMARK 3 PROGRAM CORELS;TNT;XPLOR 1D66 18
REMARK 3 AUTHORS J.SUSSMAN;D.TRONRUD;A.BRUNGER 1D66 19
REMARK 3 R VALUE 0.230 1D66 20
REMARK 3 RMSD BOND DISTANCES 0.015 ANGSTROMS 1D66 21
REMARK 3 RMSD BOND ANGLES 2.9 DEGREES 1D66 22
REMARK 4 1D66 23
REMARK 4 THERE ARE TWO DNA CHAINS WHICH HAVE BEEN ASSIGNED CHAIN 1D66 24
REMARK 4 INDICATORS *D* AND *E*. THERE ARE TWO PROTEIN CHAINS 1D66 25
REMARK 4 WHICH HAVE BEEN ASSIGNED CHAIN INDICATORS *A* AND *B*. 1D66 26
REMARK 4 EACH PROTEIN - DNA COMPLEX CONTAINS FOUR BOUND CD IONS. 1D66 27
...
Class
Architecture
Topology
Figure from Shoshana Wodak
CATH - A protein domain classification
In CATH, proteindomains are classifiedaccording to a tree with 4levels of hierarchically Class Architecture Topology Homology
CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secondary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]
Classifications of protein structures (domains)
Slide courtesy from Shoshana Wodak
Books
Branden, C. & Tooze, J. (1991). Introduction to proteinstructure. 1 edit, Garland Publishing Inc., New York andLondon.
Westhead, D.R., J.H. Parish, and R.M. Twyman. 2002.Bioinformatics. BIOS Scientific Publishers, Oxford.
Mount, M. (2001). Bioinformatics: Sequence andGenome Analysis. 1 edit. 1 vols, Cold Spring HarborLaboratory Press, New York.
Gibas, C. & Jambeck, P. (2001). DevelopingBioinformatics Computer Skills, O'Reilly.
Secondary structure - α-helix
Source: Branden & Tooze (1991)
3.6 residues
hydrogen bond
CarbonNitrogenOxygen
Hydrophobicity of side-chain residues in helices
Source: Branden & Tooze (1999)Blue: polarRed: basic or acidic
Secondary structure - β sheets
Antiparallel Parallel
Source: Branden & Tooze (1991)
Secondary structure - twist of β sheets
Mixed β sheet
Source: Branden & Tooze (1991)
Angles of rotation
Each dipeptide unit is characterizedby two angles of rotation Phi around the N-Calpha bond Psi around the Calpha-C bond
Image from Branden & Tooze (1999)
Dipeptide unit
The Ramachandran map
Slide courtesy from Shoshana Wodak
Dipeptide unit
Combinations of secondary structures
loop
α-helix
β-sheet
Retinol binding protein (PDB:1rpb)
Question: Is structure A similar to structure B ?
Structure AStructure B
Approach: structure alignments
Structure-structure alignment and comparison
Slide courtesy from Shoshana Wodak
Open form Closed form
Citrate synthase, ligand induced conformational changesDomain motion and small structural distortions
Analyzing conformational changes
Slide courtesy from Shoshana Wodak
Defining Domains: What for?
Link between domain structure and function
Different structural domains can be associated with
different functions
Enzyme active sites are often at domain interfaces;domain movements play
a functional role
Cathepsin DDNA Methyltransferase
Slide courtesy from Shoshana Wodak
N
C
N
C
C
N
1-cut
2-cuts
4-cuts
Slide courtesy from Shoshana Wodak
Methods for Identifying Domains
Underlying principle Domain limits are defined by identifying groups of residues such
that the number of contacts between groups is minimized.
Domains From Contact Map
Lactate dehydrogenase
Slide courtesy from Shoshana Wodak
Methods for structure prediction
Homology modelling Building a 3D model on the basis of similar sequences
Threading Threading the sequence on all known protein structures, and
testing the consistency
Secondary structure prediction ab initio prediction of tertiary structure
For proteins of normal size, it is almost impossible to predictstructures ab initio.
Some results have been obtained in the prediction ofoligopeptide structures.
Homology modelling - steps
Similarity search Modelling of backbone
Secondary structure elements Loops
Modelling of side chains Refinement of the model Verification
Steric compatibility of the residues
Homology modelling - similarity search
Starting from a query sequence, search for similarsequences with known structure. Search for similar sequences in a database of protein structures. Multiple alignment. A weight can be assigned to each matching protein (higher
score to more similar proteins)
The higher is the sequence similarity, the more accuratewill be the predicted structure. When one disposes of structure for proteins with >70% similarity
with the query, a good model can be expected. When the similarity is <40%, homology modeling gives poor
results. The lack of available structures constitutes one of the main
limitations to homology modeling• In 2004, PDB contains
Homology modelling - Backbone modelling
Modelling of secondary structure elements a-helices b-sheets For each secondary structure element of the template, align the
backbone of query and template.
Loop modelling Databases of loop regions Loop main chain depends on number of aa and neighbour
elements (a-a, a-b, b-a, b-b)
Homology modelling - Side chain modelling
Side-chain conformation (model building and energyrefinement) Conserved side chains take same coordinates as in the template. For non-conserved side chains, use rotamer libraries to
determine the most favourable conformation.
Homology modelling - refinement
After the steps above have been completed, the modelcan be refined by modifying the positions of some atomsin order to reduce the energy.