Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 219 times |
Download: | 1 times |
PHAR 201 Lecture 3 2012 1
Know the Limitations of your Data – X-ray, NMR, EM
PHAR 201/Bioinformatics I
Philip E. Bourne
SSPPS, UCSD
Prerequisite Reading: Structural Bioinformatics Chapters 4-6
When You Grab a PDB Fie What Are You Starting With?
PHAR 201 Lecture 3 2012 2
PHAR 201 Lecture 3 2012
PDB ID
DistributionSite
Depositor
ArchivalData
Core DB
PDB Entry
Deposit Annotate Validate
Depositor Approval
Validation Report
Corrections
Step 2
Step 3
Step 4
Step 1
Data Views• Depositor/Annotator
• Type of experiment: X-ray, NMR, EM
• Type of molecule: protein, nucleic acid, or protein-nucleic acid complex
3
PHAR 201 Lecture 3 2012
Annotation
• Resolve nomenclature and format problems
• Add missing required data items
• Add higher level classifications
• Review validation report and summary letter to the
depositor
• Produce and check final mmCIF and PDB files
• Update status and load database
• Check data consistency across archive
4
PHAR 201 Lecture 3 2012
Annotation – More Specifics
• Make sure entry is complete (mandatory items from mmCIF
dictionary)
• Format exchange
– Converts between PDB and mmCIF formats
– Recognizes most variants of PDB format
• Check nomenclature
– Residue
– Polymer atoms
– Hydrogen atoms
– Ligand atoms
5
PHAR 201 Lecture 3 2012
Validation• Covalent geometry
– Comparison with standard values (Engh and Huber1; Gelbin et al.3; Clowney et al.2 )
– Identify outliers
• Stereochemistry – check chiral centers
• Close contacts in asymmetric unit and unit cell
• Occupancy
• Sequence in SEQRES and coordinates
• Distant waters
• Experimental (SFCHECK4)1R.A.Engh & R.Huber. Acta Cryst. A47 (1991):392-4002L. Clowney et al. J.Am.Chem.Soc. 118 (1991):509-5183A. Gelbin et al. J.Am.Chem.Soc. 118 (1991):519-5294A.A. Vaguine, J. Richelle, and S.J. Wodak. Acta Cryst. D55 (1999):191-205.
6
PHAR 201 Lecture 3 2012
The process by which biological data in a database are annotated and validated
changes over time – this introduces a temporal
inconsistency
7
PHAR 201 Lecture 3 2012
Summary Thus Far• The biocurators (annotators) are the unsung
heroes of modern biology
– International Society for Biocuration
• As a resource developer - start right and the need for data remediation in years to come will be less likely
• As a resource user - be aware of the process used to provide the data and hence the limitations of the data you are using
P.E.Bourne and J. McEntyre 2006 Biocurators: Contributors to the World of Science PLoS Comp. Biol., (Editorial) 2(10) e142 [PDF]
8
The quality of the data you use in a bioinformatics experiment is a function of the method used to collect these data – understand
the method
PHAR 201 Lecture 3 2012 9
PHAR 201 Lecture 3 2012 10
As of Oct 5, 2011
EM254
PHAR 201 Lecture 3 2012 11
X-ray Crystallography• Oldest technique• Majority of the depositions• A number of Nobel prizes• International Union of Crystallography (IUCr) .. Acta ..• Method based on scattering from electrons – hydrogen
atoms usually not seen (sometimes modeled in)• In fact modeling in is an issue• Atoms of similar atomic weight not distinguishable eg O, N,
C• Influence of crystal packing eg malate dehydrogenase
(4MDH)• Environment in crystal highly aqueous• Produces similar structures to NMR eg thioredoxin (3TRX
vs 1SRX)
PHAR 201 Lecture 3 2012 12
Basic Steps
Target Selection
Crystallomics• Isolation,• Expression,• Purification,• Crystallization
DataCollection
StructureSolution
StructureRefinement
Functional Annotation Publish
The X-ray Crystallography Pipeline
PHAR 201 Lecture 3 2012 13
Limitations - Crystallization
• Crystallization:– Non-soluble– Twinning– Micro heterogeneity– Disorder
Limitations – Data Collection
PHAR 201 Lecture 3 2012 14
Limitations - Refinement
PHAR 201 Lecture 3 2012 15
PHAR 201 Lecture 3 2012 16
Limitations – Map Fitting
• In an intricate study the only way to be sure that the work is correct is to make your own judgment from the electron density – this is never done.
• It can be done at http://eds.bmc.uu.se/eds/
• It requires that the experimental data (the structure factors be available)
100d
Limitations – Non-crystallographic Symmetry (NCS)
PHAR 201 Lecture 3 2012 17
PHAR 201 Lecture 3 2012 18
Limitations – Refinement
• Introduces restraints/constraints that may or may be realistic
• Water has been used unnecessarily• Resolution quoted wrongly• Standards have helped• See for example: H. Weissig, and P.E. Bourne
1999 Bioinformatics 15(10) 807-831. An Analysis of the Protein Data Bank in Search of Temporal and Global Trends
Limitations – Interpretation of the Biologically Active Molecule
PHAR 201 Lecture 3 2012 19
http://www.pdb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/bioassembly_tutorial.html
1QQP
PHAR 201 Lecture 3 2012 20
Limitations – Functional Annotation
• Functional annotation is ONLY in the publication NOT PDB
• Attempt to address this with GO assignments • Attempt to address this with literature integration • Structural genomics – function unknown• One structure – one to many functions (power law)
– functions may be unrecognized since the PDB is relatively static
• Many efforts at functional annotation
PHAR 201 Lecture 3 2012 21
Why Are Understanding Limitations Important?
• Later we will study reductionism – a key process in the use of biological data
• As a result of reductionism you will need to choose a representative structure for the task at hand
• Understanding the limitations of the experiment will help us do this
PHAR 201 Lecture 3 2012 22
Summary of Important Features in using Structure Data Determined by X-ray
Crystallography
• Resolution is a key indicator – think about it relative to atomic resolution ie 1.54A for a C-C single bond
• Disorder (ie undetermined or alternative atomic coordinates) is a natural part of many structures
• R factor (all) describes the agreement of the model with the experimental data. It should be better than 0.20 (Rfree 0.26)
PHAR 201 Lecture 3 2012 23
Summary of Important Features in using Structure Data Determined by X-ray
Crystallography Cont.
• B (aka temperature) factors offer indicators both to the accuracy of a structure and the most mobile regions
• At right is 5EBX drawn with QuickPDB
PHAR 201 Lecture 3 2012 24
NMR
PHAR 201 Lecture 3 2012 25
Features of NMR• Limited in size (25-100 kDa) – provided labeled samples are
obtainable• Selected information on proteins to ~150kDa• Solution study – small sample needed for soluble proteins• Only a few solid state studies• Reveals hydrogen positions• Leads to an ensemble of dynamical structures – these are
rarely used in bioinformatics studies• Useful in high throughput screens to determine protein
ligand interactions• Used for phasing of X-ray structures ie the methods are
synergistic• Until recently applicable to membrane proteins
PHAR 201 Lecture 3 2012 26
NMR - Methodology• Molecules are tumbling and vibrating with thermal motion• Usually labeled with H1 C13 N15 P31 - in an external magnetic field
have two spin states – one paired and one opposed to the external magnetic field
• Detects and assigns chemical shifts of atomic nuclei with non-zero spin
• The shifts depend on their electronic environments ie identities and distances of nearby atoms
• The system can be tuned to look at specific features of the characteristic spin moments
• H1 H1 provides NOE constraints
• Better resolution is obtained when the molecule is tumbling fast – size slows this – offset by higher magnetic field strengths
• Protein must be soluble at high concentration and stable without aggregation – high throughput can show this and folded vs unfolded very quickly
PHAR 201 Lecture 3 2012 27
NMR – Methodology cont.• Result is a set of distance constraints between pairs of
atoms either bonded or non-bonded• If there are sufficient constraints then an ensemble of
possibilities results • Often this ensemble is averaged and constraints adjusted to
conform to normal bond lengths and distances• Usually left with 15-30 members of the ensemble• Ideally less than 1Å RMSD between models (backbone
only)• Portions of the molecule with high motion have tell-tale
signals eg apo calmodulin
PHAR 201 Lecture 3 2012 28
BMRB - http://www.bmrb.wisc.edu/
PHAR 201 Lecture 3 2012 29
NMR Terms• COSY/NOESY spectra: Allow the space interactions between atoms
to be measured and generate a 3D structure of the protein. (what we have discussed)
• TROSY Transverse Relaxation Optimized Spectroscopy: Invented about 1997. First described by Professor Kurt Wuthrich. Useful for analyzing larger protein systems. TROSY is a method for getting sharper peaks on large proteins. TROSY is best at higher fields. If the aim is to study a large complex or a chemical shift perturbation when a protein binds to a receptor using NMR, it’s better to use a 900 MHz machine than a more standard lower-field machine
• solid state NMR: Requires wider-bore (63 or even 89 mm diameter) magnets (than solution state NMR). The higher stored energy of these wide bore magnets means that they are significantly more difficult to build, and as a result high-field solid state NMR lags behind liquid state in terms of available field strength.
• multidimensional (three- and four-dimensional) NMR: Introduced about 12-15 years ago. This technology has the advantage of resolving the severe overlap in 2D spectra.
PHAR 201 Lecture 3 2012 30
In both X-ray crystallography and NMR there is the danger that the
final structure reflects the model it was computed against
PHAR 201 Lecture 3 2012 31
Additional Validation Checks
• Stereochemical quality– Ramachandran plot outliers– Dihedrals, bond lengths and angles– Fold Deviation Score (FDS)– Validation Server
http://deposit.rcsb.org/validate/
PHAR 201 Lecture 3 2012 32
Use the PDB Geometry Data
PHAR 201 Lecture 3 2012 33
Electron Microscopy
• Able to look at large molecular assemblies• Resolution now 30A to below 4A• Cryo-EM preserves aqueous environment (no
staining)• Experimentally more tractable• Can resolve images (direct measurement of
phases) or diffraction patterns• Can provide a 3D volumetric reconstruction• Suitable for the study of membrane proteins eg
bacteriorhodopsin (1990)
1KVP STRUCTURAL ANALYSIS OF THE SPIROPLASMA VIRUS, SPV4, IMPLICATIONS FOR EVOLUTIONARY VARIATION TO OBTAIN HOST DIVERSITY AMONG THE MICROVIRIDAE,
PHAR 201 Lecture 3 2012 34
1P85 Real space refined coordinates of the 50S subunit fitted into the low resolution cryo-EM map of the EF-G.GTP state of E. coli 70S ribosome
• Single particle reconstruction – multiple orientations of the same particle found in the specimen (viruses, ribosome…)
• Electron tomography – 3D reconstruction of a single particle (organelles, whole cells)
PHAR 201 Lecture 3 2012 35
Example EM Result• Example for a hybrid study that combines
elements of electron crystallography and helical reconstruction with homology modeling and molecular docking approaches in order to elucidate the structure of an actin-fimbrin crosslink (Volkmann et al., 2001b). Fimbrin is a member of a large superfamily of actin-binding proteins and is responsible for crosslinking of actin filaments into ordered, tightly packed networks such as actin bundles in microvilli or stereocilia of the inner ear. The diffraction patterns of ordered paracrystalline actin-fimbrin arrays (background) were used to deduce the spatial relationship between the actin filaments (white surface representation) and the various domains of the crosslinker (the two actin-binding domains of fimbrin are pink and blue, the regulatory domain cyan). Combination of this data with homology modeling and data from docking the crystal structure of fimbrin’s N-terminal actin-binding domain into helical reconstructions (Hanein et al., 1998), allowed us to build a complete atomic model of the crosslinking molecule (foreground, color scheme as in surface representation of the array).
• From Structural Bioinformatics 2005 p124
PHAR 201 Lecture 3 2012 36
• Example for a combination of high-resolution structural information from X-ray crystallography and medium-resolution information from electron cryomicroscopy (here 2.1 nm). Actin and myosin were docked into helical reconstructions of actin decorated with smooth-muscle myosin (Volkmann et al., 2000). Interaction of myosin with filamen tous actin has been im plicated in a variety of biological activities including muscle contraction, cytokinesis, cell movement, membrane transport, and certain sig nal transduction pathways. Attempts to crystallize actomyosin failed due to the tendency of actin to polymerize. Docking was performed using a global search with a density correlation measure (Volkmann and Hanein, 1999). The estimated accuracy of the fit is 0.22 nm in the myosin portion and 0.18 nm in the actin portion. One actin molecule is shown on the left as a molecular sur face representation. The yellow area de notes the largest hydrophobic patch on the exposed surface of the filament, a region expected to participate in actomyosin interactions. The fitted atomic model of my osin is shown on the right. The trans par ent envelope repre sents the density correspond ing to myosin in the 3D reconstruc tion. The solution set concept (see text) was used to evaluate the results and to assign probabilities for residues to take part in the interaction. The tone of red on the myosin model is proportional to this statistically evaluated probability (the more red, the higher the prob ability).
• From Structural Bioinformatics 2005 p127
Example EM Result
Small-angle X-ray Scattering SAXS
• Reveals shape and size of macromolecules in the range 5-25nm
• Handles partially ordered systems
• No need for crystalline sample; larger molecules than NMR, but at lower resolution
• Leading to hybrid techniques
PHAR 201 Lecture 3 2012 37
http://en.wikipedia.org/wiki/Small-angle_X-ray_scattering
PHAR 201 Lecture 3 2012 38
Summary Regarding Data Limitations
• Pay attention to the method its pluses and minuses• Be aware of models• Be aware of the general limitations of each method• For NMR be aware of an ensemble of structures• Be aware of hybrid models• For all methods be aware of the parameters that govern the
accuracy• You will need to know these limitations for just about any
bioinformatics study since it will be necessary to choose a non-redundant set (NR) – we will visit Astral and Pisces which are tools in defining an NR set