+ All Categories
Home > Documents > Template Detection for 3D Structure...

Template Detection for 3D Structure...

Date post: 05-Feb-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
33
 Template Detection for Template Detection for 3D Structure Prediction 3D Structure Prediction Michael Tress Michael Tress CNIO CNIO
Transcript
Page 1: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Template Detection forTemplate Detection for 3D Structure Prediction3D Structure Prediction

Michael Tress Michael Tress

CNIOCNIO

Page 2: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

The growth of known sequences and structures

Page 3: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Why are structures so far behind?

Sequencing millions of DNA sequences is relatively easy, while the experimental determination of a single protein structure takes1-3 years

Small targets

Page 4: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Closing the Gap

In its simplest form all that is necessary to build a 3D structural model for a sequence with unknown structure (the query or target sequence) is a similar protein with known structure (a template).

Since experimental techniques for determining protein structure are relatively slow and expensive, modelling is a way of extending the set of known 3D structures.

Page 5: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Comparative modelling steps

Template DetectionFind a sufficiently similar structural template(s) from the PDB, by sequence search (BLAST, HMM, PSI-BLAST or by structure-based techniques (hybrid, fold recognition etc.).

Alignment All template detection methods generate alignments and alignments can be edited.

Model BuildingCan be the simple transference of PDB coordinates but most programs have more complex all atom modelling.

EvaluationMost methods assess model quality, whether the model 3D structure is protein-like.

MRGKVVVE: :::.:.MVGKVTVN

Page 6: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Template Detection

Page 7: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDSYRKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFALVYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDLEDERVVGKEQGQNLARQWCNCAFLESSAKSKINVNEIFYDLVRQINR

??

?

Given a new sequence one fundamental question is "into which structure will it fold?"

Page 8: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

In Fact Protein Structure is Highly Conserved

Early comparisons of the 3D structures of homologous proteins showed that structure is rarely affected by small changes in protein sequence.

For example, these two structures have only 20% sequence identity.

Even large sequence shifts may have little effect on the backbone structure.

Page 9: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

The fact that structure is highly conserved means we can reliably predict structure based on homology

Structural space(more limited)

Sequence space(more diverse)

Comparative Modelling TargetRemotely Homologous Target

So we can make predictions for unknown sequences simply by searching databases of known 3D structures.

Page 10: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

IWLKDFIAVGQILPESQWND-SSIDKIEDRDFLVRYACEPTAEKFVPIDI

IWLKDFIDIAEVYNAKKWAQ-VRQSVNKERNFLVRYVCEPVAENFVPVDI

LQLFNFIRVANVMDGSKWEV-LKGNVDPERDFTVRYICEPTGEKFVDINI

IYLRDLQFVANIKNEKEYLDSVNEGKMDSNMFLCRSACLPSGTNLADLDI

IKLEQIIDKAKVLNQTEWNLLSTTDKGYETTYFLRYACEPSSSNFVAIDI

VKFIDFIRPINVLSESQFADVVIDESNSHSTFLVKRATDNEGNFSDIFDY

All these different sequences fold the same way

Even remotely related sequences can have similar folds

The development of template detection methods that use structural information (fold recognition etc.) came from the observation that even proteins with apparently unrelated sequences had very similar 3-dimensional structures.

Page 11: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Template Identification

Template DetectionA database search program such as FASTA or BLAST is usually sufficient to detect structural templates from a search of the sequences in the PDB database.

This should always be the first step because the results of this search will condition the approach.

DomainsMany proteins are made up of several structural domains. This complicates searching for templates. A domain search should be carried out at the same time as the sequence search.

Domain orientation is often not conserved between homologous proteins.

Multiple TemplatesUsing more than one template can be advantageous.

Page 12: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Detecting Remote

Templates

Page 13: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Similar PDB Template ExistsIf a similar PDB structure (or structures) is found from the search with BLAST, you can move on to the comparative modelling step (after checking the alignment).

No Similar PDB Template ExistsIf no template can be found for a domain, more work will be needed, particularly with the alignment.

The predictor may also have to use more complex sequence search methods (PSIBLAST, FFAS, HMMs) or fold recognition techniques.

If the sequence identity between the target sequence and the nearest template falls below 25 or 30%, homology models are less likely to be successful.

Pairwise sequence search methods detect folds when sequence similarity is high, but are very poor at detecting relationships that have less than 20% identity.

No Template Found by BLAST?

Page 14: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Structural space(more limited)

Sequence space(more diverse)

Comparative Modelling Target

Remotely Homologous Target

The relationship between sequence and structure space applies to more distant folds too

Hybrid and fold recognition techniques can find folds that sequence based methods cannot because they use structural information as well as sequence similarity to evaluate templates..

Page 15: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Advanced Sequence-Based and Hybrid Techniques

PSIBLASTThe more sequences in an alignment the more likely the alignment is reliable. PSI-BLAST uses multiple alignments (profiles) to search for templates.

Profile methods can be as accurate as many fold recognition techniques at detecting remote homology and expert users can usually spot biologically meaningful templates from careful analysis of low-scoring hits.

Hidden Markov Models Hidden Markov models regard the sequence as a series of nodes, each node corresponding to a column in a multiple alignment.

HMMs are very similar to profiles.

Page 16: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

META-PROFILES (Hybrid methods)Coincidence of real and predicted secondary structure and accessibility also generally improves alignments. Hybrid methods are profile-based methods that include predicted secondary structure.

Adding structural information to the profiles helps find distant but structurally similar homologues.

Profile-Profile Searching MethodsProfile-profile alignment methods use evolutionary information in both query and template sequences. As a result, they are able to detect remote homologies beyond the reach of other sequence comparison methods.

More Advanced Sequence-Based and Hybrid Techniques

Page 17: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Hybrid and Sequence-Based Servers

SAM T02 - www.cse.ucsc.edu/research/compbio/HMM-apps/T02-query.htmlThe query is checked against a library of hidden Markov models. This is a sequence based technique, but it does use secondary structure information.

Meta-BASIC - at bioinfo.plMeta-BASIC is based on consensus alignments of profiles. It combines sequence profiles with predicted secondary structure and uses several scoring systems and alignment algorithms.

FFAS – ffas.ljcrf.edu/ffas-cgi/cgi/ffas.plFFAS03 is a profile-profile alignment method, which takes advantage of the evolutionary information in both query and template sequences.

HHPRED – toolkit.tuebingen.mpg.de/hhpredHHpred is based on the pairwise comparison of profile hidden Markov models. It searches domain databases, like Pfam or SMART instead of single sequences and is meant to be as easy as BLAST, but more sensitive at finding remote homologues.

Page 18: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDSYRKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFALVYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDLEDERVVGKEQGQNLARQWCNCAFLESSAKSKINVNEIFYDLVRQINR

?

Fold Recognition

When fold recognition methods were first developed it was thought that they could detect analogous, proteins – those that were structurally similar but with no evolutionary relationship.

In fact most of these predictions were later shown to be homologous (have an evolutionary relationship) once advanced sequence comparison methods, such as PSI-BLAST, were developed.

Fold recognition methods are used because they allow you to find more distant templates.

There is also research to show that no single method can always find the correct identify a fold, so it is often advantageous to use more than one method.

In fold recognition we are asking the opposite to the question we asked

earlier: “given a known protein structure, does the target sequence

fit?”

Page 19: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Fold recognition methods have built in fold libraries

Fold recognition methods work by superimposing the target onto a database of known 3D structures (folds) and evaluating the sequence-fold alignments.

Each method has its own non-redundant database of folds to save calculation time.

Page 20: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Here you start with the query sequence …

Align the sequence with all folds from the library.

(Fold recognition programs use a library of folds …)

Evaluate the fit between the target and all the structures

Fold recognition – a summary

Page 21: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Instead of aligning two sequences … LKRPMT ...

Evaluate whether this would be a good structure (secondary structure, accessibility, contacts)

FRIPLN ...

L

R

K

P

Take the template structure …

F

I

R

P

R

P

I

L

Align the target sequence with the template structure …

Align again …

Evaluation Sequence-Structure Fit

Page 22: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Scoring Functions

Scoring functions for evaluating the sequence-structure fit can include any or all of the following:

●The similarity between the usual and observed residue structural environment

●Pair potentials

●Solvation energy

●Coincidence of real and predicted secondary structure or accessibility

●Evolutionary information (from aligned structures and sequences)

Page 23: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Fold Recognition Servers

3D-PSSM - www.sbg.bio.ic.ac.uk/~3dpssm/Based on sequence profiles, solvatation potentials and secondary structure.

mGenTHREADER - www.psipred.net/Combines profiles and sequence-structure alignments. A neural network-based jury system calculates the final score based on solvation and pair potentials.

RAPTOR - software.bioinformatics.uwaterloo.ca/~raptor/Best-scoring server in CAFASP3 competition in 2002. ACE server (based on Raptor) best FR server in CASP6. You have to ask to use it first ...

SPARKS2,3,4 - http://sparks.informatics.iupui.edu/Top servers in CASP 6. Sequence, secondary structure Profiles And Residue-level Knowledge-based Score for fold recognition

Page 24: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Consensus Fold Recognition

No one method can hope to correctly identify every fold. In fact the best predictions are often when server predictions agree.

Human experts have recognised this, and generally they are better at fold prediction than machines.

Human experts usually use several different fold recognition methods and predict folds after evaluating all the results (not just the top hits) from a range of methods.

So why not produce an algorithm that mimics the human experts?

The first consensus server, Pcons, sent the target sequence to six publicly available fold recognition web servers.

Predictions were structurally superimposed and evaluated for their similarity. The best model was predicted from similarity to other predicted models.

Page 25: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Consensus Fold Recognition Servers

3D Jury - http://bioinfo.pl/meta/3D Jury is a consensus predictor that utilizes the results of fold recognition servers, such as FFAS, 3D-PSSM, FUGUE and mGenTHREADER, and uses a jury system to select alignments and templates. Models are built with Modeller.

GeneSilico - http://genesilico.pl/meta/A gateway to various methods for protein structure prediction. Domains are identified by HmmPfam, and there are several methods for secondary and tertiary structure (FR) prediction. Consensus predictions are made with the Pcons consensus server and you can also send a subset of alignments to the FRankenstein server.

Pcons - www.sbc.su.se/~arne/pcons/Pcons was the first consensus server for fold recognition. It is built into both the main consensus servers, but now can be used on its own once again.

Page 26: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Alignments

Page 27: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Alignments

TARGET FLGAAQIMPDDQFEKSSLDKIRDN--IVRIAMDPSSDKFLAMDIFNV--RGRELEEFLKKLS |....||.|..|...||.|||.|. .||.|..|...||...|||.. |..|..|.||..|TEMPLATE FIAVGQILPESQWNDSSIDKIEDRDFLVRYACEPTAEKFVPIDIFQIIRRVKEMDEYLKRVS

The quality of any comparative model depends mostly on the alignment between the target and template sequences.

TARGET FIALGQILSESQYQDSSIDKIEDRDIIVRYGCEPTADKFVPIEIFQILRRVKEMDEFLKRVS |||.||||.|||..|||||||||||..|||.|||||.|||||.||||.||||||||.|||||TEMPLATE FIAVGQILPESQWNDSSIDKIEDRDFLVRYACEPTAEKFVPIDIFQIIRRVKEMDEYLKRVS

Generally, the alignment is more likely to be correct if there is high sequence similarity and few gaps between the two sequences ...

… aligning two sequences becomes more complicated and less reliable when % identity falls and gaps are introduced.

Page 28: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Alignments come free, but are not always reliable

1aac DKATIPSEPFAAAEVADGAIVVDIAKMKYETPELHVKVGDTVTWINREAMPHNVHFVAGV : ... . :. .. :. ... : ..:::. :1plc IDVLLGADDGSLAFVPSEFSISPGEKIVFKNNAGFPHNIVFDEDS

1aac L--GEAALKGPMMKKE------QAYSLTFTEAGTYDYHCTPHPF--MRGKVVVE . : : : : ... ... ..... : :...:.:: : :::.:.1plc IPSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQGAGMVGKVTVN

All methods of template detection, whether sequence-based, fold recognition or hybrid generate alignments between the query sequence and the PDB template sequence as a side effect of the search.

If an accurate 3D model is to be built, it is vital that the target-template alignments are correct. Particularly at lower percentage identity the biggest errors stem from the alignments.

Alignments from sequence-based methods are not always reliable - they are usually biased towards sequence evolution not structure. Fold recognition alignments are not any more reliable.

Page 29: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Multiple Alignments

Generally the more evolutionarily related sequences in an alignment, the more likely the alignment is reliable.

For that reason most people generate alignments with multiple sequence alignment programs or profile-based methods.

Since the alignments are for the purposes of generating structures, it is important to take into account structural information such as the coincidence of real and predicted secondary structure and accessibility.

Alignments should be edited manually using actual and predicted secondary structure and accessibility information, and careful placement of gaps.

MREYKIVVLGSGGVGKSALTVQFVQCIFVEKYDPTIEDSYRKQVEVDGQQCMLEILDTAGTEQFTAMRDLYMKNGQGFVLQINKK-KSFKVVLLGEGCVGKTSIVFRYIDNIFNDKHLMTQHAGFFKHINIGGKRICLTIWDTAGQERFHALGPIYYRGSQGALLTRRM----LKIVVVGDGAVGKTCLLISYVQGTFPTDYIPTIFENYVTNIEPNGQIIELALWDTAGQEEYSRLRPLSYTNADVLMVIHTLLS---IKFLALGDSGVGKTSVLYQYTDGKFNSKFITTVGIDFREGTVGRGQRIHLQLWDTAGQERFRSLTTAFFRDAMGFLLLDLIMK--TFKVLLVGDSGVGKSCILTRFTSGIFEESTTSTIGVDFKKYLTADGKRCKLTIWDTAGQERFRTLTSSYYRGAQGIIFVLKILQEKYRLVVVGGGGVGKSALTIQFIQSYFVTDYDPTIEDSYTKQCVIDDRAARLDILDTAGQEEFGAMREQYMRTGEGFLVRVIRKMQSIKLVVVGDGAVGKTCLLISYTSNSFPTEYVPTVFDNYSANVMVDNKTVSLGLWDTAGQEDYDRLRPLSYPQTDVFLICAV------FKVVLIGDSGVGKSNLLSRFTRNEFNLESKSTIGVEFARSIQVDGKTIKAQIWDTAGQERYRAITSAYYRGAVGALLVEIYR----KLVLLGDVGAGKSSLVLRFVKDQFVEFQESTIGAAFFSQLAVNDATVKFEIWDTAGQERYHSLAPMYYRGAAAAIIAKRLP---YKIILVGESGVGKSSILVRFTDNTFSQHFAPTLGVDFNVKTIETGQTVKLQLWDTAGQERFKSITQTFYRGSHGVIVVYIEA

Page 30: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Sequence Alignment vs. Structural Alignments

PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS TEMPLATE | . . | | | | | . | PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS TARGET (ALIGNMENT 1) PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS TARGET (ALIGNMENT 2)

"Alignment 1" may be chosen because of the PRO at position 7. But the 10 Angstrom gap is too big to close with a single peptide bond.

Page 31: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Simplified Protein Structure Simplified Protein Structure Prediction Flow ChartPrediction Flow Chart

Page 32: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Simplified Protein Structure Simplified Protein Structure Prediction Flow ChartPrediction Flow Chart

Page 33: Template Detection for 3D Structure Predictionubio.bioinfo.cnio.es/Cursos/ImmunoInformatics/StrPred/TempDetectIg.pdf · Similar PDB Template Exists If a similar PDB structure (or

   

Acknowledgments

Ana Rojas

Florencio Pazos

Lawrence Kelley

Arne Eloffson

and anyone else whose figures I

used ...

✤ CASP6✤ Osvaldo Graña✤ Iakes Ezkurdia✤ Gonzalo Lopez✤ Alfonso Valencia

✤ CASP7✤ Osvaldo Graña✤ Iakes Ezkurdia✤ Gonzalo Lopez✤ Alfonso Valencia✤ Ana Rojas✤ Txema G. Izarzugaza

[email protected]


Recommended