+ All Categories
Home > Science > Protein structure prediction with a focus on Rosetta

Protein structure prediction with a focus on Rosetta

Date post: 22-Jan-2018
Category:
Upload: bcbbslides
View: 75 times
Download: 2 times
Share this document with a friend
50
1 WITH A FOCUS ON ROSETTA This presentation was prepared by: Xavier Ambroggio, [email protected] PROTEIN STRUCTURE PREDICTION OFFICE OF CYBER INFRASTRUCTURE AND COMPUTATIONAL BIOLOGY NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES
Transcript
Page 1: Protein structure prediction with a focus on Rosetta

1

WITH A FOCUS ON ROSETTA

This presentation was prepared by: Xavier Ambroggio, [email protected]

PROTEIN STRUCTURE PREDICTION

OFFICE OF CYBER INFRASTRUCTURE AND COMPUTATIONAL BIOLOGY

NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES

Page 2: Protein structure prediction with a focus on Rosetta

Fall 2011 Computational Structural Biology Seminar Series

2

9 – 11 AM, T/Th in 12A/B51 http://training.cit.nih.gov

Week Day Date Course Instructor CIT Course #

Week 1 Tues Aug. 23 Fundamentals, Data Sources, and Visualization of Macromolecular Structure Darrell Hurt SS260-11001

Thurs Aug. 25 Generating Protein Structures from Homology Darrell Hurt SS270-11001

Week 2 Tues Aug. 30 Predicting Protein Structures from Amino Acid Sequences Xavier Ambroggio SS660-11001

Thurs Sept. 1 Predicting Macromolecular Complexes from Uncomplexed Structures Xavier Ambroggio SS670-11001

Week 3 Tues Sept. 6 Design and Analysis of Macromolecular Interfaces Xavier Ambroggio SS770-11001

Thurs Sept. 8 Analysis and Advanced Visualization of Macromolecular Structure Darrell Hurt SS330-11001

Week 4 Tues Sept. 13 Computational Drug Design Mike Dolan SS340-11001

Thurs Sept. 15 Introduction to Molecular Dynamics Mike Dolan TBA

Week 5 Thurs. Sept. 22 Advanced Molecular Dynamics Mike Dolan TBA

Page 3: Protein structure prediction with a focus on Rosetta

Bioinformatics and Computational Biosciences Branch

3

Scientific Collaboration

Scientific Training

Custom Scientific Software &

Infrastructure

•  Structural Biology •  Phylogenetics •  Statistics •  Sequence Analysis •  Microarray Analysis •  NGS Analysis •  Bioinformatics •  Biological Networks •  Function Prediction •  …

Page 4: Protein structure prediction with a focus on Rosetta

4

Ab Initio Structure Prediction: Given an amino acid sequence, find the tertiary structure

“Protein folding problem”

Page 5: Protein structure prediction with a focus on Rosetta

CASP: Critical Assessment of protein Structure Prediction

http://predictioncenter.org

•  Double-blind experiment (…competition) •  World-wide scientific community •  Unbiased assessment of techniques in structure

prediction •  Biennial (every even year)

•  “Pulse” of the prediction community •  What can be predicted? •  Which servers/algorithms perform best?

Page 6: Protein structure prediction with a focus on Rosetta

6

CASP Overview

Blutsbrüder Design

Page 7: Protein structure prediction with a focus on Rosetta

CASP Top Free-Modeling Servers

7

Why Rosetta focus? •  Standalone •  Versatile

  RNA   design   dock   …

•  Open Source •  Substantial Literature •  Shared methodology

Use any and all available servers!!!

Page 8: Protein structure prediction with a focus on Rosetta

Das & Baker Annu. Rev. Biochem 2008

prediction

design

Rosetta: multipurpose macromolecular modeling suite

CIT Course # SS660-11001

CIT Course # SS670-11001

CIT Course # SS770-11001

Page 9: Protein structure prediction with a focus on Rosetta

ab initio predict the structure from sequence

relax refine the structure using Rosetta energy functions

idealize replace bond geometries with ideal values

loop modeling build and refine local structurally variable regions in context of a structural template

design optimize sequence given a structure with a fixed backbone

docking structure prediction for a protein-protein complex given subunits

ligand ligand docking

ddG prediction protein-protein interface and protein stability ddG stability calculations for mutations

scoring score input conformations with Rosetta energy functions

RNA predict RNA structures from sequences and design sequences from fixed structures

clustering grouping input structures by RMSD to each other for structure prediction analysis

backrub generate alternate backbone conformations based on sets of rotations

membrane ab initio predict the structures of helical membrane proteins

enzyme design redesign a protein around a ligand

domain assembly fixed domains connected by variable regions

antibody automated antibody homology modeling

XML parsing Parse XML scripts into protocols

Brief Description of Select Rosetta Functions

Page 10: Protein structure prediction with a focus on Rosetta

What types of protein domains can Rosetta fold?

Small, globular, soluble protein domains…

Small, simple membrane protein domains… …but not complex domains or multi-domain proteins.

T4-lysozyme C-terminal domain

V-type Na+ ATP synthase subunit

rhodopsin

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop

A B C

Page 11: Protein structure prediction with a focus on Rosetta

What are the success rates?

High resolution predictions are achievable

•  targets ≤100 residues •  success rate ~30% •  success rate with accurate secondary

structure ~50% •  a hallmark of accuracy: convergence

11 Slide content courtesy Rhiju Das, Baker Lab

Page 12: Protein structure prediction with a focus on Rosetta

What types of protein domains can no one fold? CASP9: domains with no good FM predictions

Slide  content  adapted  from  talk  given  by  Lisa  Kinch  of  the  Grishin  lab  at  CASP9  mee>ng:  h@p://predic>oncenter.org/casp9/  

•  Non-­‐globular  •  Trimeric  •  Fe  stabilized  

•  High  contact  order  Many  residues  close    in  3D,  far  in  1D    

•  +  elongated  sheet?  

T0591d1,  3MWT   T0550d2,  3NQK  

T0629d2,  2XGF  

Page 13: Protein structure prediction with a focus on Rosetta

1.  Select  fragments  consistent  with  local  sequence  preferences  

2.  Assemble  fragments  into  models  with  na>ve-­‐like  global  proper>es  

3.  Iden>fy  the  best  model  from  the  popula>on  of  decoys  

Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology” Figures adapted from Charlie Strauss; Protein structure prediction using ROSETTA, Rohl et al (2004) Methods in Enzymology, 383:66

Basic  Ab  Ini'o  Rose<a  protocol

Page 14: Protein structure prediction with a focus on Rosetta

Assembly  

Decoy  

Decoy  

Decoy  

Decoy  

Decoy  

Decoy  

Decoy  

Decoy  

Decoy  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Fragment  

Decoy  

Fragment-Based Structure Prediction

Rosetta, Quark, …

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)  

Template(s)   Model  Alignment  Homology modeling:

Page 15: Protein structure prediction with a focus on Rosetta

First atomic-resolution model

Target 0281 CASP6 •  Topology sampled by ab initio trajectory

of homolog sequence (rmsd=2.2Å) •  Full atom refinement reduces rmsd to

1.5Å •  Side chain packing accurately

recovered

Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology” Figures adapted from Bradley P, Malmström L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KM, Baker D. Free modeling with Rosetta in CASP6. Proteins.

Page 16: Protein structure prediction with a focus on Rosetta

Folding Theory: Sequence-Structure Relationships

16

•  Secondary structure formation is the earliest part of the folding process

•  Local sequence codes for local structures… i.e. fragments

  helical sequences in a folded protein tend to be helical in isolation

•  Secondary structure prediction algorithms have ~70-80% accuracy

  Partial failure due to tertiary interactions stabilizing secondary structure elements

Page 17: Protein structure prediction with a focus on Rosetta

Rosetta fragments

•  3 and 9 residue fragments matched to query sequence

•  database created from crystal structures   < 2.5Å resolution   < 50% sequence identity

•  low resolution modeling   centroid representation of side chains

•  ranked by:   alignment   Secondary structure predictions

•  PSI-PRED •  SAM-T02 •  Jufo •  PhD

17

Page 18: Protein structure prediction with a focus on Rosetta

KVFGRCELAAAMKRHGLDNYRGYSLGNWVC... KVF KVFGRCELA VFG VFGRCELAA FGR FGRCELAAA GRC GRCELAAAM --------------------------------- EEEE TT S EEEEEEE TT HH...

query

sec str

Slide content courtesy David Hoover, CIT, NIH

Sliding fragment windows

Page 19: Protein structure prediction with a focus on Rosetta

# Rank G K L M Q E R A

13 1000 G K L

25 821 G R L

46 1000 K L M

21 635 R L M

43 923 K V M

26 523 R V M

15 970 M Q E

26 934 E R A

Separate 3-mer and 9-mer libraries generated

Slide content courtesy David Hoover, CIT, NIH

Example 3-mer fragment library

Page 20: Protein structure prediction with a focus on Rosetta

Making Fragment Libraries with Robetta

http://robetta.bakerlab.org/

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop

Page 21: Protein structure prediction with a focus on Rosetta

Making Fragment Libraries on Biowulf

Slide content by David Hoover from: http://biowulf.nih.gov/apps/Rosetta23.html#RosettaFragments

Page 22: Protein structure prediction with a focus on Rosetta

22

•  Levinthal paradox:

  Given either alpha, beta, or loop conformation, for protein of nres, 3nres possible conformations.

  If nres = 100, sampling a conformation every 10-13 seconds = 1027 years to fold

  Universe is 1010 years old.

  Folding is non-random and cooperative.

•  Many different combinations of secondary structure elements have similar stabilities

  Tertiary (side-chain level) interactions drive folding towards the native topology

  Phase transition results in a substantial energy gap between native and non-native structures

Folding Theory: The Folding Landscape

•  Cyrus Levinthal, J. Chim. Phys. 65, 44; 1968 •  Hue Sun Chan and Ken A. Dill, Protein Folding in the Landscape Perspective: Chevron Plots and Non-

Arrhenius Kinetics, Proteins: Structure, Function, and Genetics, Volume 30, No. 1, January 1998, pp 2-33.

Implications and requirements for folding algorithm:

•  Fast conformational sampling algorithm

•  Accurate scoring function

•  Full-atom modeling

Page 23: Protein structure prediction with a focus on Rosetta

early centroid models centroid models final full-atom models

Assembly Coarse funnel to native-like decoys Fine-grained funnel to near-native decoys

Page 24: Protein structure prediction with a focus on Rosetta

Major Classes of Energy Functions in Rosetta

24

Low resolution: reduced atom representation (centroid)   simplified energy function   used for aggressive search of state space

High resolution: full-atom representation   detailed energy function   local search of state space   refinement and minimization

General   weighted sum of linear terms: Energy = w1*term1 + w2*term2 + …   pairwise decomposable (speed)   weighted for task, e.g. ligand docking

Page 25: Protein structure prediction with a focus on Rosetta

Low resolution (centroid) folding

25

  Fragment insertion   conformation modification occurs in torsion space   initial insertions result in large changes in dihedrals   9 mers inserted first followed by 3 mers later in process   later insertions purposefully result in small changes in dihedrals random insertion

*

*

Page 26: Protein structure prediction with a focus on Rosetta

Sss + SHS - sheet and helix-sheet geometries

•  Scβ density/compactness of structure

•  Svdw no clashes

•  SRgyr radius  of  gyra>on  (Rgyr),  globular structure

Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”

Driving assembly towards native-like decoys

Page 27: Protein structure prediction with a focus on Rosetta

Low-resolution homolog folding improves prediction

•  Collect homologs •  Create low-resolution models

  cluster •  Thread query sequence onto models •  Proceed to fullatom refinement

…   …   …  

Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”

Page 28: Protein structure prediction with a focus on Rosetta

Low resolution (centroid) folding example

28

Page 29: Protein structure prediction with a focus on Rosetta

Clustering: Graphical representation

29

Page 30: Protein structure prediction with a focus on Rosetta

30

High resolution (full-atom) refinement

Chen Y et al. Nucl. Acids Res. 2004;32:5147-5162

evaluating/optimizing specific atom-atom interactions e.g. hydrogen bonding:

Page 31: Protein structure prediction with a focus on Rosetta

Comparison of low resolution, relax, and abrelax folding example

31

Page 32: Protein structure prediction with a focus on Rosetta

32

Examples from the Rosetta@home archive of top predictions Note: massively parallel computation

rosetta prediction crystal structure

Page 33: Protein structure prediction with a focus on Rosetta

Detailed ab initio Rosetta Workflow

33

INPUT •  amino acid sequence •  secondary structure prediction(s) •  fragment library •  constraints from experimental data

•  NMR •  biochemical/biophysical studies •  ...

LOW RESOLUTION FOLDING •  fragment insertions •  scoring •  filters

CLUSTERING •  groups of decoys with low RMSD to each other •  lowest energy decoy of clusters selected for

further refinement or prediction

HIGH RESOLUTION REFINEMENT •  backbone minimization •  rotamer optimization

ADDITIONAL MODELING •  identifying variable regions •  rebuilding

>103-106 trajectories

automated manual

Page 34: Protein structure prediction with a focus on Rosetta

34

Computational Considerations

Protocol Utility Caveats

Centroid •  fast •  widely sample conformational space

•  possibility of no near-native models after low resolution folding

•  no discrimination by energy

Full-atom refinement

•  near-native decoys separated by energy •  more computationally demanding •  must have near-native in starting decoy pool

Combined •  streamlined •  for powerful and massively parallel

computing

•  most computationally demanding •  improvement only with sufficient sampling

Page 35: Protein structure prediction with a focus on Rosetta

35

Native (CheY)

A ~1000-fold increase in computational power

Slide content courtesy Rhiju Das, Baker Lab

Page 36: Protein structure prediction with a focus on Rosetta

36

Architect of Rosetta@home: David Kim

A ~1000-fold increase in computational power

Native (CheY)

Lowest energy Rosetta structure

“brute force” approach

Page 37: Protein structure prediction with a focus on Rosetta

Computational power vs. accuracy in ab initio structure prediction

37

Cα RMSD of lowest energy model to the native structure vs. sample size

Sample Size

RM

SD

to n

ativ

e

Category 1: Successful high-resolution predictions

Category 2: Successful high-resolution predictions with additional sampling

Category 3: Unsuccessful predictions (with any amount of sampling)

Kim DE, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein structure prediction. J Mol Biol. 2009 Oct 16;393(1):249-60.

Page 38: Protein structure prediction with a focus on Rosetta

38

“De novo” phasing: large-scale tests

Tests on 30 data sets (covering 16 proteins)

Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.

TF Z-score Have I solved it? < 5 no

5 - 6 unlikely 6 - 7 possibly 7 - 8 probably > 8 definitely

Page 39: Protein structure prediction with a focus on Rosetta

39

“De novo” phasing: large-scale tests

Tests on 30 data sets (covering 16 proteins)

1hz5-sf.cif

Å

Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.

Rosetta-refined native (positive controls)

Rosetta-refined de novo models

Page 40: Protein structure prediction with a focus on Rosetta

40

“De novo” phasing: large-scale tests

Tests on 30 data sets (covering 16 proteins)

1hz5-sf.cif

Success in 14/30 data sets

Å

Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.

Rosetta-refined native (positive controls)

Rosetta-refined de novo models

Page 41: Protein structure prediction with a focus on Rosetta

41

“De novo” phasing: large-scale tests

Tests on 30 data sets (covering 16 proteins)

Rosetta-refined native (positive controls)

Rosetta-refined de novo models

Rosetta-refined de novo models, fragments with correct native 2° structure

1hz5-sf.cif

Å

Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.

Page 42: Protein structure prediction with a focus on Rosetta

Preparation for folding simulations

•  proper secondary structure assignment •  constraints

•  limit search space •  increase sampling efficiency •  decrease CPU time

42

Page 43: Protein structure prediction with a focus on Rosetta

Constraints

•  There are constraint types and function types   Constraint types: AtomPair, Angle, Dihedral, etc.   Function types: Bounded, Spline, Harmonic, Gaussian, etc.

•  Each constraint is scored individually and the total constraint score is the sum of all individual scores

•  Each constraint can have its own constraint type and function type.   In some cases, like when using Spline function, each constraint can have its own

weight •  How you define the constraint and how it’s scored depends on the constraint type;

this is same with function type.

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop

Page 44: Protein structure prediction with a focus on Rosetta

Constraint file example: EPR data

<cst type> <atom1> <res1> <atom2> <res2> <cst_func> <RosettaEPR> <Dcb> <weight> <bin>!AtomPair CB 32 CB 36 SPLINE EPR_DISTANCE 16.0 1.0 0.5!AtomPair CB 59 CB 74 SPLINE EPR_DISTANCE 19.0 1.0 0.5!AtomPair CB 62 CB 71 SPLINE EPR_DISTANCE 19.0 1.0 0.5!AtomPair CB 62 CB 74 SPLINE EPR_DISTANCE 25.0 1.0 0.5!AtomPair CB 63 CB 74 SPLINE EPR_DISTANCE 14.0 1.0 0.5!AtomPair CB 66 CB 74 SPLINE EPR_DISTANCE 23.0 1.0 0.5!AtomPair CB 83 CB 90 SPLINE EPR_DISTANCE 13.0 1.0 0.5!

Constraint info Constraint Function info

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop

Page 45: Protein structure prediction with a focus on Rosetta

Membrane protein ab initio

•  RosettaMembrane divides the protein into:   hydrophobic   hydrophilic   soluble layers

•  Specific scoring function for each layer

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop Figure from Yarov-Yarovoy, Schonbrun, and Baker 2006.

Page 46: Protein structure prediction with a focus on Rosetta

Input  Files  

Spanfile  -­‐  *.span  

 -­‐-­‐transmembrane  topology  predic>on  file  generated  using  octopus2span.pl  script  

 -­‐-­‐Input  OCTOPUS  topology  file  is  generated  at  h@p://octopus.cbr.su.se  using  protein  sequence  as  input.  

Lipopholicity  predicDon  file  -­‐  *.lips4  

 -­‐-­‐Generate  using  run_lips.pl  script  

 -­‐-­‐Need  input  FASTA  file,  spanfile,  blaspgp  and  nr  (NCBI)  database  to  run  

Fragment  generaDon    -­‐-­‐Advised  to  use  SAM  but  not  JUFO  or  PSIPRED,  which  predict  TMH  regions  poorly  

Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop

Page 47: Protein structure prediction with a focus on Rosetta

Folding and studying folding with molecular dynamics

Specialized hardware, ANTON capable of continuous ms length trajectories

Standard simulations: 1 - 3 µs simulations ~ months of HPC

Approximate Rates of Folding: 1 µs helix 10 µs sheet 100 µs fast folding protein 1+ ms typical protein

Page 48: Protein structure prediction with a focus on Rosetta

D E Shaw et al. Science 2010;330:341-346

simulation of villin at 300 K 2-8 µs folder

simulation of FiP35 at 337 K 20-80 µs folder

Blue: x-ray structures Red: last frame of MD simulation

Folding proteins at x-ray resolution

Page 49: Protein structure prediction with a focus on Rosetta

Published by AAAS

tip of hairpin 1 (12-18, blue) hairpin 1 (8-22, green) hairpin 2 (19-30, orange) full protein (2-33, red)

D E Shaw et al. Science 2010;330:341-346

Reversible folding simulation of FiP35.

Page 50: Protein structure prediction with a focus on Rosetta

Thank You

For questions or comments please contact:

[email protected]

301.496.4455

50


Recommended