+ All Categories
Home > Documents > Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning...

Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning...

Date post: 04-Jan-2016
Category:
Upload: shanon-roland-webb
View: 219 times
Download: 3 times
Share this document with a friend
29
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary structure prediction programs. Become familiar with the databases that hold secondary structure information. Understand neural networks and how they help to predict secondary structure. Workshop-Predict secondary structure of p53. Homework #9-Due June 2
Transcript
Page 1: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Protein structure predictionMay 26, 2011HW #8 due todayQuiz #3 on Tuesday, May 31Learning objectives-Understand the biochemical basis of secondary structure prediction programs. Become familiar with the databases that hold secondary structure information. Understand neural networks and how they help to predict secondary structure.Workshop-Predict secondary structure of p53.Homework #9-Due June 2

Page 2: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

What is secondary structure?

Three major types:

Alpha Helical Regions

Beta Strand Regions

Coils, Turns, Extended (anything else)

Page 3: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Can we predict the final structure?

http://en.wikipedia.org/wiki/Protein_folding

Page 4: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Some Prediction Methods

ab initio methods Based on physical properties of aa’s and bonding

patterns

Statistics of amino acid distributions in known structures Chou-Fasman

Sequence similarity to sequences with known structure PSIPRED

Page 5: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Chou-Fasman

First widely used procedureOutput-helix, strand or turnPercent accuracy: 60-65%

Page 6: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Psi-BLAST Predict Secondary Structure (PSIPRED)

Three steps:1) Generation of position specific

scoring matrix.2) Prediction of initial secondary

structure3) Filtering of predicted structure

Page 7: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

AA P(α) AA P(β) AA P(T) AA f(i) f(i+1) f(i+2) f(i+3) Glu 1.51 Val 1.70 Asn 1.56 Ala 0.060 0.076 0.035 0.058 Met 1.45 Ile 1.60 Gly 1.56 Arg 0.070 0.106 0.099 0.085 Ala 1.42 Tyr 1.47 Pro 1.52 Asp 0.147 0.110 0.179 0.081 Leu 1.21 Phe 1.38 Asp 1.46 Asn 0.161 0.083 0.191 0.091 Lys 1.14 Trp 1.37 Ser 1.43 Cys 0.149 0.050 0.117 0.128 Phe 1.13 Leu 1.30 Cys 1.19 Glu 0.056 0.060 0.077 0.064 Gln 1.11 Cys 1.19 Tyr 1.14 Gln 0.074 0.098 0.037 0.098 Ile 1.08 Thr 1.19 Lys 1.01 Gly 0.102 0.085 0.190 0.152 Trp 1.08 Gln 1.10 Gln 0.98 His 0.140 0.047 0.093 0.054 Val 1.06 Met 1.05 Thr 0.96 Ile 0.043 0.034 0.013 0.056 Asp 1.01 Arg 0.93 Trp 0.96 Leu 0.061 0.025 0.036 0.070 His 1.00 Asn 0.89 Arg 0.95 Lys 0.055 0.115 0.072 0.095 Arg 0.98 His 0.87 His 0.95 Met 0.068 0.082 0.014 0.055 Thr 0.83 Ala 0.83 Glu 0.74 Phe 0.059 0.041 0.065 0.065 Ser 0.77 Gly 0.75 Ala 0.66 Pro 0.102 0.301 0.034 0.068 Cys 0.70 Ser 0.75 Met 0.60 Ser 0.120 0.139 0.125 0.106 Tyr 0.69 Lys 0.74 Phe 0.60 Thr 0.086 0.108 0.065 0.079 Asn 0.67 Pro 0.55 Leu 0.59 Trp 0.077 0.013 0.064 0.167 Gly 0.57 Asp 0.54 Val 0.50 Tyr 0.082 0.065 0.114 0.125 Pro 0.57 Glu 0.37 Ile 0.47 Val 0.062 0.048 0.028 0.053

Conformational parameters for α-helical, β-strand, and turn amino acids (from Chou and Fasman, 1978)

Page 8: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

PSIPRED

Uses multiple aligned sequences for prediction.Uses training set of folds with known structure.Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999) First network converts a window of 15 aa’s into a raw score

of h,e (sheet), c (coil) or terminus Second network filters the first output. For example, an

output of hhhhehhhh might be converted to hhhhhhhhh.

Can obtain a Q3 value of 70-78% (may be the highest achievable)

Page 9: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Neural networks

• Computer neural networks are based on simulation of adaptivelearning in networks of real neurons.•Neurons connect to each other via synaptic junctions which are either stimulatory or inhibitory. •Adaptive learning involves the formation or suppression of the right combinations of stimulatory and inhibitory synapses so that a setof inputs produce an appropriate output.

Page 10: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Neural Networks (cont. 1)•The computer version of the neural network involves identification of a set of inputs - amino acids in the sequence, which transmit through a network of connections.•At each layer, inputs are numerically weighted and the combined result passed to the next layer.•Ultimately a final output, a decision, helix, sheet or coil, is produced.

Page 11: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Neural Networks (cont. 2)

90% of training set was used (known structures)10% was used to evaluate the performance of the neuralnetwork after the training session.

Page 12: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Neural Networks (cont. 3)

•During the training phase, selected sets of proteins of known structure were scanned, and if the decisions were incorrect, the input weightings were adjusted by the software to produce the desired result.

•Training runs were repeated until the success rate is maximized.

•Careful selection of the training set is an important aspect of this technique. The set must contain as wide a range of different fold types as possible without duplications of structural types that may bias the decisions.

Page 13: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Neural Networks (cont. 4)

•An additional component of the PSIPRED procedures involves sequence alignment with similar proteins.

•The rationale is that some amino acids positions in a sequence contribute more to the final structure than others. (This has been demonstrated by systematic mutation experiments in which each consecutive position in a sequence is substituted by a spectrum of amino acids. Some positions are remarkably tolerant of substitution, while others have unique requirements.)

•To predict secondary structure accurately, one should place less weight on the tolerant positions, which clearly contribute little to the structure

•One must also put more weight on the intolerant positions.

Page 14: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

15 groups of 21 units(1 unit for each aa plusone specifying the end)

Row specifies aa position

three outputs are helix, strand or coil

Filtering network

Provides infoon tolerant orintolerant positions

(Jones, 1999)

Page 15: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Example of Output from PSIPRED

PSIPRED PREDICTION RESULTS

Key

Conf: Confidence (0=low, 9=high)

Pred: Predicted secondary structure (H=helix, E=strand, C=coil)

AA: Target sequence

Conf: 923788850068899998538983213555268822788714786424388875156215

Pred: CCEEEEEEEHHHHHHHHHHCCCCCCHHHHHHCCCCCEEEEECCCCCCHHHHHHHCCCCCC

AA: KDIQLLNVSYDPTRELYEQYNKAFSAHWKQETGDNVVIDQSHGSQGKQATSSVINGIEAD

10 20 30 40 50 60

Page 16: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

How to calculate Q3?

Sequence: MEETHAPYRGVCNNMActual Structure: CCCCCHHHHHHEEEEPSIPRED Prediction: CCCCCHHHHHHEEEH

Q3 = 14/15 x 100 = 93%

Page 17: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Recognizing motifs in proteins.

PROSITE is a database of protein families and domains.

Most proteins can be grouped, on the basis of similarities in their sequences, into a limited number of families.

Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.

Page 18: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

PROSITE Database

Contains 1612 documentation entries.Signatures are produced by scanning the PROSITE database with your query. A “signature” of a protein allows one to place a protein within a specific function class based on structure and/or function.An example of an documentation entry in PROSITE is:

http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC50020

Page 19: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Signatures are produced from profiles and patterns.

Profile-”a table of position-specific amino acid weights and gap costs. These numbers (also referred to as scores) are used to calculate a similarity score for any alignment between a profile and a sequence, or parts of a profile and a sequence. An alignment with a similarity score higher than or equal to a given cut-off value constitutes a motif occurrence.”

Page 20: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Sequences in one profile and the PSSM associated with the profile

F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C

A -18 -10 -1 -8 8 -3 3 -10 -2 -8 C -22 -33 -18 -18 -22 -26 22 -24 -19 -7 D -35 0 -32 -33 -7 6 -17 -34 -31 0 E -27 15 -25 -26 -9 23 -9 -24 -23 -1 F 60 -30 12 14 -26 -29 -15 4 12 -29 G -30 -20 -28 -32 28 -14 -23 -33 -27 -5 H -13 -12 -25 -25 -16 14 -22 -22 -23 -10 I 3 -27 21 25 -29 -23 -8 33 19 -23 K -26 25 -25 -27 -6 4 -15 -27 -26 0 L 14 -28 19 27 -27 -20 -9 33 26 -21 M 3 -15 10 14 -17 -10 -9 25 12 -11 N -22 -6 -24 -27 1 8 -15 -24 -24 -4 P -30 24 -26 -28 -14 -10 -22 -24 -26 -18 Q -32 5 -25 -26 -9 24 -16 -17 -23 7 R -18 9 -22 -22 -10 0 -18 -23 -22 -4 S -22 -8 -16 -21 11 2 -1 -24 -19 -4 T -10 -10 -6 -7 -5 -8 2 -10 -7 -11 V 0 -25 22 25 -19 -26 6 19 16 -16 W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28 Y 34 -18 -1 1 -23 -12 -19 0 0 -18

Page 21: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

How are the patterns constructed?

ALRDFATHDDVCGK..SMTAEATHDSVACY..ECDQAATHEAVTHR..

Sequences necessary for structure or function are aligned manually byexperts in field. Then a pattern iscreated.

A-T-H-[DE]-X-V-X(4)-{ED}This pattern is translated as: Ala, Thr, His, [Asp or Glu], any,Val, any, any, any, any, any but Glu or Asp

Page 22: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Example of a pattern in a PROSITE record

ID ZINC_FINGER_C3HC4; PATTERN.

PA C-X-H-X-[LIVMFY]-C-X(2)-C-[LIVMYA]

Page 23: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Scanning the PROSITE database

“Scan a sequence against PROSITE patterns and profiles” allows the user to scan the ProSite database to search for patterns and profiles. It uses dynamic programming to determine optimal alignments. If the alignment produces a high score (a hit), then the hit is shown to the user.

http://www.expasy.ch/prosite/

If a “hit” is generated, the program gives an output that shows the region of the query that contains the pattern and a reference to the 3-D structure database if available.

Page 24: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Example of output from Prosite Scan

Page 25: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

RPSBlast

Reverse psi-blast, or rpsblast, is a program that searches a query protein sequence or protein sequences against a database of position specific scoring matrices. The PSSMs are from conserved protein sequences that have known functions/structure.

Page 26: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Page 27: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

3D structure data

The largest 3D structure database is the Protein Databank It contains over 20,000 records Each record contains 3D coordinates for

macromolecules 80% of the records were obtained from X-ray

diffraction studies, 20% from NMR.

Page 28: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

ATOM 1 N ARG A 14 22.451 98.825 31.990 1.00 88.84 N

ATOM 2 CA ARG A 14 21.713 100.102 31.828 1.00 90.39 C

ATOM 3 C ARG A 14 22.583 101.018 30.979 1.00 89.86 C

ATOM 4 O ARG A 14 22.105 101.989 30.391 1.00 89.82 O

ATOM 5 CB ARG A 14 21.424 100.704 33.208 1.00 93.23 C

ATOM 6 CG ARG A 14 20.465 101.880 33.215 1.00 95.72 C

ATOM 7 CD ARG A 14 20.008 102.147 34.637 1.00 98.10 C

ATOM 8 NE ARG A 14 18.999 103.196 34.718 1.00100.30 N

ATOM 9 CZ ARG A 14 18.344 103.507 35.833 1.00100.29 C

ATOM 10 NH1 ARG A 14 18.580 102.835 36.952 1.00 99.51 N

ATOM 11 NH2 ARG A 14 17.441 104.479 35.827 1.00100.79 N

Part of a record from the PDB

Page 29: Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.

Quiz #3 prep

BLAST Three steps Gapped BLAST Heuristic program Uses S-W algorithm for

final scoring

CLUSTAL W Pairwise alignments Difference matrix Guide tree Importance of having

highly similar sequences

Secondary Structure prediction Chou-Fasman PSIPRED Good for secondary str

Protein analysis ProScan RPBlast


Recommended