+ All Categories
Home > Documents > BMI 731 Protein Structures and Related Database Searches.

BMI 731 Protein Structures and Related Database Searches.

Date post: 21-Dec-2015
Category:
View: 224 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
BMI 731 BMI 731 Protein Structures and Protein Structures and Related Database Searches Related Database Searches
Transcript

BMI 731BMI 731

Protein Structures and Protein Structures and Related Database SearchesRelated Database Searches

Biology … Protein…Biology … Protein…

DNA(Genotype)

Protein

A single amino acid substitution in a protein causes sickle-A single amino acid substitution in a protein causes sickle-cell disease…cell disease…

What the.....!?What the.....!?

Why do we care about Why do we care about structure?structure?

• In the factory of living cells, proteins are In the factory of living cells, proteins are the workers, performing a variety of the workers, performing a variety of biological tasks.biological tasks.

• Each protein has a particular 3-D structure Each protein has a particular 3-D structure that determines its function.that determines its function.

• Protein structure is more conserved than Protein structure is more conserved than protein sequence, and more closely protein sequence, and more closely related to function.related to function.

• Sequence -> Structure -> FunctionSequence -> Structure -> Function

Structural InformationStructural Information

• Protein Data Bank: maintained by the Protein Data Bank: maintained by the Research Collaboratory of Structural Research Collaboratory of Structural Bioinformatics (RCSB)Bioinformatics (RCSB)– http://http://www.rcsb.org/pdbwww.rcsb.org/pdb//– > 15,000 structures of proteins> 15,000 structures of proteins– Also contains of structures of Also contains of structures of

Protein/Nucleic Acid Complexes, Nucleic Protein/Nucleic Acid Complexes, Nucleic Acids, CarbohydratesAcids, Carbohydrates

• Most structures are determined by X-ray Most structures are determined by X-ray crystallography. Other methods are NMR and crystallography. Other methods are NMR and electron microscopy (EM). Some structures electron microscopy (EM). Some structures are also theoretically predicted.are also theoretically predicted.

PDB Content GrowthPDB Content Growth

Protein?Protein?

• Protein are linear heteropolymers: Protein are linear heteropolymers: one or more polypeptide chainsone or more polypeptide chains

• Building blocks: 20(?) amino acid Building blocks: 20(?) amino acid residues.residues.

• Range from a few 10s-1000sRange from a few 10s-1000s

• Three-dimensional shapes (“fold”) Three-dimensional shapes (“fold”) adopted vary enormously.adopted vary enormously.

Structure…Structure…

Structure cont…Structure cont…

Basic measurements on Basic measurements on structures…structures…

• Bond lengthsBond lengths

• Bond anglesBond angles

• Dihedral (torsion) anglesDihedral (torsion) angles

Bond LengthBond Length

• The distance between bonded atoms The distance between bonded atoms is constantis constant

• Depends on the “type” of the bondDepends on the “type” of the bond

• Varies from 1.0 Varies from 1.0 Å(C-H) to 1.5 Å(C-C)Å(C-H) to 1.5 Å(C-C)

• BOND LENGTH IS A FUNCTION OF BOND LENGTH IS A FUNCTION OF THE POSITION OF TWO ATOMS.THE POSITION OF TWO ATOMS.

Bond Angle…Bond Angle…

• All bond angles are determined by All bond angles are determined by chemical makeup of the atoms chemical makeup of the atoms involved, and are constant.involved, and are constant.

• Depends on the type of atom, and Depends on the type of atom, and number of electrons available for number of electrons available for bonding.bonding.

• Ranges from 100Ranges from 100° to 180°° to 180° • BOND ANGLES IS A FUNCTION OF THE BOND ANGLES IS A FUNCTION OF THE

POSITION OF THREE ATOMS.POSITION OF THREE ATOMS.

Dihedral AnglesDihedral Angles

• These are usually variableThese are usually variable

• Range from 0-360Range from 0-360° in molecules° in molecules

• Most famous are Most famous are , , , , and and • DIHEDRAL ANGLES ARE A FUNCTION DIHEDRAL ANGLES ARE A FUNCTION

OF THE POSITION OF FOUR ATOMS.OF THE POSITION OF FOUR ATOMS.

http://www.colby.edu/chemistry/OChem/DEMOS/dihedral.htmlhttp://www.colby.edu/chemistry/OChem/DEMOS/dihedral.html

Dihedral AnglesDihedral Angles

A torsion angles is defined by 4 atoms, A, B, C and D.

When atoms A, B, C and D are mainchain atoms (ie. the carboxylic carbon, C1; the alpha carbon, C2 or C-alpha; and the amide group nitrogen, N), There are THREE repeating torsion angles along the backbone chain called phi, psi and omega.

http://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.htmlhttp://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.html

Ramachandran / phi-psi plotRamachandran / phi-psi plot

http://www.biochem.ucl.ac.uk/~roman/procheck/manual/examples/plot_01.html

Levels of Structure…Levels of Structure…

1 - Primary structure1 - Primary structure

2 - Secondary structure2 - Secondary structure

3 - Tertiary structure3 - Tertiary structure

4 - Quaternary structure4 - Quaternary structure

Primary structure…Primary structure…

• This is simply the amino acid sequences This is simply the amino acid sequences of polypeptide chainsof polypeptide chains

Secondary structureSecondary structure

• Local organization of protein backbone: Local organization of protein backbone: -helix, -helix, -strand (which assemble into -strand (which assemble into --sheet), turn and interconnecting loop.sheet), turn and interconnecting loop.

The The -helix-helix

• One of the most One of the most closely packed closely packed arrangement of arrangement of residues.residues.

• Turn: 3.6 residuesTurn: 3.6 residues

• Pitch: 5.4 Å/turnPitch: 5.4 Å/turn

The The -sheet-sheet

• Backbone almost fully extended, loosely Backbone almost fully extended, loosely packed arrangement of residues.packed arrangement of residues.

Ramachandran/phi-psi plotRamachandran/phi-psi plot

Tertiary structure…Tertiary structure…

• Packing the secondary Packing the secondary structure elements structure elements into a compact spatial into a compact spatial unitunit

• ““Fold” or domain– this Fold” or domain– this is the level to which is the level to which structure prediction is structure prediction is currently possible.currently possible.

Quaternary structure…Quaternary structure…

• Assembly of homo Assembly of homo or heteromeric or heteromeric protein chains.protein chains.

• Usually the Usually the functional unit of functional unit of a protein, a protein, especially for especially for enzymesenzymes

Classification…Classification…

• Class Class

• Fold/ArchitectureFold/Architecture

• SuperfamilySuperfamily

Databases of structural Databases of structural classificationclassification

• SCOPSCOP– Murzin AG, Brenner SE, Hubbard T, Chothia CMurzin AG, Brenner SE, Hubbard T, Chothia C– Structural classification of protein structures Structural classification of protein structures – Manual assembly by inspectionManual assembly by inspection– All nodes are annotated (e.g.. All-All nodes are annotated (e.g.. All-, , //))– Structural similarity search using 3dSearch(Singh and Structural similarity search using 3dSearch(Singh and

Brutlag)Brutlag)

• CATHCATH– Dr. C.A. Orengo, Dr. A.D. Michie, etcDr. C.A. Orengo, Dr. A.D. Michie, etc– Class-Architecture-Topology-Homologous superfamilyClass-Architecture-Topology-Homologous superfamily– Manual classification at Architecture levelManual classification at Architecture level– Automated topology classification using the SSAP Automated topology classification using the SSAP

algorithmsalgorithms– No structural similarity searchNo structural similarity search

Databases of structural Databases of structural classificationclassification

• FSSPFSSP– L.L. Holm and C. SanderL.L. Holm and C. Sander– Fully automated using the DALI algorithms (Holm and Sander)Fully automated using the DALI algorithms (Holm and Sander)– No internal node annotationsNo internal node annotations– Structural similarity search using DALIStructural similarity search using DALI

• PclassPclass– A. Singh, X. Liu, J. Chang, D. BrutlagA. Singh, X. Liu, J. Chang, D. Brutlag– Fully automated using the LOCK and 3dSearch algorithmsFully automated using the LOCK and 3dSearch algorithms– All internal nodes automatically annotated with common termsAll internal nodes automatically annotated with common terms– JAVA based classification browserJAVA based classification browser– Structural similarity search using 3dSearchStructural similarity search using 3dSearch

Why Structure Alignment?Why Structure Alignment?

• For homologous proteins (similar ancestry), For homologous proteins (similar ancestry), this provides the “gold standard” for this provides the “gold standard” for sequence alignment—elucidates the sequence alignment—elucidates the common ancestry of the proteins.common ancestry of the proteins.

• For nonhomologous proteins, allows us to For nonhomologous proteins, allows us to identify common substructures of interest.identify common substructures of interest.

• Allows us to classify proteins into clusters, Allows us to classify proteins into clusters, based on structural similarity.based on structural similarity.

How do we recognize structural How do we recognize structural similarities?similarities?

• By eye (Alexei Murzin)By eye (Alexei Murzin)

SCOP--Gold standard for structure SCOP--Gold standard for structure classification!classification!

• AlgorithmicallyAlgorithmically

Growth of PDB demands automated Growth of PDB demands automated techniques techniques

for classification and fold detectionfor classification and fold detection

Algorithms for Structure AlignmentAlgorithms for Structure Alignment

• Distance based methodsDistance based methods– DALI (Holm and Sander): Aligning scalar distance plotsDALI (Holm and Sander): Aligning scalar distance plots– STRUCTAL (Gerstein and Levitt): Dynamic programming STRUCTAL (Gerstein and Levitt): Dynamic programming

using pairwise inter-molecular distancesusing pairwise inter-molecular distances– SSAP (Orengo and Taylor): Dynamic programming using SSAP (Orengo and Taylor): Dynamic programming using

intra-molecular vector distanceintra-molecular vector distance• Vector based methodsVector based methods

– VAST (Bryant): Graph theory based secondary structure VAST (Bryant): Graph theory based secondary structure alignmentalignment

– 3dSearch (Singh and Brutlag): Fast secondary structure 3dSearch (Singh and Brutlag): Fast secondary structure index lookupindex lookup

• Both vector and distance basedBoth vector and distance based– LOCK (Singh and Brutlag): Hierarchically uses both LOCK (Singh and Brutlag): Hierarchically uses both

secondary structures vectors and atomic distancessecondary structures vectors and atomic distances

DALIDALI

• Based on aligning 2-D intra-molecular Based on aligning 2-D intra-molecular distance matricesdistance matrices

• Computes the best subset of Computes the best subset of corresponding residues from the two corresponding residues from the two proteins such that similarity between proteins such that similarity between the 2-D distance matrices is maximized.the 2-D distance matrices is maximized.

• Searches through all possible Searches through all possible alignments of residues using Monte-alignments of residues using Monte-Carlo algorithmsCarlo algorithms

VAST-Vector Alignment Search VAST-Vector Alignment Search ToolTool

• Aligns only secondary structure elements Aligns only secondary structure elements (SSE)(SSE)

• Represents each SSE as a vectorRepresents each SSE as a vector• Finds all possible pairs of vectors from the Finds all possible pairs of vectors from the

two structures that are similartwo structures that are similar• Uses a graph theory algorithms to find Uses a graph theory algorithms to find

maximal subset of similar vectorsmaximal subset of similar vectors• Overall alignment scores is based on the Overall alignment scores is based on the

number of similar pairs of vectors between number of similar pairs of vectors between the two structures.the two structures.

LOCKLOCK

• Define local secondary structuresDefine local secondary structures• Find an initial superposition by using DP Find an initial superposition by using DP

to align secondary structure vectors.to align secondary structure vectors.• Use greedy algorithms to find nearest Use greedy algorithms to find nearest

neighbors and minimize RMSD between neighbors and minimize RMSD between the C-the C- atoms from query and target. atoms from query and target.

• Find the core of aligned C-Find the core of aligned C- atoms and atoms and minimize RMSD between them.minimize RMSD between them.

Where is the data?Where is the data?•

GenBank

DB are equivalent

NCBI Reference Sequences

RefSeq

  http://www.ncbi.nlm.nih.gov/LocusLink/refseq.htmlGenPept Database

http://inn.weizmann.ac.il/databanks/genpept.html

STATS: http://www.expasy.org/sprot/relnotes/relstat.html

http://www.expasy.org/sprot/

PIR International Protein Sequence Databasehttp://pir.georgetown.edu/pirwww/search/textpsd.shtml

http://www.rcsb.org/pdb/

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

MMDB by NCBI…MMDB by NCBI…

Proteinsequence

Database similarity search

Does sequence align with protein of known 3D structure?

Protein family, domain, cluster analysis

Relation-ship to known structure?

Structural analysis

3D comparative modeling

Predictedthree dimensional structure

Is there a predicted structure?

3D analysis in laboratory

yes

no

no

no

A Flow chart for structure A Flow chart for structure predictionprediction

Images..Images..

• 3-dimensional 3-dimensional model showing the model showing the electron density in electron density in a molecule of a molecule of buckminsterfullerebuckminsterfullerene, an allotrope of ne, an allotrope of carbon (C60).carbon (C60).

Images…Images…

Computer Computer generated image, generated image, showing 3-D showing 3-D structure of structure of uteroglobin, a uteroglobin, a protein secreted in protein secreted in the uterus of the uterus of mammals.mammals.

Images… (NMR… EPR…)Images… (NMR… EPR…)

A computer image A computer image of the charge of the charge density over the density over the molecule chymosin, molecule chymosin, an important an important enzyme in cheese enzyme in cheese making. Overall making. Overall negative charge is negative charge is depicted as red, depicted as red, overall positive overall positive charge is shown in charge is shown in blue.blue.

X-ray crystallography.X-ray crystallography.

ThanksThanks

Thanks to Selnur Erdal for preparing initial versions of these slides.


Recommended