+ All Categories
Home > Documents > Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview -

Date post: 10-Feb-2016
Category:
Upload: ingrid
View: 64 times
Download: 0 times
Share this document with a friend
Description:
Protein Sequence Analysis - Overview -. NIH Proteomics Workshop 2006. Darren Natale Team Lead – Protein Science, PIR Research Assistant Professor, Georgetown University Medical Center. Major Topics. Proteomics and protein bioinformatics (protein sequence analysis) - PowerPoint PPT Presentation
34
Protein Sequence Analysis - Overview - Darren Natale Team Lead – Protein Science, PIR Research Assistant Professor, Georgetown University Medical Center NIH Proteomics Workshop 2006
Transcript
Page 1: Protein Sequence Analysis - Overview -

Protein Sequence Analysis- Overview -

Darren NataleTeam Lead – Protein Science, PIRResearch Assistant Professor, Georgetown University Medical Center

NIH Proteomics Workshop 2006

Page 2: Protein Sequence Analysis - Overview -

Major Topics

Proteomics and protein bioinformatics (protein sequence analysis)

Why do protein sequence analysis? Searching sequence databases Post-processing search results Detecting remote homologs

Page 3: Protein Sequence Analysis - Overview -

Clinical Proteomics

From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695

Page 4: Protein Sequence Analysis - Overview -

Single protein and shotgun analysis

Adapted from: McDonald et al. (2002). Disease Markers 18:99-105

Protein Bioinformatics

Mixture of proteinsG

el b

ased

sep

erat

ion

Single protein analysis

Digestion of protein mixture

Spot excisionand digestion

LC orLC/LC separation

Shotgun analysis

Peptides from a single protein

Peptides from many proteins

MS analysisMS/MS analysis

Page 5: Protein Sequence Analysis - Overview -

Protein Bioinformatics: Protein Sequence Analysis

Helps characterize protein sequences in silico and allows prediction of protein structure and function

Statistically significant BLAST hits usually signifies sequence homology

Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold

Protein sequence analysis allows protein classification

Page 6: Protein Sequence Analysis - Overview -

Development of protein sequence databases

Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein Information Resource (PIR)

Protein data bank (PDB) – structural database (1972) remains most widely used database of structures

UniProt – The Universal Protein Resource (2003) is a central database of protein sequence and function created by joining the forces of the Swiss-Prot, TrEMBL and PIR protein database activities

Page 7: Protein Sequence Analysis - Overview -

Comparative protein sequence analysis and evolution

Patterns of conservation in sequences allows us to determine which residues are under selective constraint (and thus likely important for protein function)

Comparative analysis of proteins is more sensitive than comparing DNA

Homologous proteins have a common ancestor

Different proteins evolve at different rates

Protein classification systems based on evolution: PIRSF and COG

Page 8: Protein Sequence Analysis - Overview -

PIRSF and large-scale annotation of proteins

PIRSF is a protein classification system based on the evolutionary relationships of whole proteins

As part of the UniProt project, PIR has developed this classification strategy to assist in the propagation and standardization of protein annotation

Page 9: Protein Sequence Analysis - Overview -

Comparing proteins Amino acid sequence of protein generated from

proteomics experiment e.g. protein fragment

DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFT

Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) as a measure of relatedness.

Protein structures can be compared by superimposition

Page 10: Protein Sequence Analysis - Overview -

Protein sequence alignment Pairwise alignment

a b a c d a b _ c d

Multiple sequence alignment provides more informationa b a c da b _ c dx b a c e

MSA difficult to do for distantly related proteins

Page 11: Protein Sequence Analysis - Overview -

Protein sequence analysis overview

Protein databases PIR and UniProt

Searching databases Peptide search, BLAST search, Text search

Information retrieval and analysis Protein records at UniProt and PIR Multiple sequence alignment Secondary structure prediction Homology modeling

Page 12: Protein Sequence Analysis - Overview -

Universal Protein Resource

http://www.uniprot.org/

Literature-Based Annotation

UniProt Archive

UniProt NREF

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProt KnowledgebaseAutomated Annotation

Clustering at 100, 90, 50%

Literature-Based Annotation

UniParc

UniRef100

Swiss-Prot

PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ

EnsEMBL PDB PatentData

Other Data

UniProtKB

Automated mergingof sequences

Automated Annotation

UniRef90

UniRef50

Page 13: Protein Sequence Analysis - Overview -

Peptide Search

Page 14: Protein Sequence Analysis - Overview -

ID mapping

Page 15: Protein Sequence Analysis - Overview -

Query Sequence

Unknown sequence is Q9I7I7

BLAST Q9I7I7 against the UniProt Knowledgebase (http://www.uniprot.org/search/blast.shtml)

Analyze results

Page 16: Protein Sequence Analysis - Overview -

BLAST results

Page 17: Protein Sequence Analysis - Overview -

Text Search

Page 18: Protein Sequence Analysis - Overview -

Text search results: display optionsMoving Pubmed ID and PDB ID into “Columns in Display”

Page 19: Protein Sequence Analysis - Overview -

Text search results: add input box

Page 20: Protein Sequence Analysis - Overview -

Text Search Result with NULL/NOT NULL

Page 21: Protein Sequence Analysis - Overview -

UniProtKB Protein Record

Page 22: Protein Sequence Analysis - Overview -

SIR2_HUMAN Protein Record

Page 23: Protein Sequence Analysis - Overview -

Are Q9I7I7 and SIR2_HUMAN homologs?

Check BLAST results

Check pairwise alignment

Page 24: Protein Sequence Analysis - Overview -

Protein structure prediction

Programs can predict secondary structure information with 70% accuracy

Homology modeling - prediction of ‘target’ structure from closely related ‘template’ structure

Page 25: Protein Sequence Analysis - Overview -

Secondary structure predictionhttp://bioinf.cs.ucl.ac.uk/psipred/

Page 26: Protein Sequence Analysis - Overview -

Secondary structure prediction results

Page 27: Protein Sequence Analysis - Overview -

Sir2 structure

Page 28: Protein Sequence Analysis - Overview -

Homology modelinghttp://www.expasy.org/swissmod/SWISS-MODEL.html

Page 29: Protein Sequence Analysis - Overview -

Homology model of Q9I7I7

Blue - excellentGreen - so soRed - not good

Yellow - beta sheetRed - alpha helixGrey - loop

Page 30: Protein Sequence Analysis - Overview -

Sequence features: SIR2_HUMAN

Page 31: Protein Sequence Analysis - Overview -

Multiple sequence alignment

Page 32: Protein Sequence Analysis - Overview -

Multiple sequence alignmentQ9I7I7, Q82QG9, SIR2_HUMAN

Page 33: Protein Sequence Analysis - Overview -

Sequence features: CRAA_RABIT

Page 34: Protein Sequence Analysis - Overview -

Identifying Remote Homologs


Recommended