Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Protein functions prediction
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Introduction
Signal peptidesTransmembrane regions and topologyPTM (post-translational modifications)Low complexity and biased regionsRepeatsCoils
Secondary structureAntigenic peptidesDomain/MotifsToolsThe EMBOSS package
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Different techniques
AlgorithmsSliding window, Nearest NeighborPatterns, regular expressionWeight matricesHMM, profilesNeural NetworksRules
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Sliding windowTHISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQ W
Score1Score2
Scoren
Width or Size=11, Step=5
Results are usually displayed as a graph, see example ->
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Patterns / regular expression
Pattern: <A-x-[ST](2)-x(0,1)-{V}Regexp: ^A.[ST]{2}.?[^V]Text: The sequence must start with an alanine, followed by any amino acid, followed by a serine or a threonine, two times,followed by any amino acid or nothing,followed by any amino acid except a valine.Simply the syntax differ…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Weight matrices (PSSM)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
HMM / profiles
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Neural Networks
General principle: Example:
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Signals found in proteins
N-terexportation - secretionmitochondriachloroplast
internalNLS (nuclear localization signal)
C-terGPI-anchor (Glycosyl Phosphatidyl Inositol)
other membraneanchors (see PTM) other unknown ?
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Signals detection tools
SignalPMitoProtChloroPPredotarPSortTargetPSigcleave (EMBOSS)Phobius
Big-PIDGPI
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Transmembrane regions
Detection (signal peptide, hydropathy, helices)Organisation (topology)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Transmembrane detection tools
TMHMMTMPredTopPred2DASHMMTopTmap (EMBOSS)
Mixture of toolsPhobiusConPred II
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Post translational modificationsPhosphorylation
S - T - YN-glycosylation
N O-glycosylation
S - T - (HO)KAcetylation, methylation
D - E - KSulfation
Y
Farnesylation, myristylation,palmitoylation, geranylgeranylation, GPI-anchor
C - Nter - CterUbiquitination and family
K - NterInteins (protein splicing)Pre-translational
SelenoproteinC
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
PTM detectionPattern prediction (PROSITE)Short or weak signalFrequent hit producerBest method is experimental
MS/MS detectionMost method use « rules »joining pattern detection and knowledge to predict sites.
NetOGlyc - Prediction of type O-glycosylation sites in mammalian proteins DictyOGlyc - Prediction of GlcNAc O-glycosylation sites inDictyostelium YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences NetPhos - Prediction of Ser, Thr and Tyr phosphorylation sites ineukaryotic proteins NMT - Prediction of N-terminal N-myristoylation Sulfinator - Prediction of tyrosinesulfation sites
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Low complexity regions
repeatscompositional biasPEST
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Low complexity / RepeatsDUST (DNA) / SEG
de novo detectionRepeatMasker (DNA)
search collectionREP
search collectionREPRO, Radar
de novo detectionPEST, PESTFind
de novo detection
EMBOSS (DNA)einvertedequicktandemetandempalindrome
EMBOSS (protein)oddcomp
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Coils
Helix of helixcoiled-coil
Leu-zipper
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Coils detection
COILSWeight matrices
Paircoil, MulticoilPairwise correlation
MarcoilHMM
Pepcoil (EMBOSS)Weight matrices
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Secondary structure
Structure to predictAlpha-helicesBeta-sheetsTurnsRandom coil
Garnier (EMBOSS)PHDDSCPREDATORNNSSPJpredJnetMany others
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Antigenic peptidePeptides binding to MHCclass I
8, 9, 10 mersclass II
15 mers (3+9+3)Depend highly on MHC type
Use of experimental knowledge
Databases of known peptides
SYFPEITHI HLA_Bind (BIMAS)MAPPP combined expertAntigenic (EMBOSS)Many more
Prediction of proteasome cleavage sites
NetChopPaProc
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Domain / Motif
All the protein domain descriptors
PROSITEPFAMSMARTPRODOMBLOCKSPRINTSTIGRfam…
Federation: InterProMany techniques
Patterns, RegexpPSSM (PSI-BLAST)ProfilesHMM
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Other Tools
You can find some of them on our serverswww.ch.embnet.org
Or on ExPASy serverwww.expasy.org/tools
Or ask Google!!www.google.com
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
European Molecular Biology Open Software Suite
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
How to use EMBOSS/Jemboss at SIB
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Free Open Source (for most Unix plateforms)GCG successor (compatible with GCG file format)More than 150 programs (ver. 2.9.0)Easy to install locally
but no interface, requires local databasesUnix command-line only
InterfacesJemboss, www2gcg, w2h, wemboss… (with account)Pise, EMBOSS-GUI, SRSWWW (no account)Staden, Kaptain, CoLiMate, Jemboss (local)
Access: www.emboss.org or emboss.sourceforge.net
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Format USA'asis' :: Sequence[start :end : reverse]
Format :: '@' ListFile[start :end : reverse]
Format ::'list' :ListFile[start :end : reverse]
Format ::Database :Entry [start :end : reverse]
Format ::Database -SearchField: Word[start :end : reverse]
Format :: File: Entry [start :end : reverse]
Format :: File: SearchField: Word[start :end : reverse]
Format ::Program Program-parameters '|' [start :end : reverse]
Example: fasta::Swissprot:UBP5_HUMAN[200:300]
DatabasesAny can be added, use showdb to display the available databases
Some details
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
showdbDisplays information on the currently available databases# Name Type ID Qry All Comment# ==== ==== == === === =======ipr_fetch P OK OK OK InterPro current by fetchipi_fetch P OK OK OK IPI current by fetchrefseq_fetch P OK OK OK refseq current by fetchrepbase_fetch P OK OK OK repbase current by fetchswiss_fetch P OK OK OK SwissProt current by fetchswissprot P OK OK OK SWISSPROT sequencestrembl P OK OK OK TREMBL sequencestrembl_fetch P OK OK OK trembl current by fetchtremblnew P OK OK OK TREMBL New sequencesug_fetch P OK OK OK Unigene by fetchembl N OK OK OK EMBL releaseemhum N OK OK OK EMBL release, Human section by emboss indexemrod N OK OK OK EMBL release, Rodent section by emboss indexemvrt N OK OK OK EMBL release, Vertebrate (nonhuman, nonrodent)
seqret (seqretall, seqretset, seqretsplit)entret (for complete untouched entry, e.g., for unigene, interpro, swissprot…)Possible to define your own « .embossrc » file
databases
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Some tools for DNAredata Search REBASE for enzyme name, references, suppliers etcremap Display a sequence with restriction cut sites, translation etcrestover Finds restriction enzymes that produce a specific overhangrestrict Finds restriction enzyme cleavage sitesshowseq Display a sequence with features, translation etcsilent Silent mutation restriction enzyme scancirdna Draws circular maps of DNA constructs lindna Draws linear maps of DNA constructs revseq Reverse and complement a sequence…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: remapECLAC E.coli lactose operon with lacI,lacZ,lacY and lacA genes.
Hin6ITaqI |HhaI| Bsc4I | Bsu6I| | Hin6I |BssKI | | |HhaI AciI | |BsiSI\ \ \\ \ \\\
GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT10 20 30 40 50 60
----:----|----:----|----:----|----:----|----:----|----:----|CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA
/ / / / / / / ///|TaqI | Hin6I AciI | | ||BssKIBsc4I HhaI | | |BsiSI
| | Bsu6I| Hin6IHhaI
# Enzymesthat cut Frequency IsoschizomersAciI 1Bsc4I 1BsiSI 1BssKI 1Bsu6I 1HhaI 2Hin6I 2 HinP1I,HspAITaqI 1
# Enzymesthatdo notcutAclI BamHI BceAI Bse1I BshI ClaI EcoRI EcoRIIHin4I HindII HindIII HpyCH4IV KpnI NotI
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: cirdnaFile: ../../data/data.cirp
Start 1001End 4270grouplabelBlock 1011 1362 3ex1endlabellabelTick 1610 8EcoR1endlabellabelBlock 1647 1815 1endlabellabelTick 2459 8BamH1endlabellabelBlock 4139 4258 3ex2endlabelendgroupgrouplabelRange 2541 2812 [ ] 5AluendlabellabelRange 3322 3497 > < 5MER13endlabelendgroup
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: plotorf
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
EMBOSS format input/outputUFO Universal Feature Object
gff, swissprot, embl, pir, nbrf (with or without sequence)Alignments
Multiple and pairwise, many flavors (FASTA, MSF, SRS…)Reports
Feature (UFO), SRS, motif, seqtable, excel, diffseq, listfile (USA), etc…
Sequences (compatible with USA) Many!!! E.g., fasta, clustal, gcg, paup, gff, embl, swissprot, acedb, abi, etc…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Web interfaces
PISE (Pasteur Institute Software Environment) http://www-alt.pasteur.fr/~letondal/Pise/
wEMBOSS (Belgium&Argentina) (not yet at SIB)
http://www.wemboss.org
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Pise a tool to generate Web interfaces for Molecular Biology programs
http://emboss.ch.embnet.org/Pise
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
http://www.wemboss.org
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Launch Jemboss http://emboss.ch.embnet.org/Jemboss
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Launch Jemboss
First time only…
Each time…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows other systems
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Summary
Anonymous web access through PiseRegistered access through JembossRegistered access through command-line (requires UNIX skills)
Please report problems!
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Exercises
DEA Exercises web based sequence analysisThe goal of this exercise is to use web based tools for protein sequence analysis
a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein orwith the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.
b) Pass this sequence through PFSCAN and search all databases. Compare with this command onludwig-sun1/2: hits -b "prf pat pfam" tr:Q9X252 c) use the different profile, motifs, pattern databases to get more information about the domain(s) you found.
d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)?
List of useful links:basic BLAST or advanced BLAST or PSI-BLAST
TMPred prediction tool for transmembrane regions (or TMHMM)
COILS prediction tool for coiled-coil regions
SignalP prediction tool for signal-peptide cleavage site
Profile, domain, motifs databases and search sites:PFSCAN
InterPro (Pfam, PRINTS, PROSITE, SMART)
HITS