Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | louisa-george |
View: | 220 times |
Download: | 0 times |
Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics.
Dr. Hyunji KimDepartment of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UKEmail:[email protected]
1) BLAST/WUBLAST
A search engine to find sequences of your interest.BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database. http://www.ncbi.nlm.nih.gov/BLAST/, http://www.ebi.ac.uk/blast2/,
2) ClustalW/T-Coffee/Muscle
Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. http://www.ebi.ac.uk/clustalw/
3) HMMer/PSI-BLAST
Builds a profile Hidden Markov Model from a set of sequences aligned.Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given
sequence.http://hmmer.wustl.edu/
4) Phylip/TreeDyn
Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more.http://evolution.genetics.washington.edu/phylip.html
Basic Tools
5) Databases
• Nucleotide databases; EMBL, Genbank &DDBJ• Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17th of Apr., 2007. (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3
• Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20th of Apr., 2007.
http://www.ebi.ac.uk/uniprot/index.html, http://www.ensembl.org/, http://www.ebi.ac.uk/genomes/index.html
6) Major Bioinformatics Centres, around the globe.
http://www.ebi.ac.uk/, http://www.ncbi.nlm.nih.gov/, http://www.ddbj.nig.ac.jp/, http://us.expasy.org/, http://www.sanger.ac.uk/, http://geneontology.org/
Searching for sequences by homology
- BLAST
x
yi
j
Reference: Gish, W. (1996-2006) http://blast.wustl.edu
Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR
Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done
Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N
SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. 615 3.0e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel. 615 3.0e-60 1
>SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160
Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%)
Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60
Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120
Multiple sequence alignment
– ClustalW
***************************************************** CLUSTAL W (1.83) Multiple Sequence
Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2
****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now
(Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:
CLUSTAL W (1.82) multiple sequence alignment
KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF------ 79MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS-------- 70O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF----------------- 64Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL------------- 73Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT-------- 79Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84
GrayOthers
Hydroxyl, AmineGreenSTYHCNGQ
BasicMagentaRHK
AcidicBlueDE
Small (small+ hydrophobic (incl.aromatic -Y))
RedAVFPMILW
Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST &
HMMer
DNA sequence
Amino acid sequence
PSI-BLAST
Phylogeny: Phylip & Treedyn
Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4):406-425, 1987
TreeDyn
Protein secondary structure prediction:
two consensus methods
http://sbcb.bioch.ox.ac.uk/TM_noj/TM_noj.html
640 650 660 670 680 690 700 | | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFKALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ------------???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh???????????-----------
Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server.
Example Output
http://pongo.biocomp.unibo.it/pongo
Pongo
Example Output by Pongo
Background for practical sessions
Ion channels ; Potassium channels ; Voltage-gated potassium channels
• Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes.
• There are several major families of ion channels, for instance K+, Na+, Ca2+ and Cl- channels as well as ligand gated ion channels (LGICs).
•Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005
TM
T1
Introduction to your input sequence
K+ channels, blastp
Homologues are visualised in BLIXEM.
Your expected blastp-output
Kv
BK
SK
Erg
Kir
CNG
AKT
Kv1.xShabKv2.xShalKv4.xKv5.6.8.9.ShawKv3.x
Kir2.xKir6.2Kir3.xKir4.xKir1.1Kir6.1Kir2.3
Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003
Alignment you are about to build, not necessarily as big.
hmmsearch - search a sequence database with a profile HMM
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -HMM file: Kv.hmm [Kv_homologues]Sequence database: infile_comb- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query HMM: Kv_homologuesHMM has been calibrated; E-values are empirical estimates]Scores for complete sequences (score includes all domains):
Sequence Description Score E-value N -------- ----------- ----- ------- ---CIKS_DROME 241.2 3.2e-71 1Q9VX00_DROME 234.3 3.9e-69 1CIKB_DROME 159.3 1.5e-46 1O62350_Celegans 156.7 8.8e-46 1Q9VLC6_DROME 156.6 9.6e-46 1CIKW_DROME 156.5 1e-45 1Q8SYL2_DROME 156.5 1e-45 1Q22012_Celegans 155.3 2.4e-45 1Filtered_5DROME 140.5 6.6e-41 1Filtered_6DROME 140.5 6.6e-41 1Q9XXD1_Celegans 125.0 3.1e-36 1
Example of pHMM-related output
Kir
Kv
BK
SK
AKT
CNG/HErg
KcsA
MthK
Kv1.2
KvAP
Raw tree-files produced by PHYLIP
Phylogenetic trees modified in TreeDyn