+ All Categories
Home > Documents > I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease...

I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease...

Date post: 08-Jan-2018
Category:
Upload: dominic-byrd
View: 218 times
Download: 0 times
Share this document with a friend
Description:
INTRODUCTION Current motif finding soft-wares: MEME PROSITE PRATT, etc Do they work with large number of sequences? Pattern discovery relies on statistical or combinatorial techniques, looking for signals Signal-to-noise ratio becomes less clear as the number of sequences increases What to do? I.U. School of Informatics

If you can't read please download the document

Transcript

I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz 04/25/04 Capstone Presentation INTRODUCTION Motifs Highly conserved regions across a subset of proteins that share the same function A molecules function A Structural Feature Family membership Motifs can be used to predict YNEDSKH YDDDSNH YDNDSNH YENDSKH >Seq A >Seq B >Seq C >Seq D I.U. School of Informatics INTRODUCTION Current motif finding soft-wares: MEME PROSITE PRATT, etc Do they work with large number of sequences? Pattern discovery relies on statistical or combinatorial techniques, looking for signals Signal-to-noise ratio becomes less clear as the number of sequences increases What to do? I.U. School of Informatics Develop a computational procedure to find functional motifs from large number of sequences Objective I.U. School of Informatics BLAST (Sequence alignment tool) BAG ( Sequence Clustering package) CLUSTAL W (Multiple sequence alignment) HMMERII (HMM based software) BLOCK MAKER (Block/Motif finder) LAMA (Block comparison tools) PERL Tools COMPUTATIONAL PROCEDURE I.U. School of Informatics COMPUTATIONAL PROCEDURE 1- Collecting and Clustering Sequences I.U. School of Informatics COMPUTATIONAL PROCEDURE 2 - ENRICHMENT I.U. School of Informatics 3 REFINEMENT 4 MOTIF FINDING COMPUTATIONAL PROCEDURE I.U. School of Informatics A Case Study with Disease Resistance Genes in Arabidopsis thaliana I.U. School of Informatics Why Disease Resistance Genes? I.U. School of Informatics Background, Disease Resistance Genes DomainProbable Function TIR CC KIN LRRRecognition of specificity NBATP and GTP binding I.U. School of Informatics 116 disease resistance protein or disease resistance protein like annotated sequences were extracted from Arabidopsis thaliana genome Clustered into 32 groups After refinement four clusters were formed for further analysis # of Sequences Cluster 196 Cluster 245 Cluster 3641 Cluster to 640 sequences were added in each cluster after HMM iterations Case Study, Arabidopsis thaliana I.U. School of Informatics Case Study, Arabidopsis thaliana PFAM Search Cluster 1NB-ARC, TIR, Kin, LRR Cluster 2 NB-ARC, Kin, LRR Cluster 3 Ser/Thr Kin Cluster 4 Kin Domains I.U. School of Informatics Cluster1 Cluster2 Results, Block Maker Case Study, Arabidopsis thaliana YDVFLSFRGVDTRQTIVSHL YDVFLSFRGEDTRKNIVSHL YDVFLSFRGEDTRKTIVSHL I.U. School of Informatics Results, Lama and BAG Case Study, Arabidopsis thaliana Cluster1 Cluster2 Cluster1 Cluster2 Cluster3 Clusters at the whole gene level Clusters at the Block Level I.U. School of Informatics TIR-ITIR-IIKin1a Kin2NBS-B Kin1aKin2NBS-BNBS-CNBS-AGLPL Cluster1 Cluster2 Cluster1 Cluster2 Cluster3 Clusters at the whole gene level Clusters at the Block Level LRR Case Study, Arabidopsis thaliana RPP8 RPM1 RPS4 RPP1 RPP5 I.U. School of Informatics Number of Disease Resistance Gene Candidates on each Chromosome Cluster Cluster CHR-1CHR-IICHR-III CHR-IV CHR-V Case Study, Arabidopsis thaliana I.U. School of Informatics New Disease Resistance Gene Candidates Cluster 1 GI GI GI Cluster 2 GI GI GI GI Case Study, Arabidopsis thaliana I.U. School of Informatics To test effectiveness of the computational procedure 792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs. Time : Took more than 9000 minutes on Pentium IV 1.7 GHz machine running on Linux Result : No known disease resistance gene motifs were detected Case Study, Arabidopsis thaliana I.U. School of Informatics CONCLUSIONS: Sensible combination of tools provides an excellent mechanism for motif detection Clustering helps to improve performance of other well known tools Case Study, Arabidopsis thaliana I.U. School of Informatics ACKNOWLEDGEMENT Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim will be presented at The 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences I.U. School of Informatics Case Study, Arabidopsis thaliana I.U. School of Informatics Disease Resistance Mechanism I.U. School of Informatics COMPUTATIONAL PROCEDURE Refinement B A C D B DC


Recommended