+ All Categories
Home > Documents > I.U. School of Informatics

I.U. School of Informatics

Date post: 03-Jan-2016
Category:
Upload: pleasance-roche
View: 34 times
Download: 4 times
Share this document with a friend
Description:
Capstone Presentation. Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz. I.U. School of Informatics. 04/25/04. INTRODUCTION. Motifs - PowerPoint PPT Presentation
Popular Tags:
24
.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz 04/25/04 Capstone Presentation
Transcript
Page 1: I.U. School of Informatics

I.U. School of Informatics

Motif Discovery from Large Number of Sequences:

A Case Study with Disease Resistance Genes

in Arabidopsis thaliana

by Irfan Gunduz

04/25/04

Capstone Presentation

Page 2: I.U. School of Informatics

INTRODUCTION

Motifs

• Highly conserved regions across a subset of proteins that share the same function

A molecule’s function A Structural Feature Family membership

• Motifs can be used to predict

YNEDSKHYDDDSNHYDNDSNHYENDSKH

>Seq A>Seq B>Seq C>Seq D

I.U. School of Informatics

Page 3: I.U. School of Informatics

INTRODUCTION

Current motif finding soft-wares:• MEME• PROSITE• PRATT, etc

Do they work with large number of sequences?

• Pattern discovery relies on statistical or combinatorial techniques,looking for signals

• Signal-to-noise ratio becomes less clear as the number of sequences increases

What to do?

I.U. School of Informatics

Page 4: I.U. School of Informatics

I.U. School of Informatics

Develop a computational procedure to find functional motifs from large number of sequences

Objective

Page 5: I.U. School of Informatics

I.U. School of Informatics

BLAST (Sequence alignment tool) BAG ( Sequence Clustering package) CLUSTAL W (Multiple sequence alignment) HMMERII (HMM based software) BLOCK MAKER (Block/Motif finder) LAMA (Block comparison tools) PERL

Tools

COMPUTATIONAL PROCEDURE

Page 6: I.U. School of Informatics

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

1- Collecting and Clustering Sequences

Extract well-annotated sequences of interest from genome of interest

All to all pair wise comparison using Blast

Estimate the best bit score for clustering

Cluster sequences using BAG

Page 7: I.U. School of Informatics

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

2 - ENRICHMENT

Align multiple sequences in each cluster

Start HMM based programs build profile for each cluster

Search genome of interest with new profileand extract more sequences if available

Page 8: I.U. School of Informatics

I.U. School of Informatics

3 – REFINEMENT

4 – MOTIF FINDING

COMPUTATIONAL PROCEDURE

Refine clusters by regrouping

Submit sequences in each cluster to Block Maker

compare blocks using LAMA

Cluster blocks by using BAG

Page 9: I.U. School of Informatics

I.U. School of Informatics

A Case Study with Disease Resistance Genes in Arabidopsis thaliana

Page 10: I.U. School of Informatics

I.U. School of Informatics

Why Disease Resistance Genes?

Page 11: I.U. School of Informatics

I.U. School of Informatics

Background, Disease Resistance Genes

Domain Probable FunctionTIRCCKINLRR Recognition of specificityNB ATP and GTP binding

Page 12: I.U. School of Informatics

I.U. School of Informatics

• 116 disease resistance protein or disease resistance protein like annotated sequences were extracted from Arabidopsis thaliana genome

• Clustered into 32 groups

• After refinement four clusters were formed for further analysis

# of Sequences

Cluster 1 96

Cluster 2 45

Cluster 3 641

Cluster 4 11

• 20 to 640 sequences were added in each cluster after HMM iterations

Case Study, Arabidopsis thaliana

Page 13: I.U. School of Informatics

I.U. School of Informatics

Case Study, Arabidopsis thaliana

PFAM Search

Cluster 1 NB-ARC, TIR, Kin, LRR

Cluster 2 NB-ARC, Kin, LRR

Cluster 3 Ser/Thr Kin

Cluster 4 Kin

Domains

Page 17: I.U. School of Informatics

I.U. School of Informatics

Number of Disease Resistance Gene Candidates on each Chromosome

Cluster 1 16 2 6 16 35Cluster 2 20 0 6 4 9

CHR-1 CHR-II CHR-III CHR-IV CHR-V

Case Study, Arabidopsis thaliana

Page 18: I.U. School of Informatics

I.U. School of Informatics

New Disease Resistance Gene Candidates

Cluster 1GI 15236505GI 15242136GI 15233862

Cluster 2

GI 15221277GI 15221280GI 15217940GI 15221744

Case Study, Arabidopsis thaliana

Page 19: I.U. School of Informatics

I.U. School of Informatics

To test effectiveness of the computational procedure

792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs.

• Time : Took more than 9000 minutes on Pentium IV 1.7 GHz machine running on Linux

• Result : No known disease resistance gene motifs were detected

Case Study, Arabidopsis thaliana

Page 20: I.U. School of Informatics

I.U. School of Informatics

CONCLUSIONS:

Sensible combination of tools provides an excellent mechanism for motif detection

Clustering helps to improve performance of other well known tools

Case Study, Arabidopsis thaliana

Page 21: I.U. School of Informatics

I.U. School of Informatics

ACKNOWLEDGEMENT

Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes

in Arabidopsis thaliana

Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim

will be presented at

The 2003 International Conference on Mathematics andEngineering Techniques in Medicine and Biological Sciences

Page 22: I.U. School of Informatics

I.U. School of Informatics

Case Study, Arabidopsis thaliana

Page 23: I.U. School of Informatics

I.U. School of Informatics

Disease Resistance Mechanism

Page 24: I.U. School of Informatics

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

Refinement

B

A

C

D BD C


Recommended