I.U. School of Informatics

Post on 03-Jan-2016

34 views 4 download

Tags:

description

Capstone Presentation. Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz. I.U. School of Informatics. 04/25/04. INTRODUCTION. Motifs - PowerPoint PPT Presentation

transcript

I.U. School of Informatics

Motif Discovery from Large Number of Sequences:

A Case Study with Disease Resistance Genes

in Arabidopsis thaliana

by Irfan Gunduz

04/25/04

Capstone Presentation

INTRODUCTION

Motifs

• Highly conserved regions across a subset of proteins that share the same function

A molecule’s function A Structural Feature Family membership

• Motifs can be used to predict

YNEDSKHYDDDSNHYDNDSNHYENDSKH

>Seq A>Seq B>Seq C>Seq D

I.U. School of Informatics

INTRODUCTION

Current motif finding soft-wares:• MEME• PROSITE• PRATT, etc

Do they work with large number of sequences?

• Pattern discovery relies on statistical or combinatorial techniques,looking for signals

• Signal-to-noise ratio becomes less clear as the number of sequences increases

What to do?

I.U. School of Informatics

I.U. School of Informatics

Develop a computational procedure to find functional motifs from large number of sequences

Objective

I.U. School of Informatics

BLAST (Sequence alignment tool) BAG ( Sequence Clustering package) CLUSTAL W (Multiple sequence alignment) HMMERII (HMM based software) BLOCK MAKER (Block/Motif finder) LAMA (Block comparison tools) PERL

Tools

COMPUTATIONAL PROCEDURE

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

1- Collecting and Clustering Sequences

Extract well-annotated sequences of interest from genome of interest

All to all pair wise comparison using Blast

Estimate the best bit score for clustering

Cluster sequences using BAG

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

2 - ENRICHMENT

Align multiple sequences in each cluster

Start HMM based programs build profile for each cluster

Search genome of interest with new profileand extract more sequences if available

I.U. School of Informatics

3 – REFINEMENT

4 – MOTIF FINDING

COMPUTATIONAL PROCEDURE

Refine clusters by regrouping

Submit sequences in each cluster to Block Maker

compare blocks using LAMA

Cluster blocks by using BAG

I.U. School of Informatics

A Case Study with Disease Resistance Genes in Arabidopsis thaliana

I.U. School of Informatics

Why Disease Resistance Genes?

I.U. School of Informatics

Background, Disease Resistance Genes

Domain Probable FunctionTIRCCKINLRR Recognition of specificityNB ATP and GTP binding

I.U. School of Informatics

• 116 disease resistance protein or disease resistance protein like annotated sequences were extracted from Arabidopsis thaliana genome

• Clustered into 32 groups

• After refinement four clusters were formed for further analysis

# of Sequences

Cluster 1 96

Cluster 2 45

Cluster 3 641

Cluster 4 11

• 20 to 640 sequences were added in each cluster after HMM iterations

Case Study, Arabidopsis thaliana

I.U. School of Informatics

Case Study, Arabidopsis thaliana

PFAM Search

Cluster 1 NB-ARC, TIR, Kin, LRR

Cluster 2 NB-ARC, Kin, LRR

Cluster 3 Ser/Thr Kin

Cluster 4 Kin

Domains

I.U. School of Informatics

Number of Disease Resistance Gene Candidates on each Chromosome

Cluster 1 16 2 6 16 35Cluster 2 20 0 6 4 9

CHR-1 CHR-II CHR-III CHR-IV CHR-V

Case Study, Arabidopsis thaliana

I.U. School of Informatics

New Disease Resistance Gene Candidates

Cluster 1GI 15236505GI 15242136GI 15233862

Cluster 2

GI 15221277GI 15221280GI 15217940GI 15221744

Case Study, Arabidopsis thaliana

I.U. School of Informatics

To test effectiveness of the computational procedure

792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs.

• Time : Took more than 9000 minutes on Pentium IV 1.7 GHz machine running on Linux

• Result : No known disease resistance gene motifs were detected

Case Study, Arabidopsis thaliana

I.U. School of Informatics

CONCLUSIONS:

Sensible combination of tools provides an excellent mechanism for motif detection

Clustering helps to improve performance of other well known tools

Case Study, Arabidopsis thaliana

I.U. School of Informatics

ACKNOWLEDGEMENT

Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes

in Arabidopsis thaliana

Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim

will be presented at

The 2003 International Conference on Mathematics andEngineering Techniques in Medicine and Biological Sciences

I.U. School of Informatics

Case Study, Arabidopsis thaliana

I.U. School of Informatics

Disease Resistance Mechanism

I.U. School of Informatics

COMPUTATIONAL PROCEDURE

Refinement

B

A

C

D BD C