+ All Categories
Home > Documents > My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Date post: 02-Jan-2016
Category:
Upload: hugh-dorsey
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
38
My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010
Transcript
Page 1: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

My Research Work and Clustering

Dr. Bernard Chen Ph.D.University of Central Arkansas

Fall 2010

Page 2: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Outline

Introduction Experimental Setup Clustering Future Works

Page 3: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Central Dogma of Molecular Biology

Page 4: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Amino Acids, the subunit of proteins

Page 5: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Protein Primary, Secondary, and Tertiary Structure

Page 6: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Protein 3D Structure

Page 7: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Protein Sequence Motif Although there are 20 amino acids, the

construction of protein primary structure is not randomly choose among those amino acids

Sequence Motif: A relatively small number of functionally

or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.

Page 8: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Protein Sequence Motif

These biologically significant regions orresidues are usually: Enzyme catalytic site Prostethic group attachment sites

(heme, pyridoxal-phosphate, biotin…) Amino acid involved in binding a metal

ion Cysteines involved in disulfide bonds Regions involved in binding a molecule

(ATP/ADP, GDP/GTP, Ca, DNA…)

Page 9: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Goal of the our group The main purpose is trying to obtain

and extract protein sequence motifs which are universally conserved and

across protein family boundaries.

Discuss the relation between Protein Primary structure and Tertiary structure

Page 10: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Outline

Introduction Experimental Setup Clustering Future Works

Page 11: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Experiment setup: HSSP matrix: 1b25

Page 12: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

HSSP matrix: 1b25

Page 13: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Representation of Segment Sliding window size: 9 Each window corresponds to a sequence

segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP.

More than 560,000 segments (413MB) are generated by this method.

DSSP: Obtain 2nd Structure information

Page 14: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Outline

Introduction Experimental Setup Clustering Future Works

Page 15: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Clustering Algorithms

There are two clustering algorithms we used in our approach:

K-means Clustering Fuzzy C-means Clustering

Page 16: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

K-means Clustering

Page 17: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

K-means Clustering

Page 18: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

K-means Clustering

Page 19: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

K-means Clustering

Page 20: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

K-means Clustering

Page 21: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 22: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 23: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 24: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 25: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 26: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 27: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Fuzzy C-means Clustering

Page 28: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Granular Computing Model

Original dataset

Fuzzy C-Means Clustering

Information Granule 1

Information Granule M

K-means Clustering

K-means Clustering

Join Information

Final Sequence Motifs Information

...

...

Page 29: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Motivation

Page 30: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Reduce Space-complexity

Number of Members

Number of Clusters

Data Size

Granule 0 136112 151 99.9MB

Granule 1 68792 76 50.5MB

Granule 2 86094 95 63.2MB

Granule 3 65361 72 47.9MB

Granule 4 63159 70 46.3MB

Granule 5 120130 133 88.2MB

Granule 6 128874 143 94.6MB

Granule 7 4583 5 3.3MB

Granule 8 43254 48 31.7MB

Granule 9 5032 6 3.7MB

Total 721390 799 529MB

Original dataset

562745 800 413MB

Table 1 summary of results obtained by FCM

Page 31: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Reduce Time-complexity

Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days)

Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)

Page 32: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

HSSP-BLOSUM62 Measure

Page 33: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Outline

Introduction Experimental Setup Clustering Future Works

Page 34: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Part1Bioinformatics

Knowledge and Dataset Collection

Part2Discovering Protein

Sequence Motifs

Part3Motif Information

Extraction

Part4Mining the Relations between Motifs and

Motifs

Part5Protein Local Tertiary Structure Prediction

FutureWorks

Page 35: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

PART3: protein information extraction by Decision Tree

Page 36: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

PART4: Clustering with association rule and graph theory

Page 37: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

PART4: Super rule generation by DB-Scan

Apply DB scan to build up super-rules among all motifs

Page 38: My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

PART5: Protein local tertiary structure prediction

By Decision Tree Naïve Bayesian Association rule algorithms and more…


Recommended