Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | mckenzie-norton |
View: | 16 times |
Download: | 1 times |
Intelligent Database Systems Lab
Advisor : Dr.Hsu
Graduate : Keng-Wei Chang
Author : Balaji Rajagopalan
Mark W. Isken
國立雲林科技大學National Yunlin University of Science and Technology
Exploiting data preparation to enhance
mining and knowledge discovery
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART C: APPLICATIONS AND REVIEWS, VOL. 31, NO. 4, NOVEMBER 2001
Intelligent Database Systems Lab
Outline
Motivation Objective Introduction Data Preparation Research Method Results
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Motivation using organizational data for mining and
knowledge discovery not amenable for mining in its natural form
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Objective data enhancement by the introduction of new
attributes along with judicious aggregation of existing attributes results in higher quality knowledge discovery differential impact on the performance of different
mining algorithms
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Introduction Exponential growth information result a
tremendous volume of data to knowledge workers.
Knowledge management solution Knowledge repository Knowledge sharing Knowledge discovery
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Preparation Present a framework based on prior research in
knowledge discovery Data quality Data characteristics Data preparation
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Research Method data set from a large tertiary care hospital in
the United States was used few topics
A. Problem Domain
B. Data
C. Clustering Algorithms for Knowledge Discovery
D. Entropy-Based Metrics for Cluster Quality
Assessment
E. Rule Extraction Metrics
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Problem Domain allocation of inpatient beds
more difficult is use quantitative resource allocation in a manageable set of patient types
quantitative resource sequence of hospital units visited and corresponding
length of stay patient types
a group of patients consuming a similar level of hospital resources
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Problem Domain refer to this as the patient classification
problem too few V.S. too many patient types The key is identify the set of patient types
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Inpatient obstetrical and gynecological (OB/G
YN) patient flow There are numerous fields
demographics physician information ICD9-CM diagnostic procedure codes
diagnosis-related groups (DRGs)
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data almost 500 defined in DRGs range[353-384] are related to OB/GYN grouping these DRGs into five DRG types
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Clustering Algorithms for Knowledge Discovery
K-means and Kohonen seof-organizing Similarity
Euclidean distance function
N.Y.U.S.T.
I.M.
n
iii yxyxd
1
2,
Intelligent Database Systems Lab
Entropy-Based Metrics for Cluster Quality Assessment
Entropy
Weighted Entropy cluster size calculate a weighted average entropy measure for
a cluster solution
Purity, let
N.Y.U.S.T.
I.M.
i ijijj ppE
1log2
ijij pP max
be the number of cases having a DRG type of i in cluster j
ijn
l ljijij nnp /
Intelligent Database Systems Lab
Rule Extraction Metrics expect a high degree of resonance for most of
the rules with our domain knowledge
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Results detail the data enhancements relevant to this
studyA. Data Preparation : Basics
B. Mining and Knowledge Discovery
C. Differential Impact Based on Clustering Method
D. Usefulness of Knowledge Discovered
E. Limitations
F. Implications for Research and Practice
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Preparation : Basics Data set included fields that represent the path
and associated lengths of stay along that path
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Preparation : Basics Consider three data sets characterized in order
to illustrate the impact of data preparation ED1
Eight numeric variables
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Preparation : Basics ED2
Both DRG and CCS were designed to serve as aggregate measures of hospital resource consumption
in addition ED1, ED2 add five nominal variables
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Data Preparation : Basics ED3
in addition to ED2, ED3 contains two binary variables whether or not gave birth during the visit whether or not gave birth via C-section
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Mining and Knowledge DiscoveryN.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Mining and Knowledge DiscoveryN.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Differential Impact Based on Clustering Method
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Usefulness of Knowledge DiscoveredN.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Limitations may not exactly applicable in every case examine only two data mining algorithms
K-means and Kohonen self-organizing maps
illustrative, not exhaustive domain knowledge played a critical role in the
data preparation process
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Implications for Research and Practice
provides empirical evidence demonstrating the impact of data preparation on mining and knowledge discovery
engage in a comparative investigation of multiple altorithms
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
Personal opinion …
N.Y.U.S.T.
I.M.