April 13, 2023 Data Mining: Concepts and Techniques
1
Data Mining: Concepts and Techniques
— Slides for Textbook —
— Appendix A —
©Jiawei Han and Micheline Kamber
Slides contributed by Jian Pei ([email protected])
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
April 13, 2023 Data Mining: Concepts and Techniques
2
Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining
Introduction
Overview and design philosophy
Basic components
Data set components
Data mining models
Operations on data model
Concluding remarks
April 13, 2023 Data Mining: Concepts and Techniques
3
Why OLE DB for Data Mining?
Industry standard is critical for data mining development, usage, interoperability, and exchange
OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP
Building mining applications over relational databases is nontrivial Need different customized data mining algorithms
and methods Significant work on the part of application
builders Goal: ease the burden of developing mining
applications in large relational databases
April 13, 2023 Data Mining: Concepts and Techniques
4
Motivation of OLE DB for DM
Facilitate deployment of data mining models Generating data mining models Store, maintain and refresh models as data is
updated Programmatically use the model on other data
set Browse models
Enable enterprise application developers to participate in building data mining solutions
April 13, 2023 Data Mining: Concepts and Techniques
5
Features of OLE DB for DM
Independent of provider or software Not specialized to any specific mining
model Structured to cater to all well-known
mining models Part of upcoming release of Microsoft SQL
Server 2000
April 13, 2023 Data Mining: Concepts and Techniques
6
Overview
Core relational engine
exposes OLE DB in a
language-based API
Analysis server exposes
OLE DB OLAP and OLE DB
DM
Maintain SQL metaphor
Reuse existing notionsRDB engine
OLE DB
Analysis Server
OLE DB OLAP/DM
Data miningapplications
April 13, 2023 Data Mining: Concepts and Techniques
7
Key Operations to Support Data Mining Models
Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model
Populate a mining model from training data
Predict attributes for new data Browse a mining model fro reporting and
visualization
April 13, 2023 Data Mining: Concepts and Techniques
8
DMM As Analogous to A Table in SQL
Create a data mining module object CREATE MINING MODEL [model_name]
Insert training data into the model and train it INSERT INTO [model_name]
Use the data mining model SELECT relation_name.[id], [model_name].
[predict_attr] consult DMM content in order to make predictions and
browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model
and a data set (tables) Deploy DMM by just writing SQL queries!
April 13, 2023 Data Mining: Concepts and Techniques
9
Two Basic Components
Cases/caseset: input data A table or nested tables (for hierarchical data)
Data mining model (DMM): a special type of table A caseset is associated with a DMM and meta-info
while creating a DMM Save mining algorithm and resulting abstraction
instead of data itself Fundamental operations: CREATE, INSERT INTO,
PREDICTION JOIN, SELECT, DELETE FROM, and DROP
April 13, 2023 Data Mining: Concepts and Techniques
10
Flatterned Representation of Caseset
Customers
Customer ID
Gender
Hair Color
Age
Age ProbCar
Owernership
Customer ID
Car
Car Prob
Product Purchases
Customer ID
Product Name
Quantity
Product Type
CID Gend Hair Age Age prob Prod Quan Type CarCar
prob
1 Male Black 35 100% TV 1 Elec Car 100%
1 Male Black 35 100% VCR 1 Elec Car 100%
1 Male Black 35 100% Ham 6 Food Car 100%
1 Male Black 35 100% TV 1 Elec Van 50%
1 Male Black 35 100% VCR 1 Elec Van 50%
1 Male Black 35 100% Ham 6 Food Van 50%
Problem: Lots of replication!
April 13, 2023 Data Mining: Concepts and Techniques
11
Logical Nested Table Representation of Caseset
Use Data Shaping Service to generate a hierarchical rowset Part of Microsoft Data Access
Components (MDAC) products
CID Gend Hair Age Age prob
Product Purchases
Car Ownership
Prod Quan Type CarCar
prob
1 Male Black 35 100%
TV 1 Elec Car 100%
VCR 1 ElecVan 50%
Ham 6 Food
April 13, 2023 Data Mining: Concepts and Techniques
12
More About Nested Table
Not necessary for the storage subsystem to support nested records
Cases are only instantiated as nested rowsets prior to training/predicting data mining models
Same physical data may be used to generate different casesets
April 13, 2023 Data Mining: Concepts and Techniques
13
Defining A Data Mining Model
The name of the model
The algorithm and parameters
The columns of caseset and the
relationships among columns
“Source columns” and “prediction
columns”
April 13, 2023 Data Mining: Concepts and Techniques
14
Example
CREATE MINING MODEL [Age Prediction] %Name of Model([Customer ID] LONG KEY, %source column[Gender] TEXT DISCRETE, %source column[Age] Double DISCRETIZED() PREDICT, %prediction column[Product Purchases] TABLE %source column([Product Name] TEXT KEY, %source column[Quantity] DOUBLE NORMAL CONTINUOUS, %source column[Product Type] TEXT DISCRETE RELATED TO [Product Name]
%source column))USING [Decision_Trees_101] %Mining algorithm used
April 13, 2023 Data Mining: Concepts and Techniques
15
Column Specifiers
KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause)
PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE
April 13, 2023 Data Mining: Concepts and Techniques
16
Attribute Types
DISCRETE ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME
April 13, 2023 Data Mining: Concepts and Techniques
17
Populating A DMM
Use INSERT INTO statement
Consuming a case using the data mining
model
Use SHAPE statement to create the
nested table from the input data
April 13, 2023 Data Mining: Concepts and Techniques
18
Example: Populating a DMM
INSERT INTO [Age Prediction]([Customer ID], [Gender], [Age],[Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]))SHAPE{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}APPEND{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM SalesORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases]
April 13, 2023 Data Mining: Concepts and Techniques
19
Using Data Model to Predict
Prediction join Prediction on dataset D using DMM M Different to equi-join
DMM: a “truth table” SELECT statement associated with
PREDICTION JOIN specifies values extracted from DMM
April 13, 2023 Data Mining: Concepts and Techniques
20
Example: Using a DMM in Prediction
SELECT t.[Customer ID], [Age Prediction].[Age]FROM [Age Prediction]PRECTION JOIN(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}APPEND({SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases]
)AS tON [Age Prediction].[Gender]=t.[Gender] AND[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
April 13, 2023 Data Mining: Concepts and Techniques
21
Browsing DMM
What is in a DMM?
Rules, formulas, trees, …, etc
Browsing DMM
Visualization
April 13, 2023 Data Mining: Concepts and Techniques
22
Concluding Remarks
OLE DB for DM integrates data mining and database systems A good standard for mining application
builders How can we be involved?
Provide association/sequential pattern mining modules for OLE DB for DM?
Design more concrete language primitives? References
http://www.microsoft.com/data.oledb/dm.html
April 13, 2023 Data Mining: Concepts and Techniques
23
www.cs.uiuc.edu/~hanj
Thank you !!!Thank you !!!