Date post: | 01-Jun-2018 |
Category: |
Documents |
Upload: | nareshkosuri6966 |
View: | 230 times |
Download: | 0 times |
of 67
8/9/2019 Dwm Course
1/67
Geethanjali College of Engineering and Technology
DEPARTMENT OF INFORMATION TECHNOLOGY
(Name of the Subject/Lab Cou!e"#O$eat%&' S!tem!
()NT* CODE# ++,-.. "
Po'amme# *G/PG
a&ch# IT 0e!%o&
No# . 1
Yea# III
Docume&t Numbe #GCET/IT/-,2 11
Seme!te# I No3 of
Pa'e!#4,
C5a!!%6cat%o& !tatu! (*&e!t%cte7/Re!t%cte7 " #
D%!t%but%o& L%!t#
Pe$ae7 b #
." Name # Y3RA)*
8" S%'& #
-" De!%'& #ASSOC3PROFF
2" Date #8
*$7ate7 b #
." Name #
8" S%'& #
-" De!%'& #
2" Date #
0e%6e7 b # 1Fo 93C o&5
." Name # ."Name #
8" S%'& # 8"
S%'& #
-" De!%'& # -" De!%'& #
2" Date # 2" Date #
A$$o:e7 b (HOD" #
." Name#
8/9/2019 Dwm Course
2/67
8" S%'& #
-" Date #
*If it is prepared first time 1 , if it is updated 2 or 3
**GCET/Dept./3 indicates 3rdyear; ! indicates fourt" in t"e #ist of $%T& 'y##a(us (oo)
SYLLABUS
UNIT-IINDTODUCTION:Fundamentals of data mining, Data Mining Functionalities,
Classification of Data Mining systems, Major issues in Data Mining.
Data Preprocessing : Needs Preprocessing the Data, Data Cleaning, Data Integration and
Transformation, Data eduction, Discreti!ation and Concept "ierarchy #eneration.
UNIT-II
Data $arehouse and %&'P Technology for Data Mining Data $arehouse, Multidimensional
Data Model, Data $arehouse 'rchitecture, Data $arehouse
Implementation, Further De(elopment of Data Cu)e Technology, From Data $arehousing to
Data Mining.
UNIT-III
DATA MINING PRIMITIVES, LANGUAGES AND SYSTEM ARCHITECTURES:Data
Mining Primiti(es, Data Mining *uery &anguages, Designing #raphical +ser Interfaces ased
on a Data Mining *uery &anguage 'rchitectures of Data Mining -ystems.
UNIT-IV
CONCEPTS DESCRIPTION : Characteri!ation and Comparison : Data #enerali!ation and
-ummari!ation ased Characteri!ation, 'nalytical Characteri!ation: 'nalysis of 'ttri)ute
ele(ance, Mining Class Comparisons: Discriminating )et/een Different Classes, Mining
8/9/2019 Dwm Course
3/67
Descripti(e -tatistical Measures in &arge Data)ases.
UNIT-V
MINING ASSSOCIATION RULES IN LARGE DATABASES:'ssociation ule Mining,Mining -ingleDimensional oolean 'ssociation ules from Transactional Data)ases, Mining
Multile(el 'ssociation ules from Transaction Data)ases, Mining Multidimensional 'ssociation
ules from elational Data)ases and Data $arehouses, From 'ssociation Mining to Correlation
'nalysis, Constraintased 'ssociation Mining.
UNIT-VI
CLASSIFICATION AND PREDICTION:Issues egarding Classification and Prediction,
Classification )y Decision Tree Induction, ayesian Classification, Classification )yac0propagation, Classification ased on Concepts from 'ssociation ule Mining, %ther
Classification Methods, Prediction, Classifier 'ccuracy.
UNIT-VII
CLUSTER ANALYSIS INTRODUCTION:Types of Data in Cluster 'nalysis, '
Categori!ation of Major Clustering Methods, Partitioning Methods, Densityased Methods,
#ridased Methods, Modelased Clustering Methods, %utlier 'nalysis.
UNIT-VIII
MINING COMPLEX TYPES OF DATA: Multimensional 'nalysis and Descripti(e Mining of
Comple1, Data %)jects, Mining -patial Data)ases, Mining Multimedia Data)ases, Mining Time
-eries and -e2uence Data, Mining Te1t Data)ases, Mining the $orld $ide $e).
TEXT BOOKS :
3. Data Mining 4 Concepts and Techni2ues 5I'$6I "'N 7 MIC"6&IN6
8'M6 "arcourt India.
REFERENCES :
3. Data Mining Introductory and ad(anced topics 4M'#'6T " D+N"'M,
P6'-%N 6D+C'TI%N
9. Data Mining Techni2ues 4 '+N 8 P+5'I, +ni(ersity Press.
. Data $arehousing in the eal $orld 4 -'M 'N'"%; 7 D6NNI-
8/9/2019 Dwm Course
4/67
M+';. Pearson 6dn 'sia.
< Data $arehousing Fundamentals 4 P'+&'5 P%NN'I'" $I&6; -T+D6NT
6DITI%N.
=. The Data $arehouse &ife cycle Tool 0it 4 '&P" 8IM'&& $I&6;
-T+D6NT 6DITI%N.
For more de!"#$, %"$" H&:''((()*+)
GEETHANALI COLLEGE OF ENGINEERING . TECHNOLOGY
CHEERYAL /V0 KEESARA /M0 RR D"$r"1)
De&!rme+ o2: IT
Ye!r !+d Seme$er o 34om S5*e1 "$ O22ered: III BTe14, IISem
N!me o2 4e S5*e1: D!!(!re4o$"+6 A+d D!! M"+"+6
N!me o2 4e F!1#7:Y)RAU De$"6+!"o+: A$$o) Pro2e$$or
De&!rme+: IT
8)8) I+rod1"o+ o 4e $5*e1:
Data mining, the extraction of hidden predictive information from large databases, is
a po/erful ne/ technology /ith great potential to help companies focus on the most
important information in their data /arehouses. Data mining tools predict future trends and
)eha(iors, allo/ing )usinesses to ma0e proacti(e, 0no/ledgedri(en decisions. The
automated, prospecti(e analyses offered )y data mining mo(e )eyond the analyses of past
e(entsprovided by retrospective tools typical of decision support systems. Data mining tools
8/9/2019 Dwm Course
5/67
canans/er )usiness 2uestions that traditionally /ere too time consuming to resol(e. They
scour data)ases for hidden patterns, finding predicti(e information that e1perts may miss
)ecause it lies outside their e1pectations.
Most companies already collect and refine massi(e 2uantities of data. Data mining
techni2ues can )e implemented rapidly on e1isting soft/are and hard/are platforms to enhance
the (alue of e1isting information resources, and can )e integrated /ith ne/ products and systems
as they are )rought online. $hen implemented on high performance client>ser(er or parallel
processing computers, data mining tools can analy!e massi(e data)ases to deli(er ans/ers to
2uestions such as, ?$hich clients are most li0ely to respond to my ne1t promotional mailing, and
/hy@?
This /hite paper pro(ides an introduction to the )asic technologies of data mining.
61amples of profita)le applications illustrate its rele(ance to todayAs )usiness en(ironment as
/ell as a )asic description of ho/ data /arehouse architectures can e(ol(e to deli(er the (alue of
data mining to end users.
8)9)O5*e1"%e$ o2 4e $5*e1
Im&ro%e !#"7 o2 D!!
-ince a common D-- deficiency is ?dirty data,? it is almost guaranteed that you /ill ha(e
to address the 2uality of your data during e(ery data /arehouse iteration. Data cleansing is a
stic0y pro)lem in data /arehousing. %n one hand, a data /arehouse is supposed to pro(ide
clean, integrated, consistent and reconciled data from multiple sources. %n the other hand, /e are
faced /ith a de(elopment schedule of B39 months. It is almost impossi)le to achie(e )oth/ithout ma0ing some compromises. The difficulty lies in determining /hat compromises to
ma0e. "ere are some guidelines for determining your specific goal to cleanse your source data:
8/9/2019 Dwm Course
6/67
Ne%er r7 o 1#e!+$e ALL 4e d!!)6(eryone /ould li0e to ha(e all the data perfectly
clean, )ut no)ody is /illing to pay for the cleansing or to /ait for it to get done. To clean it all
/ould simply ta0e too long. The time and cost in(ol(ed often e1ceeds the )enefit.
Ne%er 1#e!+$e NOTHING)In other /ords, al/ays plan to clean something. 'fter all,
one of the reasons for )uilding the data /arehouse is to pro(ide cleaner and more relia)le data
than you ha(e in your e1isting %&TP or D-- systems.
Deerm"+e 4e 5e+e2"$ o2 4!%"+6 1#e!+ d!!) 61amine the reasons for )uilding the data
/arehouse:
Do you ha(e inconsistent reports@
$hat is the cause for these inconsistencies@
Is the cause dirty data or is it programming errors@
$hat dollars are lost due to dirty data@
$hich data is dirty@
Deerm"+e 4e 1o$ 2or 1#e!+$"+6 4e d!!) efore you ma0e cleansing all the dirty data
your goal, you must determine the cleansing cost for each dirty data element. 61amine ho/ long
it /ould ta0e to perform the follo/ing tas0s:
'naly!e the data
Determine the correct data (alues and correction algorithms
$rite the data cleansing programs
Correct the old files and data)ases if appropriate
8/9/2019 Dwm Course
7/67
Com&!re 1o$ 2or 1#e!+$"+6 o do##!r$ #o$ 57 #e!%"+6 " d"r7) 6(erything in )usiness
must )e costjustified. This applies to data cleansing as /ell. For each data element, compare the
cost for cleansing it to the )usiness loss )eing incurred )y lea(ing it dirty and decide /hether to
include it in your data cleansing goal. If dollars lost e1ceeds the cost of cleansing, put the data on
the ?to )e cleansed? list. If cost for cleansing e1ceeds dollars lost, do not put the data on the ?to
)e cleansed? list.
Pr"or"";e 4e d"r7 d!! 7o 1o+$"dered 2or 7or d!! 1#e!+$"+6 6o!#) ' difficult part
of compromising is )alancing the time you ha(e for the project /ith the goals you are trying to
achie(e. 6(en though you may ha(e )een cautious in selecting dirty data for your cleansing goal,
you may still ha(e too much dirty data on your ?to )e cleansed? list. Prioriti!e your list.
For e!14 &r"or"";ed d"r7 d!! "em !$
8/9/2019 Dwm Course
8/67
M"+"m";e I+1o+$"$e+ Re&or$
'ddressing another common complaint a)out current D-- en(ironments, namely
inconsistent reports, /ill most li0ely )ecome one of your data /arehouse goals. Inconsistent
reports are mainly caused )y misuse of data, and the primary reason for misuse of data is
disagreement or misunderstanding of the meaning or the content of data. Correcting this pro)lem
is another predicament in data /arehousing, )ecause it re2uires the interested )usiness units to
resol(e their disagreements or misunderstandings. This type of effort has more than once
torpedoed a data /arehouse project )ecause it too0 too long to resol(e the disputes. Ignoring the
issue is not a solution either. $e suggest the follo/ing guidelines:
8/9/2019 Dwm Course
9/67
8)>) NTU S7##!5$ ("4 Add""o+!# To&"1$
-
.no
+NIT
N%
Topic 'dditional
Topics
3 3 Introduction : Fundamentals of data
mining,
Data Mining Functionalities
Classification of Data Mining systems,
Major issues in DataMining.
Data Preprocessing : Needs Preprocessing
the Data
Data Cleaning, Data Integration and
Transformation
Dataeduction
Discreti!ationandConcept
"ierarchy#eneration
9 9 Data $arehouse and %&'P Technology for
8/9/2019 Dwm Course
10/67
Data Mining Data $arehouse,
Multidimensional Data Model
Data $arehouse 'rchitecture,
Data$arehouseImplementation,
Further De(elopment of Data Cu)e
Technology
From Data $arehousing to Data Mining.
+NITIII
Data Mining Primiti(es
&anguages Testing methods
-ystem'rchitectures lac0 )o1 testing
Data Mining Primiti(es
8/9/2019 Dwm Course
11/67
Data Mining *uery &anguages
Designing #raphical +ser Interfaces
ased on a Data Mining.
*uery &anguage 'rchitectures of
Data Mining -ystems
< < +NITIG
Concepts Description.
Characteri!ation and Comparison
Data #enerali!ation and -ummari!ation
ased Characteri!ation
'nalytical Characteri!ation
'nalysis of 'ttri)ute ele(ance
Mining Class Comparisons
Discriminating )et/een Different Classes,
Mining Descripti(e -tatistical Measures in
&arge Data)ases
= = +NITG
Mining 'ssociation ules in &arge
8/9/2019 Dwm Course
12/67
Data)ases: 'ssociation ule Mining,.
Mining-ingleDimensionaloolean
'ssociation ules from Transactional
Data)ases,
$arehouses, From 'ssociation Mining to
Correlation 'nalysis,
Constraintased 'ssociation Mining
'ssociation Mining
B B +NITGI
Classification and Prediction
Issues egarding Classification and
Prediction
Classification )y Decision Tree Induction,
ayesian Classification
Classification )y ac0 propagation,
Classification ased on Concepts from
'ssociationuleMining,
8/9/2019 Dwm Course
13/67
%ther Classification Methods, Prediction
Classifier 'ccuracy.
H H +NITGII
Cluster'nalysisIntroduction.
Types of Data in Cluster 'nalysis
' Categori!ation of Major Clustering
Methods
Partitioning Methods
#ridased Methods
Modelased Clustering Methods,
Densityased Methods,
%utlier 'nalysis
+NITGIII
Mining Comple1 Types of Data
Multimensional 'nalysis and Descripti(e
8/9/2019 Dwm Course
14/67
Mining of Comple1
Data %)jects
Mining-patialData)ases
Mining Multimedia Data)ases
Mining Time-eries and -e2uence
Data, Mining Te1t Data)ases
Mining the $orld $ide $e)
I)?) Sor1e$ o2 I+2orm!"o+
I)?)8) Te@ 5oo
8/9/2019 Dwm Course
15/67
. Data $arehousing in the eal $orld 4 -'M 'N'"%; 7 D6NNI-
M+';.Pearson6dn'sia.
< Data $arehousing Fundamentals 4 P'+&'5 P%NN'I'" $I&6; -T+D6NT 6DITI%N.
=. The Data $arehouse &ife cy Tool 0it 4 '&P" 8IM'&& $I&6; -T+D6NT 6DITI%N.
.
8)?)>) 3e5$"e$:- H&:''((()*+)!1)"+'
I)?)?) or+!#$:-
8)) U+" ("$e Smm!r7
8/9/2019 Dwm Course
16/67
-
.no
+
NIT N%
Topic 'dditional
Topics
3 3 Introduction: Fundamentals of data mining, Data
Mining Functionalities
Classification of Data Mining systems,
MajorissuesinDataMining.
Data Preprocessing: Needs Preprocessing the
Data, Data Cleaning,
Data Integration and Transformation, Data
eduction,
*TP
Discreti!ation and
Concept "ierarchy #eneration
9 9 Data $arehouse and %&'P Technology for
DataMiningData$arehouse,
Data $arehouse 'rchitecture
Data$arehouseImplementation
8/9/2019 Dwm Course
17/67
Further De(elopment of Data Cu)e
Technology
From Data $arehousing to Data Mining.
Multidimensional Data Model
DataMiningPrimiti(es,
Data Mining Primiti(es, Data Mining *uery
&anguages,
Designing #raphical +ser Interfaces
ased on a Data Mining *uery &anguage.
'rchitectures of Data Mining -ystems
and -ystem 'rchitectures
&anguages
< < Concepts Description : Characteri!ation and
Comparison:
8/9/2019 Dwm Course
18/67
Data #enerali!ation and -ummari!ation &arge
Data)ases
asedCharacteri!ation,'nalytical
Characteri!ation:
-il0 Testing
'nalysis of 'ttri)ute ele(ance, Mining Class
Comparisons
Discriminating )et/een Different Classes,
Mining Descripti(e -tatistical Measures in
= = Mining 'ssociation ules in &arge Data)ases :
'ssociation ule Mining,
Mining -ingleDimensional oolean 'ssociation
ules from Transactional Data)ases,
Mining Multile(el 'ssociation ules from
Transaction Data)ases
Mining Multidimensional 'ssociation ules from
elational Data)ases and Data $arehouses,
From 'ssociation Mining to Correlation 'nalysis
Constraintased'ssociationMining.
8/9/2019 Dwm Course
19/67
B B Classification and Prediction : Issues egarding
Classification and Prediction,
Classification )y Decision Tree Induction,
ayesian Classification
Classification )y ac0 propagation, 8GC"'T
'PP&IC'TI%N
Classification ased on Concepts from
'ssociation ule Mining
%ther Classification Methods,
Prediction,Classifier'ccuracy.
H H Cluster 'nalysis Introduction : Types of
DatainCluster'nalysis,
' Categori!ation of Major Clustering Methods,
Partitioning Methods,
Densityased Methods, 'utomationTechni2ues
#ridased Methods, Modelased Clustering
Methods, %utlier 'nalysis.
8/9/2019 Dwm Course
20/67
Mining Comple1 Types of Data :
Multimensional 'nalysis and Descripti(e Mining
of Comple1,
Data %)jects
Mining Time-eries and -e2uence Data, Mining
Te1t Data)ases
Mining the $orld $ide $e).
Mining -patial Data)ases,
Mining Multimedia Data)ases
,,
'gel model
8)) M"1ro P#!+
-
.&
n
+nit
No
Total no of
Periods
Topics to )e co(ered eg>'dditi
onal
Teac
hing aids
used
emar
8/9/2019 Dwm Course
21/67
o &CD>
%"P>
3 3 3 Introduction : Fundamentals of data mining, egular %"P,
9 Data Mining Functionalities egular %"P,
Classification of Data Mining systems egular %"P,
< MajorissuesinDataMining. egular %"P,
= DataPreprocessing:NeedsPreprocessing the
Data
egular
DataCleaning,DataIntegrationandTransformat
ion
9 9 B Data$arehouseand egular %"P,
H %&'P Technology for Data Mining Data
$arehouse,
egular
MultidimensionalDataModel. egular %"P,
8/9/2019 Dwm Course
22/67
K Data $arehouse 'rchitecture, egular
3L Data$arehouseImplementation, egular
33 Further De(elopment of Data Cu)e
Technology,
egular %"P,
From Data $arehousing to Data Mining egular
39 DataMiningPrimiti(es egular %"P,
3 Data Mining Primiti(es egular
3< Data Mining *uery &anguages, egular
3= Designing #raphical +ser Interfaces ased
on a Data Mining *uery
egular %"P,
3B &anguage 'rchitectures of Data Mining
-ystems.
egular
&anguages,and-ystem'rchitectures egular %"P,
< < 3H Concepts Description egular
3 Characteri!ation and Comparison egular
8/9/2019 Dwm Course
23/67
3K Data#enerali!ationand -ummari!ationased
Characteri!ation
egular
9L 'nalytical Characteri!ation: 'nalysis of
'ttri)ute ele(ance
egular
93 MiningClassComparisons:Discriminating
)et/een Different Classes
egular
99 Mining Descripti(e -tatistical Measures in
&arge Data)ases.
= = 9 Mining 'ssociation ules in &arge Data)ases egular
9< 'ssociation ule Mining egular
9= Mining -ingleDimensional oolean
'ssociation ules from Transactional
Data)ases
egular
9B Mining Multile(el 'ssociation ules from
Transaction Data)ases
egular
9H Mining Multidimensional 'ssociation ules
from elational Data)ases and Data
$arehouses
egular
From 'ssociation Mining to Correlation
8/9/2019 Dwm Course
24/67
'nalysis, Constraintased 'ssociation
Mining.
B B 9 ClassificationandPrediction egular %"P,
9K Issues egarding Classification and
Prediction
egular
L Classification )y Decision Tree Induction egular
3 ayesian Classification egular %"P
9 Classification )y ac0 propagation,
Classification ased on Concepts from
'ssociation ule Mining
egular %"P,
%therClassificationMethods, Prediction,
Classifier 'ccuracy.
egular %"P,
H H Cluster'nalysisIntroduction egular
< Types of Data in Cluster 'nalysis egular
= ' Categori!ation of Major Clustering egular %"P,
8/9/2019 Dwm Course
25/67
Methods
B Partitioning Methods egular
H Mining Comple1 Types of Data egular %"P,
Multimensional 'nalysis and Descripti(e
Mining of Comple1
egular
K Data %)jects, Mining -patial Data)ases egular &CD,
%"P,
8/9/2019 Dwm Course
26/67
3.ppts
9.ohp slides
. su)jecti(e type 2uestionsappro1imately = tL in no
8/9/2019 Dwm Course
27/67
UNIT-I
DEFINITIONS:
DATAMINING:Data mining refers to e1tracting or mining 0no/ledge fromlarge amounts of data)
DATAMINING FUNTIONALITIES:Characteri!ation and discrimination,
Mining Fre2uent Patterns, 'ssociations, and Correlations ,'ssociation 'nalysis,
Classification and Prediction ,Cluster analysis, %utlier analysis, Trend ande(olution analysis
CLASSIFICATION OF DATAMINING SYSTEMS:
Ge+er!# 2+1"o+!#"7
Descripti(e data mining
Predicti(e data mining
D!! m"+"+6 %!r"o$ 1r"er"!$:
8inds of data)ases to )e mined
8inds of 0no/ledge to )e disco(ered
8inds of techni2ues utili!ed
8inds of applications adapted
D!!5!$e$ o 5e m"+ed
elational, transactional, o)jectoriented, o)jectrelational, acti(e, spatial, time
series, te1t, multimedia, heterogeneous, legacy, $$$, etc.
K+o(#ed6e o 5e m"+ed
8/9/2019 Dwm Course
28/67
Characteri!ation, discrimination, association, classification, clustering, trend,
de(iation and outlier analysis, etc.
Multiple>integrated functions and mining at multiple le(els
analysis, $e) mining, $e)log analysis, etc.
Te14+"e$ "#";ed
Data)aseoriented, data /arehouse %&'P, machine learning, statistics,
(isuali!ation, neural net/or0, etc.
A&"1!"o+$ !d!&ed
etail, telecommunication, )an0ing, fraud analysis, DN' mining, stoc0 mar0et
MAOR ISSUES IN DATAMINING
Mining methodology and user interaction issues
Performance issues
Issues relating to the di(ersity of data types
DATA PREPROSESSING
integrating multiple, heterogeneous data sources
DATA CLEANSING
6nsure consistency in naming con(entions, encoding structures, attri)ute measures,
etc. among different data sources
IT-
3. Re6re$$"o+is the oldest and most /ell0no/n statistical techni2ue that the data mining
community utili!es9. D!! m"+"+6is the use of automated data analysis techni2ues to unco(er pre(iously
undetected relationships among data items
. Three of the major data mining techni2ues are re6re$$"o+, 1#!$$"2"1!"o+ !+d 1#$er"+6)
8/9/2019 Dwm Course
29/67
8/9/2019 Dwm Course
30/67
H.61plain Preprocess techni2ues@
UNIT-II
DATA3AREHOUSING
' decision support data)ase that is maintained separately from the organi!ationAs
operational data)ase
' data /arehouse is a su)jectoriented, integrated, time(ariant, and non(olatile
collection of data in support of managementAs decisionma0ing process.
DEFINITIONS:
OLAP/o+-#"+e !+!#7"1!# &ro1e$$"+60
8/9/2019 Dwm Course
31/67
Major tas0 of data /arehouse system
Data analysis and decision ma0ing
MULTIDIMENTIONAL DATAMODEL
-tar schema
-no/fla0e schema
Fact constellations
CUBE DEFINITION /F!1 T!5#e0
define cu)e Ocu)e nameQ ROdimension listQS: Omeasure listQ
DATA3AREHOUSE APPLICATIONS
supports 2uerying, )asic statistical analysis, and reporting using crossta)s, ta)les,
charts and graphs
multidimensional analysis of data /arehouse data
supports )asic %&'P operations, slicedice, drilling, pi(oting
M!*or T!$
8/9/2019 Dwm Course
32/67
com)ines data from multiple sources into a coherent store
Red+d!+ d!! o11r o2e+ (4e+ "+e6r!"o+ o2 m#"e d!!5!$e$
The same attri)ute may ha(e different names in different data)ases
%ne attri)ute may )e a deri(ed attri)ute in another ta)le, e.g., annual re(enue
edundant data may )e a)le to )e detected )y correlation analysis
Careful integration of the data from multiple sources may help reduce>a(oid
redundancies and inconsistencies and impro(e mining speed and 2uality
D!! red1"o+ $r!e6"e$
Data cu)e aggregation
'ttri)ute su)set selection
Dimensionality reduction
Numerosity reduction
Discreti!ation and concept hierarchy generation
34! "$ T4e #o(e$ #e%e# o2 ! d!! 15e
the aggregated data for an indi(idual entity of interest
e.g., a customer in a phone calling data /arehouse.
P!r!mer"1 me4od$
'ssume the data fits some model, estimate model parameters, store only the
parameters, and discard the data e1cept possi)le outliers
&oglinear models: o)tain (alue at a point in mD space as the product on
appropriate marginal su)spaces
8/9/2019 Dwm Course
33/67
No+-&!r!mer"1 me4od$
Do not assume models
Major families: histograms, clustering, sampling
D"$1re";!"o+
reduce the num)er of (alues for a gi(en continuous attri)ute )y di(iding the range
of the attri)ute into inter(als. Inter(al la)els can then )e used to replace actual data
(alues.
Co+1e& 4"er!r14"e$
reduce the data )y collecting and replacing lo/ le(el concepts such as numeric
(alues for the attri)ute age )y higher le(el concepts such as young, middleaged,or senior.
IT-
3. A D!! 3!re4o$e Is ' -tructured epository of "istoric Data)
9) A data warehouse integrates data from multiple data sources
>) A data warehouseis a copy of transaction data specifically structured for query and analysis.
?) OLAP stands for On-Line Analytical Processing
) OLAP can be braodly divided into two different ways that is:MOLAP and ROLAP
) ' data /arehouse maintains its functions in three layers $!6"+6, "+e6r!"o+, !+d !11e$$) The data accessed for reporting and analy!ing and the tools for reporting and analy!ing
data is is is also called thed!! m!r)
. D!! !11e$$layer is the interface )et/een the operational and informational access layer) the data /arehousing concept /as intended to pro(ide an architectural model for the flo/
of data from operational systems tode1"$"o+ $&&or e+%"ro+me+$8) The integrationlayer is used to integrate data and to ha(e a le(el of a)straction from
users
E!$7 e$"o+$
3. 61plain Preprocessing procedure@
9. 61plain data Transformation@
. 61plain data Integration@
8/9/2019 Dwm Course
34/67
UNIT-III
DEFINITIONS
DATAMINING PRIMITIVES
More fle1i)le user interaction
Foundation for design of graphical user interface
-tandardi!ation of data mining industry and practice
DATAMINING UERY LANGUAGES
' DM*& can pro(ide the a)ility to support adhoc and interacti(e data mining
y pro(iding a standardi!ed language li0e -*&
to achie(e a similar effect li0e that -*& has on relational data)ase
Foundation for system de(elopment and e(olution
Facilitate information e1change, technology transfer, commerciali!ation and
/ide acceptance
34! !$
8/9/2019 Dwm Course
35/67
Data collection and data mining 2uery composition
Presentation of disco(ered patterns
"ierarchy specification and manipulation
Manipulation of data mining primiti(es
Interacti(e multile(el mining
%ther miscellaneous information
34! De2"+e$ ! D!! M"+"+6 T!$< =
Tas0rele(ant data
Type of 0no/ledge to )e mined
ac0ground 0no/ledge
Pattern interestingness measurements
Gisuali!ation of disco(ered patternsT!$
8/9/2019 Dwm Course
36/67
8/9/2019 Dwm Course
37/67
Mine8no/ledge-pecification ::
mine associations Ras patternnameS
34! !$
8/9/2019 Dwm Course
38/67
Mine8no/ledge-pecification ::J
mine comparison Ras patternnameS
for targetclass /here targetcondition
V(ersus contrastclassi /here contrastconditioniW
analy!e measures
34! "$ 4e S7+!@ 2or !$D$ systems, efficient
implementation of a fe/ DM primiti(es.
E!$7 e$"o+$
3.61plain Data Mining Primiti(es@
8/9/2019 Dwm Course
39/67
9.61plain ' data mining 2uery language@
.
'rchitecture of data mining systems@
8/9/2019 Dwm Course
40/67
Gisuali!ation techni2ues:
Pie charts, )ar charts, cur(es, cu)es, and other (isual forms.
!+"!"%e 14!r!1er"$"1 r#e$
Mapping generali!ed result into characteristic rules /ith 2uantitati(e information
associated /ith it
De1"$"o+ ree
each internal node tests an attri)ute
each )ranch corresponds to attri)ute (alue
each leaf node assigns a classification
ID> !#6or"4m
)uild decision tree )ased on training o)jects /ith 0no/n class la)els to classify
testing o)jects
ran0 attri)utes /ith information gain measure
minimal height
the least num)er of tests to classify an o)ject
De1"$"o+ ree
each internal node tests an attri)ute
each )ranch corresponds to attri)ute (alue
each leaf node assigns a classification
ID> !#6or"4m
8/9/2019 Dwm Course
41/67
)uild decision tree )ased on training o)jects /ith 0no/n class la)els to classify
testing o)jects
ran0 attri)utes /ith information gain measure
minimal height
the least num)er of tests to classify an o)ject
D!! d"$&er$"o+ 14!r!1er"$"1$
median, ma1, min, 2uantiles, outliers, (ariance, etc.
Nmer"1!# d"me+$"o+$ -1orre$&o+d o $ored "+er%!#$
Data dispersion: analy!ed /ith multiple granularities of precision
o1plot or 2uantile analysis on sorted inter(als
D"$&er$"o+ !+!#7$"$ o+ 1om&ed me!$re$
Folding measures into numerical dimensions
o1plot or 2uantile analysis on the transformed cu)e
!r"#e$, o#"er$ !+d 5o@o$
*uartiles: *39=thpercentile, *H=
thpercentile
Inter2uartile range: I* *4*3
Fi(e num)er summary: min, *3, M,*, ma1
o1plot: ends of the )o1 are the 2uartiles, median is mar0ed, /his0ers, and plot
outlier indi(idually
%utlier: usually, a (alue higher>lo/er than 3.= 1 I*
F"%e-+m5er $mm!r7 o2 ! d"$r"5"o+:
Minimum, *3, M, *, Ma1imum
Bo@o
8/9/2019 Dwm Course
42/67
Data is represented /ith a )o1
The ends of the )o1 are at the first and third 2uartiles, i.e., the height of the )o1 is
I*
The median is mar0ed )y a line /ithin the )o1
$his0ers: t/o lines outside the )o1 e1tend to Minimum and Ma1imum
S!+d!rd de%"!"o+:
the s2uare root of the (ariance
Measures spread a)out the mean
It is !ero if and only if all the (alues are e2ual
oth the de(iation and the (ariance are alge)raic
D"22ere+1e "+ &4"#o$o&4"e$ !+d 5!$"1 !$$m&"o+$
Positi(e and negati(e samples in learningfrome1ample: positi(e used for
generali!ation, negati(e for speciali!ation
Positi(e samples only in data mining: hence generali!ation)ased, to drilldo/n
)ac0trac0 the generali!ation to a pre(ious state
D"22ere+1e "+ me4od$ o2 6e+er!#";!"o+$
Machine learning generali!es on a tuple )y tuple )asis
Data mining generali!es on an attri)ute )y attri)ute )asis
IT-
8) C4!r!1er";!"o+ of the composition of the postsynaptic proteome P-P pro(ides a
frame/or0 for understanding the o(erall organi!ation and function9) Clustering using representati(es called CURE
>) The D!! M"+"+6 Ser%ermust )e integrated /ith the data /arehouse and the %&'Pser(er to em)ed %Ifocused )usiness analysis directly into this infrastructure
?) ' de1"$"o+ ree techni2ue used for classification of a dataset
) 1#!$$"2"1!"o+ The process of di(iding a dataset into mutually e1clusi(e groups
8/9/2019 Dwm Course
43/67
) d!! 1#e!+$"+6is The process of ensuring that all (alues in a dataset are consistent andcorrectly recorded.
) d!! (!re4o$e is a system for storing and deli(ering massi(e 2uantities of data.
) !+!#7"1!# mode#is a structure and process for analy!ing a dataset
) d!! +!%"6!"o+The process of (ie/ing different dimensions, slices, and le(els of detailof a multidimensional data)ase.
8) #o6"$"1 re6re$$"o+a linear regression that predicts the proportions of a categorical target(aria)le, such as type of customer, in a population.
6asy *uestions
3. 61plain $hat is concept description@
9. Data generali!ation and summari!ation)ased characteri!ation@
. 'nalytical characteri!ation: 'nalysis of attri)ute rele(ance@
8/9/2019 Dwm Course
44/67
UNIT-V
A$$o1"!"o+ r#e m"+"+6
Finding fre2uent patterns, associations, correlations, or causal structures
among sets of items or o)jects in transaction data)ases, relational
data)ases, and other information repositories.
B!$"1 Co+1e&$ o2 A$$o1"!"o+ R#e
#i(en a data)ase of transactions each transaction is a list of items purchased )y a
customer in a (isit
Find all rules that correlate the presence of one set of items /ith that of another set
of items
Find fre2uent patterns
61ample for fre2uent itemset mining is mar0et )as0et analysis.
A$$o1"!"o+ r#e &er2orm!+1e me!$re$
Confidence
-upport
Minimum support threshold
Minimum confidence threshold
M!r
8/9/2019 Dwm Course
45/67
Identify patterns from oolean (ector
Patterns can )e represented )y association rules.
A&r"or" A#6or"4m
-ingle dimensional, singlele(el, oolean fre2uent item sets
Finding fre2uent item sets using candidate generation
#enerating association rules from fre2uent item sets
S"+6#e-d"me+$"o+!# r#e$
)uysJ, mil0 )uysJ, )read
M#"-d"me+$"o+!# r#e$
Interdimension association rules no repeated predicates
ageJ,3K9= occupationJ,student )uysJ,co0e
hy)riddimension association rules repeated predicates
ageJ,3K9= )uysJ, popcorn )uysJ, co0e
C!e6or"1!# Ar"5e$
finite num)er of possi)le (alues, no ordering among (alues
!+"!"%e Ar"5e$
numeric, implicit ordering among (alues
S!"1 D"$1re";!"o+ o2 !+"!"%e Ar"5e$
Discreti!ed prior to mining using concept hierarchy.
Numeric (alues are replaced )y ranges.
8/9/2019 Dwm Course
46/67
In relational data)ase, finding all fre2uent 0predicate sets /ill re2uire k
or kX3 ta)le scans.
Data cu)e is /ell suited for mining.
The cells of an ndimensional cu)oid correspond to the predicate sets.
Mining from data cu)escan )e much faster.
O5*e1"%e me!$re$
T/o popular measurements
support
confidence
S5*e1"%e me!$re$
' rule pattern is interesting if
Yit is unexpectedsurprising to the user and>or
*actionablethe user can do something /ith it
le(el constraints
ule constraint
Interestingness constraints
A 1o+$r!"+ C!"$ !+"-mo+oo+e "22) 2or !+7 &!er+ S +o $!"$27"+6
C!, +o+e o2 4e $&er-&!er+$ o2 S 1!+ $!"$27 C!
8/9/2019 Dwm Course
47/67
A 1o+$r!"+ Cm"$ mo+oo+e "22) 2or !+7 &!er+ S $!"$27"+6 Cm,
e%er7 $&er-&!er+ o2 S !#$o $!"$2"e$ "
S11"+1+e$$ Pro&er7 o2 Co+$r!"+$
For any set -3 and -9 satisfying C, -3 -9 satisfies C
#i(en '3 is the sets of si!e 3 satisfying C, then any set - satisfying C are )ased on
'3 , i.e., it contains a su)set )elongs to '3 ,
61ample :
sum(S.Price ) v is not succinct
min(S.Price ) v is succinct
IT-
3. A+ !$$o1"!"o+rule is a pattern that states /henXoccurs, occurs /ith certain
pro)a)ility9. Go!#Find all rules that satisfy the userspecified minimum support minsup and
minimum confidenceminconf.. T!5#e d!!need to )e con(erted to transaction form for association mining.
8/9/2019 Dwm Course
48/67
6asy *uestions
3.e1plain
'ssociation rule mining@
9.
Mining singledimensional oolean association rules from transactional
data)ases@
.61plain Mining multile(el association rules from transactional data)ases@
8/9/2019 Dwm Course
49/67
UNIT-VI
C#!$$"2"1!"o+:predicts categorical class la)els
classifies data constructs a model )ased on the training set and the (alues class
la)els in a classifying attri)ute and uses it in classifying ne/ data
Pred"1"o+:
models continuous(alued functions
predicts un0no/n or missing (alues
S&er%"$ed #e!r+"+6 /1#!$$"2"1!"o+0
-uper(ision: The training data o)ser(ations, measurements, etc. are accompanied
)y la)els indicating the class of the o)ser(ations
Ne/ data is classified )ased on the training set
U+$&er%"$ed #e!r+"+6 /1#$er"+60
The class la)els of training data is un0no/n
#i(en a set of measurements, o)ser(ations, etc. /ith the aim of esta)lishing the
e1istence of classes or clusters in the data
I$$e$ re6!rd"+6 1#!$$"2"1!"o+ !+d &red"1"o+ Com&!r"+6 C#!$$"2"1!"o+
Me4od$
'ccuracy
-peed and scala)ility
o)ustness
-cala)ility
Interpreta)ility:
8/9/2019 Dwm Course
50/67
Interpreta)ility
De1"$"o+ ree
' flo/chartli0e tree structure
Internal node denotes a test on an attri)ute
ranch represents an outcome of the test
&eaf nodes represent class la)els or class distri)ution
De1"$"o+ ree 6e+er!"o+ 1o+$"$$ o2 (o &4!$e$
Tree construction
't start, all the training e1amples are at the root
Partition e1amples recursi(ely )ased on selected attri)utes
Tree pruning
Identify and remo(e )ranches that reflect noise or outliers
U$e o2 de1"$"o+ ree: C#!$$"27"+6 !+ +
8/9/2019 Dwm Course
51/67
E@r!1"+6 C#!$$"2"1!"o+ R#e$ 2rom Tree$
epresent the 0no/ledge in the form of IFT"6N rules
%ne rule is created for each path from the root to a leaf
6ach attri)ute(alue pair along a path forms a conjunction
The leaf node holds the class prediction
ules are easier for humans to understand
T(o !&&ro!14e$ o !%o"d o%er2""+6
Pre&r+"+6:"alt tree construction earlyUdo not split a node if this /ould result
in the goodness measure falling )elo/ a threshold
Difficult to choose an appropriate threshold
Po$&r+"+6: emo(e )ranches from a fully gro/n treeUget a se2uence of
progressi(ely pruned trees
+se a set of data different from the training data to decide /hich is the )est
pruned tree
A&&ro!14e$ o Deerm"+e 4e F"+!# Tree S";e
-eparate training and testing sets
+se cross (alidation, 3Lfold cross (alidation
+se all the data for training
+se minimum description length MD& principle
E+4!+1eme+$ o 5!$"1 de1"$"o+ ree "+d1"o+
'llo/ for continuous(alued attri)utes
"andle missing attri)ute (alues
'ttri)ute construction
8/9/2019 Dwm Course
52/67
C#!$$"2"1!"o+Ua classical pro)lem e1tensi(ely studied )y statisticians and
machine learning researchers
S1!#!5"#"7:Classifying data sets /ith millions of e1amples and hundreds of
attri)utes /ith reasona)le speed
347 de1"$"o+ ree "+d1"o+ "+ d!! m"+"+6=
relati(ely faster learning speed than other classification methods
con(erti)le to simple and easy to understand classification rules
can use -*& 2ueries for accessing data)ases
compara)le classification accuracy /ith other methods
B!7e$"!+ C#!$$"2"1!"o+
-tatical classifiers
ased on ayeAs theorem
NaZ(e ayesian classification
Class conditional independence
B!7e$"!+ 5e#"e2 +e(o
8/9/2019 Dwm Course
53/67
E@r!1"+6 r#e$ 2rom ! r!"+ed +e(or
8/9/2019 Dwm Course
54/67
Major method for prediction is regression
&inear and multiple regressions
Nonlinear regression
L"+e!r re6re$$"o+:; X J
T/o parameters, and specify the line and are to )e estimated )y using the data
at hand.
+sing the least s2uares criterion to the 0no/n (alues of ;3, ;9[ J3, J9, [.
M#"e re6re$$"o+:; )L X )3 J3 X )9 J9.
Many nonlinear functions can )e transformed into the a)o(e.
Lo6-#"+e!r mode#$:
The multi/ay ta)le of joint pro)a)ilities is appro1imated )y a product of lo/er
order ta)les.
Pro5!5"#"7:p(a! b! c! d) " ab acad bcd
3. Mode# 1o+$r1"o+descri)ing a set of predetermined classes.
9. S1!#!5"#"7 Classifying data sets /ith millions of e1amples and hundreds of attri)utes
/ith reasona)le speed. C#!$$"2"1!"o+predicts categorical class la)els.
8/9/2019 Dwm Course
55/67
9. 61plain IssuesA regarding classification and prediction@
. 61plainClassification )y decision tree induction@
8/9/2019 Dwm Course
56/67
8/9/2019 Dwm Course
57/67
Inter(alscaled (aria)les
inary (aria)les
Categorical, %rdinal, and atio -caled (aria)les
Garia)les of mi1ed types
M!*or C#$er"+6 A&&ro!14e$
Partitioning algorithms
"ierarchy algorithms
Density)ased
#rid)ased
Model ased
%utlier 'nalysis
CLARA/C#$er"+6 L!r6e A&"1!"o+$0 /80
#$%&%8aufmann and ousseeu/ in 3KKL
uilt in statistical analysis pac0ages, such as -XIt dra/s multiple samplesof the data set, appliesP%'on each sample, and gi(es
the )est clustering as the output
-trength: deals /ith larger data sets thanP%'
$ea0ness:
6fficiency depends on the sample si!e
' good clustering )ased on samples /ill not necessarily represent a goodclustering of the /hole data set if the sample is )iased
B"r14:alanced Iterati(e educing and Clustering using "ierarchies, )y \hang,
ama0rishna, &i(ny -I#M%DAKB
8/9/2019 Dwm Course
58/67
CHAMELEON: hierarchical clustering using dynamic modeling, )y #. 8arypis,
6.". "an and G. 8umarAKK
DBSCAN A#6or"4m
'r)itrary select a pointp
etrie(e all points densityreacha)le fromp/rtpsand'inPts.
Ifpis a core point, a cluster is formed.
Ifpis a )order point, no points are densityreacha)le frompand D-C'N (isits
the ne1t point of the data)ase.
Continue the process until all of the points ha(e )een processed.
L"m"!"o+$ o2 COB3EB
The assumption that the attri)utes are independent of each other is often too strong
)ecause correlation may e1ist
Not suita)le for clustering large data)ase data 4 s0e/ed tree and e1pensi(e
pro)a)ility distri)utions
Ner!# +e(or< !&&ro!14e$
epresent each cluster as an e1emplar, acting as a prototype of the cluster
Ne/ o)jects are distri)uted to the cluster /hose e1emplar is the most similar
according to some do stance measure
O#"er$
The set of o)jects are considera)ly dissimilar from the remainder of the data
61ample: -ports: Michael 5ordon, $ayne #ret!0y, ...
D"$!+1e-5!$ed o#"er:' D p, Doutlier is an o)ject % in a dataset T such that
at least a fraction p of the o)jects in T lies at a distance greater than D from %
A#6or"4m$ 2or m"+"+6 d"$!+1e-5!$ed o#"er$
Inde1)ased algorithm
8/9/2019 Dwm Course
59/67
Nestedloop algorithm
Cell)ased algorithm
See+"!# e@1e&"o+ e14+"e
-imulates the /ay in /hich humans can distinguish unusual o)jects from among a
series of supposedly li0e o)jects
OLAP d!! 15e e14+"e
+ses data cu)es to identify regions of anomalies in large multidimensional data
IT-
8) 1#$er"+6is the assignment of a set of o)ser(ations into su)sets.9) S5$&!1e 1#$er"+6methods loo0 for clusters that can only )e seen in a particular
projection of the data.>) Many clustering algorithms re2uire the $&e1"2"1!"o+ o2 4e +m5er o2 1#$er$to
produce in the input data set, prior to e1ecution of the algorithm.?) D"$!+1e me!$re/hich /ill determine ho/ thesimilarityof t/o elements is calculated.
) "ierarchical clustering creates a 4"er!r147of clusters /hich may )e represented in a ree
$r1recalled a dendrogram.
) T 1#$er"+6 "s an alternati(e method of partitioning data, in(ented for gene clustering.
H. *T clustering *T stands for !#"7 T4re$4o#d.) Form!# 1o+1e& !+!#7$"$is a techni2ue for generating clusterscalled formal concepts
of o)jects and attri)utes.) 6(aluation of clustering is sometimes referred to as C#$er %!#"d!"o+.
Several dierent clustering systems based on mutua5 %&fomat%o&have been
proposed.
E!$7 e$"o+
3.
hat is Cluster !nalysis"
9. 61plain Types of Data in Cluster 'nalysis@
. 61plain ' Categori!ation of Major Clustering Methods@
8/9/2019 Dwm Course
60/67
H. 61plain #ridased Methods@
. 61plain Modelased Clustering Methods@
K. 61plain %utlier 'nalysis@
UNIT-VIII
Se-%!#ed !r"5e
#enerali!ation of each (alue in the set into its corresponding higherle(el concepts
Deri(ation of the general )eha(ior of the set, such as the num)er of elements in the
set, the types or (alue ranges in the set, or the /eighted a(erage for numerical data
hobby Vtennis! hockey! chess! violin! nintendogamesW generali!es to Vsports!music! videogamesW
L"$-%!#ed or ! $ee+1e-%!#ed !r"5e
-ame as set(alued attri)utes e1cept that the order of the elements in the se2uence
should )e o)ser(ed in the generali!ation
8/9/2019 Dwm Course
61/67
S&!"!# d!!:
#enerali!e detailed geographic points into clustered regions, such as )usiness,
residential, industrial, or agricultural areas, according to land usage
e2uire the merge of a set of geographic areas )y spatial operations
Im!6e d!!:
61tracted )y aggregation and>or appro1imation
-i!e, color, shape, te1ture, orientation, and relati(e positions and structures of the
contained o)jects or regions in the image
M$"1 d!!:
-ummari!e its melody: )ased on the appro1imate patterns that repeatedly occur in
the segment
Smm!r";ed "$ $7#e: )ased on its tone, tempo, or the major musical instruments
played
O5*e1 "de+"2"er:generali!e to the lo/est le(el of class in the class>su)class
hierarchies
C#!$$ 1om&o$""o+ 4"er!r14"e$
generali!e nested structured data
generali!e only o)jects closely related in semantics to the current one
P#!+:a (aria)le se2uence of actions
6.g., Tra(el flight: Otra(eler, departure, arri(al, dtime, atime, airline, price,
seatQ
P#!+ m"+"+6:e1traction of important or significant generali!ed se2uential
patterns from a plan)ase a large collection of plans
6.g., Disco(er tra(el patterns in an air flight data)ase, or
find significant patterns from the se2uences of actions in the repair of automo)iles
8/9/2019 Dwm Course
62/67
S&!"!# d!! (!re4o$e: Integrated, su)jectoriented, time(ariant, and
non(olatile spatial data repository for data analysis and decision ma0ing
S&!"!# d!! "+e6r!"o+:a )ig issue
-tructurespecific formats raster (s. (ector)ased, %% (s. relational models,
different storage and inde1ing
Gendorspecific formats 6-I, MapInfo, Integraph
S&!"!# d!! 15e:multidimensional spatial data)ase
oth dimensions and measures may contain spatial components
S&!"!# !$$o1"!"o+ r#e:%Rs], c]S
' and are sets of spatial or nonspatial predicates
To&o#o6"1!# re#!"o+$:intersects! overlaps! dis+oint! etc.
S&!"!# or"e+!"o+$:leftof! ,estof! under!etc.
D"$!+1e "+2orm!"o+:closeto! ,ithindistance! etc.
H"er!r147 o2 $&!"!# re#!"o+$4"&:
gcloseto: nearby,touch,intersect, contain, etc.
First search for rough relationship and then refine it
S&!"!# 1#!$$"2"1!"o+
'naly!e spatial o)jects to deri(e classification schemes, such as decision trees in
rele(ance to certain spatial properties district, high/ay, ri(er, etc.
61ample: Classify regions in a pro(ince into rich(s.pooraccording to the a(erage
family income
De$1r"&"o+-5!$ed rer"e%!# $7$em$
uild indices and perform o)ject retrie(al )ased on image descriptions, such as
0ey/ords, captions, si!e, and time of creation
8/9/2019 Dwm Course
63/67
&a)orintensi(e if performed manually
esults are typically of poor 2uality if automated
Co+e+-5!$ed rer"e%!# $7$em$
-upport retrie(al )ased on the image content, such as color histogram, te1ture,
shape, o)jects, and /a(elet transforms
Im!6e $!me-5!$ed er"e$:
Find all of the images that are similar to the gi(en image sample
Compare the feature (ector signature e1tracted from the sample /ith the feature
(ectors of images that ha(e already )een e1tracted and inde1ed in the image
data)ase
Im!6e 2e!re $&e1"2"1!"o+ er"e$:
-pecify or s0etch image features li0e color, te1ture, or shape, /hich are translated
into a feature (ector
Match the feature (ector /ith the feature (ectors of the images in the data)ase
T"me-$er"e$ d!!5!$e
Consists of se2uences of (alues or e(ents changing /ith time
Data is recorded at regular inter(als
Characteristic timeseries components
Trend, cycle, seasonal, irregular
E$"m!"o+ o2 171#"1 %!r"!"o+$
If appro1imate periodicity of cycles occurs, cyclic inde1 can )e constructed inmuch the same manner as seasonal inde1es
E$"m!"o+ o2 "rre6#!r %!r"!"o+$
y adjusting the data for trend, seasonal and cyclic (ariations
8/9/2019 Dwm Course
64/67
Se&$ 2or &er2orm"+6 ! $"m"#!r"7 $e!r14
Aom"1 m!14"+6
Find all pairs of gapfree /indo/s of a small length that are similar
3"+do( $"14"+6
-titch similar /indo/s to form pairs of large similar su)se2uences allo/ing gaps
)et/een atomic matches
S5$ee+1e Order"+6
&inearly order the su)se2uence matches to determine /hether enough similar
pieces e1ist
Pro5#em$ ("4 4e 3e5 #"+
Not e(ery hyperlin0 represents an endorsement
%ther purposes are for na(igation or for paid ad(ertisements
If the majority of hyperlin0s are for endorsement, the collecti(e opinion /ill still
dominate
%ne authority /ill seldom ha(e its $e) page point to its ri(al authorities in the
same field
'uthoritati(e pages are seldom particularly descripti(e
H5
-et of $e) pages that pro(ides collections of lin0s to authorities
HITS /H7&er#"+
8/9/2019 Dwm Course
65/67
De$"6+ o2 ! 3e5 Lo6 M"+er
$e) log is filtered to generate a relational data)ase
' data cu)e is generated form data)ase
%&'P is used to drilldo/n and rollup in the cu)e
%&'M is used for mining interesting 0no/ledge
Be+e2"$ o2 M#"-L!7er Me!-3e5
Multidimensional $e) info summary analysis
'ppro1imate and intelligent 2uery ans/ering
$e) highle(el 2uery ans/ering $e)-*&, $e)M&
$e) content and structure mining
%)ser(ing the dynamics>e(olution of the $e)
IT-
3. A "me Ser"e$ D!!5!$econsists of se2uences of (alues or e(ents o)tained o(er repeated
measurements of time.9. See+"!# P!er+ M"+"+6 is the disco(ery of fre2uently occurring ordered e(ents as
patterns.. D-M- stands forD!! Sre!m M!+!6eme+ S7$em)
8/9/2019 Dwm Course
66/67
3.
#ultidimensional analysis and descriptive mining of comple$
data objects
9. 61plain mining spatial data)ases
. 61plain
#ultidimensional analysis and descriptive mining of
comple$ data objects"
8/9/2019 Dwm Course
67/67