+ All Categories
Home > Documents > data mining slides

data mining slides

Date post: 01-Jun-2018
Category:
Upload: abdul-samad
View: 217 times
Download: 0 times
Share this document with a friend

of 43

Transcript
  • 8/9/2019 data mining slides

    1/43

    Data MiningPractical Machine Learning Tools and Techniques

    Slides for Chapter 3 of Data Miningby I. H. Witten !. "ran# andM. $. Hall

  • 8/9/2019 data mining slides

    2/43

    2Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Output: Knowledge representation

    Tables

    Linear odels

    Trees

    !ules

    Classi"ication rules

    #ssociation rules

    !ules with e$ceptions

    More e$pressi%e rules

    &nstance'based representation

    Clusters

  • 8/9/2019 data mining slides

    3/43

    3Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Output: representing structural patterns

    Man di""erent was o" representing patterns

    Decision trees rules instance'based *

    #lso called +,nowledge- representation

    !epresentation deterines in"erence ethod

    .nderstanding the output is the ,e to understanding theunderling learning ethods

    Di""erent tpes o" output "or di""erent learning probles

    (e/g/ classi"ication regression *)

  • 8/9/2019 data mining slides

    4/434Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Tables

    0iplest wa o" representing output:

    .se the sae "orat as input1

    Decision table "or the weather proble:

    Main proble: selecting the right attributes

  • 8/9/2019 data mining slides

    5/435Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Linear odels

    #nother siple representation

    !egression odel&nputs (attribute %alues) and output are all

    nueric

    Output is the su o" weighted attribute %alues

    The tric, is to "ind good %alues "or the weights

  • 8/9/2019 data mining slides

    6/436Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # linear regression "unction "or the CP.

    per"orance data

    PRP = 37.06 + 2.47CACH

  • 8/9/2019 data mining slides

    7/43

    7Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    inar classi"ication

    Lineseparates the two classes

    Decision boundar ' de"ines where the decision changes"ro one class %alue to the other

    Prediction is ade b plugging in obser%ed %alues o" the

    attributes into the e$pression

    Predict one class i" output and the other class i" output 4

    oundar becoes a high'diensional plane

    (hyperplane) when there are ultiple attributes

    Linear odels "or classi"ication

  • 8/9/2019 data mining slides

    8/43

    8Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    0eparating setosas "ro %ersicolors

    2.0 0.5PETAL-LENGTH 0.8PETAL-WIDTH = 0

  • 8/9/2019 data mining slides

    9/43

  • 8/9/2019 data mining slides

    10/43

    1Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    5oinal and nueric attributes

    5oinal:nuber o" children usuall equal to nuber %alues

    attribute won6t get tested ore than once

    Other possibilit: di%ision into two subsets

    5ueric:test whether %alue is greater or less than constant attribute a get tested se%eral ties

    Other possibilit: three'wa split (or ulti'wa split)

    &nteger: less than, equal to, greater than

    !eal: below, within, above

  • 8/9/2019 data mining slides

    11/43

    11Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Missing %alues

    Does absence o" %alue ha%e soe signi"icance7

    8es +issing- is a separate %alue

    5o +issing- ust be treated in a special wa

    assign instance to ost popular branch

  • 8/9/2019 data mining slides

    12/43

    12Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Trees "or nueric prediction

    Regression: the process o" coputing an e$pression

    that predicts a nueric quantit

    Regression tree: +decision tree- where each lea"

    predicts a nueric quantit

    Predicted %alue is a%erage %alue o" traininginstances that reach the lea"

    Model tree: +regression tree- with linear regressionodels at the lea" nodes

    Linear patches appro$iate continuous "unction

  • 8/9/2019 data mining slides

    13/43

    13Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Linear regression "or the CP. data

    PRP =

    - 56.1

    + 0.049 MYCT

    + 0.015 MMIN

    + 0.006 MMAX

    + 0.630 CACH- 0.270 CHMIN

    + 1.46 CHMAX

  • 8/9/2019 data mining slides

    14/43

    14Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    !egression tree "or the CP. data

  • 8/9/2019 data mining slides

    15/43

    15Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Model tree "or the CP. data

  • 8/9/2019 data mining slides

    16/43

    16Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    Classi"ication rules

    Popular alternati%e to decision trees

    Antecedent(pre'condition): a series o" tests (9ust li,e the

    tests at the nodes o" a decision tree)Tests are usuall logicall #5Ded together (but a also

    be general logical e$pressions)

    Consequent(conclusion): classes set o" classes or

    probabilit distribution assigned b rule

    &ndi%idual rules are o"ten logicall O!ed togetherCon"licts arise i" di""erent conclusions appl

  • 8/9/2019 data mining slides

    17/43

    17Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    ro trees to rules

    ;as: con%erting a tree into a set o" rules

    One rule "or each lea":

    #ntecedent contains a condition "or e%er node on the path "ro the root to thelea"

    Consequent is class assigned b the lea"

    Produces rules that are unabiguousDoesn6t atter in which order the are e$ecuted

    ut: resulting rules are unnecessaril cople$

    Pruning to reo%e redundant tests

  • 8/9/2019 data mining slides

    18/43

    18Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    ro rules to trees

    More di""icult: trans"oring a rule set into a tree

    Tree cannot easil e$press dis9unction between rules

    ;$aple: rules which test di""erent attributes

    0etr needs to be bro,en

    Corresponding tree contains identical subtrees( +replicated subtree proble-)

    If a and b !"n #

    If $ and d !"n #

  • 8/9/2019 data mining slides

    19/43

    19Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # tree "or a siple dis9unction

  • 8/9/2019 data mining slides

    20/43

    2Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    The e$clusi%e'or proble

    If # = % and & = 0!"n $'a(( = a

    If # = 0 and & = %!"n $'a(( = a

    If # = 0 and & = 0

    !"n $'a(( = b

    If # = % and & = %!"n $'a(( = b

  • 8/9/2019 data mining slides

    21/43

    21Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # tree with a replicated subtree

    If # = % and & = %!"n $'a(( = a

    If ) = % and * = %

    !"n $'a(( = a

    !",*(" $'a(( = b

    + - " , l d

  • 8/9/2019 data mining slides

    22/43

    22Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    +5uggets- o" ,nowledge

    #re rules independent pieces o" ,nowledge7 (&t sees eas

    to add a rule to an e$isting rule base/)Proble: ignores how rules are e$ecuted

    Two was o" e$ecuting a rule set:

    Ordered set o" rules (+decision list-)Order is iportant "or interpretation

    .nordered set o" rules

    !ules a o%erlap and lead to di""erent conclusions "or the sae instance

    & i l

  • 8/9/2019 data mining slides

    23/43

    23Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    &nterpreting rules

    =hat i" two or ore rules con"lict7

    >i%e no conclusion at all7>o with rule that is ost popular on training data7

    *

    =hat i" no rule applies to a test instance7>i%e no conclusion at all7

    >o with class that is ost "requent in training data7

    *

    0 i l b l l

  • 8/9/2019 data mining slides

    24/43

    24Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    0pecial case: boolean class

    #ssuption: i" instance does not belong to class +es- it

    belongs to class +no-Tric,: onl learn rules "or class +es- and use de"ault

    rule "or +no-

    If # = % and & = % !"n $'a(( = a

    If ) = % and * = % !"n $'a(( = a

    !",*(" $'a(( = b

    # i ti l

  • 8/9/2019 data mining slides

    25/43

    25Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    #ssociation rules

    #ssociation rules*

    * can predict an attribute and cobinations o"

    attributes

    * are not intended to be used together as a set

    Proble: iense nuber o" possible associations

    Output needs to be restricted to show onl the ost

    predicti%e associations

    onl those with highsupport and high confidence

    0 t d "id " l

  • 8/9/2019 data mining slides

    26/43

    26Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    0upport and con"idence o" a rule

    0upport (Co%erage): nuber o" instances predicted correctlCon"idence (#ccurac) : nuber o" correct predictions as

    proportion o" all instances that rule applies to

    ;$aple: 4cool das with noral huidit

    0upport ? 4 con"idence ? 1@

    5orall: iniu support and con"idence pre'speci"ied

    (e/g/ 58rules with support 2 and con"idence 95@ "orweather data)

    If "/",a," = $11' !"n !d& = n1,a'

    &nterpreting association rules

  • 8/9/2019 data mining slides

    27/43

    27Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    &nterpreting association rules

    &nterpretation is not ob%ious:

    is notthe sae as

    &t eans that the "ollowing also holds:

    If *nd& = fa'(" and /'a& = n1 !"n 1'11 = (nn&

    and !d& = !!

    If *nd& = fa'(" and /'a& = n1 !"n 1'11 = (nn&If *nd& = fa'(" and /'a& = n1 !"n !d& = !!

    If !d& = !! and *nd& = fa'(" and /'a& = n1

    !"n 1'11 = (nn&

    !ules with e$ceptions

  • 8/9/2019 data mining slides

    28/43

    28Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    !ules with e$ceptions

    &dea: allow rules to ha%e exceptions

    ;$aple: rule "or iris data

    5ew instance:

    Modi"ied rule:

    0.%

    Petal

    &idth

    %.'

    Petal

    length

    Iris(setosa3.)).*

    TypeSepal

    &idth

    Sepal

    length

    If /"a'-'"n! 2.45 and /"a'-'"n! 4.45 !"n I,(-",($1'1,

    If /"a'-'"n! 2.45 and /"a'-'"n! 4.45 !"n I,(-",($1'1,

    ECEPT f /"a'-*d! %.0 !"n I,(-("1(a

    # ore cople$ e$aple

  • 8/9/2019 data mining slides

    29/43

    29Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # ore cople$ e$aple

    ;$ceptions to e$ceptions to e$ceptions *

    d"fa' I,(-("1(a

    "#$"/ f /"a'-'"n! 2.45 and /"a'-'"n! 5.355

    and /"a'-*d! %.75

    !"n I,(-",($1'1,

    "#$"/ f /"a'-'"n! 4.5 and /"a'-*d! %.55

    !"n I,(-,n$a

    "'(" f ("/a'-'"n! 4.5 and ("/a'-*d! 2.45

    !"n I,(-,n$a

    "'(" f /"a'-'"n! 3.35

    !"n I,(-,n$a

    "#$"/ f /"a'-'"n! 4.85 and ("/a'-'"n! 5.5

    !"n I,(-",($1'1,

    #d%antages o" using e$ceptions

  • 8/9/2019 data mining slides

    30/43

    3Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    #d%antages o" using e$ceptions

    !ules can be updated increentall

    ;as to incorporate new data

    ;as to incorporate doain ,nowledge

    People o"ten thin, in ters o" e$ceptions

    ;ach conclusion can be considered 9ust in the

    conte$t o" rules and e$ceptions that lead to it

    Localit propert is iportant "or understandinglarge rule sets

    +5oral- rule sets don6t o""er this ad%antage

    More on e$ceptions

  • 8/9/2019 data mining slides

    31/43

    31Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    More on e$ceptions

    Default...exceptif...then...

    is logicall equi%alent to

    if...then...else

    (where the else speci"ies what the de"ault did)

    ut: e$ceptions o""er a pschological ad%antage#ssuption: de"aults and tests earl on appl ore

    widel than e$ceptions "urther down

    ;$ceptions re"lect special cases

    !ules in%ol%ing relations

  • 8/9/2019 data mining slides

    32/43

    32Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    !ules in%ol%ing relations

    0o "ar: all rules in%ol%ed coparing an attribute'%alue to a

    constant (e/g/ teperature 4 45)

    These rules are called +propositional- because the ha%ethe sae e$pressi%e power as propositional logic

    =hat i" proble in%ol%es relationships between e$aples

    (e/g/ "ail tree proble "ro abo%e)7Can6t be e$pressed with propositional rules

    More e$pressi%e representation required

    The shapes proble

  • 8/9/2019 data mining slides

    33/43

    33Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    The shapes proble

    Target concept:standing up

    0haded:standing.nshaded: lying

    # propositional solution

  • 8/9/2019 data mining slides

    34/43

    34Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # propositional solution

    If *d! 3.5 and !"! 7.0!"n '&n

    If !"! 3.5 !"n (andn

    # relational solution

  • 8/9/2019 data mining slides

    35/43

    35Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    # relational solution

    Coparing attributes with each other

    >eneraliAes better to new data

    0tandard relations: ? 4 B

    ut: learning relational rules is costl

    0iple solution: add e$tra attributes

    (e/g/ a binar attribute is width < height?)

    If *d! 9 !"! !"n '&n

    If !"! 9 *d! !"n (andn

    !ules with %ariables

  • 8/9/2019 data mining slides

    36/43

    36Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    !ules with %ariables.sing %ariables and ultiple relations:

    The top o" a tower o" bloc,s is standing:

    The whole tower is standing:

    !ecursi%e de"inition1

    If !"!:and:*d!:1f;#

    and (:,"(:1f;#!"n (andn;#

    &nducti%e logic prograing

  • 8/9/2019 data mining slides

    37/43

    37Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    &nducti%e logic prograing

    !ecursi%e de"inition can be seen as logic progra

    Techniques "or learning logic progras ste "ro the areao" +inducti%e logic prograing- (&LP)

    ut: recursi%e de"initions are hard to learn

    #lso: "ew practical probles require recursionThus: an &LP techniques are restricted to non'recursi%e

    de"initions to a,e learning easier

    &nstance'based representation

  • 8/9/2019 data mining slides

    38/43

    38Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    &nstance based representation

    0iplest "or o" learning: rote learning

    Training instances are searched "or instance that

    ost closel resebles new instanceThe instances thesel%es represent the ,nowledge

    #lso called instance-basedlearning0iilarit "unction de"ines what6s +learned-

    &nstance'based learning is lazy learning

    Methods: nearest-neighbor, k-nearest-neighbor,

  • 8/9/2019 data mining slides

    39/43

    Learning prototpes

  • 8/9/2019 data mining slides

    40/43

    4Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    g p p

    Onl those instances in%ol%ed in a decision need

    to be stored

    5ois instances should be "iltered out

    &dea: onl useprototypicale$aples

    !ectangular generaliAations

  • 8/9/2019 data mining slides

    41/43

    41Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    g g

    5earest'neighbor rule is used outside rectangles

    !ectangles are rules1 (ut the can be ore conser%ati%ethan +noral- rules/)

    5ested rectangles are rules with e$ceptions

    !epresenting clusters &

  • 8/9/2019 data mining slides

    42/43

    42Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    p g

    Simple 2-D representation Venn diagram

    O%erlapping clusters

    !epresenting clusters &&

  • 8/9/2019 data mining slides

    43/43

    43Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

    p g

    1 2 3

    a /4 /1 /5b /1 /8 /1

    c /3 /3 /4

    d /1 /1 /8

    e /4 /2 /4" /1 /4 /5g /7 /2 /1

    h /5 /4 /1

    *

    Probabilistic assignment Dendrogram

    5: dendron is the >ree,word "or tree


Recommended