romi-dm-04-algoritma-nov2014.pptx

Data Mining:4. Metode dan Algoritma

Romi Satria [email protected]

http://romisatriawahono.net/dmWA/SMS: +6281586220090

1

Romi Satria Wahono

• SD Sompok Semarang (1987)• SMPN 8 Semarang (1990)• SMA Taruna Nusantara Magelang (1993)• B.Eng, M.Eng and Ph.D in Software Engineering from

Saitama University Japan (1994-2004)Universiti Teknikal Malaysia Melaka (2014)• Research Interests: Software Engineering,

Machine Learning• Founder dan Koordinator IlmuKomputer.Com• Peneliti LIPI (2004-2007)• Founder dan CEO PT Brainmatics Cipta Informatika

2

Course Outline

1. Pengenalan Data Mining2. Proses Data Mining3. Evaluasi dan Validasi pada Data Mining4. Metode dan Algoritma Data Mining5. Penelitian Data Mining

4. Metode dan Algoritma

1. Inferring rudimentary rules2. Statistical modeling3. Constructing decision trees4. Constructing rules5. Association rule learning6. Linear models7. Instance-based learning8. Clustering

Simplicity first

• Simple algorithms often work very well!• There are many kinds of simple structure, eg:• One attribute does all the work• All attributes contribute equally & independently• A weighted linear combination might do• Instance-based: use a few prototypes• Use simple logical rules

• Success of method depends on the domain

Inferring rudimentary rules

• 1R: learns a 1-level decision tree• I.e., rules that all test one particular attribute

• Basic version• One branch for each value• Each branch assigns most frequent class• Error rate: proportion of instances that don’t belong to

the majority class of their corresponding branch• Choose attribute with lowest error rate(assumes nominal attributes)

Pseudo-code for 1R

• Note: “missing” is treated as a separate attribute value

For each attribute,For each value of the attribute, make a rule as

follows:count how often each class appearsfind the most frequent classmake the rule assign that class to this

attribute-valueCalculate the error rate of the rules

Choose the rules with the smallest error rate

Evaluating the weather attributes

3/6True No*

5/142/8False YesWindy

1/7Normal Yes

4/143/7High NoHumidity

5/14

4/14

Total errors

1/4Cool Yes

2/6Mild Yes

2/4Hot No*Temp

2/5Rainy Yes

0/4Overcast Yes

2/5Sunny NoOutlook

Errors

RulesAttribute

NoTrueHighMildRainy

YesFalseNormalHotOvercast

YesTrueHighMildOvercast

YesTrueNormalMildSunny

YesFalseNormalMildRainy

YesFalseNormalCoolSunny

NoFalseHighMildSunny

YesTrueNormalCoolOvercast

NoTrueNormalCoolRainy

YesFalseNormalCoolRainy

YesFalseHighMildRainy

YesFalseHighHot Overcast

NoTrueHighHotSunny

NoFalseHighHotSunny

PlayWindy

Humidity

TempOutlook

Dealing with numeric attributes• Discretize numeric attributes• Divide each attribute’s range into intervals

• Sort instances according to attribute’s values• Place breakpoints where class changes (majority class)• This minimizes the total error

• Example: temperature from weather data 64 65 68 69 70 71 72 72 75 75 80 81 83 85

Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No

……………

YesFalse8075Rainy

YesFalse8683Overcast

NoTrue9080Sunny

NoFalse8585Sunny

PlayWindyHumidityTemperature

Outlook

The problem of overfitting

• This procedure is very sensitive to noise• One instance with an incorrect class label will probably

produce a separate interval

• Also: time stamp attribute will have zero errors• Simple solution:

enforce minimum number of instances in majority class per interval• Example (with min = 3):

64 65 68 69 70 71 72 72 75 75 80 81 83 85

Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No

64 65 68 69 70 71 72 72 75 75 80 81 83 85

Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No

With overfitting avoidance

• Resulting rule set:

0/1> 95.5 Yes

3/6True No*

5/142/8False YesWindy

2/6> 82.5 and 95.5 No

3/141/7 82.5 YesHumidity

5/14

4/14

Total errors

2/4> 77.5 No*

3/10 77.5 YesTemperature

2/5Rainy Yes

0/4Overcast Yes

2/5Sunny NoOutlook

ErrorsRulesAttribute

Discussion of 1R

• 1R was described in a paper by Holte (1993)• Contains an experimental evaluation on 16 datasets

(using cross-validation so that results were representative of performance on future data)• Minimum number of instances was set to 6 after some

experimentation• 1R’s simple rules performed not much worse than much

more complex decision trees

• Simplicity first pays off!Very Simple Classification Rules Perform Well on Most Commonly Used DatasetsRobert C. Holte, Computer Science Department, University of Ottawa

Discussion of 1R: Hyperpipes• Another simple technique: build one rule for each

class• Each rule is a conjunction of tests, one for each attribute• For numeric attributes: test checks whether instance's

value is inside an interval• Interval given by minimum and maximum observed in training

data• For nominal attributes: test checks whether value is one

of a subset of attribute values• Subset given by all possible values observed in training data

• Class with most matching tests is predicted

Statistical modeling

• “Opposite” of 1R: use all the attributes• Two assumptions: Attributes are• equally important• statistically independent (given the class value)

• I.e., knowing the value of one attribute says nothing about the value of another (if the class is known)

• Independence assumption is never correct!• But … this scheme works well in practice

Probabilities for weather data

5/14

5

No

9/14

9

Yes

Play

3/5

2/5

3

2

No

3/9

6/9

3

6

Yes

True

False

True

False

Windy

1/5

4/5

1

4

NoYesNoYesNoYes

6/9

3/9

6

3

Normal

High

Normal

High

Humidity

1/5

2/5

2/5

1

2

2

3/9

4/9

2/9

3

4

2

Cool2/53/9Rainy

Mild

Hot

Cool

Mild

Hot

Temperature

0/54/9Overcast

3/52/9Sunny

23Rainy

04Overcast

32Sunny

Outlook

NoTrueHighMildRainy

YesFalseNormalHotOvercast

YesTrueHighMildOvercast

YesTrueNormalMildSunny

YesFalseNormalMildRainy

YesFalseNormalCoolSunny

NoFalseHighMildSunny

YesTrueNormalCoolOvercast

NoTrueNormalCoolRainy

YesFalseNormalCoolRainy

YesFalseHighMildRainy

YesFalseHighHot Overcast

NoTrueHighHotSunny

NoFalseHighHotSunny

PlayWindyHumidityTempOutlook

Probabilities for weather data

5/14

5

No

9/14

9

Yes

Play

3/5

2/5

3

2

No

3/9

6/9

3

6

Yes

True

False

True

False

Windy

1/5

4/5

1

4

NoYesNoYesNoYes

6/9

3/9

6

3

Normal

High

Normal

High

Humidity

1/5

2/5

2/5

1

2

2

3/9

4/9

2/9

3

4

2

Cool2/53/9Rainy

Mild

Hot

Cool

Mild

Hot

Temperature

0/54/9Overcast

3/52/9Sunny

23Rainy

04Overcast

32Sunny

Outlook

?TrueHighCoolSunny

PlayWindyHumidity

Temp.OutlookA new day:

Likelihood of the two classes

For “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053

For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206

Conversion into a probability by normalization:

P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205

P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

Bayes’s rule

• Probability of event H given evidence E:

• A priori probability of H :• Probability of event before evidence is seen• A posteriori probability of H :• Probability of event after evidence is seen

Thomas BayesBorn: 1702 in London, EnglandDied: 1761 in Tunbridge Wells, Kent, England

𝑃𝑟 [𝐻∣ 𝐸 ]=𝑃𝑟 [𝐸∣𝐻 ]𝑃𝑟 [𝐻 ]𝑃𝑟 [𝐸 ]

𝑃𝑟 [𝐻 ]

𝑃𝑟 [𝐻∣ 𝐸 ]

Naïve Bayes for classification• Classification learning: what’s the probability of the

class given an instance?• Evidence E = instance• Event H = class value for instance• Naïve assumption: evidence splits into parts (i.e.

attributes) that are independent

𝑃𝑟 [𝐻∣ 𝐸 ]=𝑃𝑟 [𝐸1∣𝐻 ]𝑃𝑟 [𝐸2 ∣𝐻 ] 𝑃𝑟 [𝐸𝑛∣ 𝐻 ] 𝑃𝑟 [𝐻 ]

𝑃𝑟 [𝐸 ]

Weather data example?TrueHighCoolSunny

PlayWindyHumidity

Temp.OutlookEvidence E

Probability ofclass “yes”

𝑃𝑟 [ 𝑦𝑒𝑠 ∣𝐸 ]=𝑃𝑟 [𝑂𝑢𝑡𝑙𝑜𝑜𝑘=𝑆𝑢𝑛𝑛𝑦 ∣ 𝑦𝑒𝑠 ]×𝑃𝑟 [𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒=𝐶𝑜𝑜𝑙 ∣ 𝑦𝑒𝑠 ]×𝑃𝑟 [𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦= h𝐻𝑖𝑔 ∣ 𝑦𝑒𝑠 ]×𝑃𝑟 [𝑊𝑖𝑛𝑑𝑦=𝑇𝑟𝑢𝑒∣ 𝑦𝑒𝑠 ]

×𝑃𝑟 [ 𝑦𝑒𝑠 ]𝑃𝑟 [𝐸 ]

¿

29×39×39×39×914

𝑃𝑟 [𝐸 ]

Cognitive Assignment II

1. Pahami dan kuasai satu algoritma data mining dari berbagai literatur

2. Rangkumkan dengan detail dalam bentuk slide, dengan format:

1. Definisi2. Tahapan Algoritma3. Penerapan Tahapan Algoritma untuk Dataset Main Golf

atau Iris4. Code Java dari Algoritma

3. Presentasikan di Pertemuan Berikutnya

Pilihan Algoritma1. Neural Network2. Logistic Regression3. Support Vector Machine4. K-Means5. K-Nearest Neighbor6. Self-Organizing Map7. Linear Regression8. Naïve Bayes9. FP-Growth10. C4.5

Referensi1. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical

Machine Learning Tools and Techniques 3rd Edition, Elsevier, 2011

2. Daniel T. Larose, Discovering Knowledge in Data: an Introduction to Data Mining, John Wiley & Sons, 2005

3. Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, 2011

4. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques Third Edition, Elsevier, 2012

5. Oded Maimon and Lior Rokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, 2010

6. Warren Liao and Evangelos Triantaphyllou (eds.), Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, World Scientific, 2007

Date post:	16-Jan-2016
Category:	Documents
Upload:	rizqy-fahmi
View:	8 times
Download:	0 times

romi-dm-04-algoritma-nov2014.pptx

Documents