Download - 1 Classification And Bayesian Learning Presented By Abdu Hassan AL- Gomai Supervisor Prof. Dr. Mohamed Batouche.

1

Classification And Bayesian Learning

Presented ByAbdu Hassan AL- Gomai

Supervisor

Prof. Dr. Mohamed Batouche

2

Contents

Classification vs. Prediction. Classification Step Process. Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification.

3

What is the difference between classification and prediction?

The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( http://www.kdnuggets.com/faq/classification-vs-prediction.html )].

classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data.

Typical Applications Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis.

Classification vs. Prediction

4

Classification—A Two-Step Process

Model construction: describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined

class, as determined by the class label attribute. The set of tuples used for model construction is training

set. The model is represented as classification rules, decision

trees, or mathematical formula. Model usage: for classifying future or unknown objects

Estimate accuracy of the model. The known label of test sample is compared with the

classified result from the model. Accuracy rate is the percentage of test set samples that

are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify

data tuples whose class labels are not known.

5

Classification Process (1): Model Construction

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

6

Classification Process (2): Use the Model in Prediction

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

7

Supervised vs. Unsupervised Learning

Supervised learning (classification) Supervision: The training data (observations,

measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents input-output pairs).

New data is classified based on the training set.

Unsupervised learning (clustering) The class labels of training data is unknown

Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.

8

Major Classification Models

Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods

KNN Boosting Bagging …

9

Evaluating Classification Methods

Predictive accuracy Speed and scalability

time to construct the model. time to use the model.

Robustness handling noise and missing values.

Scalability efficiency with respect to large data.

Interpretability: understanding and insight provided by the model.

Goodness of rules compactness of classification rules.

10

Bayesian Classification

Here we learn: Bayesian classification

E.g. How to decide if a patient is ill or healthy, based on

A probabilistic model of the observed data Prior knowledge.

11

Training data: examples of the form (d,h(d)) where d are the data objects to classify (inputs) and h(d) are the correct class info for d, h(d){1,…

K} Goal: given dnew, provide h(dnew)

Classification problem

12

Why Bayesian?

Provides practical learning algorithms E.g. Naïve Bayes

Prior knowledge and observed data can be combined

It is a generative (model based) approach, which offers a useful conceptual framework E.g. sequences could also be classified,

based on a probabilistic model specification

Any kind of objects can be classified, based on a probabilistic model specification

13

Bayes’ Rule

)(

)()|()|(

dP

hPhdPdhp

) data the seen having after hypothesis ofty (probabili posterior

data) the ofy probabilit (marginal evidence data

is hypothesis the if data the ofty (probabili likelihood

data)any seeing before hypothesis ofty (probabili belief prior

dh

h

h

:)|(

:)()|()(

true) :)|(

:)(

dhP

hPhdPdP

hdP

hP

h

Who is who in Bayes’ rulesidesboth on

y probabilitjoint same the

),(),(

)()|()()|(

grearrangin -

(model) hypothesish

datad

rule Bayes' ingUnderstand

hdPhdP

hPhdPdPdhp

14

Naïve Bayes Classifier What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances

are conditionally independent given the classification hypothesis

it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice

The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier

One of the most practical learning methods Successful applications:

Medical Diagnosis Text classification

t

tT haPhaaPhP )|()|,...,()|( 1d

15

Naïve Bayesian Classifier: Example1

OutlookTemp.HumidityWindyPlay

SunnyCoolHighTrue? Evidence E

Probability ofclass “yes”

]|Pr[]|Pr[ yesSunnyOutlookEyes ]|Pr[ yesCooleTemperatur

]|Pr[ yesHighHumidity ]|Pr[ yesTrueWindy

]Pr[

]Pr[

E

yes

]Pr[149

93

93

93

92

E

The Evidence relates all attributes without Exceptions.

16

OutlookTemperatureHumidityWindyPlay

YesNoYesNoYesNoYesNoYesNo

Sunny23Hot22High34False6295

Overcast40Mild42Normal61True33

Rainy32Cool31

OutlookTempHumidityWindyPlay

SunnyHotHighFalseNo

SunnyHot High TrueNo

Overcast Hot HighFalseYes

RainyMildHighFalseYes

RainyCoolNormalFalseYes

RainyCoolNormalTrueNo

OvercastCoolNormalTrueYes

SunnyMildHighFalseNo

SunnyCoolNormalFalseYes

RainyMildNormalFalseYes

SunnyMildNormalTrueYes

OvercastMildHighTrueYes

OvercastHotNormalFalseYes

RainyMildHighTrueNo

Sunny2/93/5Hot2/92/5High3/94/5False6/92/59/145/14

Overcast4/90/5Mild4/92/5Normal6/91/5True3/93/5

Rainy3/92/5Cool3/91/5

17

Sunny2/93/5Hot2/92/5High3/94/5False6/92/59/145/14

Overcast4/90/5Mild4/92/5Normal6/91/5True3/93/5

Rainy3/92/5Cool3/91/5

OutlookTemp.HumidityWindyPlay

SunnyCoolHighTrue?

For compute prediction for new day:

Likelihood of the two classesFor “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206

Conversion into a probability by normalization:P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

Compute Prediction For New Day

18

Training datasetage income student credit_rating buys_computer

<=30 high no fair no<=30 high no excellent no30…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

Class:C1:buys_computer=‘yes’C2:buys_computer=‘no’

Data sample X =(age<=30,Income=medium,Student=yesCredit_rating=Fair)


19

Compute P(X/Ci) for each class

P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4

X=(age<=30 ,income =medium, student=yes,credit_rating=fair)

P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028

P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007X belongs to class “buys_computer=yes”


20

Naïve Bayesian Classifier: Advantages and Disadvantages

Advantages : Easy to implement. Good results obtained in most of the cases.

Disadvantages Assumption: class conditional independence , therefore loss

of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer,

diabetes etc Dependencies among these cannot be modeled by Naïve

Bayesian Classifier. How to deal with these dependencies?

Bayesian Belief Networks.

21

References

Software: NB for classifying text:http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/

naive-bayes.html Useful reading for those interested to learn more

about NB classification, beyond the scope of this module:http://www-2.cs.cmu.edu/~tom/NewChapters.html.

http:// www.cs.unc.edu/Courses/comp790-090 s08/Lecturenotes.

Introduction to Bayesian Learning, School of Computer Science, University of Birmingham, [email protected].