1
Classification And Bayesian Learning
Presented ByAbdu Hassan AL- Gomai
Supervisor
Prof. Dr. Mohamed Batouche
2
Contents
Classification vs. Prediction. Classification Step Process. Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification.
3
What is the difference between classification and prediction?
The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( http://www.kdnuggets.com/faq/classification-vs-prediction.html )].
classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data.
Typical Applications Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis.
Classification vs. Prediction
4
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute. The set of tuples used for model construction is training
set. The model is represented as classification rules, decision
trees, or mathematical formula. Model usage: for classifying future or unknown objects
Estimate accuracy of the model. The known label of test sample is compared with the
classified result from the model. Accuracy rate is the percentage of test set samples that
are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify
data tuples whose class labels are not known.
5
Classification Process (1): Model Construction
TrainingData
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
ClassificationAlgorithms
IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’
Classifier(Model)
6
Classification Process (2): Use the Model in Prediction
Classifier
TestingData
NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
7
Supervised vs. Unsupervised Learning
Supervised learning (classification) Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents input-output pairs).
New data is classified based on the training set.
Unsupervised learning (clustering) The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.
8
Major Classification Models
Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods
KNN Boosting Bagging …
9
Evaluating Classification Methods
Predictive accuracy Speed and scalability
time to construct the model. time to use the model.
Robustness handling noise and missing values.
Scalability efficiency with respect to large data.
Interpretability: understanding and insight provided by the model.
Goodness of rules compactness of classification rules.
10
Bayesian Classification
Here we learn: Bayesian classification
E.g. How to decide if a patient is ill or healthy, based on
A probabilistic model of the observed data Prior knowledge.
11
Training data: examples of the form (d,h(d)) where d are the data objects to classify (inputs) and h(d) are the correct class info for d, h(d){1,…
K} Goal: given dnew, provide h(dnew)
Classification problem
12
Why Bayesian?
Provides practical learning algorithms E.g. Naïve Bayes
Prior knowledge and observed data can be combined
It is a generative (model based) approach, which offers a useful conceptual framework E.g. sequences could also be classified,
based on a probabilistic model specification
Any kind of objects can be classified, based on a probabilistic model specification
13
Bayes’ Rule
)(
)()|()|(
dP
hPhdPdhp
) data the seen having after hypothesis ofty (probabili posterior
data) the ofy probabilit (marginal evidence data
is hypothesis the if data the ofty (probabili likelihood
data)any seeing before hypothesis ofty (probabili belief prior
dh
h
h
:)|(
:)()|()(
true) :)|(
:)(
dhP
hPhdPdP
hdP
hP
h
Who is who in Bayes’ rulesidesboth on
y probabilitjoint same the
),(),(
)()|()()|(
grearrangin -
(model) hypothesish
datad
rule Bayes' ingUnderstand
hdPhdP
hPhdPdPdhp
14
Naïve Bayes Classifier What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances
are conditionally independent given the classification hypothesis
it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice
The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier
One of the most practical learning methods Successful applications:
Medical Diagnosis Text classification
t
tT haPhaaPhP )|()|,...,()|( 1d
15
Naïve Bayesian Classifier: Example1
OutlookTemp.HumidityWindyPlay
SunnyCoolHighTrue? Evidence E
Probability ofclass “yes”
]|Pr[]|Pr[ yesSunnyOutlookEyes ]|Pr[ yesCooleTemperatur
]|Pr[ yesHighHumidity ]|Pr[ yesTrueWindy
]Pr[
]Pr[
E
yes
]Pr[149
93
93
93
92
E
The Evidence relates all attributes without Exceptions.
16
OutlookTemperatureHumidityWindyPlay
YesNoYesNoYesNoYesNoYesNo
Sunny23Hot22High34False6295
Overcast40Mild42Normal61True33
Rainy32Cool31
OutlookTempHumidityWindyPlay
SunnyHotHighFalseNo
SunnyHot High TrueNo
Overcast Hot HighFalseYes
RainyMildHighFalseYes
RainyCoolNormalFalseYes
RainyCoolNormalTrueNo
OvercastCoolNormalTrueYes
SunnyMildHighFalseNo
SunnyCoolNormalFalseYes
RainyMildNormalFalseYes
SunnyMildNormalTrueYes
OvercastMildHighTrueYes
OvercastHotNormalFalseYes
RainyMildHighTrueNo
Sunny2/93/5Hot2/92/5High3/94/5False6/92/59/145/14
Overcast4/90/5Mild4/92/5Normal6/91/5True3/93/5
Rainy3/92/5Cool3/91/5
17
Sunny2/93/5Hot2/92/5High3/94/5False6/92/59/145/14
Overcast4/90/5Mild4/92/5Normal6/91/5True3/93/5
Rainy3/92/5Cool3/91/5
OutlookTemp.HumidityWindyPlay
SunnyCoolHighTrue?
For compute prediction for new day:
Likelihood of the two classesFor “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206
Conversion into a probability by normalization:P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
Compute Prediction For New Day
18
Training datasetage income student credit_rating buys_computer
<=30 high no fair no<=30 high no excellent no30…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
Class:C1:buys_computer=‘yes’C2:buys_computer=‘no’
Data sample X =(age<=30,Income=medium,Student=yesCredit_rating=Fair)
Naïve Bayesian Classifier: Example2
19
Compute P(X/Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4
X=(age<=30 ,income =medium, student=yes,credit_rating=fair)
P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007X belongs to class “buys_computer=yes”
Naïve Bayesian Classifier: Example2
20
Naïve Bayesian Classifier: Advantages and Disadvantages
Advantages : Easy to implement. Good results obtained in most of the cases.
Disadvantages Assumption: class conditional independence , therefore loss
of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer,
diabetes etc Dependencies among these cannot be modeled by Naïve
Bayesian Classifier. How to deal with these dependencies?
Bayesian Belief Networks.
21
References
Software: NB for classifying text:http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/
naive-bayes.html Useful reading for those interested to learn more
about NB classification, beyond the scope of this module:http://www-2.cs.cmu.edu/~tom/NewChapters.html.
http:// www.cs.unc.edu/Courses/comp790-090 s08/Lecturenotes.
Introduction to Bayesian Learning, School of Computer Science, University of Birmingham, [email protected].