Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | eric-jennings |
View: | 215 times |
Download: | 2 times |
Final Exam Review
Final Exam: May 10 Thursday
If event E occurs, then the probability thatevent H will occur is p(H|E)
IF E (evidence) is trueTHEN H (hypothesis) is true with
probability p
Bayesian reasoning
HpHEpHpHEp
HpHEpEHp
Bayesian reasoning Example: Cancer and Test P(C) = 0.01 P(¬C) = 0.99P(+|C) = 0.9 P(-|C) = 0.1P(+|¬C) = 0.2 P(-|¬C) = 0.8
P(C|+) = ?
HpHEpHpHEp
HpHEpEHp
Expand the Bayesian rule to work with multiple hypotheses (H1...Hm) and evidences (E1...En)
Assuming conditional independence among evidences E1...En
Bayesian reasoning with multiple hypotheses and evidences
m
kkknkk
iiniini
HpHEp...HEpHEp
HpHEpHEpHEpE...EEHp
121
2121
...
Expert data:
Bayesian reasoning Example
H ypothesi sProbability
1i 2i 3i
0.40
0.9
0.6
0.3
0.35
0.0
0.7
0.8
0.25
0.7
0.9
0.5
iHp
iHEp 1
iHEp 2
iHEp 3
user observes E3 E1 E2
H ypothesi sProbability
1i 2i 3i
0.40
0.9
0.6
0.3
0.35
0.0
0.7
0.8
0.25
0.7
0.9
0.5
iHp
iHEp 1
iHEp 2
iHEp 3
32,1,=,3
1321
321321 i
HpHEpHEpHEp
HpHEpHEpHEpEEEHp
kkkkk
iiiii
0.4525.09.00.50.7
0.7
0.7
0.5
+.3507.00.00.8+0.400.60.90.30.400.60.90.3
3211
EEEHp
025.09.0+.3507.00.00.8+0.400.60.90.3
35.07.00.00.83212
EEEHp
0.5525.09.00.5+.3507.00.00.8+0.400.60.90.3
25.09.00.70.53213
EEEHp
32,1,=,3
1321
321321 i
HpHEpHEpHEp
HpHEpHEpHEpEEEHp
kkkkk
iiiii
0.4525.09.00.50.7
0.7
0.7
0.5
+.3507.00.00.8+0.400.60.90.30.400.60.90.3
3211
EEEHp
025.09.0+.3507.00.00.8+0.400.60.90.3
35.07.00.00.83212
EEEHp
0.5525.09.00.5+.3507.00.00.8+0.400.60.90.3
25.09.00.70.53213
EEEHp
Bayesian reasoning Example
expert system computesposterior probabilitiesuser observes E2
H ypothesi sProbability
1i 2i 3i
0.40
0.9
0.6
0.3
0.35
0.0
0.7
0.8
0.25
0.7
0.9
0.5
iHp
iHEp 1
iHEp 2
iHEp 3
m
kkknkk
iiniini
HpHEp...HEpHEp
HpHEpHEpHEpE...EEHp
121
2121
...
Propagation of CFsFor a single antecedent rule:
cf(E) is the certainty factor of the evidence.cf(R) is the certainty factor of the rule.
Single antecedent rule exampleIF patient has toothache THEN problem is cavity {cf 0.3}Patient has toothache {cf 0.9}What is the cf(cavity, toothache)?
Propagation of CFs (multiple antecedents)For conjunctive rules:
IF <evidence E1> AND <evidence E2> ... AND <evidence En> THEN <Hypothesis H> {cf}
For two evidences E1 and E2:cf(E1 AND E2) = min(cf(E1), cf(E2))
Propagation of CFs (multiple antecedents)For disjunctive rules:
IF <evidence E1> OR <evidence E2> ... OR <evidence En> THEN <Hypothesis H> {cf}
For two evidences E1 and E2:cf(E1 OR E2) = max(cf(E1), cf(E2))
ExerciseIF (P1 AND P2) OR P3 THEN C1 (0.7) AND C2 (0.3)Assume cf(P1) = 0.6, cf(P2) = 0.4, cf(P3) =
0.2What is cf(C1), cf(C2)?
Defining fuzzy sets with fit-vectorsA can be defined as:
So, for example:Tall men = (0/180, 1/190)Short men=(1/160, 0/170)Average men=(0/165,1/175,0/185)
What about linguistic values with qualifiers?e.g. very tall, extremely short, etc.
Hedges are qualifying terms that modifythe shape of fuzzy setse.g. very, somewhat, quite, slightly,
extremely, etc.
Qualifiers & Hedges
Representing HedgesHedge Mathematical
Expression
A little
Slightly
Very
Extremely
Graphical Representation
[A(x)]1.3
[A(x)]1.7
[A(x)]2
[A(x)]3
Hedge MathematicalExpression
A little
Slightly
Very
Extremely
Graphical Representation
[A(x)]1.3
[A(x)]1.7
[A(x)]2
[A(x)]3
Representing Hedges
Hedge MathematicalExpression Graphical Representation
Very very
More or less
Indeed
Somewhat
2 [A(x )]2
A(x)
A(x)
if 0 A 0.5
if 0.5 < A 1
1 2 [1 A(x)]2
[A(x)]4
Representing HedgesHedge Mathematical
Expression Graphical Representation
Very very
More or less
Indeed
Somewhat
2 [A(x )]2
A(x)
A(x)
if 0 A 0.5
if 0.5 < A 1
1 2 [1 A(x)]2
[A(x)]4
Hedge MathematicalExpression Graphical Representation
Very very
More or less
Indeed
Somewhat
2 [A(x )]2
A(x)
A(x)
if 0 A 0.5
if 0.5 < A 1
1 2 [1 A(x)]2
[A(x)]4
Crisp Set Operations
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
ComplementTo what degree do elements not belong to this set?
tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190};
Not tall men = {1/180, 0.75/182, 0.5/185, 0.25/187, 1/190};
Fuzzy Set Operations
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
Complement
0x
1
(x)
0x
1
Containment
0x
1
0x
1
AB
Not A
A
Intersection
0x
1
0x
AB
Union0
1
ABAB
0x
1
0x
1
B
A
B
A
(x)
(x) (x)
m¬A(x) = 1 – mA(x)
ContainmentWhich sets belong to other sets?
tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190};
very tall men = {0/180, 0.06/182, 0.25/185, 0.56/187, 1/190};
Fuzzy Set Operations
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
Complement
0x
1
(x)
0x
1
Containment
0x
1
0x
1
AB
Not A
A
Intersection
0x
1
0x
AB
Union0
1
ABAB
0x
1
0x
1
B
A
B
A
(x)
(x) (x)
Each element of the fuzzysubset has smaller membership
than in the containing set
IntersectionTo what degree is the element in both sets?
Fuzzy Set Operations
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
Complement
0x
1
(x)
0x
1
Containment
0x
1
0x
1
AB
Not A
A
Intersection
0x
1
0x
AB
Union0
1
ABAB
0x
1
0x
1
B
A
B
A
(x)
(x) (x)
mA∩B(x) = min[ mA(x), mB(x) ]
tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190};
average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190};
tall men ∩ average men = {0/165, 0/175, 0/180, 0.25/182, 0/185, 0/190};
ortall men ∩ average men = {0/180, 0.25/182,
0/185};
mA∩B(x) = min[ mA(x), mB(x) ]
UnionTo what degree is the element in either or
both sets?
Fuzzy Set Operations
Intersection Union
Complement
NotA
A
Containment
AA
B
BA AA B
Complement
0x
1
(x)
0x
1
Containment
0x
1
0x
1
AB
Not A
A
Intersection
0x
1
0x
AB
Union0
1
ABAB
0x
1
0x
1
B
A
B
A
(x)
(x) (x)
mAB(x) = max[ mA(x), mB(x) ]
tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190};
average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190};
tall men average men = {0/165, 1/175, 0.5/180, 0.25/182, 0.5/185, 1/190};
mAB(x) = max[ mA(x), mB(x) ]
25
Choosing the Best Attribute:Binary ClassificationWant a formal measure that returns a maximum
value when attribute makes a perfect split and minimum when it makes no distinction
Information theory (Shannon and Weaver 49)Entropy: a measure of uncertainty of a random
variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit)
Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute
26
Formula for Entropy
Examples:Suppose we have a collection of 10 examples, 5
positive, 5 negative:H(1/2,1/2) = -1/2log21/2 -1/2log21/2 = 1 bit
Suppose we have a collection of 100 examples, 1 positive and 99 negative:
H(1/100,99/100) = -.01log2.01 -.99log2.99 = .08 bits
Information gainInformation gain (from attribute test) =
difference between the original information requirement and new requirement
Information Gain (IG) or reduction in entropy from the attribute test:
Choose the attribute with the largest IG
Information gainFor the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
Example contd.Decision tree learned from the 12 examples:
Substantially simpler than “true”
Perceptrons
Threshold
Inputs
x1
x2
Output
Y
HardLimiter
w2
w1
LinearCombiner
X = x1w1 + x2w2
Y = Ystep
Perceptrons
How does a perceptron learn?A perceptron has initial (often random)
weights typically in the range [-0.5, 0.5]Apply an established training dataset Calculate the error as
expected output minus actual output:
error e = Yexpected – Yactual
Adjust the weights to reduce the error
Perceptrons
How do we adjust a perceptron’s weights to produce Yexpected?If e is positive, we need to increase Yactual
(and vice versa)Use this formula:
, where and
α is the learning rate (between 0 and 1) e is the calculated error
Perceptron Example – AND
Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 andlearning rate α = 0.1
Perceptron Example – AND
Train a perceptron to recognize logical AND
Use threshold Θ = 0.2 andlearning rate α = 0.1
Perceptron Example – ANDRepeat until convergence
i.e. final weights do not change and no error
Use threshold Θ = 0.2 andlearning rate α = 0.1