+ All Categories
Home > Documents > 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our...

1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our...

Date post: 22-Dec-2015
Category:
View: 220 times
Download: 1 times
Share this document with a friend
41
1 Bits of probability
Transcript
Page 1: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

1

Bits of probability

Page 2: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

2

Why?Why?

We need a concept of probability to make We need a concept of probability to make judgements about our hypotheses in the scientific judgements about our hypotheses in the scientific method. Is the data consistent with our hypotheses?method. Is the data consistent with our hypotheses?

Page 3: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

3

• relative frequency:If a process or an experiment is repeated a large number oftimes, n, and if the characteristic, E, occurs m times, then therelative frequency, m/n, of E will be approximately equal to theprobability of E.P(E) ≈ m / n

• personal probabilityWhat is the probability of life on Mars?

What is probability?What is probability?

Page 4: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

4

PicturesPictures

UA

Not A

P(A) = (Area of A)/(Area of U) =implicitly P(A|U)

Event spaceEvent space

Page 5: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

5

Operation on event setsOperation on event sets

Union of 2 eventsUnion of 2 events = probability(union)probability(union)

= P(E1 or E2) = P(E1 E2)

.OR..OR.

E1

E2

U E1 E2U

Page 6: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

6

Operation on event setsOperation on event sets

Intersection of 2 eventsIntersection of 2 events = probability(intersection)probability(intersection)

= P(E1 e E2) = P(E1E2).AN.AND.D.

E1

E2

U E1

E2

U

E1 E2

Page 7: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

7

Probability PropertiesProbability Properties

1.1. 0 0 P(E P(Eii) ) 1 1 The probability of EThe probability of Ei i is always a is always a

number between 0 e 1number between 0 e 1

2.2. ii P(E P(Eii) = 1) = 1 The sum of all the outcomes The sum of all the outcomes

EEi i U (the event space) is = 1 U (the event space) is = 1

3.3. Additivity Additivity : P(E P(E11 E E22) = ?) = ?

Page 8: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

8

Probability PropertiesProbability Properties

1.1. 0 0 P(E P(Eii) ) 1 1 The probability of EThe probability of Ei i is always a is always a

number between 0 e 1number between 0 e 1

2.2. ii P(E P(Eii) = 1) = 1 The sum of all the outcomes The sum of all the outcomes

EEi i U (the event space) is = 1 U (the event space) is = 1

3.3. Additivity Additivity : P(E P(E11 E E22) = P(E) = P(E11) + P(E) + P(E22) - P(E) - P(E11

EE22))

Page 9: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

9

P(sum is even or 7)= P(sum even) + P(sum 7) - P(sum = 8,10,12) = 18/36 + 21/36 - 9/36 = 30/36

1° experiment = toss 2 dice 1° experiment = toss 2 dice || results = sum of the outcomes results = sum of the outcomes

1/366,6122/366,55,6113/364,66,45,5104/364,55,43,66,395/366,2

5,2

4,486/366,11,62,54,33,475/362,44,21,55,13,364/361,44,13,22,35

3/361,33,12,24

2/362,11,231/361,12p(x)Possible outcomesPossible outcomesx

x= sum x= sum ofofthe 2 the 2 diesdies

3,5 5,3 2,6

Page 10: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

10

if if EE11 and and EE22 are are mutually exclusivemutually exclusive then then

P(E1 E2) = P(E1) + P(E2)

For instance

P(sum = 2 3) = 1 + 2 = 3 . 36 36 36

Probability AdditivityProbability Additivity

Page 11: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

11

2° experiment 2° experiment = joint probability of parents-children

Parent title Parent title

primary High school degree

Children title

primary 0,04 0,01 0,00High

school0,06 0,24 0,05

degree 0,05 0,30 0,25

Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = ??

P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = ??

Event Event = a pair of values: one for each variable

Page 12: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

12

2° experiment 2° experiment = joint probability of parents-children

Parent title Parent title

primary High school degree total

Children title

primary 0,04 0,01 0,00 0,05High

school0,06 0,24 0,05 0,35

degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00

Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30

P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60

Event Event = a pair of values: one for each variable

Page 13: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

13

2° experiment 2° experiment = joint probability of parents-children

Parent title Parent title

primary High school degree total

Children title

primary 0,04 0,01 0,00 0,05High

school0,06 0,24 0,05 0,35

degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00

Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30

P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60

Union probabilitiesUnion probabilitiesP(P(PPd d CCdd) =P[(parent = degree ) or (child=degree)] ) =P[(parent = degree ) or (child=degree)] =?=?

P(P(PPpp CCpp) = P[(parent=primary) or (child=primary)]) = P[(parent=primary) or (child=primary)] = = ??

P(P(PPdd CCpp) = P[(parent=degree) or (child=primary) ) = P[(parent=degree) or (child=primary) = = ??

Event Event = a pair of values: one for each variable

Page 14: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

14

2° experiment 2° experiment = joint probability of parents-children

Parent title Parent title

primary High school degree total

Children title

primary 0,04 0,01 0,00 0,05High

school0,06 0,24 0,05 0,35

degree 0,05 0,30 0,25 0,60total 0,15 0,55 0,30 1,00

Marginal probability :Marginal probability :P(P(PPdd) = P(parent title = degree) ) = P(parent title = degree) = = 0,300,30

P(P(CCdd) = P(child title = degree) ) = P(child title = degree) = = 0,600,60

Union probabilitiesUnion probabilitiesP(P(PPd d CCdd) =P[(parent = degree ) or (child=degree)] ) =P[(parent = degree ) or (child=degree)] = = 0,30+0,60-0,25 = 0,650,30+0,60-0,25 = 0,65

P(P(PPpp CCpp) = P[(parent=primary) or (child=primary)]) = P[(parent=primary) or (child=primary)] = = 0,15+0,05-0,04= 0,160,15+0,05-0,04= 0,16

P(P(PPdd CCpp) = P[(parent=degree) or (child=primary) ) = P[(parent=degree) or (child=primary) = = 0,30+0,05-0,00= 0,350,30+0,05-0,00= 0,35

Event Event = a pair of values: one for each variable

Page 15: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

15

Conditional probability

P(Cd | Pp)= P[(child=degree) given (parent=primary)] = ?

P(Cd | Phs)= P[(child=degree) given (parent=high school)] = ?

P(Cd | Pd)= P[(child=degree) given (parent=degree)] = ?

1 21 2

2

P(E E )P(E |E )=

P(E )

1 2 1 2 2P(E E )=P(E |E ) P(E )

1,000,300,550,15total0,600,250,300,05degree0,350,050,240,06High school

Children level

0,050,000,010,04primarytotaldegreeHigh schoolprimary

Parent level of study Parent level of study

Page 16: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

16

Conditional probability

P(Cd | Pp)= P[(child=degree) given (parent=primary)] = 0,05/0,15 = 0,33

P(Cd | Phs)= P[(child=degree) given (parent=high school)] = 0,30/0,55 = 0,54

P(Cd | Pd)= P[(child=degree) given (parent=degree)] = 0,25/0,30 = 0,83

1 21 2

2

P(E E )P(E |E )=

P(E )

1 2 1 2 2P(E E )=P(E |E ) P(E )

1,000,300,550,15total0,600,250,300,05degree0,350,050,240,06High school

Children level

0,050,000,010,04primarytotaldegreeHigh schoolprimary

Parent level of study Parent level of study

Page 17: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

17

Conditional probability

“Conditioning on an event” implies that the new total event space is reduced to that event. This is why we divide by its probability

Independent eventIndependent event

2 outcomes 2 outcomes EE11 andand E E22 are independent when are independent when

1 21 2

2

P(E E )P(E |E )=

P(E )

1 2 1 2 2P(E E )=P(E |E ) P(E )

P(EP(E11 |E |E22) = P(E) = P(E11) and P(E) and P(E22|E|E11) = P(E) = P(E22))

both holdsboth holds

Page 18: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

18

“Conditioning on an event” implies that the new total event space is reduced to that event.

Page 19: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

19

Independent event?

Two dice caseTwo dice case

 True= 18/36 3/6 =? P(first is even) P(the first is even | dice are equal)True= 6/36 3/18 =? P(dice are equal) P(2 equal dice | the first is even)False 18/36 12/30

=? P(sum = even) P(sum=even | 2 dice are different)False 3/36 1/6

=? P(sum=10)P(sum=10 | dice are equal)

Page 20: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

20

Computing the joint Computing the joint probability probability

Hint:assuming E2 is a certain event we can compute P(E1|E2).

Then we can relax this assumption by multiplying the results by P(E2).

The product is the joint probability (intersection) of the 2 events

P(E1E2)=P(E1|E2)P(E2)

If EIf E11 and E and E2 are are independentindependent the P(E the P(E1||

EE2)=P(E1) )=P(E1)

and this implyand this imply

P(EP(E1EE2) = P(E) = P(E1) P(E) P(E2))Note: 1) 2 mutually exclusive events cannot be independent Note: 1) 2 mutually exclusive events cannot be independent 2) 2 independent events are not mutually exclusive 2) 2 independent events are not mutually exclusive

Page 21: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

21

PartitionPartition

If U = i Bi and Bi Bj = for all ij{Bi} is a partition of U

U

B1 B2 B3

B4

B5

B6

Page 22: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

22

PartitionPartition

If {Bi} is a partition of U

P(A)= ii P(A,Bi)= P(A,Bi)= ii P(A|Bi)P(Bi) P(A|Bi)P(Bi)

U

B1 B2 B3

B4

B5

B6

A

Page 23: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

23

Pr( ) Pr( | ) Pr( )Pr( | )

Pr( ) Pr( )i i i

iB A A B B

B AA A

Bayes’ Rule• Suppose that B1, B2, … Bk form a partition of S:

Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

; i j iiB B B S

A

SB1 B2 B3

B4

B5

B6

Page 24: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

24

P(X,Y) = P(X | Y) P(Y) = P(Y | X) P(X) Joint probability

So:

P(Y | X) =

P(X | Y) P(Y)

P(X)

P(M | s) =

P(s | M) P(M)

P(s)A priori probabilities

Bayes Theorem

P(M | s)Evidenc

e sConclusio

n M

P(s | M)Evidenc

e sConclusio

n M

Page 25: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

25

• A rare disease affects 1 out of 100,000 people.

• A test shows positive – with probability 0.99 when applied to an ill person,

and – with probability 0.01 when applied to a healthy

person.

• You result positive to the test. ARE YOU ILL?

Bayes’ rule: Example

Page 26: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

26

Bayes’ rule: Example

P(+|ill) = 0.99 P(+|healthy) = 0.01 P(ill) = 10-5

)()|()()|(

)()|( )|(

healthyPhealthyPillPillP

illPillPillP

455

5

1089.9)101(01.01099.0

1099.0

Happy End:More likely the test is incorrect!!

Page 27: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

27

Is the pope an alien?

Since the probability

P(Pope|Human) =1/(6,000,000,000)

do this imply that

the Pope is not a human being?

Beck-Bornholdt HP, Dubben HH, Nature 381, 730 (1996)

THAT IS:

if Human Pope is RARE, is Pope Human RARE ?

(Human ~ Not Pope) ? (Pope ~ Not Human)

Page 28: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

28

The pope is (probably) not an alien

P(Pope|Human) is not the same as P(Human|Pope)

but P(Alien) ~ 0

So

P(Human|Pope) ~ 1.0

)()|()()|(

)()|(

)|(

AlienPAlienPopePHumanPHumanPopeP

HumanPHumanPopeP

PopeHumanP

S Eddy and D McKay’s answer

Page 29: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

29

More examples of fallacious inferenceMore examples of fallacious inference

Since most of sport accidents occur when playing soccer, Stern titled: “SOCCER IS THE MOST DANGEROUS SPORT” (without considering that soccer is probably the most common sport)

Since a third of all fatal accidents in Germany occurs in private homes, Die Welt titled :”PRIVATE HOMES AS DANGER SPOTS”(without considering that home is the place where people spend most of the time)

Since most of the cars entering in one-way streets in the wrong direction are driven by women, Bild titled:”WOMEN MORE DISORIENTED DRIVERS” (without considering whether the samples of men and women drivers had the same size)

From: Kramer W, Gigerenzer G, Statistical Science 20:223-230 (2005)

Page 30: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

30

33 Pirates (zecchino d’oro)33 Pirates (zecchino d’oro)

• 11 “pirati nell’occhio hanno una benda” (sight problem) • 11 “pirati son zoppi in una gamba” (leg problem)• 11 “pirati non sentono la tromba” (hearing problem)What is the probability of:• Having all three injuries• Having 2 injuries• Having 1 injury• No injury

Page 31: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

31

33 Pirates (zecchino d’oro)33 Pirates (zecchino d’oro)

Suppose that the problems are independentP(I_i)=1/3 (prob injury i), P(NI_I)=2/3 (prob injury j)• Having all three injuries

P(S)*P(L)*P(H)=1/3*1/3*1/3=1/27

• Having 2 injuries P(i,j,not k) + P(j,k,not i) + P(k,i,not j ) =3*2/3*2/3*1/3=12/27

• Having 1 injuryP(i,not (j,k))+P(j, not (i,k))+P(k,not (i,j))= 3*2/3*1/3*1/3=6/27

• No injuryP(not S)*P(not L)*P(not H)= 2/3*2/3*2/3 = 8/27

Page 32: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

32

Game: 1 car and 2 sheepGame: 1 car and 2 sheep

From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon

•Two sheep and a care are hidden by three different doors

Page 33: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

33From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon

Game: 1 car and 2 sheepGame: 1 car and 2 sheep

•The game: you select one door (ex. 1)

1 2 31 2 3

•From the remaining two one door with a sheep is shown to you (ex. 2)•You may change your door (selecting 3) or you can keep the your first choice (1)

Page 34: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

34From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon

Game: 1 car and 2 sheepGame: 1 car and 2 sheep

Question: Are the 2 choices1. Equivalent2. Better change opinion3. Better keeping the first choice

1 2 31 2 3

Page 35: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

35From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon

Game: 1 car and 2 sheepGame: 1 car and 2 sheep

Suppose you select x (y and z are the alternatives). Suppose you select x (y and z are the alternatives). P(x)=P(P(x)=P(yy)=P()=P(zz)=1/3)=1/3P(Sz)=probability of showing zP(Sz)=probability of showing z

P(first) = 1/3

P(second) = P(y,Sz)+P(z,Sy)= P(Sz|y)P(y)+P(Sy|z)P(z)= 1*1/3+1*1/3= 2/3

Page 36: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

36From “The Curious Incident of the Dog in the Night-Time” by Mark Haddon

Game: 1 car and 2 sheepGame: 1 car and 2 sheep

Write program to test it:Write program to test it:

firstOK = 0

secondOK = 0

for i=1 to MaxIter doors = [0,0,0] put random a 1 in doors first = one door selected random shown = position in door != first which is 0 second= the remaining position != first != shown if doors[first] == 1 then firstOK = firstOK + 1 else if doors[second] == 1 then secondOK = secondOK + 1 end ifend for probfirst = firstOK / MaxIterprobsecond = secondOK / MaxIter

Page 37: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

37

Some useful measure: Odd ratio and log-odd scoreSome useful measure: Odd ratio and log-odd score

A measure of the relative influence of A and B is

odd(A,B)=P(A,B) / P(A)P(B)

if A and B are independent odd(A,B) ~ 1

alternatively log(odd(A,B)) >> 0 or << 0

indicates strong correlation

Ex: Substitution matrices

Page 38: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

38

Probabilistic training of a parametric methodProbabilistic training of a parametric method

Generally speaking, a parametric model M aims to reproduce a set Generally speaking, a parametric model M aims to reproduce a set of known dataof known data

Model MModel MParameters TParameters T

Modelled dataModelled dataReal data (D)Real data (D)

How to compare them?How to compare them?

Page 39: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

39

Maximum likelihoodMaximum likelihood

T* = argmax P(D|T,M) =T

D=data, M= model, T=model parameters

T* = argmax log(P(D|T,M))T

Page 40: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

40

Example (coin-tossing)Example (coin-tossing)

Given N tossing of a coin (our data D), the outcomes are h heads and t tails (N=t+h)

ASSUME the model

P(D|M)= ph (1- p)t

Computing the maximum likelihood of P(D|M)

d P(D|M)d p = ph -1(1- p)t-1(h(1-p)-tp) = 0

d P(D|M)d p = 0

We obtain that our estimate of p is

p = h / (h+t) = h / N

Page 41: 1 Bits of probability. 2 Why? We need a concept of probability to make judgements about our hypotheses in the scientific method. Is the data consistent.

41

Example (Error measure)Example (Error measure)

Suppose you think that your data are affected by a Gaussian error

So that they are distributed according to

F(xi)=A*exp-[(xi – )2 /22]

With A=1/sqrt(2 )

If your measures are independent the data likelihood is

P(Data| model) = i F(xi)

Find and that maximize the P(Data| model)


Recommended