Stat 491 Chapter 3--Probability

7/30/2019 Stat 491 Chapter 3--Probability

1/34

Probability and InferenceDefinitions and Properties

Event RelationsLaws of Probability

Conditional ProbabilityBayes Rule and Screening Tests

Stat 491: Biostatistics

Chapter 3: Probability

Solomon W. Harrar

The University of Montana

Fall 2012

Chapter 3: Probability Stat 491: Biostatistics


2/34




Example: Verifying a Claim

A manufacturer of pregnancy test kit claims that the accuracy(true positive rate) of their kit is over 75%.

We conducted a clinical trial and out of 100 pregnant women,77 tested positive.

Can we have faith on the sample evidence?

There is enough evidence to back your claim if

Probability(Evidence given that the Claim is FALSE) = Small,

say less than 0.05.

This is one instance where probability and probability modelscome in handy.



3/34




Definitions

Sample Space: the set of all possible outcomes of a trial.

Event (A,B,C,. . .): any set of outcomes of interest.

The probability of an event is the relative frequency ofoccurrence of this set of outcomes over an indefinitely large(or infinite) number of trials.

That is,

P(E) =nEn

where nE the number of outcomes in favor of E in n (large)repetitions of the trial.



4/34




Interpretation of Probability

Suppose it is known that

The probability of developing a breast cancer over

30 years in 40-year-old women who have never hadcancer is 1/11.

This probability means

over a large sample of 40-year-old women who have

never had breast cancer, approximately 1 in 11 willdevelop the disease by age 70.

This proportion becomes increasingly close to 1 in 11 as thenumber of women sampled increases.


P b bili d I f


5/34




Properties of Probability

Prop. 1 For any event A

0 P(A) 1.

Prop. 2 If A and B are two events that can not happen at the sametime then

P(A or B occurs) = P(A) + P(B).

Definition Two events A and B are said to be mutually exclusive if theycan not happen at the same time.


P b bilit d I f


6/34




Example: Properties of Probability

Let X be diastolic blood pressure of a person. LetA = {X < 90} and B = {90 X < 95}. SupposeP(A) = 0.7 and P(B) = 0.1. Let C = X < 95. Then,

P(C) = P(A or B) = P(A) + P(B) = 0.7 + 0.1 = 0.8.


Probability and Inference


7/34




The Complement of an EventThe complement of an event A written as A is defined as

A = the event that A does not occur

= the collection of all outcomes in the sample spacethat are not in A

P(A) = 1 P(A).

Using a venn diagram.

Example: For the diastolic blood pressure example whereA = {X < 90},

P(A) = 1 P(A) = 1 0.7 = 0.3.




8/34




The Union of Events

The union of events A and B written as A B is defined as

A B = the event that either A or B or both occur

= the collection of all outcomes in the sample spacethat are either in A or B or both

Using a venn diagram

Example: For the hypertension example, let A = {X 90}and B = {75 X 100}. Then

A B = {X 75}.




9/34




The Intersection of Events

The intersection of events A and B written as A B isdefined as

A B = the event that both A and B occur

= the collection of all outcomes in the sample space

that are both in A and B

Using a venn diagram

For the hypertension example,let A = {X 90} andB = {75 X 100} then

P(A B) = {90 X 100}

If A and B are mutually exclusive, then A B = andP(A B) = 0.




10/34




The Intersection of Events Contd ...

Two events A and B are said to be independent if

P(A B) = P(A) P(B).

Otherwise, A and B are said to be dependent.Example: Let A = {Wifes DBP > 95},B = {Husbands DBP > 95} andC = {first-born childs DBP > 80} where DBP stands fordiastolic blood pressure. Assume that P(A) = 0.1 and

P(B) = 0.2. What can we say about P(A B)?If we are willing to assume that the wifes hypertensive statusdoes not depend on the husbands hypertensive status then

P(A B) = 0.1 0.2 = 0.02.

If P(C) = .2 and P(B C) = .05, are B and C independent?Is the result unex ected? Ex lain.Chapter 3: Probability Stat 491: Biostatistics



11/34

yDefinitions and Properties



The Intersection of Events: Example

Suppose two doctors, A and B, test all patients coming into aclinic for syphilis.

A+ = {Dr. A makes a positive diagnosis}

B+ = {Dr. B makes a positive diagnosis}

Suppose P(A+) = 0.1 , P(B+) = 0.17 andP(A+ B+) = 0.08. Are the events A+ and B+ independent?

Now

P(A+ B+) = 0.08 > P(A+) P(B+) = 0.017.

Thus the two events are dependent.

This result is NOT unexpected. Why?




12/34

yDefinitions and Properties



Multiplication and Addition Laws

Multiplication Law: For mutually independent eventsA1,A2, . . . ,Am,

P(A1 A2 Am) = P(A1) P(A2) P(Am).

Addition Law: For any two events A and B,

P(A B) = P(A) + P(B) P(A B).

Use venn diagram.For the STD example, suppose a patient will be referred forfurther lab test if at least one of the doctors makes a positivediagnosis. Then

P(A+B+) = P(A+)+P(B+)P(A+B+) = 0.1+0.170.08 = .19

Thus 19% of all patients will be referred for further lab tests.Chapter 3: Probability Stat 491: Biostatistics



13/34

Definitions and PropertiesEvent Relations

Laws of ProbabilityConditional Probability

Bayes Rule and Screening Tests

Addition Law for Independent Events

If A and B are independent events then

P(A B) = P(A) + P(B)[1 P(A)].

Example: Let A = {Wifes DBP > 95} andB = {Husbands DBP > 95} where DBP stands for diastolicblood pressure. Assume that P(A) = 0.1 and P(B) = 0.2.Then the probability of a hypertensive household is

P(AB) = P(A)+P(B)[1P(A)] = 0.1+0.2[10.1] = 0.28.

Therefore, 28% of all households will be hypertensive.


Probability and Inferencefi


14/34




Addition Law for More Than Two Events

For any events A, B and C,

P(A B C) = P(A) + P(B) + P(C)

P(A B) P(A C) P(B C)

+ P(A B C)

The addition law also generalizes to an arbitrarily number ofevents, although that is beyond the scope of this course.


Probability and InferenceD fi i i d P i


15/34




Conditional Probability

Let A and B be two events with P(B) > 0. Then theconditional probability of event A given event B, written asP(A|B), is defined as

P(A|B) =P(A B)

P(B)

and the conditional probability of event B given event A isdefined in similar way assuming that P(A) > 0.

Using a venn diagram.If A and B are independent then

P(B|A) = P(B) = P(B|A).


Probability and InferenceD fi iti d P ti


16/34




Relative Risk

The relative risk (RR) of B given A is

RR = P(B|A)P(B|A)

.

Obviously,0 RR


17/34




Relative Risk: Example

Suppose that 1500 smokers in 10,000 develop lung cancer in20 years and 50 non-smokers in 5000 developed lung cancer in

20 years. What is the relative risk of lung cancer in 20 yearsgiven a person smokes?

Let A = {Smoker} and B = {Develop lung cancer}. Then

RR =P(B|A)

P(B|A)=

0.15

0.01= 15.

Therefore, smokers are 15 times more likely to develop lungcancer in 20 years than nonsmokers.




18/34




Total Probability Rule

Let A and B be any two events.

Clearly,P(B) = P(B A) + P(B A).

The above relation implies

P(B) = P(B|A) P(A) + P(B|A) P(A)

which is known as the Total Probability Rule.

Generalization of the total probability rule: Let A1,A

2, ,A

mbe a set of mutually exclusive and exhaustive events. Then,

P(B) =m

i=1

P(B|Ai) P(Ai).




19/34




Total Probability Rule: Example

Suppose the rate of type II diabetes mellitus (DM) in 40- to

59-year old is 7% among Caucasians, 10% among AfricanAmericans, 12% among Hispanics and 5% among AsianAmericans. Suppose the ethnic distribution in Houston, TX is30% Caucasian, 25% African American, 40% Hispanics and5% Asian American. What is the overall probability of type IIDM among 40 to 59 year-old in Houston?




20/34




Total Probability Rule: Example contd...

Let A1 = {Caucasian}, A2 = {African American},A3 = {Hispanics}, A4 = {Asian American} and

B = {Having Type II DM}. Then,

P(B) = P(B|A1) P(A1) + P(B|A2) P(A2)

+ P(B|A3) P(A3) + P(B|A4) P(A4)

= (.07)(.3) + (.1)(.25) + (.12)(.4) + (.05)(.05) = 0.0965.

Therefore, the overall probability of type II DM among 40 to59 year-old in Houston is 0.0965.




21/34

pEvent Relations



Generalized Multiplication Rule

Let A1,A2, . . . ,Am be arbitrary set of events, not necessarilymutually independent.

P(A1 A2 Am) = P(A1) P(A2|A1)

P(A3|A2 A1) P(Am|A1 A2 Am1)

This is a direct consequence of definition of conditional

probability.

The rule reduces to the multiplication law discussed before ifthe events A1,A2, . . . ,Am are mutually independent.




22/34



Predictive Value, Sensitivity and Specificity

Predictive Value Positive of a screening test

PV+ = P(disease|test+)

Predictive Value Negative of a screening test

PV = P(no disease|test)

Sensitivity of a screening test

Sensitivity = P(test+|disease)

Specificity of a screening test

Specificity = P(test|no disease)

Question: Which of the above information do you think can bedirectly measured by clinicians?




23/34



Symptoms as a Screening Test

A symptom or set of symptoms can be used as a screeningtest.

Ideally, we would like to find a set of symptoms such thatboth PV+ and PV are 1.

For a symptom to be effective in predicting disease, it isimportant that both sensitivity and specificity to be high.

Please know, though, that high sensitivity and specificity arenot the whole story.



E R l i


24/34



False Positive and False Negative

When do you think results from a screening test are classifiedFalse Positive and when are they classified False Negative.

The probability of False positive

P(False Positive) = 1 Specificity

The probability of False Negative

P(False Negative) = 1 Sensitivity

Example: Suppose 5% of women with breast cancer have afamily history of breast cancer but only 2% women withoutbreast cancer have such a history. Then what is the rate ofFalse Positive and False Negative of family history as ascreening test?



E t R l ti


25/34



Example: Association Between PSA and Prostate Cancer

The PSA+ and PSA status of each participant in a studywas evaluated and the following data was obtained.

PSA Test Result Prostate Cancer Frequency+ + 92+ - 27- + 46- - 72

Calculate the PV+, PV

, Sensitivity and Specificity.Why is this type of sample is, in general, unrealistic?

Typically, case-control studies which allow only to estimatesensitivity and specificity but not PVs are used.



Event Relations


26/34



Example: Association Between PSA and Prostate Cancer

Contd...



Event Relations


27/34



Bayes Rule

Let A = Symptom and B = disease.

P(B) = is the probability of the disease in the referencepopulation.

P(B) is also known as the prevalence of the disease in thereference population.

The prevalence information can be combined with thespecificity and sensitivity information to get PV+ and PV.



Event Relations


28/34



Bayes Rule Contd...

Bayes Rule says that the Predictive Value Positive is given by

PV+ =P(A|B) P(B)

P(A|B) P(B) + P(A|B) P(B)

=Sensitivity Prevalence

Sensitivity Prevalence + (1 Specificity) (1 Prevalence)

Similarly, the Predictive Value Negative is given by

PV = P(A|

B) P(

B)

P(A|B) P(B) + P(A|B) P(B)

=Specificity (1 Prevalence)

Specificity (1 Prevalence) + (1 Sensitivity) Prevalence



Event Relations


29/34



Bayes Rule: Example

It is known that Enzyme-Linked Immunosorbent Assay(ELISA) test for HIV infection has 99.7% sensitivity and

98.5% specificity.What is the likelihood that a person has been infected withHIV if he or she had a positive test readout given that

1 the prevalence of HIV infection in the region where the patientis from is 0.2%?

2 the patients decision to come for a test resulted from his or herconcern of high-risk behavior and the prevalence of HIVinfection among high-risk groups in the persons region is 20%.



Event Relations


30/34



Bayes Rule: Example Contd...

Why would the probability be so low in (1) while the tests arevery accurate?

This is easy to explain. The test has high false positive rate

compared to the prevalence HIV.Of 1000 people in the region, roughly 2 people[.002 .997 1000 = 1.994] would have HIV and the testwould give positive result.

Approximately, the test gives positive results for roughly17[(.002 .997 + .015 .998) 1000 = 16.964] people.

The test gives positive results for 15 more than actually are.

Hence, the probability that the person have HIV given apositive test result is 1.994/16.949 = 11.7543%.



Event Relations


31/34



Bayes Rule: Example contd...

The two lessons to be learnt from this example are1 The patient must be told his likelihood in addition to the

specificity and sensitivity of the test, NOT just the readout.2 The prevalence information is very important in determining

the persons likelihood.

The best course of action may be to do a more accurate(Western Blot Procedure) test if a positive readout is met.The combination of these two procedures is highly accurate.

In the previous calculation, we updated the persons priorprobability of infection in light of the test result to get thepersons posterior probability of infection.

This is a special cases of the general class of inference knownus Bayesian Inference.



Event Relations


32/34



Generalization of Bayes Rule

Let B1,B2, . . . ,Bm be a set of mutually exclusive andexhaustive disease states. Let A be the presence of asymptom or set of symptoms. Then

P(Bi|A) =P(A|Bi) P(Bi)

m

j=1

P(A|Bj) P(Bj).

Read Example 3.27 in the textbook.



Event RelationsL f P b bili


33/34



Summary

Defined Probability

Addition and multiplication laws

Independent and dependent eventsConditional Probability and Relative Risks to quantify thedependence between events.

Accuracy of screening tests defined as an application ofconditional probabilities.

Application of Bayes rule for computing PV+ and PVwhen only sensitivity, specificity and prevalence are known.

This is a special case of Bayesian Inference.



Event RelationsL f P b bilit


34/34



Homework

Problems 3.49, 3.50, 3.75, 3.83, 3.84


Date post:	14-Apr-2018
Category:	Documents
Upload:	jose-juan-gongora-cortes
View:	222 times
Download:	0 times

Stat 491 Chapter 3--Probability

Documents