+ All Categories
Home > Education > Statistical inference: Probability and Distribution

Statistical inference: Probability and Distribution

Date post: 14-Jul-2015
Category:
Upload: eugene-yan
View: 1,179 times
Download: 3 times
Share this document with a friend
22
Statistical Inference Weeks 1 & 2: Probability and Distribution
Transcript
Page 1: Statistical inference: Probability and Distribution

Statistical InferenceWeeks 1 & 2: Probability and Distribution

Page 2: Statistical inference: Probability and Distribution

Types of Variables

All Variables

Categorical May be represented by

numbers, but does not make sense to add, subtract, average, etc

Numerical Makes sense to add,

subtract, average, etc(i.e., perform math operations)

Discrete Are counted and can

only take on non-negative whole numbers

Continuous Are measured and

can take on any real number (i.e., have decimal places)

Categorical Have no inherent

ordering (e.g., single, married, divorced)

Ordinal Have ordered levels

(e.g., primary, secondary, JC, university, etc)

Page 3: Statistical inference: Probability and Distribution

Probability

P(A) = Probability of event A happening0 ≀ P(A) ≀ 1

Disjoint (mutually exclusive) events Cannot happen at the same time

βˆ’ A card drawn from a deck cannot be both spades and hearts

βˆ’ P(Spade & Heart) = 0

Non-disjoint events Can happen at the same time

βˆ’ A card drawn from a deck can be both a spade and an ace

βˆ’ P(Spade & Ace) = 1/52

Spade SpadeHeart Ace

Page 4: Statistical inference: Probability and Distribution

Disjoint and non-disjoint events

Union of disjoint eventsβˆ’Probability of drawing a

Spade or a Heart from a deck of cards

P(Spade or Heart)

= P(Spade) + P(Heart)

= 13/52 + 13/52

= 26/52

Union of non-disjoint eventsβˆ’Probability of drawing a

Spade or an Ace from a deck of cards

P(Spade or Ace)

= P(Spade) + P(Ace) – P(Spade and Ace)

= 13/52 + 4/52 – 1/52

= 16/52

General Additional Rule = P(A or B) = P(A) + P(B) – P(A and B)

Page 5: Statistical inference: Probability and Distribution

Marginal, Joint, and Conditional Probability

Marginal probabilityβˆ’ Probability based on a single variable

P(Student = uses)

= 219/445

Joint Probabilityβˆ’ Probability based on two or more

variables

P(Student = uses and Parent = uses)

= 125/445 = 0.28

Conditional Probabilityβˆ’ Probability of one event conditional

upon another event

P(Student = use | parents = used)

= 125/210 = 0.60

Parents

Used Did not use

Total

Student

Uses 125 94 219

Does not Use

85 141 226

Total 210 235 445

Page 6: Statistical inference: Probability and Distribution

Bayes’ Theorem

Bayes’ theoremβˆ’ 𝑷 𝑨 𝑩) =

𝑷(𝑨 𝒂𝒏𝒅 𝑩)

𝑷 (𝑩)

Probability that the Children use given that the Parents also used𝑃 π‘β„Žπ‘–π‘™π‘‘π‘Ÿπ‘’π‘› = 𝑒𝑠𝑒 π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘π‘  = 𝑒𝑠𝑒𝑑)

= 𝑃(π‘β„Žπ‘–π‘™π‘‘π‘Ÿπ‘’π‘›=𝑒𝑠𝑒 π‘Žπ‘›π‘‘ π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘π‘ =𝑒𝑠𝑒𝑑)

𝑃(π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘π‘ =𝑒𝑠𝑒𝑑)

= 125/445

210/445

= 0.60

Parents

Used Did not use

Total

Children

Uses 125 94 219

Does not Use

85 141 226

Total 210 235 445

General Product Rule = P(A and B) = P(A|B) x P(B)

Page 7: Statistical inference: Probability and Distribution

Bayes’ Theorem expanded Probability of women with

breast cancer in general populationβˆ’ P(breast cancer) = 0.017

Probability of true positive from mammogramβˆ’ P(positive | breast cancer) = 0.78

βˆ’ I.e., sensitivity

Probability of false positive from mammogramβˆ’ P(positive | no breast cancer) =

0.10

βˆ’ i.e., 1 - specificity

What is the probability that the patient has breast cancer given a positive mammogram? 𝑃(π‘π‘Žπ‘›π‘π‘’π‘Ÿ | π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’)

= 𝑃 π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘π‘Žπ‘›π‘π‘’π‘Ÿ) 𝑃(π‘π‘Žπ‘›π‘π‘’π‘Ÿ)

𝑃 π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘π‘Žπ‘›π‘π‘’π‘Ÿ) 𝑃 π‘π‘Žπ‘›π‘π‘’π‘Ÿ +𝑝 π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘›π‘œ π‘π‘Žπ‘›π‘π‘’π‘Ÿ) 𝑃(π‘›π‘œ π‘π‘Žπ‘›π‘π‘’π‘Ÿ)

= 0.78 βˆ— 0.017

0.78 βˆ—0.017+0.10 βˆ—0.983

= 0.119

Bayes’ theorem

𝑷 𝑨 𝑩) =𝑷(𝑨 𝒂𝒏𝒅 𝑩)

𝑷 (𝑩)

= 𝑷 𝑩 𝑨) 𝑷(𝑨)

𝑷 (𝑩)

= 𝑷 𝑩 𝑨) 𝑷(𝑨)

𝑷 𝑩 𝑨) 𝑷 𝑨 +𝑷 𝑩 𝑨𝒄)𝑷(𝑨𝒄)

Page 8: Statistical inference: Probability and Distribution

Probability Tree

Cancer

No Cancer

P(cancer)0.017

P(no cancer)0.983

What is the probability that the patient has breast cancer given a positive mammogram?

Positive

Positive

Negative

Negative

P(positive | cancer)

0.78

P(negative | cancer)

0.22

P(positive | no cancer)

0.10

P(negative | no cancer)

0.90

P(cancer and positive)

0.017 x 0.78 = 0.01326

P(no cancer and positive)0.983 x 0.10

= 0.0983

𝑃(π‘π‘Žπ‘›π‘π‘’π‘Ÿ | π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’)

= 𝑃(π‘π‘Žπ‘›π‘π‘’π‘Ÿ π‘Žπ‘›π‘‘ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ )

𝑃(π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’)

= 0.01326

0.01326+0.0983

= 0.119

Page 9: Statistical inference: Probability and Distribution

Expected Mean

Expected Mean𝐸 𝑋

= E[𝑋 Γ— 𝑝 π‘₯ ] # sum of all values of x multiplied by its probability

What is the expected value of a dice roll?𝐸 𝑋

= 1 Γ—1

6+ 2 Γ—

1

6+ 3 Γ—

1

6+ 4 Γ—

1

6+ 5 Γ—

1

6+ 6 Γ—

1

6

= 3.5

Notation: π‘₯ : sample meanπœ‡ : population mean

Page 10: Statistical inference: Probability and Distribution

Mean

Meanπ‘€π‘’π‘Žπ‘›

= π‘₯1+ π‘₯2+ π‘₯3+ …+ π‘₯𝑛

𝑛

What is the mean number of dots on each die face?π‘€π‘’π‘Žπ‘›

= 1+2+3+4+5+6

6

= 3.5

Notation: π‘₯ : sample meanπœ‡ : population mean

Page 11: Statistical inference: Probability and Distribution

Expected Variance

Expected Varianceπ‘‰π‘Žπ‘Ÿ 𝑋

=E[(𝑋 βˆ’ πœ‡)2] # sum square of difference between each value and mean

=E 𝑋2 βˆ’ 𝐸[𝑋]2

What is the variance of a dice roll?

From previous slide, mean 𝐸 𝑋 = 3.5

𝐸 𝑋2 = 12 Γ—1

6+ 22 Γ—

1

6+ 32 Γ—

1

6+ 42 Γ—

1

6+ 52 Γ—

1

6+ 62 Γ—

1

6= 15.17

Var(X) = 𝐸 𝑋2 βˆ’ 𝐸 𝑋 2 = 15.17 βˆ’ 3.52 β‰ˆ 2.9

Notation:𝑠2: sample variance𝜎2 : population variance

𝑠 : sample standard deviation𝜎 : population standard deviation

Page 12: Statistical inference: Probability and Distribution

Population Variance

Population Variance𝜎2

= 1

𝑁Σ[(π‘₯𝑖 βˆ’ πœ‡)2]

What is the variance of dots on die faces?

Given π‘₯ = 3.5

𝜎2 = 1

6[ 1 βˆ’ 3.5 2 + 2 βˆ’ 3.5 2 + …+ 6 βˆ’ 3.5 2]

β‰ˆ 2.9

Notation:𝑠2: sample variance𝜎2 : population variance

𝑠 : sample standard deviation𝜎 : population standard deviation

Page 13: Statistical inference: Probability and Distribution

Sample Variance

Sample Variance𝑠2

= 1

π‘›βˆ’1Ξ£[(π‘₯𝑖 βˆ’ π‘₯)2]

Why n – 1?βˆ’A sample will always have smaller variance than the population. Thus, we

perform an β€œadjustment” to get a bigger variance that more closer approximates the population variance

βˆ’ i.e., think of it as a β€œcorrection” used on samples

Notation:𝑠2: sample variance𝜎2 : population variance

𝑠 : sample standard deviation𝜎 : population standard deviation

Page 14: Statistical inference: Probability and Distribution

Bernoulli Distribution

Where an individual trial only has two possible outcomes

Assuming a fair coin, what is the probability of it landing on heads (i.e., success)?𝑃 𝑠𝑒𝑐𝑐𝑒𝑠𝑠 = 𝑝 β„Žπ‘’π‘Žπ‘‘π‘  1𝑝(π‘‘π‘Žπ‘–π‘™π‘ )0 = 0.5

Assuming an unfair coin (i.e., 𝑝 β„Žπ‘’π‘Žπ‘‘π‘  = 0.25), what is the probability of it landing on tails (i.e., failure)? 𝑃 π‘“π‘Žπ‘–π‘™π‘’π‘Ÿπ‘’ = 𝑝 β„Žπ‘’π‘Žπ‘‘π‘  0𝑝(π‘‘π‘Žπ‘–π‘™π‘ )1 = 0.75

Page 15: Statistical inference: Probability and Distribution

Binomial Distribution

Probability of k successes in n trials𝑃 π‘˜ 𝑠𝑒𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑛 π‘‘π‘Ÿπ‘–π‘Žπ‘™π‘  = (π‘˜

𝑛) π‘π‘˜(1 βˆ’ 𝑝)(π‘›βˆ’π‘˜)

where (π‘˜π‘›) =

𝑛!

π‘˜! π‘›βˆ’π‘˜ !

Given 7 trials, how many scenarios can have 2 successes?

(27) =

7!

2!(5!)

= 7 Γ—6 Γ—5!

2 Γ—1Γ—5!

= 21

If you toss the unfair coin 7 times, what’s the probability of 2 heads (i.e., successes)?

Given 𝑃 β„Žπ‘’π‘Žπ‘‘π‘  = 0.25𝑃 π‘˜ = 2 = (2

7) Γ— 0.252 Γ— 0.755

= 7 Γ—6 Γ—5!

2 Γ—1Γ—5!Γ— 0.252 Γ— 0.755

= 0.311

Page 16: Statistical inference: Probability and Distribution

Normal Distribution

Unimodal (only one peak) and symmetric

68-95-99.7% ruleβˆ’ 68% of values within 1sd from mean

βˆ’ 95% of values within 2sd from mean

βˆ’ 99.7% of values within 3sd from mean

Represented as 𝑁(πœ‡, 𝜎)

Page 17: Statistical inference: Probability and Distribution

Xiao MingMuthu

Normal Distribution

You want to compare between two cousins and determine who fared better. Xiao Ming scored 1800 on his SAT and Muthuscored 24 on his ACTβ€”who did better?βˆ’ 𝑆𝐴𝑇 π‘ π‘π‘œπ‘Ÿπ‘’π‘  ~ 𝑁 π‘šπ‘’π‘Žπ‘› = 1500, 𝑆𝐷 = 300

βˆ’π΄πΆπ‘‡ π‘ π‘π‘œπ‘Ÿπ‘’π‘  ~ 𝑁(π‘šπ‘’π‘Žπ‘› = 21, 𝑆𝐷 = 6)

Xiao Ming: 1800 βˆ’1500

300= 1sd

Muthu: 24 βˆ’21

6= 0.5sd

Page 18: Statistical inference: Probability and Distribution

Normal Distribution (Z scores)

Standardization with Z scores (normalization)

𝑍 =π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘Žπ‘‘π‘–π‘œπ‘› βˆ’ πœ‡

𝑆𝐷

Standardized (Z) score of a value is the number of standard deviations it falls above or below the mean

Z score of mean = 0

Page 19: Statistical inference: Probability and Distribution

Normal Distribution

Suppose that your company ad campaign receives daily ad clicks that are (approximately) normally distributed with mean = 1,020 and standard deviation = 50. What’s the probability of getting more than 1,160 clicks a day?

𝑍 =π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘Žπ‘‘π‘–π‘œπ‘› βˆ’ πœ‡

𝑆𝐷

=1,160 βˆ’ 1,020

50= 2.8

𝑃 𝑍 > 2.8 = 1 βˆ’ 0.9974= 0.0026

Page 20: Statistical inference: Probability and Distribution

Normal Distribution

Your friend boast that his ad is in the top 25% of the company’s ad campaign. What is the lowest number of ad clicks his ad received? βˆ’π΄π‘‘ π‘π‘™π‘–π‘π‘˜π‘  ~ 𝑁(1020, 50)

𝑍 = 0.67 =π‘₯ βˆ’ 1,020

50π‘₯ = 0.67 Γ— 50 + 1020= 1053.5

Page 21: Statistical inference: Probability and Distribution

Poisson Distribution

Poisson Distribution

𝑃 𝑋 =π‘’βˆ’πœ†πœ†π‘₯

π‘₯!βˆ’ 𝑒 = π‘π‘Žπ‘ π‘’ π‘œπ‘“ π‘›π‘Žπ‘‘π‘’π‘Ÿπ‘Žπ‘™ π‘™π‘œπ‘”, 2.71828…

βˆ’ πœ† = π‘šπ‘’π‘Žπ‘› π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑠𝑒𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 π‘Ž 𝑔𝑖𝑣𝑒𝑛 π‘‘π‘–π‘šπ‘’ π‘–π‘›π‘‘π‘’π‘Ÿπ‘£π‘Žπ‘™

2.5 people show up at a bus stop every hour. What is the probability that 3 or fewer people show up after 4 hours?

𝑃 𝑋 ≀ 3 =π‘’βˆ’10100

0!+π‘’βˆ’10101

1!+π‘’βˆ’10102

2!+π‘’βˆ’10103

3!= 0.10336

Page 22: Statistical inference: Probability and Distribution

Thank you for your attention!Eugene Yan


Recommended