+ All Categories
Home > Documents > Chapter 7: Theoretical Probability Distributions

Chapter 7: Theoretical Probability Distributions

Date post: 09-Dec-2021
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
21
BSTT523: Pagano & Gavreau, Chapter 7 1 Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic Random Variable (R.V.) X Assumes values (x) by chance Discrete R.V. Can assume a finite number of values Continuous R.V. Can assume any value within an interval 1. Discrete Random Variables p.2 2. Some Discrete Distributions Bernoulli distribution p.5 Binomial distribution p.7 Poisson distribution p.13 3. Continuous Random Variables p.16 4. The Normal/Gaussian Distribution p.18
Transcript
Page 1: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 1

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic Random Variable (R.V.) X Assumes values (x) by chance Discrete R.V. Can assume a finite number of values Continuous R.V. Can assume any value within an interval 1. Discrete Random Variables p.2 2. Some Discrete Distributions

Bernoulli distribution p.5 Binomial distribution p.7 Poisson distribution p.13 3. Continuous Random Variables p.16 4. The Normal/Gaussian Distribution p.18

Page 2: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 2

1. Discrete Random Variables: Definition: The probability distribution of a discrete r.v. X: A table, graph, formula, or other device that specifies all possible values of X and their respective probabilities Example 7.1 Table form: Birth order of children in the U.S.

x: Birth Order

𝑃(𝑋 = π‘₯)

1 0.416 2 0.330 3 0.158 4 0.058 5 0.021 6 0.009 7 0.004

8+ 0.004 Total 1.000

Page 3: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 3

Discrete Probability Density Function (Discrete PDF): 𝑓(π‘₯) = 𝑃(𝑋 = π‘₯) Properties of the discrete PDF: i. 0 ≀ 𝑓(π‘₯𝑖) ≀ 1 for all π‘₯𝑖 Non-negative ii. βˆ‘ 𝑓(π‘₯𝑖){π‘Žπ‘™π‘™ π‘₯𝑖} = 1 Exhaustive iii. 𝑃�𝑋 = π‘₯𝑖 βˆͺ 𝑋 = π‘₯𝑗� = 𝑓(π‘₯𝑖) + 𝑓(π‘₯𝑗) Additive Cumulative Distribution Function (CDF): 𝐹(π‘₯) = 𝑃(𝑋 ≀ π‘₯) = βˆ‘ 𝑃(𝑋 = π‘₯𝑖)π‘₯𝑖≀π‘₯ = βˆ‘ 𝑓(π‘₯𝑖)π‘₯𝑖≀π‘₯ Note: i. PDF 𝑓(π‘₯𝑖) = 𝑃(𝑋 = π‘₯𝑖) = 𝐹(π‘₯𝑖) βˆ’ 𝐹(π‘₯π‘–βˆ’1) ii. 𝑃(𝑋 < π‘₯𝑖) = 𝐹(π‘₯π‘–βˆ’1) iii. 𝑃(π‘Ž < 𝑋 ≀ 𝑏) = 𝐹(𝑏) βˆ’ 𝐹(π‘Ž)

Page 4: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 4

Example 7.1: Birth Order

x: Birth Order

PDF 𝑓(π‘₯) = 𝑃(𝑋 = π‘₯)

CDF 𝐹(π‘₯) = 𝑃(𝑋 ≀ π‘₯)

1 0.416 0.416 2 0.330 0.746 3 0.158 0.904 4 0.058 0.962 5 0.021 0.983 6 0.009 0.992 7 0.004 0.996

8+ 0.004 1.000 Q1. Prob. that a child picked at random was mother’s 1st or 2nd child? Q2. Prob. that a child picked at random was of birth order fewer than 4? Q3. Prob. that a child picked at random was of order 5 or more? Q4. Prob. that a child picked at random was of order between 3 and 5?

Page 5: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 5

2. Some Discrete Distributions Bernoulli Distribution

Bernoulli Variable: Binary Variable

𝑋 = οΏ½1, 𝑠𝑒𝑐𝑐𝑒𝑠𝑠0, π‘“π‘Žπ‘–π‘™π‘’π‘Ÿπ‘’

Bernoulli Trial: One performance of experiment with 0/1 outcome

Denote 𝑝 = 𝑃(𝑋 = 1) π‘ž = 𝑃(𝑋 = 0) = 1 βˆ’ 𝑝

The PDF of the Bernoulli distribution is

𝑓(π‘₯) = �𝑝 𝑖𝑓 𝑋 = 1π‘ž 𝑖𝑓 𝑋 = 0

= 𝑝π‘₯π‘ž1βˆ’π‘₯, π‘₯ = 0,1

= 𝑝π‘₯(1 βˆ’ 𝑝)1βˆ’π‘₯, π‘₯ = 0,1 The Bernoulli distribution has one parameter = 𝑝

Page 6: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 6

If X follows a Bernoulli distribution, then Mean: πœ‡ = 𝐸(𝑋) = 𝑝 Variance: 𝜎2 = π‘‰π‘Žπ‘Ÿ(𝑋) = π‘π‘ž = 𝑝(1 βˆ’ 𝑝) Examples of Bernoulli variables:

Ex. 1: flip a coin 𝑋 = οΏ½1,π»π‘’π‘Žπ‘‘π‘ 0, π‘‡π‘Žπ‘–π‘™π‘ 

Ex. 2: roll a die, interested in 3’s 𝑋 = οΏ½1,𝑑𝑖𝑒 π‘“π‘Žπ‘™π‘™π‘  π‘œπ‘› 30, π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

Page 7: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 7

Binomial Distribution

Perform 𝑛 independent Bernoulli trials.

𝑋 = number of successes (1’s) 𝑝 = probability of success in each trial π‘ž = 1 βˆ’ 𝑝 𝑋~𝐡𝐼𝑁(𝑛,𝑝)

Q: What is the PDF 𝑓(π‘₯), π‘₯ = 0,1, … ,𝑛 of 𝑋~𝐡𝐼𝑁(𝑛,𝑝) ? i.e., what is the probability of π‘₯ successes in 𝑛 Bernoulli trials?

Q1. 5 Bernoulli trials, 𝑋~𝐡𝐼𝑁(5,𝑝) P(result is 10010)=? Solution: π‘π‘žπ‘žπ‘π‘ž = 𝑝2π‘ž3

Q2. Other results with 2 successes out of 5? Number Sequence

1 11000 2 10100 3 10010 4 10001 5 01100 6 01010 7 01001 8 00110 9 00101

10 00011 There are 10 ways to get 2 successes out of 5 The probability of each sequence is 𝑝2π‘ž3 P(Sequence 1 or 2 or … or 10) = 10𝑝2π‘ž3

Page 8: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 8

Definition: A Combination of n subjects taken x at a time =

Number of unordered subsets of x (β€œn choose x”) = nCx = 𝑛!

π‘₯!(π‘›βˆ’π‘₯)!

where x! = x(x-1)(x-2) Β· Β· Β· (2)(1) and define 0!=1 Example: β€œ5 choose 2” – how many subsets of 2 out of 5? 5C2 = 5!

2!(5βˆ’2)! = 5Β·4

2Β·1 = 10

Back to binomial distribution question: 𝑋~𝐡𝐼𝑁(5,𝑝); 𝑓(2)=? 𝑓(5)=? Ans: 𝑓(2) = 5C2p2q3 = 10p2q3 𝑓(5) = 5C5p5q0 = 5!

5!0!p5q0 = 1p51= p5

Page 9: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 9

Binomial PDF 𝑓(π‘₯) : 𝑋~𝐡𝐼𝑁(𝑛,𝑝) P(x successes in n Bernoulli trials) 𝑓(π‘₯) = nCx pxqn-x , x = 0, 1, …, n = 0, otherwise

Number of Successes π‘₯

Probability 𝑓(π‘₯)

0 nC0 qn 1 nC1 pqn-1

. . . . . . π‘₯ nCx pxqn-x

. . . . . . n-1 nCn-1 pn-1q

𝑛 nCn pn Total 1

Important Binomial distribution features: Mean: πœ‡ = 𝐸(𝑋) = 𝑛𝑝 Variance: 𝜎2 = π‘‰π‘Žπ‘Ÿ(𝑋) = π‘›π‘π‘ž

Page 10: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 10

Example 7.2 Smoking in the U.S.: 29% are smokers, or 𝑝 = .29 Select a random sample of size 10. Q1. What is P(4 smokers in the sample)? 𝑋 = number of smokers out of 𝑛 = 10 𝑋~𝐡𝐼𝑁(10, .29)

Solution 1. 𝑓(4) = 10C4 (0.29)4(0.71)6

= 10!4!6!

(.00707)(.1281) = .1903

Solution 2. Table A.1 (P.A1): Binomial PDF 𝑝 = 0.05 to 0.5, 𝑛 = 2 to 20 𝑓(4) : 𝑛 = 10, 𝑝 β‰ˆ .30 β†’ 𝑓(4) β‰ˆ .2001

Solution 3. SAS: PROBBNML(p, n, m) – CDF PDF(β€˜BINOMIAL’, x, p, n) – PDF CDF(β€˜BINOMIAL’, x, p, n) - CDF Q2. P(6 or more smokers in the sample)=? 𝑃(𝑋 β‰₯ 6) = 1 βˆ’ 𝐹(5) = 1 βˆ’ (. 9596) = .0404

Page 11: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 11

Q3. Among the 10 individuals chosen, what is the expected number of smokers? 𝐸(𝑋) = 𝑛𝑝 = 10 βˆ™ 29 = 2.9 Variance and SD: π‘‰π‘Žπ‘Ÿ(𝑋) = π‘›π‘π‘ž = 10 βˆ™ (. 29) βˆ™ (. 71) = 2.059 𝑆𝐷 = οΏ½π‘›π‘π‘ž = √2.059 = 1.43 Note: Using Table A.1, what if 𝑝>0.5? 𝑓(π‘₯,𝑛,𝑝) = nCx px(1-p)n-x 𝑓(𝑛 βˆ’ π‘₯,𝑛, 1 βˆ’ 𝑝) = nCn-x (1-p)n-x(p)x nCx= 𝑛!

π‘₯!(π‘›βˆ’π‘₯)!= 𝑛!

(π‘›βˆ’π‘₯)!π‘₯!= nCn-x

β‡’ 𝑓(π‘₯,𝑛,𝑝) = 𝑓(𝑛 βˆ’ π‘₯,𝑛, 1 βˆ’ 𝑝) i.e. if 𝑝>0.5 then treat 𝑋𝐢 as β€œsuccess”. 𝑃(𝑋 ≀ π‘₯), 𝑋~𝐡𝐼𝑁(𝑛,𝑝) = 𝑃(𝑋𝐢 β‰₯ 𝑛 βˆ’ π‘₯), 𝑋𝐢~𝐡𝐼𝑁(𝑛, 1 βˆ’ 𝑝)

Page 12: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 12

Example 7.3 β€˜What do you think about the problem of childhood obesity?’ Poll in 2003: 55% of residents think it is β€˜serious’. Randomly select 𝑛=12 residents. Q1. P(8 people think it is β€˜serious’)? 𝑋~𝐡𝐼𝑁(12, .55) β†’ 𝑓(8) = .1700

Same as P(4 out of 12 do not think β€˜serious’); 𝑋~𝐡𝐼𝑁(12, .45) β†’ 𝑓(4) = .1700 Q2. P(5 or fewer think β€˜serious’) = ?

𝑃(𝑋 ≀ 5|𝑛 = 12,𝑝 = .55) = 𝑃(𝑋 β‰₯ 7|𝑛 = 12,𝑝 = .45) = 1 βˆ’ 𝑃(𝑋 ≀ 6|𝑛 = 12,𝑝 = .45) = 1 βˆ’ .7393 = .2607 Q3. Among the sample of 12, what is the expected number of people who think childhood obesity is β€˜serious’?

𝐸(𝑋) = 𝑛𝑝 = 12 βˆ™ .55 = 6.6 Q4. What is the variance of the number who think childhood obesity is β€˜serious’?

π‘‰π‘Žπ‘Ÿ(𝑋) = π‘›π‘π‘ž = 12 βˆ™ (. 55) βˆ™ (. 45) = 2.97

Page 13: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 13

Poisson Distribution 𝑋 = number of event occurrences in a given interval of time/space/volume etc. i.e. Count Data Probability that π‘₯ events will occur:

𝑓(π‘₯) = π‘’βˆ’πœ†πœ†π‘₯

π‘₯! , π‘₯=0, 1, 2, . . .

𝑋~𝑃𝑂𝐼(πœ†) Important Poisson features: Mean: 𝐸(𝑋) = πœ† Variance: π‘‰π‘Žπ‘Ÿ(𝑋) = πœ† When Ξ» is small, the distribution is right-skewed; when Ξ» increases (Ξ»β‰₯10), the distribution becomes symmetric.

Page 14: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 14

Example 7.4 Allergic reaction to anesthesia (Laake and Rottingen) Occurrences of reaction ∼ Poisson, about 12 incidents per year expected Q1. In the next year, what is the probability of seeing 3 incidents? Solution: 𝑋~𝑃𝑂𝐼(12)

𝑓(3) = π‘’βˆ’12123

3! = .00177

Q2. What is the probability that at least 3 will have a reaction

in the next year? Solution 1: 𝑃(𝑋 β‰₯ 3) = 1 βˆ’ 𝑃(𝑋 ≀ 2) = 1 βˆ’ 𝐹(2) = 1 βˆ’ {𝑓(0) + 𝑓(1) + 𝑓(2)}

= 1 βˆ’ οΏ½π‘’βˆ’12120

0!+ π‘’βˆ’12121

1!+ π‘’βˆ’12122

2!οΏ½

= 1 βˆ’ .00052225 = .9994775

Page 15: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 15

Solution 2: Table A.2 (P.A-6): POISSON PDF 𝑃(𝑋 β‰₯ 3) = 1 βˆ’ 𝐹(2) = 1 βˆ’ (. 0000 + .0001 + .0004) = .9995 Solution 3: SAS: POISSON(Ξ», x) – CDF PDF(β€˜POISSON’, x, Ξ») – PDF CDF(β€˜POISSON’, x, Ξ») – CDF

Page 16: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 16

3. Continuous Random Variables Continuous 𝑋 can assume any value within its range. Within any interval, there are theoretically an infinite number of values. Subareas of histograms represent frequency of occurrence of values within class intervals Total frequency of values between π‘Ž and 𝑏: add all subareas for intervals π‘Ž through 𝑏. If width of class intervals is very small, then connecting midpoints (creating a frequency polygon) creates a smooth curve. If probability is shown on the y-axis and we have a smooth curve: probability density function (PDF) 𝑓(π‘₯) 𝑃(π‘Ž < 𝑋 ≀ 𝑏) = total area under 𝑓(π‘₯) between π‘Ž and 𝑏, or ∫ 𝑓(𝑑)𝑑𝑑𝑏

π‘Ž .

Page 17: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 17

Cumulative density function (CDF) of X: 𝐹(π‘₯) = ∫ 𝑓(𝑑)𝑑𝑑π‘₯

βˆ’βˆž Note: Total area under 𝑓(π‘₯) = 1, i.e., ∫ 𝑓(𝑑)𝑑𝑑+∞

βˆ’βˆž = 1 and 𝑓(π‘₯) = 𝑑

𝑑π‘₯𝐹(π‘₯) = 𝐹′(π‘₯)

Page 18: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 18

4. A special continuous distribution: the Normal or Gaussian Normal PDF:

𝑓(π‘₯) = 1√2πœ‹πœŽ

π‘’βˆ’(π‘₯βˆ’πœ‡)2

2𝜎2 , βˆ’βˆž < π‘₯ < +∞

𝑋~𝑁(πœ‡,𝜎2) Characteristics:

Distribution is symmetric around πœ‡

Mean = Median = Mode = πœ‡

Total area under the curve = 1, i.e., ∫ 1√2πœ‹πœŽ

π‘’βˆ’(π‘₯βˆ’πœ‡)2

2𝜎2+βˆžβˆ’βˆž = 1

Area under the curve between βˆ’πœŽ and +𝜎 β‰ˆ .68 Area under the curve between βˆ’2𝜎 and +2𝜎 β‰ˆ .95 Area under the curve between βˆ’3𝜎 and +3𝜎 β‰ˆ .997

𝐸(𝑋) = πœ‡ location parameter

π‘‰π‘Žπ‘Ÿ(𝑋) = 𝜎2 scale parameter Standard Normal Distribution:

𝑍~𝑁(0,1) has PDF πœ™(𝑧) = 1√2πœ‹

π‘’βˆ’π‘§2

2 , βˆ’βˆž < 𝑧 < +∞

Page 19: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 19

Table A.3: Standard Normal Upper Tail Cumulative Probabilities

𝑃(𝑍 β‰₯ 𝑧0) = 1 βˆ’Ξ¦(𝑧0) , 𝑧0 β‰₯ 0

where Ξ¦(𝑧) = ∫ πœ™(𝑑)π‘‘π‘‘π‘§βˆ’βˆž is the CDF for 𝑍

for 𝑧0 < 0, Ξ¦(𝑧0) = 𝑃(𝑍 ≀ 𝑧0) = 𝑃(𝑍 β‰₯ (βˆ’π‘§0)) , 𝑧0 ≀ 0 Example 7.5 Given a variable that follows the standard normal distribution, i.e. 𝑍~𝑁(0,1) , what is 𝑃(𝑧 β‰₯ 1) and 𝑃(𝑧 ≀ βˆ’1) ?

Solution: by Table A.3, 𝑃(𝑧 β‰₯ 1)=0.159

and 𝑃(𝑧 ≀ βˆ’1) = 𝑃(𝑧 β‰₯ 1) = 0.159 Example 7.6 Randomly pick a value 𝑧 from the standard normal distribution. P(𝑧 has a value between -2 and +2) = ?

Solution: Note that for a continuous distribution 𝑃(𝑋 = π‘₯) = 0.

𝑃(βˆ’2 ≀ z ≀ +2) = 𝑃(βˆ’2 < 𝑧 < +2) = 1 βˆ’ 𝑃(𝑧 β‰₯ 2) βˆ’ 𝑃(𝑧 ≀ βˆ’2) = 1 βˆ’ 2 βˆ™ 𝑃(𝑧 β‰₯ 2) = 1 βˆ’ 2 βˆ™ (. 023) = 0.954

Page 20: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 20

How is the 𝑁(0,1) distribution related to 𝑁(πœ‡,𝜎2) ?

If 𝑿~𝑡(𝝁,𝝈𝟐) and 𝒁 = (π‘Ώβˆ’π)𝝈

, then 𝒁~𝑡(𝟎,𝟏) . Example 7.7 Systolic Blood Pressure (SBP) (p.181 P&G) 𝑋 = SBP for 18-74 year old males; 𝑋~𝑁(πœ‡,𝜎2) with πœ‡=129 mm Hg and 𝜎=19.8 mm Hg.

Find π‘₯ which is the cutoff for the upper 2.5% of the SBP distribution; i.e. find π‘₯ such that 𝑃(𝑋 > π‘₯) = .025 .

Solution: By Table A.3 we know that 𝑃(𝑍 β‰₯ 1.96) = .025.

(π‘₯βˆ’πœ‡)𝜎

= 1.96 β‡’ (π‘₯βˆ’129)19.8

= 1.96

β‡’ π‘₯ = (1.96)(19.8) + 129 = 167.8 What proportion of men in this population have SBP>150 mmHg?

Solution: 𝑃(𝑋 > 150) = 𝑃 οΏ½(π‘₯βˆ’πœ‡)𝜎

> (150βˆ’129)19.8

οΏ½

= 𝑃(𝑍 > 1.06) = 0.145 β‡’ 14.5%

Page 21: Chapter 7: Theoretical Probability Distributions

BSTT523: Pagano & Gavreau, Chapter 7 21

Example 7.8 Breath study (Diskin et al.) 𝑋 = Ammonia concentration in parts per billion (ppb) πœ‡=491 ppb, 𝜎 = 119 ppb; i.e. 𝑋~𝑁(491, 1192)

𝑃(292 ≀ 𝑋 ≀ 649) =? Solution 1: 𝑃(292 ≀ 𝑋 ≀ 649) = 𝑃 οΏ½292βˆ’491

119≀ π‘‹βˆ’πœ‡

πœŽβ‰€ 649βˆ’491

119οΏ½

= 𝑃(βˆ’1.67 ≀ 𝑍 ≀ 1.33) = 1 βˆ’ 𝑃(𝑍 ≀ βˆ’1.67) βˆ’ 𝑃(𝑍 β‰₯ 1.33) = 1 βˆ’ .047 βˆ’ .092 = .861 Solution 2: SAS: ProbNorm(x) – N(0,1) CDF PDF(β€˜NORMAL’, x) – N(0,1) PDF PDF(β€˜NORMAL’, x, πœ‡, 𝜎) – N(πœ‡, 𝜎) PDF CDF(β€˜NORMAL’, x) – N(0,1) CDF CDF(β€˜NORMAL’, x, πœ‡, 𝜎) – N(πœ‡, 𝜎) CDF


Recommended