+ All Categories
Home > Documents > Probability, random variables. Continuous random variable ...

Probability, random variables. Continuous random variable ...

Date post: 31-Jan-2022
Category:
Upload: others
View: 29 times
Download: 3 times
Share this document with a friend
7
v2020 1 / 7 Biomathematics 2 Probability, random variables. Continuous random variable. Normal, standard normal distribution. Dr. BeΓ‘ta Bugyi associate professor University of PΓ©cs, Medical School Department of Biophysics 2020
Transcript

v2020

1 / 7

Biomathematics 2

Probability, random variables.

Continuous random variable. Normal, standard normal

distribution.

Dr. BeΓ‘ta Bugyi

associate professor

University of PΓ©cs, Medical School

Department of Biophysics

2020

v2020

2 / 7

CONTINUOUS RANDOM VARIABLE continuous: uncountable, infinite number of values, arises from measurement

Probability – discrete/continuous random variables

Let’s consider that a statistical experiment has an outcome corresponding to

A) a discrete random variable and X = 0 – 10 (finite number of outcomes: 10)

Give the probability that the outcome is 6.

𝑃(𝑋 = 6) =1

10= 0.1

B) a continuous random variable and X = 0 – 10 (infinite number of outcomes)

Give the probability that the outcome is 6. Exactly 6, not 6.1, 6.01, …, 6.00000000001

𝑃(𝑋 = 6) =1

∞= 0

NORMAL DISTRIBUTION

𝑁(πœ‡, 𝜎), πœ‡ = π‘šπ‘’π‘Žπ‘›, 𝜎 = π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›

Probability density function (PDF)

𝑓(π‘₯) =1

√2πœ‹πœŽ2exp (βˆ’

(π‘₯ βˆ’ πœ‡)2

2𝜎2 )

Cumulative density function (CDF)

𝐹(π‘₯) = ∫1

√2πœ‹πœŽ2exp (βˆ’

(π‘₯ βˆ’ πœ‡)2

2𝜎2 )π‘₯

βˆ’βˆž

Graphical representation of the PDF and CDF of normal distributions.

The normal distribution is defined by its mean (πœ‡) and standard deviation (𝜎).

The PDF has a characteristic bell shape.

The PDF is symmetric to the mean of the distribution.

v2020

3 / 7

The inflection point of the PDF corresponds to the standard deviation of the distribution.

The width (width at half-maximum) of the PDF is proportional to the standard deviation; the

larger the width the larger the standard deviation.

Probability is given by the area under the PDF (see examples below).

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. 𝑡(𝝁, 𝝈) = 𝑡(πŸ”πŸŽ, 𝟏𝟎). Represent graphically the following

probabilities.

Q1.1: What is the probability that a student scores 60%? 𝑃(𝑋 = π‘₯ = 60) = ?

Q1.2: What is the probability that a student scores less than 60%? 𝑃(𝑋 < π‘₯ = 60) =?

Q1.3: What is the probability that a student scores more than 60%? 𝑃(𝑋 > π‘₯ = 60) = ?

Q1.4: What is the probability that a student scores less than 80%? 𝑃(𝑋 < π‘₯ = 80) = ?

Q1.5: What is the probability that a student scores between 60% and 80%? 𝑃(π‘₯ = 60 < 𝑋 < π‘₯ =

80) = ?

Example 2

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. 𝑡(𝝁, 𝝈) = 𝑡(πŸ”πŸ, πŸ–).

Question:

How can we work with different normal distributions? Do we need the PDF of each and every normal

distribution?

Answer:

Normal distributions can be standardized; ∞ normal distribution 1 standardized distribution

(standard normal distribution)

How to standardize normal distributions?

𝑁(πœ‡, 𝜎)

z score: 𝒛 =π’™βˆ’π

𝝈

z score: how many standard deviations (𝜎) is a given value (π‘₯) from the mean (πœ‡)

STANDARD NORMAL DISTRIBUTION

𝑆𝑁(0, 1), πœ‡ = 1, 𝜎 = 0

Probability density function (PDF)

𝑓(π‘₯) =1

√2πœ‹πœŽ2exp (βˆ’

(π‘₯βˆ’πœ‡)2

2𝜎2 ) , π‘€β„Žπ‘’π‘Ÿπ‘’ πœ‡ = 0 π‘Žπ‘›π‘‘ 𝜎 = 1: 𝑓(π‘₯) =1

√2πœ‹exp (βˆ’

π‘₯2

2),

Cumulative density function (CDF)

𝐹(π‘₯) = ∫1

√2πœ‹exp (βˆ’

π‘₯2

2)

π‘₯

βˆ’βˆž

Graphical representation of the PDF and CDF of the standard normal distribution.

v2020

4 / 7

Z table

summarizes the CDF of the standard normal distribution

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. 𝑡(𝝁, 𝝈) = 𝑡(πŸ”πŸŽ, 𝟏𝟎). Standardize the normal distribution. Give the

probabilities by using the Z table.

Q1.1: What is the probability that a student scores 60%? 𝑃(𝑋 = π‘₯ = 60) = ?

𝑃(𝑋 = π‘₯ = 60) = 0

Q1.2: What is the probability that a student scores less than 60%? 𝑃(𝑋 < π‘₯ = 60) =?

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

60 βˆ’ 60

10= 0.00

𝑃(𝑋 < π‘₯ = 60) = 0.5 β†’ 50 %

Q1.3: What is the probability that a student scores more than 60%? 𝑃(𝑋 > π‘₯ = 60) = ?

𝑃(𝑋 > π‘₯ = 60) + 𝑃(𝑋 < π‘₯ = 60) = 1

𝑃(𝑋 > π‘₯ = 60) = 1 βˆ’ 𝑃(𝑋 < π‘₯ = 60) = 1 βˆ’ 0.5 = 0.5 β†’ 50 %

Q1.4: What is the probability that a student scores less than 80%? 𝑃(𝑋 < π‘₯ = 80) = ?

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

80 βˆ’ 60

10= 2.00

𝑃(𝑋 < π‘₯ = 80) = 0.9772 β†’ 97.72 %

Q1.5: What is the probability that a student scores between 60% and 80%? 𝑃(π‘₯ = 60 < 𝑋 < π‘₯ =

80) = ?

𝑃(𝑋 < 80) βˆ’ 𝑃(𝑋 < 60) = 0.9772 βˆ’ 0.5 = 0.4772 β†’ 47.72%

Example 2

v2020

5 / 7

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. 𝑡(𝝁, 𝝈) = 𝑡(πŸ”πŸ, πŸ–). Give the probabilities by using the Z table.

Q2.1: What is the probability that a student scores less than 65%? 𝑃(𝑋 < π‘₯ = 65) =?

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

65 βˆ’ 62

8= + 0.375

If a value is not listed in the table, use the following approximation:

+ 0.375 =0.37 + 0.38

2

𝑃(𝑋 < π‘₯ = 65) =0.6443 + 0.6480

2= 0.6462 β†’ 64.62 %

Q2.2: What is the probability that a student scores less than 45%? 𝑃(𝑋 < π‘₯ = 45) =?

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

45 βˆ’ 62

8= βˆ’2.125

If a value is not listed in the table, use the following approximation:

βˆ’2.125 =βˆ’2.12 + (βˆ’2.13)

2

𝑃(𝑋 < π‘₯ = 45) =0.0170 + 0.0166

2= 0.0168 β†’ 1.68 %

Q2.3: What is the probability that a student scores between 45% and 65%? 𝑃(π‘₯ = 45 < 𝑋 < π‘₯ = 65) =

?

𝑃(π‘₯ = 45 < 𝑋 < π‘₯ = 65) = 𝑃(𝑋 < π‘₯ = 65) βˆ’ 𝑃(𝑋 < π‘₯ = 45) = 0.6462 βˆ’ 0.0168 = 0.6294

β†’ 62.94 %

Q2.4: What is the median of the students’ scores? 𝑃(𝑋 < π‘₯) = 0.5, π‘₯ = ?

𝑃(𝑋 < π‘₯) = 0.5 β†’ 𝑧 = 0.00

𝑧 =π‘₯ βˆ’ πœ‡

πœŽβ†’ 0.00 =

π‘₯ βˆ’ 62

8β†’ π‘₯ = 62

Note: The mean of a data set following normal distribution is equal to its median.

Q2.5: What is the first quartile of the students’ scores? 𝑃(𝑋 < π‘₯) = 0.25, π‘₯ = ?

𝑃(𝑋 < π‘₯) = 0.25 β†’ 𝑧 = βˆ’0.675

𝑧 =π‘₯ βˆ’ πœ‡

πœŽβ†’ βˆ’0.675 =

π‘₯ βˆ’ 62

8β†’ π‘₯ = 56.6

Q2.6: What is the third quartile of the students’ scores? 𝑃(𝑋 < π‘₯) = 0.75, π‘₯ = ?

𝑃(𝑋 < π‘₯) = 0.75 β†’ 𝑧 = 0.675

𝑧 =π‘₯ βˆ’ πœ‡

πœŽβ†’ 0.675 =

π‘₯ βˆ’ 62

8β†’ π‘₯ = 67.4

Q2.7: Find what percentage of data is between mean Β± 1Γ—standard deviation, mean Β± 2Γ—standard

deviation, mean Β± 3Γ—standard deviation.

v2020

6 / 7

IMPORTANCE OF NORMAL DISTRIBUTION

CENTRAL LIMIT THEOREM

Example 3

In a population of persons let X = life expectancy of a person (in years). The distribution of X

has a mean and standard deviation of 72 and 18.2 years, respectively.

𝑋 = 𝑙𝑖𝑓𝑒 𝑒π‘₯π‘π‘’π‘π‘‘π‘Žπ‘›π‘π‘¦ π‘œπ‘“ π‘Ž π‘π‘’π‘Ÿπ‘ π‘œπ‘› 𝑖𝑛 π‘Ž π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› (π‘¦π‘’π‘Žπ‘Ÿπ‘ )

𝑋 = π‘₯π‘π‘’π‘Ÿπ‘ π‘œπ‘›1, π‘₯π‘π‘’π‘Ÿπ‘ π‘œπ‘›2, …

We choose samples from the population, each of the samples consists of n persons and by

finding the average lifetime in each sample (οΏ½Μ…οΏ½, sample mean) we obtain the distribution of οΏ½Μ…οΏ½.

Sampling distribution of sample means: a distribution of the sample means calculated from all

possible random samples of a specific size (n) taken from a population.

οΏ½Μ…οΏ½ = π‘Žπ‘£π‘’π‘Ÿπ‘Žπ‘”π‘’ 𝑙𝑖𝑓𝑒 𝑒π‘₯π‘π‘’π‘π‘‘π‘Žπ‘›π‘π‘¦ π‘œπ‘“ π‘π‘’π‘Ÿπ‘ π‘œπ‘›π‘  𝑖𝑛 π‘Ž π‘ π‘Žπ‘šπ‘π‘™π‘’ (π‘¦π‘’π‘Žπ‘Ÿπ‘ )

οΏ½Μ…οΏ½ = οΏ½Μ…οΏ½π‘ π‘Žπ‘šπ‘π‘™π‘’1, οΏ½Μ…οΏ½π‘ π‘Žπ‘šπ‘π‘™π‘’2, …

Properties of the distribution of the sample means

πœ‡οΏ½Μ…οΏ½ = πœ‡π‘‹

πœŽοΏ½Μ…οΏ½ =πœŽπ‘‹

βˆšπ‘› (standard error of the mean, SEM)

Characteristics of the distribution: Central limit theorem (CLT)

POPULATION SAMPLE

𝑋 = π‘₯

life expectancy of a person in a

population

οΏ½Μ…οΏ½ = οΏ½Μ…οΏ½

average life expectancy of persons in a

sample

normal distribution normal distribution for any n

not normal/not known distribution

CLT: if n is large enough (𝑛 β‰₯ 30)

approximated by normal distribution

the larger n, the better the approximation

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Q3.1: Consider that X has normal distribution: 𝑁𝑋(72, 18.2). What is the distribution of οΏ½Μ…οΏ½ if n

= 10 or n = 40?

n = 10 normal, n = 40 normal

Q3.2: Consider that the distribution of X is not known/not normal. What is the distribution of

οΏ½Μ…οΏ½ if n = 10 or n = 40?

n = 10 not known/not normal, n = 40 approximated by normal

Q3.3: What is the mean of οΏ½Μ…οΏ½ and standard deviation of οΏ½Μ…οΏ½ (standard error of the mean) if n = 40?

πœ‡οΏ½Μ…οΏ½ = πœ‡π‘‹ = 72

πœŽοΏ½Μ…οΏ½ =πœŽπ‘‹

βˆšπ‘›=

18.2

√40= 2.88

v2020

7 / 7

𝑁�̅�(72, 2.88)

Q3.4: Find 𝑃(𝑋 < π‘₯ = 70) and 𝑃(οΏ½Μ…οΏ½ < οΏ½Μ…οΏ½ = 70)?

𝑃(𝑋 < π‘₯ = 70): What is the probability that the life expectancy of a person in the population

is less than 70 years?

𝑁𝑋(72, 18.2)

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

70 βˆ’ 72

18.2= βˆ’0.109

𝑃(𝑋 < π‘₯ = 70) = 0.4247 β†’ 42.47 %

𝑃(οΏ½Μ…οΏ½ < οΏ½Μ…οΏ½ = 70): What is the probability that the average life expectancy of persons in a sample

is less than 70 years?

𝑁�̅�(72, 2.88)

𝑧 =π‘₯ βˆ’ πœ‡

𝜎=

οΏ½Μ…οΏ½ βˆ’ πœ‡

πœŽοΏ½Μ…οΏ½=

οΏ½Μ…οΏ½ βˆ’ πœ‡πœŽπ‘‹

βˆšπ‘›

=70 βˆ’ 72

2.88= βˆ’0.7

𝑃(οΏ½Μ…οΏ½ < οΏ½Μ…οΏ½ = 70) = 0.2420 β†’ 24.2 %


Recommended