Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random...

Continuous Random Variables

PUBH 7401: Fundamentals of Biostatistical Inference

Eric F. LockUMN Division of Biostatistics, SPH

[email protected]

09/27/2018

PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables

Types of Random VariablesDiscrete Random Variable

The support of a discrete random variable either1 constitutes a finite set , or2 is an infinite sequence of values in which there is a first

element, a second element, and so on (i.e., not an interval)

E.g., D = {0, 1/2, 1} or D = {0, 1, 2, 3, . . . }Continuous Random Variable

A random variable is continuous if both of the following apply:1 Its support consists either of all numbers in a single interval

on the number line2 No specific value of the variable has positive probability, that

is, P(X = c) = 0 for any single value c.

E.g., D = [0, 1]: all values between 0 and 1E.g., D = [−∞,∞]: any positive or negative number


Failure of the pmf for Continuous RVs

I Why can’t we use the same principles that we used fordiscrete random variables?

I In particular, why can’t we assign a probability to eachpossible value x?

I If the number of possible values of x is uncountable, thereexists no possible pmf such that

∑x∈D p(x) <∞, where D is

the support


Motivating Example: Heights

I Suppose we are interested in the heights in inches of all malestudents at the U

I Suppose we measured everyone’s height and did so to 10decimal places

I We could provide the proportion of subjects at each recordedheight (pmf)

I Likely there would only be one (maybe two) people at eachheight

I Would not be a very informative summary measure


Density Histogram

● ●● ●● ● ● ●● ●●●● ● ●●●● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●●● ●● ●● ●●● ●● ●●● ● ●●● ●● ●●●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ●● ●●●● ●●● ●●● ●●●● ●●● ●●●●●● ●● ●●● ●● ● ●● ●●●● ● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●●● ●● ●●● ●●● ● ● ●● ● ●●● ● ●●● ●●● ●● ●● ● ● ● ●● ● ●●● ● ●● ● ●● ●● ●● ● ●●● ●● ●● ●● ●●● ● ● ●●●●● ●●● ●● ●● ●●● ●●● ●●● ●●● ● ●● ●● ●●●● ●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ● ●●●●● ●●● ●●● ●● ●● ● ●● ● ●● ●● ●●●●● ●● ●●●● ●● ●● ● ●● ●●● ●● ●●● ● ●● ●●●● ●●● ● ●●● ● ●● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●● ●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●●●● ●●● ●●● ●● ●●● ● ●●● ●● ● ●●● ●● ●● ●●● ●●● ●●● ●●● ●● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ● ● ●● ● ●● ●● ● ●●● ●● ●● ●● ●●● ●● ●● ●●●● ●●●● ●● ●●● ● ●● ●●● ●● ●●● ●● ●●● ● ●

65 70 75

0.00

150.

0025

First 500 observerations

Height (inches)

Pro

babi

lity


Frequency Histogram

I Instead we could provide the number of students withincertain height ranges (from which you could calculate theproportion within each range)

I Height of each bar is proportional to probability of height ineach range

I The scale of the y -axis changes based on the “bin sizes” orsize of range

I If we divide each bin size in half the height of the barsdecreases by a factor of 2, on average

I As bin size get smaller and smaller the histogram converges tozero almost everywhere


Frequency Histogram

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00

height

Fre

quen

cy

60 65 70 75 80

020

0040

0060

0080

00


Density Histogram

I A more informative summary measure would be the density ofpoints

I Solution to that problem is to have the area of the curveproportional to the probability

I Area = length × height. Here length is the bin width soheight must be probability/(bin width).

I In statistics, density=probability/length of interval

I The smooth curve which is the limit as the bin size goes tozero is known as the probability density function


Density Histogram

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15

height

Den

sity

60 65 70 75 80

0.00

0.05

0.10

0.15


Definition of Probability Density Function

Probability Density Function (pdf)

Let X be a continuous rv. Then a probability distribution orprobability density function (pdf) of X is a function f (x) such thatfor any two numbers a and b with a < b

P(a ≤ X ≤ b) =∫ b

af (x)dx

I The probability that X takes on a value in the interval [a, b] isthe area under the graph of the density function f (x).

I Implication of this is that P(X = c) = 0 (or more colloquially,P(X = c) approaches zero)


Heuristic for Probability Density

I Think of a pdf as a “smoothed” Density histogram

I The density is NOT the probability

I The density does describe the relative likelihood of the valuesin the support of the RV

I The density f (x)× ε ≈ probability of observations within ε ofpoint x if ε is small.


Review of Integration

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Integration

x

f(x)

The definite integral of a function f (x) evaluated from a to b,which we denote as

∫ ba f (x)dx , gives the area under the curve

a and b tell us over what interval to evaluate the area,dx tell use with respect to which variable to take the integral


Review of Integration: Summation

I Think of integration in terms of summation (technically, thelimit of a sum).

I Let x1, x2, . . . , xn+1 be a sequence between a and b

I∫ b

a f (x)dx ≈∑n

i=1 f (xi )(xi+1 − xi )

I Think of integral∫

as a sum∑

of the function f(x) evaluatedover a sequence of equally spaced points between a and b anddx as the distance between successive points in the sequence


Properties of the pdf

I f (x) ≥ 0 for all x ∈ (−∞,+∞)

I∫∞−∞ f (x)dx = 1

I Is f (x) ≤ 1 for all x ∈ (−∞,+∞)?


Wind turbin example: 4.4

Let X denote the vibratory stress (psi) on a wind turbine blade ata particular wind speed in a wind tunnel. As a model for thedistribution of X we use the Rayleigh distribution with pdf

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0

I Verify that f (x ; θ) is a legitimate pdf

I Suppose θ = 100. What is the probability that X is at most200? Less than 200?


Example #1: Rayleigh Dist. Density Plot

0 100 200 300 400 500

0.00

00.

001

0.00

20.

003

0.00

40.

005

0.00

6

Rayleigh Distribution

Vibratory Stress (psi)

Den

sity


GPA example

The grade point average (GPA’s) for graduating seniors at acollege are distributed as a continuous random variable X with pdf

f (x) = k{1− (x − 3)2}, 2 ≤ x ≤ 4

Find the value of k


Definition of Cumulative Distribution Function

Cumulative Distribution Function The cumulative distributionfunction F (x) for a continuous rv X is defined for every number xby

F (x) = P(X ≤ x) =∫ x

−∞f (y)dy


cdf Example #1

Find the cdf of the Rayleigh distribution with pdf given by

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0


Properties of the cdf

I For all x ∈ (−∞,+∞), 0 ≤ F (x) ≤ 1

I F (−∞) = limx→−∞ P(X ≤ x) = 0

I F (∞) = limx→∞ P(X ≤ x) = 1

I If x < y then F (x) ≤ F (y)


Using the cdf to Determine Probabilities Between TwoValues

I Let X be a continuous rv with pdf f (x) and cdf F (x). Thenfor any numbers a ≤ b, P(a ≤ X ≤ b) = F (b)− F (a).

I Because P(X = c) = 0, then P(a ≤ X ≤ b) = P(a < X ≤b) = P(a ≤ X < b) = P(a < X < b)

I

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f(x)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f(x)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

xf(

x)


cdf Example #2

A family of pdf’s that have been used to approximate thedistribution of income, city population, and size of firms is thePareto family. The family has two parameters k and θ where both> 0 and the pdf is given by

f (x ; k, θ) = kθk

xk+1 , x ≥ θ

I Find the cdf.

I For θ = 1 and k = 2 find P(1 ≤ X ≤ 3)


Obtaining the pdf from the cdf

I If X is a continuous rv with pdf f (x) and cdf F (x), then atevery x at which the derivative F ′(x) exists, F ′(x) = f (x)

I Let F (x) = 1− exp(−λx). Find the density.


Obtaining the pdf from the cdf

Find the density of the cdf given by:

F (x) =

0 if x < 0x2

4 if 0 ≤ x < 21 if 2 ≤ x


Definition of Percentile

Think growth charts or standardized tests. We say that astudent scored in the 85th percentile of the ACT if she didbetter than 85% of all other students on the test. Formalizedefinition of percentile below.

Percentile Definition Let p be a number between 0 and 1. The(100p)th percentile of the distribution of a continuous rv X ,denoted by η(p), is defined by

p = F{η(p)} =∫ η(p)

−∞f (y)dy

Defining a percentile for a discrete distribution is problematic.Why?Median is the 50th percentile.


Percentile Example #1

Find the median of the Rayleigh distributionRayleigh pdf

f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0


Definition of Expectation for Continuous RVs

I As with discrete random variables, expectation is a measure ofthe center of the distribution

I Long-run average of many observations from a distribution

Definition of Expectation The expected or mean value of acontinuous rv X with pdf f (x) is

E (X ) =∫ ∞−∞

xf (x)dx

For a continuous random variable does the expectation have to bein the support?


Connection to Definition with Discrete RVs

Recall if X is discrete

E (X ) =∑x∈D

xP(X = x)

Again, think of integral as approximating a sum:

∫xf (x)dx ≈

n∑i=1

xi f (xi )(xi+1 − xi )

≈n∑

i=1xiP(xi ≤ X ≤ xi+1)


Definition of Variance and Other Functions for ContinuousRVs

I For any function, h(X ), of the continuous rv X with pdf f (x)the expectation of h(X ) is given by

E (h(X )) =∫ ∞−∞

h(x)f (x)dx

I In particular, the variance of the continuous rv X , denoted byV (X ), with mean µ and pdf f (x) is E{(X − µ)2}

I As with discrete rv, V (X ) = E (X 2)− {E (X )}2


Expectation and Variance Example #1

Find the mean and variance of the Pareto distribution withpdf below and θ = 1 and k = 3

f (x ; k, θ) = kθk

xk+1 , x ≥ θ


Date post:	03-Aug-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Continuous Random Variables - Eric F. Lockericfrazerlock.com/continuous_rvs.pdfContinuous Random...

Documents