Continuous Random Variables
PUBH 7401: Fundamentals of Biostatistical Inference
Eric F. LockUMN Division of Biostatistics, SPH
09/27/2018
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Types of Random VariablesDiscrete Random Variable
The support of a discrete random variable either1 constitutes a finite set , or2 is an infinite sequence of values in which there is a first
element, a second element, and so on (i.e., not an interval)
E.g., D = {0, 1/2, 1} or D = {0, 1, 2, 3, . . . }Continuous Random Variable
A random variable is continuous if both of the following apply:1 Its support consists either of all numbers in a single interval
on the number line2 No specific value of the variable has positive probability, that
is, P(X = c) = 0 for any single value c.
E.g., D = [0, 1]: all values between 0 and 1E.g., D = [−∞,∞]: any positive or negative number
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Failure of the pmf for Continuous RVs
I Why can’t we use the same principles that we used fordiscrete random variables?
I In particular, why can’t we assign a probability to eachpossible value x?
I If the number of possible values of x is uncountable, thereexists no possible pmf such that
∑x∈D p(x) <∞, where D is
the support
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Motivating Example: Heights
I Suppose we are interested in the heights in inches of all malestudents at the U
I Suppose we measured everyone’s height and did so to 10decimal places
I We could provide the proportion of subjects at each recordedheight (pmf)
I Likely there would only be one (maybe two) people at eachheight
I Would not be a very informative summary measure
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Density Histogram
● ●● ●● ● ● ●● ●●●● ● ●●●● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●●● ●● ●● ●●● ●● ●●● ● ●●● ●● ●●●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ●● ●●●● ●●● ●●● ●●●● ●●● ●●●●●● ●● ●●● ●● ● ●● ●●●● ● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●●● ●● ●●● ●●● ● ● ●● ● ●●● ● ●●● ●●● ●● ●● ● ● ● ●● ● ●●● ● ●● ● ●● ●● ●● ● ●●● ●● ●● ●● ●●● ● ● ●●●●● ●●● ●● ●● ●●● ●●● ●●● ●●● ● ●● ●● ●●●● ●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ● ●●●●● ●●● ●●● ●● ●● ● ●● ● ●● ●● ●●●●● ●● ●●●● ●● ●● ● ●● ●●● ●● ●●● ● ●● ●●●● ●●● ● ●●● ● ●● ●● ●● ●● ●●● ●●● ● ●●● ●●● ●● ●● ●● ● ●● ●● ●●● ●● ●●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●●●● ●●● ●●● ●● ●●● ● ●●● ●● ● ●●● ●● ●● ●●● ●●● ●●● ●●● ●● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ● ● ●● ● ●● ●● ● ●●● ●● ●● ●● ●●● ●● ●● ●●●● ●●●● ●● ●●● ● ●● ●●● ●● ●●● ●● ●●● ● ●
65 70 75
0.00
150.
0025
First 500 observerations
Height (inches)
Pro
babi
lity
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Frequency Histogram
I Instead we could provide the number of students withincertain height ranges (from which you could calculate theproportion within each range)
I Height of each bar is proportional to probability of height ineach range
I The scale of the y -axis changes based on the “bin sizes” orsize of range
I If we divide each bin size in half the height of the barsdecreases by a factor of 2, on average
I As bin size get smaller and smaller the histogram converges tozero almost everywhere
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Frequency Histogram
height
Fre
quen
cy
60 65 70 75 80
020
0040
0060
0080
00
height
Fre
quen
cy
60 65 70 75 80
020
0040
0060
0080
00
height
Fre
quen
cy
60 65 70 75 80
020
0040
0060
0080
00
height
Fre
quen
cy
60 65 70 75 80
020
0040
0060
0080
00
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Density Histogram
I A more informative summary measure would be the density ofpoints
I Solution to that problem is to have the area of the curveproportional to the probability
I Area = length × height. Here length is the bin width soheight must be probability/(bin width).
I In statistics, density=probability/length of interval
I The smooth curve which is the limit as the bin size goes tozero is known as the probability density function
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Density Histogram
height
Den
sity
60 65 70 75 80
0.00
0.05
0.10
0.15
height
Den
sity
60 65 70 75 80
0.00
0.05
0.10
0.15
height
Den
sity
60 65 70 75 80
0.00
0.05
0.10
0.15
height
Den
sity
60 65 70 75 80
0.00
0.05
0.10
0.15
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Definition of Probability Density Function
Probability Density Function (pdf)
Let X be a continuous rv. Then a probability distribution orprobability density function (pdf) of X is a function f (x) such thatfor any two numbers a and b with a < b
P(a ≤ X ≤ b) =∫ b
af (x)dx
I The probability that X takes on a value in the interval [a, b] isthe area under the graph of the density function f (x).
I Implication of this is that P(X = c) = 0 (or more colloquially,P(X = c) approaches zero)
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Heuristic for Probability Density
I Think of a pdf as a “smoothed” Density histogram
I The density is NOT the probability
I The density does describe the relative likelihood of the valuesin the support of the RV
I The density f (x)× ε ≈ probability of observations within ε ofpoint x if ε is small.
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Review of Integration
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Integration
x
f(x)
The definite integral of a function f (x) evaluated from a to b,which we denote as
∫ ba f (x)dx , gives the area under the curve
a and b tell us over what interval to evaluate the area,dx tell use with respect to which variable to take the integral
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Review of Integration: Summation
I Think of integration in terms of summation (technically, thelimit of a sum).
I Let x1, x2, . . . , xn+1 be a sequence between a and b
I∫ b
a f (x)dx ≈∑n
i=1 f (xi )(xi+1 − xi )
I Think of integral∫
as a sum∑
of the function f(x) evaluatedover a sequence of equally spaced points between a and b anddx as the distance between successive points in the sequence
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Properties of the pdf
I f (x) ≥ 0 for all x ∈ (−∞,+∞)
I∫∞−∞ f (x)dx = 1
I Is f (x) ≤ 1 for all x ∈ (−∞,+∞)?
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Wind turbin example: 4.4
Let X denote the vibratory stress (psi) on a wind turbine blade ata particular wind speed in a wind tunnel. As a model for thedistribution of X we use the Rayleigh distribution with pdf
f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0
I Verify that f (x ; θ) is a legitimate pdf
I Suppose θ = 100. What is the probability that X is at most200? Less than 200?
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Example #1: Rayleigh Dist. Density Plot
0 100 200 300 400 500
0.00
00.
001
0.00
20.
003
0.00
40.
005
0.00
6
Rayleigh Distribution
Vibratory Stress (psi)
Den
sity
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
GPA example
The grade point average (GPA’s) for graduating seniors at acollege are distributed as a continuous random variable X with pdf
f (x) = k{1− (x − 3)2}, 2 ≤ x ≤ 4
Find the value of k
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Definition of Cumulative Distribution Function
Cumulative Distribution Function The cumulative distributionfunction F (x) for a continuous rv X is defined for every number xby
F (x) = P(X ≤ x) =∫ x
−∞f (y)dy
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
cdf Example #1
Find the cdf of the Rayleigh distribution with pdf given by
f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Properties of the cdf
I For all x ∈ (−∞,+∞), 0 ≤ F (x) ≤ 1
I F (−∞) = limx→−∞ P(X ≤ x) = 0
I F (∞) = limx→∞ P(X ≤ x) = 1
I If x < y then F (x) ≤ F (y)
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Using the cdf to Determine Probabilities Between TwoValues
I Let X be a continuous rv with pdf f (x) and cdf F (x). Thenfor any numbers a ≤ b, P(a ≤ X ≤ b) = F (b)− F (a).
I Because P(X = c) = 0, then P(a ≤ X ≤ b) = P(a < X ≤b) = P(a ≤ X < b) = P(a < X < b)
I
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
f(x)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
f(x)
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
xf(
x)
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
cdf Example #2
A family of pdf’s that have been used to approximate thedistribution of income, city population, and size of firms is thePareto family. The family has two parameters k and θ where both> 0 and the pdf is given by
f (x ; k, θ) = kθk
xk+1 , x ≥ θ
I Find the cdf.
I For θ = 1 and k = 2 find P(1 ≤ X ≤ 3)
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Obtaining the pdf from the cdf
I If X is a continuous rv with pdf f (x) and cdf F (x), then atevery x at which the derivative F ′(x) exists, F ′(x) = f (x)
I Let F (x) = 1− exp(−λx). Find the density.
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Obtaining the pdf from the cdf
Find the density of the cdf given by:
F (x) =
0 if x < 0x2
4 if 0 ≤ x < 21 if 2 ≤ x
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Definition of Percentile
Think growth charts or standardized tests. We say that astudent scored in the 85th percentile of the ACT if she didbetter than 85% of all other students on the test. Formalizedefinition of percentile below.
Percentile Definition Let p be a number between 0 and 1. The(100p)th percentile of the distribution of a continuous rv X ,denoted by η(p), is defined by
p = F{η(p)} =∫ η(p)
−∞f (y)dy
Defining a percentile for a discrete distribution is problematic.Why?Median is the 50th percentile.
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Percentile Example #1
Find the median of the Rayleigh distributionRayleigh pdf
f (x ; θ) = xθ2 exp{−x2/(2θ2)}, for x > 0
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Definition of Expectation for Continuous RVs
I As with discrete random variables, expectation is a measure ofthe center of the distribution
I Long-run average of many observations from a distribution
Definition of Expectation The expected or mean value of acontinuous rv X with pdf f (x) is
E (X ) =∫ ∞−∞
xf (x)dx
For a continuous random variable does the expectation have to bein the support?
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Connection to Definition with Discrete RVs
Recall if X is discrete
E (X ) =∑x∈D
xP(X = x)
Again, think of integral as approximating a sum:
∫xf (x)dx ≈
n∑i=1
xi f (xi )(xi+1 − xi )
≈n∑
i=1xiP(xi ≤ X ≤ xi+1)
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Definition of Variance and Other Functions for ContinuousRVs
I For any function, h(X ), of the continuous rv X with pdf f (x)the expectation of h(X ) is given by
E (h(X )) =∫ ∞−∞
h(x)f (x)dx
I In particular, the variance of the continuous rv X , denoted byV (X ), with mean µ and pdf f (x) is E{(X − µ)2}
I As with discrete rv, V (X ) = E (X 2)− {E (X )}2
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables
Expectation and Variance Example #1
Find the mean and variance of the Pareto distribution withpdf below and θ = 1 and k = 3
f (x ; k, θ) = kθk
xk+1 , x ≥ θ
PUBH 7401: Fundamentals of Biostatistical Inference Continuous Random Variables