10/23/2017
1
Note Packet #14
Frequency Analysis & Probability Plots
CEE 3710
October 20, 2017
Frequency Analysis
• Process by which engineers formulate magnitude of design
events (i.e. 100‐year flood) or assess risk associated with
various outcomes/events
• Based on use of sample data to hypothesize probability model
and infer characteristics of the population of interest
• Works with any probability distribution
10/23/2017
2
Motivation:
You need to design a levee to withstand the 100‐year flood (X0.99).
Given the sample data {x1, x2, …, xn} below corresponding to the
magnitude of n = 50 annual maximum flood flows, what is ?
0
500
1000
1500
2000
2500
3000
3500
4000
1960 1970 1980 1990 2000 2010
Annual M
aximum Discharge (cfs)
Year
0.99X̂
(1) Compute sample moments, descriptive statistics
(2) Select an appropriate model (probability distribution) of
annual maximum flood flows
Considerations:
‐ What does data look like? Is data skewed?
‐ Are variables strictly positive? Continuous or Discrete?
0
2
4
6
8
10
12
14
410 860 1310 1760 2210 2660 3110 More
Freq
uen
cy
Annual Maximum Discharge (cfs)
Histogram
1
11503.9 cfs
n
ii
x xn
2 22
1
1821.0 cfs
1
n
x ii
s x xn
10/23/2017
3
(3) Fit selected model to data using MOM to estimate distribution
parameters (Point estimates of parameters)
0
2
4
6
8
10
12
14
16
410 860 1310 1760 2210 2660 3110 More
Frequency
Annual Maximum Discharge (cfs)
Histogram
( ) 1503.9 cfsx xxf x dx x
22 2 2( ) ( ) 821.0 cfsx x x xx f x dx s
1/22 2
Y X Xln 1 0.577
2Y X Y
1ln[ ] 7.161
2
If X ~ Lognormal 2Y
X 22YY
ln( )1 1( ) exp
22
xf x
xfor x > 0; 0 otherwise
(4) Assess goodness‐of‐fit: How well does model represent data
How good are our parameter estimates?
(How good is our estimate of 100‐year event?)
Compute confidence intervals
Construct Quantile‐Quantile Plot
(5) Compute (or other values of interest)0.99X̂
10/23/2017
4
General Procedure:
1. Obtain a sample of size n, compute sample moments and
descriptive statistics
2. Hypothesize underlying probability density function (pdf) of
the population
3. Apply method of moments and compute parameters of the
assumed pdf (i.e. fit probability model to the data)
4. Assess fit of probability model by graphing the fitted
cumulative distribution function (cdf) relative to sample data
(empirical cdf, probability plot, or quantile‐quantile plot)
5. Use the fitted cdf to obtain percentiles (design events) or
probabilities associated with outcomes of interest
• Smooth line/curve corresponds to probability model (representation of population)
• Dots/points correspond to observed sample data
10/23/2017
5
Empirical Cumulative Distribution Function (CDF)
• Representation of the cumulative distribution function based
on the relative magnitude of observations in a sample of size n
• Obtained by graphing the plotting positions versus the ranked
observations
• Plotting Position (pi): provides an estimate of the cumulative
probability associated with the observation of rank i (x(i))
pi = i/(n+1)
In other words, pi = P[X ≤ x(i)] and thus, x(i) represents an
empirical percentile (or quantile)
Example: Construct an empirical CDF for the following sample data:
{90, 105, 65, 135, 95, 115, 80, 73, 76, 88}
i x(i) pi
1 65 0.091
2 73 0.182
3 76 0.273
4 80 0.364
5 88 0.455
6 90 0.545
7 95 0.636
8 105 0.727
9 115 0.818
10 135 0.909
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF
Note: Construction of the empirical CDF does not require consideration of the form of the underlying probability distribution for the random variable/population; however, we can assess the goodness‐of‐fit of a probability distribution by plotting the assumed/fitted cdf (model) on the same figure as the empirical cdf (observed).
10/23/2017
6
Example: Use the method of moments to fit a normal distribution
to the data above, and then assess how well it represents the
data by plotting the fitted CDF relative to the empirical CDF.
i x(i) pi zpi ( )ˆ ˆi pix x
1 65 0.091 -1.335 63.8
2 73 0.182 -0.908 72.9
3 76 0.273 -0.605 79.4
4 80 0.364 -0.349 84.8
5 88 0.455 -0.114 89.8
6 90 0.545 0.114 94.6
7 95 0.636 0.349 99.6
8 105 0.727 0.605 105.0
9 115 0.818 0.908 111.5
10 135 0.909 1.335 120.6
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF vs. Fitted Normal CDF
Sample Data
Fitted Normal
Example: Use the method of moments to fit a lognormal
distribution to the data, and then assess how well it represents
the data by plotting the fitted CDF relative to the empirical CDF.
i x(i) pi zpi ( )ˆ ˆi pix x
1 65 0.091 -1.335 66.3
2 73 0.182 -0.908 73.1
3 76 0.273 -0.605 78.3
4 80 0.364 -0.349 83.0
5 88 0.455 -0.114 87.5
6 90 0.545 0.114 92.2
7 95 0.636 0.349 97.3
8 105 0.727 0.605 103.1
9 115 0.818 0.908 110.5
10 135 0.909 1.335 121.7
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF vs. Fitted Lognormal CDF
Sample Data
Fitted LN
10/23/2017
7
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF vs. Fitted Normal CDF
Sample Data
Fitted Normal
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF vs. Fitted Lognormal CDF
Sample Data
Fitted LN
Example: Use the method of moments to fit a Gumbeldistribution to the data, and then assess how well it represents the data by plotting the fitted CDF relative to the empirical CDF.
i x(i) pi ( )ˆ ˆi pix x
1 65 0.091 68.1
2 73 0.182 73.8
3 76 0.273 78.3
4 80 0.364 82.4
5 88 0.455 86.6
6 90 0.545 90.9
7 95 0.636 95.8
8 105 0.727 101.6
9 115 0.818 109.3
10 135 0.909 121.6
0.0
0.2
0.4
0.6
0.8
1.0
0 50 100 150
pi
x
Empirical CDF vs. Fitted Gumbel CDF
Sample Data
Fitted Gumbel
10/23/2017
8
Quantile‐Quantile (Q‐Q) Plots
• Constructed by plotting ranked observations ( ) against the
fitted percentiles, or quantiles ( )
Observed or Empirical Quantiles vs. Modeled or Fitted Quantiles
• Sample data should fall approximately on a straight line (1:1) if
the fitted distribution adequately describes the true population
)i(x̂(i)x
10/23/2017
10
Probability Plots
• Sample data is plotted so that the observations should fall
approximately on a straight line if a selected distribution describes
the true population
– however, unlike Q‐Q plots, assessment of the selected
distribution (model) does not depend on estimated parameters
• Can be created with special commercially available probability
papers for some distributions (normal, lognormal, Gumbel), or the
general technique developed here (easy with a spreadsheet)
• Constructed by plotting ranked observations ( x(i) ) against
standardized percentiles
i x(i) pi zpi
1 65 0.091 -1.335
2 73 0.182 -0.908
3 76 0.273 -0.605
4 80 0.364 -0.349
5 88 0.455 -0.114
6 90 0.545 0.114
7 95 0.636 0.349
8 105 0.727 0.605
9 115 0.818 0.908
10 135 0.909 1.335
Example: Reconsider the sample data above. Use a probability plot to assess how well the normal distribution fit using the method of moments represents the sample data.
10/23/2017
11
Example: Reconsider the sample data above. Use a probability plot to assess how well the lognormal distribution fit using the method of moments represents the sample data.
i x(i) ln( x(i)) pi zpi
1 65 4.174 0.091 -1.335
2 73 4.290 0.182 -0.908
3 76 4.330 0.273 -0.605
4 80 4.382 0.364 -0.349
5 88 4.477 0.455 -0.114
6 90 4.500 0.545 0.114
7 95 4.554 0.636 0.349
8 105 4.654 0.727 0.605
9 115 4.745 0.818 0.908
10 135 4.905 0.909 1.335
i x(i) pi -ln(-ln(pi))
1 65 0.091 -0.875
2 73 0.182 -0.533
3 76 0.273 -0.262
4 80 0.364 -0.012
5 88 0.455 0.238
6 90 0.545 0.501
7 95 0.636 0.794
8 105 0.727 1.144
9 115 0.818 1.606
10 135 0.909 2.351
Example: Reconsider the sample data above. Use a probability plot to assess how well the Gumbel distribution fit using the method of moments represents the sample data.