+ All Categories
Home > Documents > Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Date post: 18-Jan-2016
Category:
Upload: chad-gabriel-washington
View: 227 times
Download: 0 times
Share this document with a friend
38
Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009
Transcript
Page 1: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Choosing a Probability Distribution

Water Resource Risk AnalysisDavis, CA

2009

Page 2: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Probability x Consequence

• Quantitative risk assessment requires you to use probability

• Sometimes you will estimate the probability of an event

• Sometimes you will use distributions to– Describe data– Model variability– Represent our uncertainty

• What distribution do you use?

Page 3: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Probability—Language of Random

Variables• Constant

• Variables• Some things vary predictably• Some things vary unpredictably

• Random variables• It can be something known but not known by

us

Page 4: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Checklist for Choosing a Distributions From Some Data

1. Can you use your data?

2. Understand your variable

a) Source of data

b) Continuous/discrete

c) Bounded/unbounded

d) Meaningful parameters

e) Univariate/multivariate

f) 1st or 2nd order

3. Look at your data—plot it

4. Use theory

5. Calculate statistics

6. Use previous experience

7. Distribution fitting

8. Expert opinion

9. Sensitivity analysis

Page 5: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

First!

• Do you have data?

• If so, do you need a distribution or can you just use your data?

• Answer depends on the question(s) you’re trying to answer as well as your data

Page 6: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Use Data

• If your data are representative of the population germane to your problem use them

• One problem could be bounding data– What are the true min & max?

• Any dataset can be converted into a – Cumulative distribution function– General density function

Page 7: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Fitting Empirical Distribution to Data

• If continuous & reasonably extensive

• May have to estimate minimum & maximum

• Rank data x(i) in ascending order

• Calculate the percentile for each value

• Use data and percentiles to create cumulative distribution function

Page 8: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

When You Can’t Use Your Data• Given wide variety of distributions it is not

always easy to select the most appropriate one– Results can be very sensitive to distribution

choice

• Using wrong assumption in a model can produce incorrect results

• Incorrect results can lead to poor decisions• Poor decisions can lead to undesirable

outcomes

Page 9: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Understand Your Data

• What is source of data?– Experiments– Observation– Surveys– Computer databases– Literature searches– Simulations– Test case

The source of the data mayaffect your decision to useit or not.

Understand your variable

Page 10: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Type of Variable?

• Is your variable discrete or continuous ?• Do not overlook this!

– Discrete distributions- take one of a set of identifiable values, each of which has a calculable probability of occurrence

– Continuous distributions- a variable that can take any value within a defined range  

•Barges in a tow •Houses in floodplain•People at a meeting•Results of a diagnostic test•Casualties per year•Relocations and acquisitions

•Average number of barges per tow•Weight of an adult striped bass•Sensitivity or specificity of a diagnostic test•Transit time•Expected annual damages•Duration of a storm•Shoreline eroded•Sediment loads

Understand your variable

Page 11: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

What Values Are Possible?

• Is your variable bounded or unbounded?– Bounded-value confined to lie between

two determined values– Unbounded-value theoretically extends

from minus infinity to plus infinity– Partially bounded-constrained at one end

(truncated distributions)

• Use a distribution that matches

Understand your variable

Page 12: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Continuous Distribution Examples

• Unbounded– Normal– t– Logistic

• Left Bounded– Chi-square– Exponential– Gamma– Lognormal– Weibull

• Bounded– Beta– Cumulative– General/histogram– Pert – Uniform– Triangle

5.0% 90.0% 5.0%

-1.645 1.645

-2.5-2.0-1.5-1.0-0.50.00.51.01.52.02.5

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

5.0% 90.0% 5.0%

0.051 2.996

-0.50.00.51.01.52.02.53.03.54.04.55.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0.0% 100.0% 0.0%

0.000 1.000

-0.20.00.20.40.60.81.01.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Understand your variable

Page 13: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Discrete Distribution Examples

• Unbounded– None

• Left Bounded– Poisson– Negative binomial– Geometric

• Bounded– Binomial– Hypergeometric– Discrete– Discrete Uniform

5.0% 90.0% 5.0%

5.00 15.00

-202468101214161820

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

5.0% 90.0% 5.0%

1.00 4.00

-10123456

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

5.0% 90.0% 5.0%

5.00 15.00

-202468101214161820

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

5.0% 90.0% 5.0%

1.00 4.00-10123456

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Understand your variable

Page 14: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Are There Parameters

• Does your variable have parameters that are meaningful?– Parametric--shape is determined by the

mathematics describing a conceptual probability model

• Require a greater knowledge of the underlying

– Non-parametric—empirical distributions for which the mathematics is defined by the shape required

• Intuitively easy to understand• Flexible and therefore useful

Understand your variable

Page 15: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Choose Parametric Distribution If

• Theory supports choice

• Distribution proven accurate for modelling your specific variable (without theory)

• Distribution matches any observed data well

• Need distribution with tail extending beyond the observed minimum or maximum

Understand your variable

Page 16: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Choose Non-Parametric Distribution If

• Theory is lacking

• There is no commonly used model

• Data are severely limited

• Knowledge is limited to general beliefs and some evidence

Understand your variable

Page 17: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Parametric and Non-Parametric

• Normal• Lognormal• Exponential• Poisson• Binomial• Gamma

• Uniform• Pert• Triangular• Cumulative

5.0% 90.0% 5.0%

-1.645 1.645

-2.5-2.0-1.5-1.0-0.50.00.51.01.52.02.5

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40 5.0% 90.0% 5.0%

-1.71 1.71

-3-2-10123

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Understand your variable

Page 18: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Is It Dependent on Other Variables

• Univariate and multivariate distributions– Univariate--describes a single parameter or

variable that is not probabilistically linked to any other in the model

– Multivariate--describe several parameters that are probabilistically linked in some way

• Engineering relationships are often multivariate

Understand your variable

Page 19: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Do You Know the Parameters?

• First or Second order distribution– First order—a probability distribution with

precisely known parameters (N(100,10))– Second order--a probability with some

uncertainty about its parameters (N(m,s))• Risknormal(risktriang(90,100,103),riskuniform(8,11))

Understand your variable

Page 20: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Continuing Checklist for Choosing a Distributions

3. Look at your data—plot it

4. Use theory

5. Calculate statistics

6. Use previous experience

7. Distribution fitting

8. Expert opinion

9. Sensitivity analysis

Page 21: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Plot--Old Faithful Eruptions

• What do your data look like?

• You could calculate Mean & SD and assume its normal

• Beware, danger lurks

• Always plot your data

Page 22: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Which Distribution?

• Examine your plot• Look for distinctive shapes of specific

distributions– Single peaks– Symmetry– Positive skew– Negative values– Gamma, Weibull,

beta are useful and flexible forms

Page 23: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Theory-Based Choice

• Most compelling reason for choice• Formal theory

– Central limit theorem

• Theoretical knowledge of the variable– Behavior– Math—range

• Informal theory– Sums normal, products lognormal– Study specific– Your best documented thoughts on subject

Page 24: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Calculate Statistics

• Summary statistics may provide clues

• Normal has low coefficient of variation and equal mean and median

• Exponential has positive skew and equal mean and standard deviation

• Consider outliers

Page 25: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Outliers

• Extreme observations can drastically influence a probability model

• No prescriptive method for addressing them• If observation is an error remove it• If not what is data point telling you?

– What about your world-view is inconsistent with this result? 

– Should you reconsider your perspective?  – What possible explanations have you not yet

considered?

Page 26: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Outliers (cont)

•  Your explanation must be correct, not merely plausible– Consensus is poor measure of truth  

• If you must keep it and can't explain it– Use conventional practices and live with

skewed consequences– Choose methods less sensitive to such

extreme observations (Gumbel, Weibull)

Page 27: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Previous Experience

• Have you dealt with this issue successfully before?

• What did other analyses or risk assessments use?

• What does the literature reveal?

Page 28: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Goodness of Fit

• Provides statistical evidence to test hypothesis that your data could have come from a specific distribution

• H0 these data come from an “x” distribution

• Small test statistic and large p mean accept H0

• It is another piece of evidence not a determining factor

Page 29: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

GOF Tests• Chi-Square Test

– Most common—discrete & continuous

– Data are divided into a number of cells, each cell with at least five

– Usually 50 observations or more

• Kolomogorov-Smirnov Test– More suitable for small

samples than Chi-Square

– Better fit for means than tails

• Andersen-Darling Test– Weights differences

between theoretical and empirical distributions at their tails greater than at their midranges

– Desirable when better fit at extreme tails of distribution are desired

Page 30: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Kolmogorov-Smirnov Statistic

• Blue = data• Red =

true/hypothetical• Find biggest difference

between the two • K-S statistic is largest

difference consistent with your– n– α

Normal(25.2290, 4.9645)

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30 35 40

< >5.0% 5.0%90.0%17.06 33.39

Page 31: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

No Data Available

• Modelers must resort to judgment

• Knowledge of distributions is valuable in this situation

Page 32: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Defining Distributions w/ Expert Opinion

• Data never collected

• Data too expensive or impossible

• Past data irrelevant

• Opinion needed to fill holes in sparse data

• New area of inquiry, unique situation that never existed

Page 33: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

What Experts Estimate

• The distribution itself– Judgment about distribution of value in

population– E.g. population is normal

• Parameters of the distribution– E.g. mean is x and standard deviation is y

Page 34: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Modeling Techniques

• Disaggregation (Reduction)

• Subjective Probability Elicitation

• PDF or CDF

• Parametric or Non-parametric distributions

Page 35: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Elicitation Techniques Needed

• Literature shows we do not assess subjective probabilities well

• In part due to heuristics we use– Representativeness– Availability– Anchoring and adjustment

• There are methods to counteract our heuristics and to elicit our expert knowledge

Page 36: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Sensitivity Analysis

• Unsure which is the best distribution?

• Try several– If no difference you are free to use any one– Significant differences mean doing more work

Page 37: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Take Away Points

• Choosing the best distribution is where most new risk assessors feel least comfortable.

• Choice of distribution matters.

• Distributions come from data and expert opinion.

• Distribution fitting should never be the basis for distribution choice.

Page 38: Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Charles Yoe, [email protected]

Questions?


Recommended