+ All Categories
Home > Documents > GG 313 Lecture 6

GG 313 Lecture 6

Date post: 06-Jan-2016
Category:
Upload: janus
View: 21 times
Download: 0 times
Share this document with a friend
Description:
GG 313 Lecture 6. Probability Distributions. - PowerPoint PPT Presentation
27
GG 313 Lecture 6 Probability Distributions
Transcript
Page 1: GG 313 Lecture 6

GG 313 Lecture 6

Probability Distributions

Page 2: GG 313 Lecture 6

When we sample many phenomena, we find that the probability of occurrence of an event will be distributed in a way that is easily described by one of several well-known functions called “probability distributions”. We will discuss these functions and give examples of them.

UNIFORM DISTRIBUTION

The uniform distribution is simple, it’s the probability distribution found when all probabilities are equal. For example: Consider the probability of throwing an “x” using a 6-sided die: P(x), x=1,2,3,4,5,6. If the die is “fair”, then P(x)=1/6, x=1,2,3,4,5,6.

This is a discrete probability distribution, since x can only have integer or fixed values.

Page 3: GG 313 Lecture 6

P(x) is always ≥ 0 and always ≤ 1.

If we add up the probabilities for all x, the sum = 1:

P(x i) =1i=1

n

Page 4: GG 313 Lecture 6

In most cases, the probability distribution is not uniform, and some events are more likely than others.

For example, we may want to know the probability of hitting a target x times in n tries. We can’t get the solution unless we know the probability of hitting the target in one try: P(hit)=p. Once we know P(hit), we can calculate the probability distribution:

P(x)=pxq(n-x),

where q=1-p. This is the number of combinations of n things taken x at a time, nCx.

Recall that this is the probability where the ORDER of the x hits matters, which is not what we want. We want the number of permutations as defined earlier:

Page 5: GG 313 Lecture 6

P(x) =n

x

⎝ ⎜

⎠ ⎟pxqn−x =

n

x

⎝ ⎜

⎠ ⎟px (1− p)n−x, x =1,2,L n

n

x

⎝ ⎜

⎠ ⎟= binomial coefficients =

n!

x!(n − x)!

Recall that:

This is known as the binomial probability distribution, used to predict the probability of success in x events out of n tries.

Using our example above, what is the probability of hitting a target x times in n tries if p=0.1 and n=10?

P(x) =10

x

⎝ ⎜

⎠ ⎟0.1x ⋅0.910−x =

10!

x!(10 − x)!0.1x ⋅0.910−x

Page 6: GG 313 Lecture 6

We can write an easy m-file for this:

% binomial distributionn=10p=0.1xx=[0:n];for ii=0:n; px(ii+1)=(factorial(n)/(factorial(xx(ii+1))*factorial(n-xx(ii+1))))*p^xx(ii+1)*(1-p)^(n-xx(ii+1));endplot(xx,px)sum(px) % check to be sure sure that the sum =1

Page 7: GG 313 Lecture 6

P=0.1

The probability of 1 hit is 0.38 with p=0.1

Page 8: GG 313 Lecture 6

What if we change the probability of hitting the target in any one shot to 0.3. What is the most likely number of hits in 10 shots?

p=0.3

Page 9: GG 313 Lecture 6

Continuous Probability Distributions

Continuous populations, such as the temperature of the atmosphere, depth of the ocean, concentration of pollutants, etc., can take on any value with in their range. We may only sample them at particular values, but the underlying distribution is continuous.

Rather than the SUM of the distribution equaling 1.0, for continuous distributions the integral of the distribution (the area under the curve) over all possible values must equal 1.

P(x)dx =1−∞

P(x) is called a probability density function, or PDF.

Page 10: GG 313 Lecture 6

Because they are continuous, the probability of any particular value being observed is zero. We discuss instead the probability of a value being between two limits, a- and a+:

P(a ± Δ) = P(x)dxa−Δ

a+Δ

∫As 0, the probability also approaches zero.

We also define the cumulative probability distribution giving the probability that an observation will have a value less than or equal to a. This distribution is bounded by 0≤p(x) ≤ 1.

Pc (a) = P(x)dx−∞

a

∫As a, Pc(a) 1.

Page 11: GG 313 Lecture 6

The normal distribution

While any continuous function with an area under the curve of 1 can be a probability distribution, but in reality, some functions are far more common than others. The Normal Distribution, or Gaussian Distribution, is the most common and most valued. This is the classic “bell-shaped curve”. It’s distribution is defined by:

p(x) =1

σ 2πe

−1

2

x−μ

σ

⎝ ⎜

⎠ ⎟2

Where µ and are the mean and standard deviation defined earlier.

Page 12: GG 313 Lecture 6

We can make a Matlab m-file to generate the normal distribution:

% Normal distributionxsigma=1.;xmean=7;xx=[0:.1:15];xx=(xx-xmean)/xsigma;px=(1/(xsigma*sqrt(2*pi)))*exp(-0.5*xx.^2);xmax=max(px)xsum=sum(px)/10.plot(xx,px)

Or, make an Excel spreadsheet plot:

Page 13: GG 313 Lecture 6

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14 16 18 20

sigma=1, mean=7

sigma=1,5, mean=8

sigma=2.25, mean=9

sigma=3.375, mean=10

sigma=5.1, mean=11

sigma=7.6, mean=12

sigma=11.4, mean=13

Note that smaller values of sigma are sharper (have smaller kurtosis).

Page 14: GG 313 Lecture 6

We can define a new variable that will normalize the distribution to =1 and =0:

zi =x i −μ

σAnd the defining equation reduces to:

p(x) =1

σ 2πe

−1

2

x−μ

σ

⎝ ⎜

⎠ ⎟2

, p(z) =1

2πe

−z 2

2

The values on the x axis are now equivalent to standard deviations: x=±1 = ±1 standard deviation, etc.

Page 15: GG 313 Lecture 6

This distribution is very handy. We expect 68.27% of our results to be within 1 standard deviation of the mean, 95.45% to be within 2 standard deviations, and 99.73% to be within 3 standard deviations. This is why we can feel reasonably confident about eliminating points that are more than 3 standard deviations away from the mean.

Page 16: GG 313 Lecture 6

We also define the cumulative normal distribution:

P(z ≤ a) = p(z)dz =−∞

a

∫ p(z)dz + p(z)dz =0

a

∫−∞

0

∫ 1/2 +1

2πe

−z 2

2 dz0

a

let : u2 =z2

2 and dz = 2du, then

P(z ≤ a) =1/2 +1

πe

−u2

2 du0

a

2∫ =1/2 +1

π

π

2erf

a

2

⎝ ⎜

⎠ ⎟=1/2 1+ erf

a

2

⎝ ⎜

⎠ ⎟

⎝ ⎜

⎠ ⎟

Erf(x) is called the error function.

For any value z,

Pc (z) =1/2 1+ erfz

2

⎝ ⎜

⎠ ⎟

⎝ ⎜

⎠ ⎟

Page 17: GG 313 Lecture 6

0

0.25

0.5

0.75

1

-4 -2 0 2 4

Pc(z)

z

CumulativeNormaldistribution

Page 18: GG 313 Lecture 6

We easily see that the probability of a result being between a and b is:

Pc (a ≤ z ≤ b) = Pc (b) − Pc (a) =1/2 erfb

2

⎝ ⎜

⎠ ⎟− erf

a

2

⎝ ⎜

⎠ ⎟

⎝ ⎜

⎠ ⎟

Example: Estimates of the strength of olivine yield a normal distribution given by µ=1.0*1011 Nm and =1.0 *1010 Nm. What is the probability that a sample estimate will be between 9.8*1010 Nm and 1.1*1011 Nm? First convert to normal scores

And calculate with the formula above.

DO THIS NOW, either in Excel or Matlab -€

zi =x i −μ

σ

Page 19: GG 313 Lecture 6

The normal distribution is a good approximation to the binomial distribution for large n (actually, when np and (1-p)n >5). The mean and standard deviation of the binomial distribution become:

=np and σ = np(1− p)

So that:

Pb x( ) =1

2πnp 1− p)( )exp

−1(x−np )2

2np(1− p )

⎧ ⎨ ⎩

⎫ ⎬ ⎭

I had a difficult time getting this to work in Excel because the term -(x-np)2 is evaluated as (-(x-np))2

Page 20: GG 313 Lecture 6

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 2 4 6 8 10 12 14 16

normal

binomial

N=20, p=0.5, µ=np=10, =np(1-p)=2.236

Page 21: GG 313 Lecture 6

Poisson distribution

One way to think of a Poisson distribution is that it is like a normal distribution that gets pushed close to zero but can’t go through zero. For large means, they are virtually the same.The Poisson distribution is a good approximation to the binomial distribution that works when the probability of a single event is small but when n is large.

P(x) =λxe−λ

x!, x = 0,1,2,3,L n where

λ = np

Is the rate of occurrence. This is used to evaluate the probability of rare events.

Page 22: GG 313 Lecture 6

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14 16 18 20

lambda=2

lambda=4

lambda=8

lambda=12

lambda=16

lambda=20

lambda=1

Note that the Poisson distribution approaches the normal distribution for large .

Poisson Distribution

Page 23: GG 313 Lecture 6

Example: The number of floods in a 50-year period on a particular river has been shown to follow a Poisson distribution with =2.2. That is the most likely number of floods in a 50 year period is a bit larger than 2.

What is the probability of having at least 1 flood in the next 50 years?

The probability of having NO floods (x=0) is e-2.2, or 0.11.

The probability of having at least 1 flood is (1-P(0))=0.89.

Page 24: GG 313 Lecture 6

The Exponential Distribution

P(x) = λe−xλ

And:

Pc (x) =1− e−xλ

% exponential distributionlambda=.5 % lambda=rate of occurrencexx=[0:1:20];px=lambda*exp(-1*xx*lambda);plot(xx,px)holdcum=1-exp(-1*xx*lambda)plot(xx,cum)

Page 25: GG 313 Lecture 6

Pc(x)

P(x)

Exponential Distribution

Page 26: GG 313 Lecture 6

Example: The height of Pacific seamounts has approximately an exponential distribution with Pc(h)=1-e-h/340, where h is in meters. Which predicts the probability of a height less than h meters. What’s the probability of a seamount with a height greater than 4 km?

Pc(4000)=1-e-4000/340 which is approximately 0.99999 so the probability of a large seamount is 0.00001. (Which I don’t believe….)

Page 27: GG 313 Lecture 6

Log-Normal Distribution

In some situations, distributions are greatly skewed, a situation seen in some situations, such as grain size distributions and when errors are large and propagate as products, rather than sums.

In such cases, taking the log of the distribution may result in a normal distribution. The statistics of the normal distribution can be obtained and exponentiated to obtain the actual values of uncertainty.


Recommended