Introduction to Statistical Data Analysis Lecture 3: Probability Distributions · 2016-08-18 ·...

IntroductionUniform DistributionBinomial Distribution

Hypergeometric DistributionPoisson Distribution

Continuous Distributions

Introduction to Statistical Data AnalysisLecture 3: Probability Distributions

James V. Lambers

Department of MathematicsThe University of Southern Mississippi

James V. Lambers Statistical Data Analysis 1 / 56




Random VariablesDiscrete Probability DistributionsRules for Discrete DistributionsMeanVariance and Standard Deviation

Introduction

Now that we know how to compute probabilities of events, we can studythe behavior of the probability across all possible outcomes of anexperiment–that is, the distribution of the probability across the samplespace.

Our understanding of the probability distribution will eventually allow usto make inferences from the data from which the distribution arises.






Random Variables

A random variable, usually denoted by a capital letter such as X , is anoutcome of an experiment that has a numerical value.

The value itself is usually denoted by the lower-case version of the letterused to denote the variable itself; that is, a random variable X takes onnumerical values that are denoted by x .

Random variables can either be continuous or discrete.

A continuous random variable can assume a value equal to any realnumber within some interval, whereas a discrete random variable canonly assume selected numerical values, such as, for example, nonnegativeintegers. We will study random variables of both kinds.






Discrete Probability Distributions

A discrete probability distribution is a listing of all possible values of adiscrete random variable, along with the probability of each value beingassumed by the variable.






Example

Let X be a discrete random variable whose outcomes correspond towhere one finishes in a race: first, second, third, etc.

If there are 10 runners in the race, then X can assume as a value anypositive integer between 1 and 10.






The Distribution

The probability distribution might look like the following:

x P(X = x)1 0.12 0.153 0.234 0.185 0.156 0.17 0.048 0.029 0.0210 0.01

Note that the notation P(X = x) is used to refer to the probability thatthe random variable X assumes the value x .






Rules for Discrete Distributions

A discrete probability distribution must follow these rules:

I Each outcome must be mutually exclusive of the others; that is, wecannot have X assume two values simultaneously as a the result ofan experiment.

I For each outcome x , we must have 0 ≤ P(X = x) ≤ 1.

I If the distribution has n possible outcomes x1, x2, . . . , xn, then wemust have

n∑i=1

P(X = xi ) = 1.






Mean

For a given probability distribution, it is very helpful to know the “mostlikely”, or expected, value that the variable will assume.

This can be obtained by computing a weighted mean of the outcomes,where the probabilities serve as the weights.

We therefore define the mean, or expected value, of the discrete randomvariable X by

E [X ] = µ =n∑

i=1

xiP(X = xi ).






Example

Consider a raffle, in which each ticket costs $5.

There is one grand prize of $100, two first prizes of $50 each, and foursecond prizes of $25 each.

If 200 tickets are sold, then the probability of winning the grand prize is1/200 = 0.005, while the probabilities of winning first prize and secondprize are 2/200 = 0.01 and 4/200 = 0.02, respectively.

Then, the expected amount of winnings is

E [X ] = 100(0.005) + 50(0.01) + 25(0.02) + 0(0.965) = 1.5.






Interpretation

That is, a ticket holder can expect to win, on average, $1.50.

However, we must account for the cost of the ticket, which applies to allparticipants; therefore, the expected net winnings is −$3.50.

Since the expected amount is negative, the raffle is not fair to the ticketholders; if the expected value was zero, then the raffle would beconsidered a “fair game”.






Variance and Standard Deviation

Using the mean of X , we can then characterize the dispersion of theoutcomes by defining the variance of X as follows:

σ2 =n∑

i=1

(xi − µ)2P(X = xi ).

An equivalent formula, in terms of expected values, is

σ2 = E [X 2]− E [X ]2.

Note that in the first term, the values of X are squared, and then theyare multiplied by the probabilities and summed, whereas in the secondterm, the expected value is computed first, and then squared.





Uniform Distribution

The uniform distribution U{a, b} is the probability distribution for arandom variable X with domain {a, a + 1, . . . , b} in which each value inthe domain of X is equally likely to be observed

It follows that the probability mass function for this distribution is

P(X = k) =1

n, n = b − a + 1, k ∈ {a, a + 1, . . . , b}





Mean and Variance

Using the above definitions of the mean and variance of a discreterandom variable, it can be shown that

E [X ] =a + b

2, σ2 =

(b − a + 1)2 − 1

12

If a random variable X has the distribution U{a, b}, we writeX ∼ U{a, b}

We will use similar notation with other probability distributions, in orderto indicate that a given random variable has a particular distribution





Binomial ExperimentsThe Binomial DistributionThe Mean and Standard Deviation

Binomial Experiments

Suppose that an experiment is performed n times, and it can have onlytwo outcomes, that are classified as “success” and “failure”.

Each of these individual experiments is referred to as a trial.

Furthermore, suppose that each trial is independent of the others, andthat the probability of a trial being successful is p, where 0 < p < 1 (andtherefore, the probability of failure is q = 1− p).

These trials are called Bernoulli trials.






Examples

Examples of Bernoulli trials are:

I Testing for defective parts, in which n is the number of parts to bechecked, p is the probability that a part is not defective, and k is thenumber of parts that are not defective.

I Observing the number of correct responses on exam, in which n isthe total number of questions, p is the probability of getting thecorrect answer on a single question, and k is the number of correctresponses.

I Counting number of households with an internet connection, inwhich n is the number of households, p is the probability of a singlehousehold having an internet connection, and k is the number ofhouseholds that have an internet connection.






The Binomial Distribution

The binomial distribution B(n, p) is the probability distribution for thediscrete random variable X whose value is the number of successes,denoted by k , in n Bernoulli trials, with probability of success p for eachtrial.

Given a value for k , 0 ≤ k ≤ n, what is P(X = k), the probability that Xis equal to k?

First, we note that because the trials are independent, the probability ofsuccess (or failure) in consecutive trials can be obtained simply bymultiplying the probabilities of the outcomes of the individual trials.

It follows that the probability of k successes, followed by n− k failures, is

pk(1− p)n−k .






Probability Mass Function

However, to determine the probability that any k of the n trials aresuccessful, we have to consider all possible ways to choose k trials out ofthe n to be successful.

That is, we must multiply the above expression by nCk .

We conclude that the probability mass function for the binomialdistribution is

P(X = k) = nCkpk(1− p)n−k =

n!

k!(n − k)!pk(1− p)n−k .

Using properties of the binomial coefficients, it can be verified that thesum of all of these probabilities, for k = 0, 1, 2, . . . , n, is equal to 1.






Examples

The binomial distribution, for various values of n and p






Behavior of the Distribution

Note that the binomial distribution is symmetric if p = 0.5, in which casethe probability mass function simplifies to P(X = k) = nCk2−n.

Otherwise, the distribution skews to the left if p < 0.5, because there is agreater probability of more failures, and skews to the right if p > 0.5,since there is a greater probability of more successes.






The Mean and Standard Deviation

Using the definition of expected value, and properties of binomialcoefficients, it can be shown by direct computation, and a lot of algebraicmanipulation, that if X is a discrete random variable with a binomialdistribution corresponding to n trials and probability of success p, then

E [X ] = µ = np.

It can also be shown that the standard deviation is given by

σ =√np(1− p).






Binomial Distribution in R

In R, the function dbinom can be used to compute probabilities from abinomial distribution.

Its first argument is a value, or vector of values, of k (number ofsuccesses).

The second argument is n, the number of trials, and the third argumentis p, the probability of success.

An example of its usage is:

> dbinom(c(0,1,2,3,4),4,0.5)

[1] 0.0625 0.2500 0.3750 0.2500 0.0625

The output lists P(X = k), for k = 0, 1, 2, 3, 4, with p = 0.5 and n = 4.





Hypergeometric Distribution

The binomial distribution involves sampling with replacement, becauseeach trial is independent of the other trials.

By contrast, the hypergeometric distribution is based on samplingwithout replacement.

Suppose that n trials are to be performed, but the outcomes of thesetrials are drawn from a set of N + M outcomes, of which N are successesand M are failures.

The hypergeometric distribution describes the probability that k of the ntrials are successes.





Example

A situation that would call for the hypergeometric distribution is thefollowing:

Suppose that you have 100 lightbulbs, and you know that 10 of them aredefective.

If you need 20 lightbulbs and you start taking them from the collection of100, what is the probability that at most 2 of the chosen lightbulbs aredefective?





Probability Mass Function

To compute the probability of k successes out of n trials, we need tocount the number of ways to choose k objects (the successes) out of aset of N, and then choose n − k objects (the failures) out of a set of M.

This is divided by the number of ways to choose n objects from a set ofN + M, to obtain the probability mass function

P(X = k) =

(Nk

)(M

n − k

)(

N + Mn

) .





Mean and Variance

It can be shown that if X is a random variable that follows thehypergeometric distribution, and p = N/(N + M) is the probability ofsuccess in a single trial, then the mean and variance are given by

E [X ] = np, Var(X ) = np(1− p)

(1− n − 1

N + M − 1

).

It is interesting to note as N + M →∞ in such a way that p remainsfixed, the variance converges to that of the binomial distribution, whichmakes sense because as the number of outcomes increases, thedistinction between sampling with replacement and sampling withoutreplacement diminishes.





Example

Continuing the previous example of sampling lightbulbs, the probability ofat most two defective lightbulbs (failures), or at least 18 successes, is

P(X ≥ 18) = P(X = 18) + P(X = 19) + P(X = 20)

=

(9018

)(102

)(

10020

) +

(9019

)(101

)(

10020

) +

(9020

)(100

)(

10020

)= 0.318 + 0.268 + 0.095

= 0.681.





Poisson ProcessesThe Poisson DistributionApproximating the Binomial Distribution

Poisson Processes

In contrast to Bernoulli trials, in which an experiment consists of a fixednumber of trials and the number of successful trials is counted, a Poissonprocess is an experiment that counts the number of occurrences of acertain outcome over a certain period of time, area, or otherdomain-defining quantity.

In addition, a Poisson process has these defining characteristics:

I The mean number of occurrences must be the same for each intervalof measurement, and

I The number of occurrences within an interval must be independentof those in any other interval.






Examples of Poisson processes

I Car accidents within a particular area

I Requests for documents from a web server

I Calls received by a call center

I Customers entering a queue






The Poisson Distribution

Suppose a Poisson process has a mean of µ.

Then, the probability distribution of the process P(µ) is described by theprobability mass function

P(X = k) =µke−µ

k!, k = 0, 1, 2, . . . .

It can be shown that the variance is actually equal to the mean, andtherefore the standard deviation is

√µ.






Various Poisson Distributions

The Poisson distribution P(µ), for various values of µ






Example

Suppose that a tire manufacturing plant determines that on average, 0.5defective tires are produced per hour.

Then, if X is the random variable for the number of defective tires perhour, the probability that 1 defective tire is produced within the nexthour is given by

P(X = 1) =(0.5)1e−0.5

1!=

e−0.5

2= 0.303.






Example, cont’d

To determine the probability that at most 3 defective tires will beproduced during the next day, where a work day is defined to be 8 hours,we use the fact that the mean is the same for each interval ofmeasurement to determine that on average, 4 defective tires will beproduced per day.

Then, the probability of at most 3 defective tires is given by

3∑k=0

P(X = k) =40e−4

0!+

41e−4

1!+

42e−4

2!+

43e−4

3!= 0.433.






Poisson Distribution in R

The R function dpois gives the probability for given values of k (firstargument) with a specified mean µ (second argument).

To easily compute cumulative probabilities

n∑k=0

P(X = k),

use the ppois function.

The first argument is n, the highest number of desired outcomes, and thesecond argument is the mean µ.






Poisson Distribution in R, cont’d

The following output gives two ways to perform the computation fromthe preceding example:

> sum(dpois(c(0,1,2,3),4))

[1] 0.4334701

> ppois(3,4)

[1] 0.4334701






Approximating the Binomial Distribution

The Poisson distribution can be used to approximate the binomialdistribution.

This is useful because evaluating the probability mass function

P(X = k) =µke−µ

k!

is easier than evaluating the binomial distribution function

P(X = k) =n!

k!(n − k)!pk(1− p)n−k .






Assumptions

The mean np of the binomial distribution can be substituted for the valueof µ, the mean of the Poisson distribution.

This approximation works well provided that the number of trials n is atleast 20, and the probability of success p is quite small, at most 0.05.

Otherwise, the Poisson distribution is not a good fit for the binomialdistribution curve.






Assumptions are Important!

Note that in the left plot, the parameters n and p do not satisfy theconditions n ≥ 20, p ≤ 0.05, and therefore the two distributions do notagree very well. In the right plot, the parameters do satisfy the conditions(barely), and the fit is much better.

Approximation of the binomial distribution (blue circles) by the Poissondistribution (red crosses) for n = 15, p = 0.1 (left plot) and n = 20,

p = 0.05 (right plot).





Continuous Uniform DistributionExponential DistributionNormal Distribution

Continuous Probability Distribution

Recall that a continuous random variable is a random variable X whosedomain is an interval D = [a, b], which is a subset of R, the set of realnumbers

A continuous probability distribution is a function f : D → [0, 1] whosevalue at x ∈ D is the probability P(X = x)

The function f (x) is the probability density function of X . By analogywith the requirement that the sum of all probabilities in a discreteprobability distribution must equal one, a probability density function fora continuous random variable X must satisfy∫ b

a

f (x) dx = 1,

where the interval [a, b] is the domain of X






Mean and Variance

The mean, or expected value, of a continuous random variable X isdefined by

E [X ] =

∫ b

a

xf (x) dx

Then, we can define the variance in the same way as for a discreterandom variable:

Var[X ] = E [X 2]− E [X ]2






Continuous Uniform Distribution

The continuous uniform distribution U(a, b) is the probability distributionfor a random variable X with domain [a, b] in which all subintervals of[a, b] of the same width are equally likely to be observed

It follows that the probability density function for this distribution is

f (x) =1

b − a, x ∈ [a, b]

Using the above definitions of the mean and variance of a continuousrandom variable, it can be shown that

E [X ] =a + b

2, σ2 =

(b − a)2

12






Continuous Uniform Distribution in R

The R function dunif gives the probability of observing within asubinterval of width 1 centered at x (first argument) on a specifiedinterval [a, b] (second and third arguments). It simply returns 1/(b − a)if a ≤ x ≤ b, and 0 otherwise.

To easily obtain cumulative probabilities, use the punif function. Thefirst argument is c , the largest desired outcome, and the second and thirdarguments are the endpoints a and b, respectively, of the domain of X

Finally, given a probability p, the function qunif(p,a,b) returns thevalue of x (that is, the quantile) such that P(X ≤ x) = p. It can easilybe determined that x = p(b − a) + a






Exponential Distribution

The exponential distribution Exp(λ) is a continuous distribution thatdescribes the time between events in a Poisson process

Its parameter λ is a nonnegative real number that is called the rateparameter

It refers to the number of events per unit of time that are expected tooccur.






The Particulars

The probability density function for a continuous random variableX ∼ Exp(λ) is

f (x) = λe−λx ,

and its mean and variance are

E [X ] =1

λ, Var[X ] =

1

λ2

For example, suppose an operator at a call center receives, on average,two calls per hour. Then the time between calls is a random variableX ∼ Exp(2), and its mean is E [X ] = 1/2. That is, the operator canexpect to receive a call every half hour






Example

Suppose that you are renting a car late at night, and there is only onecustomer service representative working at the counter. On average, hecan assist 10 customers per hour, or one customer every 6 minutes.

This corresponds to a rate parameter of 1/6 customers per minute.

If he just started helping a customer, and you are at the front of the line,what is the probability that you will get to the counter within the next 5minutes?

That probability is

P(X ≤ 5) =

∫ 5

0

1

6e−x/6 dx = 1− e−5/6 = 0.57.

That is, you have a 57% chance of being waited on within 5 minutes.






Normal Distribution

The normal distribution is a probability distribution that is followed bycontinuous random variables, that can assume any real value within someinterval.

A normal distribution has two parameters, its mean µ and its standarddeviation σ; often, N (µ, σ) is used to refer to a specific normaldistribution.

Its mean, median and mode are all the same, and equal to µ.






Characteristics

The distribution is “bell-shaped”, and is symmetric around the mean. Inview of the essential properties of probability, the area under the entirebell-shaped normal distribution curve must be equal to 1.

The probability is always strictly positive; it can never be zero, thoughthe probability approaches zero for values of the variable that are far fromthe mean.

The probability density function is

P(X = x) =1

σ√

2πe−(x−µ)

2/(2σ2).






Normal Distribution in R

This function can be evaluated in R using its function dnorm; forexample, dnorm(1,0.5,2) computes P(X = 1) for the normaldistribution with mean µ = 0.5 and standard deviation σ = 2.

If the third argument is omitted, then σ is assumed to be 1; if the secondargument is omitted as well, then µ is assumed to be 0.

This corresponds to the notion of the standard normal distribution, thathas mean 0 and standard deviation 1.






The Standard Normal Distribution

The standard normal distribution, with mean 0 and standard deviation 1






Calculating Probabilities

Suppose we wish to determine P(X ≤ x0), which happens to the area ofthe region bounded by the normal distribution curve, the x-axis, and thevertical line x = x0.

As such, this probability would be given by

P(X ≤ x0) =

∫ x0

−∞P(X = x) dx =

1

σ√

2π

∫ x0

−∞e−(x−µ)

2/(2σ2) dx ,

but this integral cannot be evaluated using analytical techniques fromcalculus.

It must instead be evaluated numerically, which is cumbersome.






More R Functions

In R, we can use the pnorm function; for example, pnorm(1) computesP(X ≤ 1) for the normal distribution with µ = 0 and σ = 1.

More generally, to compute the probability P(X ≤ x0): pnorm(x0,m,s)

where m is the mean and s is the standard deviation

To find the quantile x0 such that P(X ≤ x0) = a: x0=qnorm(a,m,s)

As with dnorm, the default values of m and s are 0 and 1, respectively






Tables and z-scores

Tables are often used to evaluate normal distribution probabilities.

Such tables use the standard normal distribution N (0, 1); therefore, if adifferent distribution is being used, a conversion to the standarddistribution must be performed first.

This involves computing the z-score,

z =x − µσ

.

If x is a value of the normal distribution N (µ, σ), then z is thecorresponding value in N (0, 1); more precisely, it is the number ofstandard deviations between x and µ.






Using Symmetry

We can now describe how to compute various probabilities using normaldistribution tables. In the following, we assume that z0 is the z-score forx0.

I P(X ≤ x0): Obtain P(Z ≤ z0) from a standard normal distributiontable, or by using pnorm

I P(X > x0) = 1− P(X ≤ x0), because the events X > x0 andX ≤ x0 are complementary. That is, they are mutually exclusive andexhaustive, so their probabilities must sum to 1.

I P(X ≤ µ− x0) = 1− P(X ≤ µ+ x0), by the symmetry of thenormal distribution.

I P(X > µ− x0) = P(X ≤ µ+ x0), again by symmetry.

I P(x1 ≤ X ≤ x2) = P(X ≤ x2)− P(X ≤ x1).






The Empirical Rule, Revisited

The empirical rule, introduced previously, can be used to estimate normaldistribution probabilities.

While it is approximately true for any bell-shaped, symmetric distribution,it is exact for any normal distribution.

In fact, the rule is derived from the behavior of the normal distribution.

Expressed in terms of probabilities, the empirical rule states that

P(−1 ≤ Z ≤ 1) ≈ 0.68,

P(−2 ≤ Z ≤ 2) ≈ 0.95,

P(−3 ≤ Z ≤ 3) ≈ 0.997.






Approximating the Binomial Distribution

Like the Poisson distribution, the normal distribution can be used toapproximate the binomial distribution, as long as the number of trials nand the probability of success p satisfy

np ≥ 5, n(1− p) ≥ 5.






Un-discretization

For computing probabilities, it is best to use the midpoints of the discretevalues of the number of successes.

For example, to approximate P(X ≤ 5), where X is a discrete randomvariable with a binomial distribution, one should work with a continuousrandom variable Y with a normal distribution

N (np,√

np(1− p))

and compute P(Y ≤ 4.5), rather than P(Y ≤ 5).

This is due to the change from a discrete random variable to acontinuous random variable.






Example

Approximation of the binomial distribution with n = 30 and p = 0.25(blue circles) by N (np,

√np(1− p)) (red curve)


Date post:	03-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions · 2016-08-18 ·...

Documents