+ All Categories
Home > Documents > 01 Probability and Probability Distributions

01 Probability and Probability Distributions

Date post: 06-Jul-2018
Category:
Upload: rama-dulce
View: 242 times
Download: 9 times
Share this document with a friend

of 18

Transcript
  • 8/17/2019 01 Probability and Probability Distributions

    1/18

    - 1 -

    PROBABILITY AND PROBABILITY DISTRIBUTIONS – REVIEW

    Topics Outline

    • Probability of Events

    • Probability Rules

    • Random Variables• Probability Distributions

    Probability of Events

    There are many interpretations of probability. The three most widely used approaches are:

    1. Classical method – based on the assumption of equally likely events.

    Example: Six-sided fair die.

    Each side has the same chance of turning up. Therefore, each has a probability 1/6.

    2. Empirical method – based on experimental or historic data.Example: Predicting the weather.

    A 30% chance of rain today means that it rained on 30% of all days with similaratmospheric conditions.

    3. Subjective method – based on judgment, experience, intuition.Example: Chris and Sally make an offer to purchase a house.

    Sally believes that the probability their offer will be accepted is 0.8.

    Chris, however, believes that the probability their offer will be accepted is 0.6.

    In the case of equally likely events, a convenient way of thinking about probabilities is:

    P(A) =outcomesall

    Ainoutcomesof number

    Probability Rules

    No matter which method is used to assign probabilities to events, the following

    probability rules (axioms) must hold.

    Rule 1.  1)(0   ≤≤  AP  for any event A

    That is, the probability of any event A is a number that lies between 0 and 1.

    Rule 2. P(all outcomes) = 1

    That is, the probability of “something happening” is 1.

    The compliment Ac of A is the event consisting of all sample points that are not in A.

    That is, Ac is the event that A does not occur.

    Rule 3. Compliment rule 

    P(Ac) = 1 – P(A)

    That is, the probability of an event not occurring is 1 minus the probability that the event does occur.

  • 8/17/2019 01 Probability and Probability Distributions

    2/18

    - 2 -

    Rule 4. Addition rule 

    P(AB) = P(A) + P(B) – P(AB)

    AB = union = “or” = either A or B or both

    AB  = intersection = “and” = both A and B 

    Two events A and B are mutually exclusive (disjoint) if they have no outcomes in common andso can never happen together.

    Addition rule for mutually exclusive events

    If A an B are mutually exclusive,

    P(AB) = P(A) + P(B)

    Addition rule for more than two mutually exclusive events:

    P(A1A2 . . . An) = P(A1) + P(A2) + . . . + P(An)

    The conditional probability of the event A given that the event B has already occurred is given by

     

    Rule 5. Multiplication rule 

          

    Two events A and B are independent if knowledge of the occurrence of one has no influence on

    the probability of occurrence of the other, that is

    P(A|B) = P(A)

    Multiplication rule for independent events

    P(AB) = P(A)P(B)

    Mutually exclusive events are not independent! If two events are mutually exclusive,

    then knowledge of the occurrence of one has influence on the probability of the other.(If you know that event B has occurred, you know that event A cannot have occurred).

    How to check independence?

    Use any one of the following techniques:

    1. Check if A and B are mutually exclusive.If they are mutually exclusive, then they are not independent.

    2. Check if the multiplication rule for independent events P(A B) = P(A)P(B) holds.If it holds, then the events are independent.

    3. Use the definition of independent events: P(A|B) = P(A)

    If it is satisfied, then the events are independent.

  • 8/17/2019 01 Probability and Probability Distributions

    3/18

    - 3 -

    Example 1 Due to rising health insurance costs, 43 million people in the United States go without healthinsurance (Time, December 1, 2003). Sample data representative of the national health insurance

    coverage for individuals 18 years of age and older are shown here.

    Age 

    Health Insurance

    Yes No

    18 to 34 750 170

    35 and over 950 130

    a. Develop a joint probability table for these data.First we calculate the totals in the table above.

    Health Insurance

    Age Yes No Total

    18 to 34 750 170 920

    35 and over 950 130 1080Total 1700 300 2000

    Total sample size = 2000.Dividing each entry by 2000 provides the following joint probability table (or contingency table).

    Health Insurance

    Age Yes No

    18 to 34 0.375 0.085 0.46

    35 and over 0.475 0.065 0.54

    Total 0.850 0.150 1.00

    Marginal probabilites are displayed in the margins and represent the probability of one event.

    Joint probabilities are displayed in the interior cells and represenet probabilities of intersections.

    Let A = 18 to 34 age group

    B = 35 and over age group

    Y = Insurance coverage

    N = No insurance coverage

    The marginal probabilities are:

    P(A) = 0.46 P(B) = 0.54 P(Y) = 0.85 P(N) =0.15

    The joint probabilities are:

    P(AY) = 0.375 P(AN) = 0.085

    P(BY) = 0.475 P(BN) = 0.065

  • 8/17/2019 01 Probability and Probability Distributions

    4/18

    - 4 -

    b. What is the probability that a randomly selected individual does not have health insurance

    coverage?

    c. If the individual is between the ages of 18 and 34,what is the probability that the individual does not have health insurance coverage?

    d. If the individual does not have health insurance coverage,what is the probability that the individual is in the 18 to 34 age group?

    e. Are the events A and N independent?

    f. What does the probability information tell you about the health insurance coverage in the United

    States?

  • 8/17/2019 01 Probability and Probability Distributions

    5/18

    - 5 -

    Solution:

    b. What is the probability that a randomly selected individual does not have health insurance

    coverage?

    P(N) = 0.15

    c. If the individual is between the ages of 18 and 34,what is the probability that the individual does not have health insurance coverage?

    1848.046.0

    085.0

    )(

    )()|(   ==

    ∩=

     AP

     A N P A N P

     

    d. If the individual does not have health insurance coverage,what is the probability that the individual is in the 18 to 34 age group?

    5667.015.0085.0

    )()()|(   ==∩=

     N P N  AP N  AP

     

    Please note that the probabilities in (c) and (d) are different.

    e. Are the events A and N independent?

    P(AN) = 0.085

    P(A)P(N) = (0.46)(0.15) =0.069

    Since 0.085 ≠0.069, A and N are not independent

    Or, equivalently

    P(A|N) = 0.5667

    P(A) = 0.46

    Since 0.5667 ≠0 .46, A and N are not independent.

    f. What does the probability information tell you about the health insurance coverage in the

    United States?

    Probability of no health insurance coverage is 0.15.

    A higher probability for no insurance exists for the younger population:

    0.1848 (or approximately 18.5%) versus 0.1204 (or approximately 12%).

    Of the no insurance group, more are in the 18 to 34 age group:

    0.5667, or approximately 57% are ages 18 to 34.

  • 8/17/2019 01 Probability and Probability Distributions

    6/18

    - 6 -

    Random Variables

    A random variable is a variable taking numerical values determined by the outcome of a

    random phenomenon.

    Example: Toss two coins and let X  stand for the number of heads: 0, 1, or 2.

    Outcome TT HT TH HH

    | \ / |

     X   0 1 2

    A random variable can be classified as being either discrete or continuous depending on the

    numerical values it assumes.

    Discrete random variables can take only a finite number, or a countable infinity of values.For example, the above defined random variable X  can assume only three values (0, 1, or 2),

    it is discrete.

    The random variable

     X  = {# of customers at a gas station for one day}

    is a discrete random variable which can take an infinite sequence of values (0, 1, 2, and so on).

    Continuous random variables may take any numerical value in an interval or collection ofintervals. For example,

     X  = {height of students in this class}

    is a continuous random variable.

    In general, continuous random variables represent measured  data, such as height, weight,

    temperature, distance, or time, whereas discrete random variables represent count  data,

    such as the number of defectives in a sample of 50 items or the number of cars arriving at

    a tollbooth during a one-day period.

    One way to determine whether a random variable is discrete or continuous is to think of the

    values of the random variable as points on a line segment. Choose two points representing valuesof the random variable. If the entire line segment between the two points also represents possible

    values for the random variable, then the random variable is continuous.

  • 8/17/2019 01 Probability and Probability Distributions

    7/18

    - 7 -

    Probability Distributions

    The probability distribution function  f(x) for a discrete random variable X  provides the probability

    for each value of the random variable. A given assignment of probabilities produces a valid

    probability distribution if it satisfies the following two rules:

    Rule 1.  f(x) ≥ 0 for all x 

    Rule 2. ∑   = x

     x f  all

    1)(

    To calculate the probabilities involving continuous random variables we use a special function

    called density function. The probability that the continuous random variable X  takes on a value

    in the interval [a, b] is equal to the area under the graph of the probability density function  f(x) over the interval [a, b]. The density function must satisfy the following two rules:

    Rule 1.  f(x) ≥ 0 for all x That is, it is always on or above the horizontal axis.

    Rule 2.  ∫   = x

    dx x f  all

    1)(

    That is, the total area under the graph of f(x) is equal to 1.

    Note the similarities between these conditions and those for a probability distribution function of adiscrete random variable. However, there are important differences between the two kinds of

    probability functions. Note that for a continuous random variable:

    1. P( X  = c) = 0 for all c 

    That is, the probability of any given point equals zero.

    (Because P( X  = c) is the area of the line segment over the point c and the area of a line segment is zero

    2. It follows that in continuous case

    P(a ≤  X ≤  b) = P(a < X ≤  b) = P(a ≤  X < b) = P(a < X < b)

    3. f(x) may exceed a value of 1.

    The calculation of the expected value or mean and variance for a continuous random variable is

    analogous to that for a discrete random variable. The difference is that instead of summations we use

    integrals:

    Discrete random variable Continuous random variable

    ∑== x

     x xf  X  E  all

    )()(   µ    ∫== x

    dx x xf  X  E  all

    )()(   µ   

    ∑   −== x

     x f  x X Var  all

    22 )()()(   µ σ     ∫   −== x

    dx x f  x X Var  all

    22 )()()(   µ σ    

    The standard deviation in both cases is2

    σ  σ    = .

  • 8/17/2019 01 Probability and Probability Distributions

    8/18

    - 8 -

    Example 2

    A psychologist determined that the number of sessions required to obtain the trust of a new patient iseither 1, 2, or 3. Let  X   be a random variable indicating the number of sessions required to gain the

    patient’s trust. The following probability function has been proposed.

    3or21,for

    6

    )(   ==  x x

     x f   

    (a) Is this probability function valid? Explain.

     x 

    (Values which

     X  can take)

     f(x) 

    (Associated

    probabilities)

    1 1/6

    2 2/6

    3 3/6

    It is a valid probability distribution since

    1. f(x) ≥ 0 for all x 

    2. f (1) + f (2) + f (3) = 1/6 + 2/6 + 3/6 = 1

    (b) What is the probability that it takes exactly 2 sessions to gain the patient’s trust?

     f (2) = 2/6 = 0.333

    (c) What is the probability that it takes at least 2 sessions to gain the patient’s trust?

     f (2) + f (3) = 2/6 + 3/6 = 5/6 = 0.833

    (d) What is the expected number of hours required to obtain the trust of a new patient?

    ( ) ( ) 333.26 / 146 / 96 / 46 / 1633

    622

    61)1()()(   ==++=

      

      +

      

      +

      

      === ∑ xall

     x xf  x E    µ   

    (e) What is the variance of the hours required to obtain the trust of a new patient?

    ∑   −== xall

     x f  x xVar  )()()( 22  µ σ  

     

    556.09

    5

    6

    3

    6

    143

    6

    2

    6

    142

    6

    1

    6

    141

    222

    == 

      

      

      

     −+

     

      

      

      

     −+

     

      

      

      

     −=  

    (f) What is the standard deviation of the hours required to obtain the trust of a new patient?

    745.09

    52===   σ  σ  

     Using Excel:

     x f(x) xf(x)  x- µ    ( x- µ  )^2 ( x- µ  )^2 f(x) 

    1 0.167 0.167 -1.333 1.778 0.296

    2 0.333 0.667 -0.333 0.111 0.037

    3 0.500 1.500 0.667 0.444 0.222

     µ = 2.3332

    σ   = 0.556σ   = 0.745

  • 8/17/2019 01 Probability and Probability Distributions

    9/18

    - 9 -

    Binomial Distribution

    Suppose we have a random process with just two possible outcomes, for example:

    – tossing a coin (heads or tails)

    – football game (win or loss)

    – auto smog inspection (pass or fail)

    If the following properties are present we say the random process is a binomial experiment:

    1. The experiment consists of a sequence of n identical trials.

    2. The result of each trial may be either success or failure.

    3. At each trial, the probability of a success is equal to p, and the probability of a failure is equal to 1 – p

    4. The trials are independent.

    (That is, the outcome of one trial has no influence on later outcomes.)

    The binomial random variable is defined as

     X  = number of successes in n trialsThe probability of having x successes in n trials is given by the binomial distribution function:

     xn x p p

     x

    n x f 

      −−

     

      

     = )1()(

     where

    n – number of trials

     x – number of successes; x = 0, 1, 2, ... ,n 

     p – probability of success

    Recall: The binomial coefficient “n choose x” is equal to

    )!(!

    !

     xn x

    n

     x

    n

    −=

     

      

     

     

    and represents the number of ways to choose x “successes” in a sequence of n observations.

    Factorial:

     0! = 1

    Note that  X   is a discrete random variable that can assume any of the values 0, 1, 2 ,…, n.

    It can be shown that in the case of a binomial random variable, the general formulas for theexpected value, variance, and standard deviation simplify to the following:

    np X  E    ==  µ )(  

    )1()( 2  pnp X Var    −== σ    

    )1(  pnp   −=σ    

  • 8/17/2019 01 Probability and Probability Distributions

    10/18

    - 10 -

    Example 3

    Suppose that the likelihood that someone who logs onto a particular site in a “shopping mall” on the

    World Wide Web will purchase an item is 0.2.

    If the site has 5 people accessing it in the next minute, what is the probability that

    (a) exactly 2 individuals will purchase an item?

    Let X  = number of individuals who will purchase an item.

    ( ) ( ) 2048.08.02.0)!25(!2

    !5)2.01()2.0(

    2

    5)2()2(

    32252=

    −=−

     

      

     ===

      − f  X P  

    Or using Excel,

    P( X =2) = f (2) = BINOMDIST(2,5,0.2,FALSE) = 0.2048

    (b) at most 2 individuals will purchase an item?

    )2()1()0()2(   =+=+==≤  X P X P X P X P  

    ( ) ( )50 8.02.00

      

     =

    +

    ( ) ( )41 8.02.01

      

     

    +

    ( ) ( )   = 

      

      328.02.0

    2

    5

     

    = 0.32768 + 0.4096 + 0.2048 = 0.94208

    Using Excel,

    P(X ≤ 2) = BINOMDIST(2,5,0.2,TRUE) = 0.94208

    (c) more than 2 individuals will purchase an item?

    P(X > 2) represents the complement of the probability P(X ≤  2).

    Because all the probabilities in a probability distribution must sum to 1,

    05792.094208.01)2(1)2(   =−=≤−=>  X P X P  

    (d) On average, how many individuals will purchase an item?

    1)2.0)(5()(   ==== np X  E    µ   

    (e) What is the variance of the number of individuals who will purchase an item?

    8.0)2.01)(2.0)(5()1(2 =−=−=  pnpσ    

    (f) What is the standard deviation of the number of individuals who will purchase an item?

    8944.08.02 ===   σ  σ    

  • 8/17/2019 01 Probability and Probability Distributions

    11/18

    - 11 -

    Uniform Distribution

    A continuous random variable X  is said to be uniformly distributed over the interval [a, b] if its

    probability density function is given by

    ≤≤−=

    elsewhere0

    1

    )(b xa for 

    ab x f   

    The expected value, variance, and standard deviation are given by

    2)(

    ba X  E 

      +==  µ   

    12)()(

    22 ab X Var    −== σ  

     

    12

    )( 22 ab −==   σ  σ    

    Possible applications:

    1. It is used as a “first” model for a quantity that is felt to be randomly varying between a and b 

    but about which little else is known.

    2. It is essential in generating random values from all other distributions.

  • 8/17/2019 01 Probability and Probability Distributions

    12/18

    - 12 -

    Example 4 

    The time X  it takes to build a laser printer is thought to be uniformly distributed between 7 and 15 hours.

    (a) Plot the density curve of X .

    The height should be8

    1 because

    the area under the curve must be 1. 

    (b) What are the chances that it will take more than 10 hours to build a printer?

    )10(   > X P  = area of the rectangle with

    base 5 ( = 15 – 10) and height8

    1

      = (5) 625.08

    5

    8

    1==

     

      

       or 62.5% 

    (c) Determine the probability that it will take between 12 and 14 hours to build a printer.

    )1412(  

  • 8/17/2019 01 Probability and Probability Distributions

    13/18

    - 13 -

    Normal Distribution

    The most important probability distribution in the entire field of statistics is the normal or

    Gaussian distribution. It was discovered by DeMoivre in 1733 and reintroduced by Gauss near

    the beginning of 19-th century. The density function of the normal distribution with mean  µ   

    and standard deviation σ   is given by

    2

    2

    2

    )(

    2

    1)(   σ  

     µ 

    σ  π  

    −−

    =

     x

    e x f   

    where π   = 3.14159… and e = 2.71828 is the base of the natural logarithm.

    Note that:

    1. The normal curve is single-peaked, bell-shaped, symmetric about the mean  µ  .

    2.  µ   can be any real number – positive, negative or 0.

    3. )( x f   has a maximum at  µ   and the maximum value of the density function isσ  π  

     µ 2

    1)(   = f  .

    4. σ   is the distance from  µ   to the change of curvature points on either side;

    σ  determines how widely spread the distribution will be; larger σ   implies a more

    disperse curve.

    5. The normal curve approaches the horizontal axis asymptotically as we proceed in either

    direction from the mean.

    We abbreviate the normal distribution with mean  µ   and standard deviation σ   as N( µ  ,σ  ).

    Once µ andσ   are specified, the normal curve is completely determined.

    For example, if  µ =7 and σ  =5, then the ordinates 50)7(

    2

    25

    1)(

    −−

    =

     x

    e x f π  

    can easily be computed

    for various values of x and the curve drawn.

  • 8/17/2019 01 Probability and Probability Distributions

    14/18

    - 14 -

    Standard Normal Distribution

    The standard normal distribution N(0,1) is a normal distribution with mean  µ = 0 and standard

    deviation σ  = 1. The density curve of the standard normal distribution is given by

    2

    2

    2

    1)(

     z

    e z f −

    =

    π  

     

    Any normal curve N( ),σ   µ   with mean  µ   and standard deviation σ  can be converted to the

    standard normal curve N(0,1) using the formula

    σ  

     µ −=

     x z  

    This equation rescales any normal distribution axis from its true units (time, weight, dollars,

    barrels, and so forth) to the standard measure referred to as a  z-score.

    Thus, any observation x from a normally distributed density curve can be represented by a

    unique z-score. The z-score represents the number of standard deviations that a data value x is

    away from the distribution mean  µ  .

    Please study the formula in words:

    deviationstandard

    mean-variable

    score-   = z  

  • 8/17/2019 01 Probability and Probability Distributions

    15/18

    - 15 -

    Calculating Normal Probabilities

    Normal probabilities can be calculated using a table of the standard normal distribution or

    software. We will be using a table and Excel. The table gives the area under the standard normal

    curve to the left of a value z.

    There are two types of calculations – forward and backward (or inverse) calculations. Forward

    calculations are used to find probabilities (areas under the curve) given a value of the normal variable.

    Backward calculations are used to find a value of the normal variable given the probability.

    Forward Calculations

    (Finding probabilities)

    Example 5

    An electrical firm manufactures light bulbs that have a length of life that is normally distributed

    with mean equal to 800 hours and a standard deviation of 40 hours.

    Find the probability that a given bulb burns:(a) in less than 700 hours

    (b) between 778 and 834 hours

    (c) after 850 hours

    (d) exactly 850 hours

    (e) What is the value of the density function at 850 hours?

    Solution:  µ = 800 σ  = 40 The formula for finding z-scores isσ  

     µ −=

     x z . 

    (a)

    40

    800700 −=

     z = –2.5

    The area left of (–2.5) is 0.0062

    The calculations if we use Excel instead of a table are:

    P( X  < 700) = NORMDIST(700,800,40,TRUE) = 0.00621

    The proportion of bulbs that burn in less than 700 hours is 0.0062, or 0.62%.

    (b)

    85.040

    800834834   =

    −= z  

    55.040

    800778778   −=

    −= z  

  • 8/17/2019 01 Probability and Probability Distributions

    16/18

    - 16 -

    area for X between 778 and 834 = area for x left of 834 – area for x left of 778

    = area for z left of 0.85 – area for z left of (–0.55)

    = 0.8023 – 0.2912

    = 0.5111

    Calculations if Excel is used:

    P(778 850) = 1 – P( X ≤  850)

    = 1 – NORMDIST(850,800,40,TRUE)

    = 1 – 0.8944

    = 0.1056

    The proportion of bulbs that burn after 850 hours is 0.1056, or 10.56%.

    (d) The answer is 0, since there is no area under a smooth curve and exactly over the point 850.

    (e) What is the value of the density function at 850 hours?

     f (850) = NORMDIST(850,800,40,FALSE) = 0.0046

    The value of the density function is 0.0046 and it represents the height of the distribution at

    the point 850.

  • 8/17/2019 01 Probability and Probability Distributions

    17/18

    - 17 -

    Backward (inverse) calculations

    (Finding cut-off points)

    Example 5 (continued)

    (f) One percent of the bulbs will have a life expectancy of at most how many hours?

    (g) One percent of the bulbs will have a life expectancy of at least how many hours?

    Solution:

    (f) The point40

    800−=

     x z  cuts off 1%, or 0.01 in the lower tail of the standard normal distribution.

    Using the table backward, we find that the entry closest to 0.01 corresponds to  z = –2.33.

    Substituting z in the above equation and solving for x we get

    40

    80033.2

      −=−

     x 

    (40)(–2.33) = x – 800

     x = 800 + (40)(–2.33) = 706.80 ≈ 707

    Note: To find x, we can also use the formulaσ   µ   z x   +=  

     x = 800 + (–2.33) (40) = 706.80 ≈ 707

    Or, using Excel:

     x = NORMINV(0.01,800,40) = 706.95 ≈  707

    Thus, 1% of the bulbs will have a life expectancy of at most 707 hours.

  • 8/17/2019 01 Probability and Probability Distributions

    18/18

    - 18 -

    (g) The point40

    800−=

     x z  cuts off 1%, or 0.01 in the upper tail of the standard normal distribution.

    This means that the area to the left of this point is 99%, or 0.99.

    Using the table backward, we find that the entry closest to 0.99 corresponds to  z = 2.33.

    (Note: We could also find the z-value using the result from part (f) and the symmetry

    of the standard normal distribution.)

    Substituting z in the above equation and solving for x we get

    40

    80033.2

      −=

     x 

    (40)(2.33) = x – 800

     x = 800 + (40)(2.33) = 893.20 ≈ 893

    Or, using the formula

    σ   µ   z x   +=  

     x = 800 +(40)(2.33) = 893.20 ≈ 893

    Using Excel,

     x = NORMINV(0.99,800,40) = 893.05 ≈  893

    Therefore, 1% of the bulbs will have a life expectancy of at least 893 hours.


Recommended