Post on 08-Apr-2018
transcript
8/7/2019 Statistics 2[1]- Notes
1/29
1
Statistics 2
Binomial Distribution
A spinner is divided into four equal sized sections marked 1, 2, 3, 4. If the spinner is spun 6 times, how
likely is it to land on 1 on four occasions?
One possible sequence would be 111111.
The number of possible sequences is or .
Each sequence has probability 0.254 0.752.
So the required probability is
A binomial distribution arises when the following conditions are met:
an experiment is repeated a fixed number (n) of times(i.e., there is a fixed number of trials);
the outcomes from the trials are independent of one another; each trial has two possible outcomes (referred to as success and failure); the probability of a success (p) is constant.
If the above conditions are satisfied andXis the random variable for the number of successes, thenX
has a binomial distribution. We write:
Where n = number of trials and p = probability of success.
Interpretation of certain phrases is critical, especially when dealing with Discrete distributions. (Binomial
& Poisson).
Phrase Means To use tablesGreater than 5 X > 5 1 P(X 5)
At least 7 X 7 1 P(X 6)
Fewer than 10 X < 10 P(X 9)
No more than 3 X 3 P(X 3)
At most 8 X 8 P(X 8)
Exactly 4 X = 4 P(X 4) P(X 3)
8/7/2019 Statistics 2[1]- Notes
2/29
2
Example :
a) P(X= 3)
Using tables
b) P(X> 1)
Example:
The probability that a baby is born a boy is 0.51. A mid-wife delivers 10 babies. Find:
a) The probability that exactly 4 are male;
b) The probability that at least 8 are male.
( ( (
8/7/2019 Statistics 2[1]- Notes
3/29
3
It can be shown that ifX~ B(n, p), then
Poisson Distribution
A random variableXwhich counts the number of times an event occurs in a given unit of space or time
will have a Poisson distribution if:
The events occur independently of each other and at random; The events occur at a constant rate; The events occur singly (one at a time).
The notation used to indicate that a random variable X has a Poisson distribution is
The distribution is fully specified by a single parameter .
IfX~ Po( ) then
For
Example:
SupposeX~ Po( ). Find .
8/7/2019 Statistics 2[1]- Notes
4/29
4
Example:
On average a call centre receives 1.75 phone calls per minute.
a) Assuming a Poisson distribution, find the probability that the number of phone calls receivedin a randomly chosen minute is :
(i) Exactly 4;(ii) No more than 2.
LetX= number of phone calls received in 1 minute.
Then X~ Po(1.75).
b) Find the probability that 6 phone calls are received in a 4 minute period.Let Y= number of phone calls received in 4 minutes.
The number of calls in 4 minutes will be on average
So Y~ Po(7).
8/7/2019 Statistics 2[1]- Notes
5/29
5
Approximating a Binomial by a Poisson
X~ B(n, p), then X can be reasonably be approximated by a Poisson distribution with mean np if :
n is large p is small
Two frequently used rules of thumb are :
n > 50 and np < 5 n > 50 and p < 0.1
Example:
A drug manufacturer has found 2% of the patients taking a particular drug will experience a particular
side effect.
A hospital consultant prescribes the drug to 150 of her patients.
Using a suitable approximation calculate the probability that:
a) None of her patients suffer from the side effects.b) No more than 5 suffer from the side effects.
LetXrepresent the number of patients experiencing side effects.
The exact distribution ofXisX~ B(150, 0.02).
Since n is large and p is small, X Po(150 x 0.02)
So, X Po(3).
(tables)
(tables)
8/7/2019 Statistics 2[1]- Notes
6/29
6
Continuous random variables
A probability density function (p.d.f.) is a curve that models the shape of the distribution corresponding
to a continuous random variable.
If is the p.d.f corresponding to a continuous random variableXand if is defined
then the following properties must hold
1. The total area under a p.d.f. is 1.
2. The graph of the p.d.f never dips below the x-axis.
for
3. Probabilities correspond to the area under the curve.
8/7/2019 Statistics 2[1]- Notes
7/29
7
Mode
Suppose that a random variableXis defined by the probability density function for .
The mode ofXis the value of that produces the largest value for in the interval .
A sketch of the probability density function can be very helpful when determining the mode.
Example:
A random variableXhas p.d.f. , where
Find the mode.
The mode can be found using differentiation:
Differentiation could
be used to find the
mode here.
8/7/2019 Statistics 2[1]- Notes
8/29
8
To find the turning point we solve .
or
if the point is maximum.
So the mode is
Cumulative distribution functions
The c.d.f. is found by integrating the p.d.f.
Example:
A random variableXhas a p.d.f , where
Find the c.d.f and find P(X < 1).
8/7/2019 Statistics 2[1]- Notes
9/29
9
F( )
x
x x x x
x
41 1
24 6
0 0
0 2
1 2
=
Median and Quartiles
The median of a random variableXis defined to be the value such that
where F is the cumulative distribution ofX.
Likewise the lower quartile is the solution to the equation
and the upper quartile is the solution to
Example :
A random variableXis defined by the cumulative distribution function:
( )F( )
x
x x x x
x
21
24
0 2
6 2 5
1 5
a) Calculate and sketch the probability density function.b) Find the median value.c) Work out
The p.d.f. is found by differentiating the c.d.f.
Sketch of
8/7/2019 Statistics 2[1]- Notes
10/29
10
Median
Therefore
or
must be since it lies in the interval [2,5]
-
=
Expectation
IfXis a continuous random variable defined by the probability density function over the domain
, then the mean or expectation ofXis given by
8/7/2019 Statistics 2[1]- Notes
11/29
11
Note : If the p.d.f is symmetrical, then the expected value ofXwill be the value corresponding to the line
of symmetry.
Example :
A random variableXis defined by the probability density function
Calculate the E[X] and E[1/X]
Variance
IfXis a continuous random variable defined by the probability density function over the domain
then the variance ofXis given by
where
Example :
A continuous random variable Yhas a probability density function where
Calculate the value of Var[Y].
8/7/2019 Statistics 2[1]- Notes
12/29
12
Sketch of
The p.d.f. is symmetrical. Therefore .
Examination-style question :
The mass,Xkg, of luggage taken on board an aircraft by a passenger can be modeled by the probability
density function
a) Sketch the probability density function and find the value ofk.b) Verify that the median weight of luggage is about 20.586 kg.c) Find the mean and variance ofX.
8/7/2019 Statistics 2[1]- Notes
13/29
13
To find kwe use
To verify that the median is about 20.586, we need to check that
Therefore Var[X] = 428.5714 - 20 = 28.6 )
8/7/2019 Statistics 2[1]- Notes
14/29
14
Continuous Uniform Distribution
A random variableXis said to have a continuous uniform distribution (or rectangular distribution) over
the interval [a,b] if its probability density function has the form :
The graph of the p.d.f. is as follows:
IfXhas a continuous uniform distribution over the interval [a,b], then
Example :
A random variable Y has a continuous uniform distribution in the interval [2,8]. Find .
8/7/2019 Statistics 2[1]- Notes
15/29
15
Examination-style question:
A random variableXis given by the probability density function , where
Find:
a) E[X] and Var[x]b)
Xhas a uniform distribution over the interval (5,15).
The p.d.f. for X is shown on the diagram below.
The probability we require is shaded.
So,
8/7/2019 Statistics 2[1]- Notes
16/29
16
If X has a uniform distribution over the interval (a,b) then the cumulative distribution function ofXis :
( ) ( )
x a
x aF x X x a x b
b a
x b
< = =
>
0
P
1
Approximating a binomial using a normal
Calculating probabilities using the binomial distribution can be cumbersome if the number of trials (n) is
large.
Consider this example:
10% of people in the United Kingdom are left handed.
A school has 1200 students. Find the probability that more than 140 of them are left handed.
Let the number of left-handed people in the school beX.
Then X ~ B[1200, 0.1].
The required probability is
P(X > 140) = P(X = 141) + P(X = 142) + + P(X = 1200)
As no tables exist for this distribution, calculating this probability by hand would be a mammoth task.
A further problem arises if you attempt to work one of these probabilities, for example P(X = 141):
P(X = 141) = C X 0.1 X 0.9
Calculators cannot calculate
the value of this coefficient
it is too large!
8/7/2019 Statistics 2[1]- Notes
17/29
17
One way forward is to approximate the binomial distribution using a normal distribution.
IfX ~ B(n,p) where n is large and p is small, then X can be reasonably approximated using a normal
distribution :
where
There is a widely used rule of thumb that can be applied to tell you when the approximation will be
reasonable:
Continuity Correction
A continuity correction must be applies when approximating a discrete distribution (such as binomial) to
a continuous distribution (such as normal distribution).
Exact distribution: B(n,p) Approximate distribution: N[np, npq]
Introductory example:
10% of people in the United Kingdom are left handed. A school has 1200 students. Find the probability
that more than 140 of them are left handed.
Let the number of left-handed people in the school beX.
Then X ~ B[1200, 0.1].
Since np = 120 > 5 and nq = 1080 > 5 we can approximate the distribution using a normal distribution:
So P(X > 140) P(X 140.5) (Using Continuity Correction)
Standardize = = 1.973
Therefore P(X 140.5) = P(Z 1.973)
A binomial distribution can by approximated
reasonably well by a normal distribution
provided that np > 5 and nq > 5
8/7/2019 Statistics 2[1]- Notes
18/29
18
= 1- P(Z 1.973) = 1- 0.9758
= 0.0242
Examination-style question:
A sweet manufacturer makes sweets in 5 colours. 25% of the sweets it produces are red.
The company sells its sweets in tubes and in bags. There are 10 sweets in a tube and 28 sweets in a bag.
It can be assumed that the sweets are of random colours.
a) Find the probability that there are more than 4 red sweets in a tube.b) Using a suitable approximation, find the probability that a bag of sweets contains between 5 and
12 red sweets (inclusive).
Let the number of red sweets in a tube be X.
Then the exact distribution for X is X ~ B[10, 0.25].
P (X > 4) = 1 P(X 4)
= 1 0.9219
= 0.0781
Let the number of red sweets in a bag be Y.
Then the exact distribution for Yis Y ~ B[28, 0.25].
The distribution can be approximated by a normal since np = 7 and nq = 21 (both greater than 5) :
Y N[7, 5.25]
P (5 Y 12) P(4.5 Y 12.5) (Using Continuity Correction)
Standardize : = -1.091
P(-1.091 Z 2.400)
= P(Z 2.400) - P(Z -1.091)
= P(Z 2.400) (1- P(Z 1.091)
= 0.9918 (1-0.8623) = 0.8541
8/7/2019 Statistics 2[1]- Notes
19/29
19
Approximating the Poisson using a normal
If and is large, thenXis approximately normally distributed:
Recall that the mean and variance of a Poisson distribution are equal.
There is a widely used rule of thumb that can be applied to tell you when the approximation will be
reasonable:
Note: A continuity correction is required because we approximating a discrete distribution using a
continuous one.
Examination-style question:
An electrical retailer has estimated that he sells a mean number of 5 digital radios each week.
a) Assuming that the number of digital radios sold on any week can be modelled by a Poissondistribution find the probability that the retailer sells fewer than 2 digital radios on a randomly
chosen week.
b) Use a suitable approximation to decide how many digital radios he should have in order for himto be at least 90% certain of being able to meet the demand for radios over the next 5 weeks.
LetXrepresent the number of digital radios sold in a week.
So .
P( X < 2) = P( X 1)
= 0.0404
Let Yrepresent the number of digital radios sold in a period of 5 weeks.
A Poisson can be approximated
reasonably well by a normal
distribution provided .
8/7/2019 Statistics 2[1]- Notes
20/29
20
P( Y y ) = 0.9
P( Y y + 0.5) (Using Continuity Correction)
So,
So the retailer would need to keep 31 digital radios in stock.
The 10% point of
a normal is 1.282.
8/7/2019 Statistics 2[1]- Notes
21/29
21
Statistic is a quantity calculated solely from
the observations in a sample.
Populations and samples
Examples:
A head teacher is interested in finding out how long her sixth form students spend in part-time
employment per week.
Population is the set of all sixth form students in her school.
Sampling frame - would be the registers of sixth form tutor groups.
Carrying out a census of the entire population is usually not feasible or sensible.
Advantages of taking a census are:
Every single member of the population is used Unbiased Gives an accurate answer
Population is the set of all individuals
or objects that we wish to study.
Census is an investigation in which information
is obtained from every member of the population.
Sampling frame is a list of all memberof the population.
Sampling unit is an individual member of
a population.
Sample is a selection of individual members
or items from a population.
8/7/2019 Statistics 2[1]- Notes
22/29
22
Disadvantages of taking a census are:
Money Time Resources
Instead of surveying the whole population, information can instead be obtained from a sample.
The sampling process should be undertaken carefully to ensure that the sample is representative of the
entire population.
Bias can occur if one section of the population is over/under represented.
A simple random sample of size n consists of the observationX,X,,Xnfrom a population whereXi
are Independent random variables. have the same distribution as the population.
Example :
A large bag of coins contains 1p, 2p and 5p coins in the ratio 2:1:3.
a) Find the mean, , and the variance, 2, for the population of coins.b) A random sample of 3 coins is taken from this population. List all the possible outcomes.
LetXbe the value of the coin chosen.
Distribution of the population:
1 2 5
Random sample if every member in the sample size
has the same probability of being chosen.
8/7/2019 Statistics 2[1]- Notes
23/29
23
The possible outcomes and the mean:
(1,1,1) 1
(1,1,2) (1,2,1) (2,1,1) 4/3
(2,2,1) (2,1,2) (1,2,2) 5/3
(2,2,2) 2
(1,1,5) (1,5,1) (5,1,1) 7/3
(5,5,1) (5,1,5) (1,5,5) 11/3
(5,5,5) 5
(2,2,5) (2,5,2) (5,2,2) 3
(5,5,2) (5,2,5) (2,5,5) 4
(1,2,5) (1,5,2) (2,1,5) (2,5,1) (5,1,2) (5,2,1) 8/3
Working out
e.g.
(1,1,2) = 4/3
The sampling distribution is :
1 4/3 5/3 2 7/3 8/3 3 11/5 4 5
1/27 1/18 1/36 1/216 1/6 1/6 1/24 1/4 1/8 1/8
Times by 3: Since 3 different combinations.
8/7/2019 Statistics 2[1]- Notes
24/29
24
Hypothesis Testing
Null Hypothesis (H0) is the hypothesis we assume to be correct unless proved otherwise.
Alternative Hypothesis (H1) tells us whether the assumption is wrong or not.
Steps required to answer Hypothesis Test questions in an examination are:
Hypothesis Testing for the Binomial Distribution
Lower One Tail Test
Example:
Is a normal six sided die fair when 1 six is thrown in 24 throws?
Test at the 5% level of significance.
LetXbe the random variable the number of 6s thrown in 24 throws.
Therefore X ~ B[24, ]
Step 1: Write out H0 and H1 in mathematical terms.
Step 2: State the significance level if none is mentioned in the question, it is usual
to choose 5%.
Step 3: State the distribution, assuming the null hypothesis to be true.
Step 4:Calculate the probability (under H0) of obtaining results as extreme as those
collected.
Step 5: Compare the probability with the significance level and make conclusions
can H0 be rejected or not? Interpret your results in context.
8/7/2019 Statistics 2[1]- Notes
25/29
25
H0 =
H1 0.05
Accept H0 : evidence to suggest that the die is fair.
Upper One Tail Test
Example:
In Luigi's restaurant, on average 1 in 10 people order a bottle of Chardonnay. Out of a sample of 50, 11
chose Chardonnay. Has the drink become more popular?
Test at the 1% level of significance.
LetXbe the random variable the number of people ordering a bottle of Chardonnay in a sample of 50.
X ~ B[50, 0.1]
H0 = 0.1
H1 > 0.1
Reject H0 if: P(X11) 0.01
P(X11) = 1 - P(X10)
= 1 0.9906 (Using tables)
= 0.0094 < 0.01
Reject H0 : since evidence to suggest the number of people ordering Chardonnay has increased at the
1% level of significance.
8/7/2019 Statistics 2[1]- Notes
26/29
26
Critical Values Method
Example 1:
A manufacturer claims that 2 out of 5 people prefer Soapy Suds washing powder over any other brand.
For a sample of 25 people, only 4 people are found to prefer Soapy Suds. Is the manufacturers claim
justified?
Test at the 5% level of significance.
LetXbe the random variable the number of people who prefer soapy suds.
X ~ B[25, 0.4]
H0 = 0.4
H1 < 0.4
Reject H0 if: P(Xxc) 0.05
From tables: xc = 5
Since x=4 < critical value.
Reject H0 : since evidence to suggest that the manufacturers claim is false and it is less than 2 in 5 at the
5% level of significance.
Example 2:
A particular drug has a 1 in 4 chance of curing a certain disease. A new drug is developed to cure the
disease. How many people would need to be cured in a sample of 20 if the new drug was to be deemed
more successful at curing the disease than the old drug to obtain a significant result at the 5% level?
LetXbe the random variable the number of people who are cured by the new drug.
X ~ B[20, 0.25]
H0 = 0.25
H1 > 0.25
Reject H0 if: P(Xxc) 0.05 ; xc = critical value
1 - P(Xxc - 1) 0.05
8/7/2019 Statistics 2[1]- Notes
27/29
27
P(Xxc - 1) 0.95
xc 1 9
xc 10
So 10 or more people are required to be cured to obtain significant evidence that the new drug is better
at curing the disease.
Two Tail Test
Example:
A person suggests that the proportion, p of red cars on a road is 0.3. In a random sample of 15 cars it is
desired to test the null hypothesis p = 0.3 against p 0.3 at a nominal significance level of 10%.
Determine the appropriate acceptance region and the corresponding actual significance level.
LetXbe the random variable the number of red cars in a sample of 15.
X ~ B[15, 0.3]
H0 = 0.3
H1 0.3
5% level of significance for each tail.
Reject H0 if: P(Xxl) 0.05; xl = lower critical value
From tables: xl = 1
Reject H0 if: P(Xxu) 0.05; xu = upper critical value
1 - P(Xxu - 1) 0.05
P(Xxu - 1) 0.95
From tables: xu 1 = 7
Therefore xu = 8
H0 rejection region: x 1 or x 8.
Actual significance level: P(x 1) + P(x 8)
0.0353 + 0.05 = 0.0853 = 8.53%
8/7/2019 Statistics 2[1]- Notes
28/29
28
Hypothesis Testing for the Poisson Distribution
Lower One Tail Test
Example:
The number of car accidents along a certain stretch of road occurred at an average rate of 5 per week.
After the introduction of speed cameras the number of accidents in one week is 2. Assuming that the
number of accidents can be modeled as a Poisson distribution, test at the 5% nominal significance level
if the has been in a reduction in the number of accidents.
LetXbe the random variable the number of accidents in a week.
X ~ Po[5]
H0 = 5
H1 < 5
Reject H0 if: P(Xxl) 0.05
From tables: xl = 1
Since x = 2 > Lower Critical Value.
Accept H0:since there is insufficient evidence to the claim that the number of accidents has reduced at
the nominal 5% significance level.
Upper One Tail Test
Example:
A shop sells a particular make of radio at a rate of 4 per week on average. The shop places an advert in
the local paper in the hope of raising sales. In the week that the advert was placed the number of sales
was 10. Is there significant evidence that the sales have increased? Test at the 5% nominal level of
significance.
LetXbe the random variable the number of radios sold per week.
X ~ Po[4]
H0 = 4
H1 > 4
Reject H0 if: P(Xxu) 0.05
1 - P(Xxu - 1) 0.05
8/7/2019 Statistics 2[1]- Notes
29/29
P(Xxu - 1) 0.95
From tables:
xu 1 = 8
xu = 9
x = 10 > Upper Critical Value.
Reject H0: Since evidence to suggest that the number of radios sold has increased at the 5% level of
significance.
Two Tail Test
Example:
A machine produces glass sheets. The number or bubbles seen per square metre in the glass sheet
follows a Poisson distribution with mean 3. Find the lower and upper critical values for a nominal 10%
significance level test for the mean not equal to 3 and the actual significance level of the test.
LetXbe the random variable the number of bubbles per m2.
X ~ Po[3]
H0 = 3
H1 3
5% level of significance for each tail.
Reject H0 if: P(Xxl) 0.05; xl = lower critical value
From tables: xl = 0
Reject H0 if: P(Xxu) 0.05;xu = upper critical value
1 - P(Xxu - 1) 0.05
P(Xxu - 1) 0.95
From tables: xu 1 = 6
xu = 7
Actual significance level: P(X0) + P(X7)
= 0.0498 + ( 1 0.9665) = 0.0833 = 8.33%