+ All Categories
Home > Documents > Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of...

Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of...

Date post: 29-Dec-2015
Category:
Upload: hubert-carpenter
View: 219 times
Download: 1 times
Share this document with a friend
Popular Tags:
62
Chapter 4 Statistics
Transcript
Page 1: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Chapter 4 Statistics

Page 2: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.1 – What is Statistics?

Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection of methods for estimating distributions and parameters of random variables through the collection and analysis of data.

Page 3: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.1 – What is Statistics?

Definition 4.1.2 The population is the set of all objects of interest in a statistical study. A sample is a subset of the population.

Definition 4.1.3 Data are information that has been collected. The field of statistics is a collection of methods for drawing conclusions about a population by collecting and anlyzing data from a sample.

Page 4: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Types of Data

Definition 4.1.4 A parameter is a number calculated using information from every member of a population. A statistic is calculated using information from a sample.

Definition 4.1.5 Quantitative data consist of numbers. Qualitative data are nonnumeric information that can be separated into different categories.

Page 5: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Types of Data

Definition 4.1.6 Discrete data are observed values of a discrete random variable. They are numbers that have a finite or countable set of values. Continuous data are observed values of a continuous random variable. They are numbers that can take any value within some range.

Page 6: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Levels of Measurement

Definition 4.1.7– Data are at the nominal level of measurement if they consist of

only names, labels, or categories. They cannot be ordered (such as smallest to largest) in a meaningful way.

– Data are at the ordinal level of measurement if they can be ordered in a meaningful way, but differences between data values cannot be calculated or are meaningless.

– Data are at the interval level of measurement if they can be ordered in a meaningful way and differences between data values are meaningful.

– Data are at the ratio level of measurement if they are at the interval level, ratios of data values are meaningful, and there is meaningful zero starting point.

Page 7: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Types of Studies

Definition 4.1.8– In an observational study, data is obtained in a

way such that the members of the sample are not changed, modified, or altered in any way.

– In an experiment, something is done to the members of the sample and the resulting effects are recorded. The “something” that is done is called a treatment.

Page 8: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Types of Observational Studies

Definition 4.1.9– In a cross-sectional study, data are collected at one

specific point in time.– In a retrospective study, data are collected from

studies done in the past.– In a prospective study, data are collected by

observing a sample for some time into the future.

Page 9: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Blocks

Definition 4.1.10 A block is a subset of the population with a similar characteristic. Different blocks of a population have different characteristics that may affect the variable of interest differently. A randomized block design is a type of experiment where:

1. The population is divided into blocks.

2. Members from each block are randomly chosen to receive the treatment.

Page 10: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sampling Techniques

Definition 4.1.11– A convenience sample is a sample that is very easy

to get.– A voluntary response sample is obtained when

members of the sample decide whether to participate or not.

– A systematic sample is obtained by arranging the population in some order, then selecting a starting point, and then selecting every kth member (such as every 20th).

Page 11: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sampling Techniques

– A cluster sample is obtained by dividing the population into subsets (or clusters) where the members of each cluster have a common characteristic, then randomly choosing some of the clusters, and surveying every member of the chosen clusters.

– A stratified sample is obtained by dividing the population into subsets and then randomly choosing some members from each of the subsets.

– A multistage sample is obtained by successively applying a variety of sampling techniques. At each stage the sample becomes smaller, and at the last stage, a clustersample is chosen.

Page 12: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Random Samples

Definition 4.1.12– A random sample is chosen in a way such that

every individual member of the population has the same probability of being chosen.

– A simple random sample of size n is chosen in a way such that every group of size n has the same probability of being chosen.

Page 13: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.2 – Summarizing Data

Example 4.2.3 Shown below are the waiting times of 30 customers at a supermarket check-out stand

Relative frequency distribution

Page 14: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Histograms

The “shape” of a relative frequency histogram is an approximation of the graph of the p.d.f. (or p.m.f.) of the underlying random variable.

Page 15: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Summary Statistics

Definition 4.2.1 Let {x1, x2,…, xn} be a set of quantitative data collected from a sample of the population

1. mean of the data:

2. variance of the data:

3. standard deviation of the data:

4. range of the data: (max value) – (min value)

1

1 n

ii

x xn

22

1

1

1

n

ii

s x xn

2s s

Page 16: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.2.4

2 2 2 2 2 2

1(0.0 0.0 5.1 7.3) 2 min

301

(0.0 2) (0.0 2) (5.1 2) (7.3 2) 2.946 min30 1

2.96 1.72 min

Range : 7.3 0.0 7.3 min

x

s

s

Page 17: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Percentiles

Definition 4.2.2 Let p be a number between 0 and 1. The (100p)th percentile of a set of quantitative data is a number, denoted πp, that is greater than (100p)% of the data values.

– The 25th, 50th, and 75th percentiles are called the first, second and third quartiles and are denoted p1 = π0.25, p2 = π0.50, and p3 = π0.75, respectively.

– The 50th percentile is also called the median of the data and is denoted m = p2.

– The mode of the data is the data value that occurs most frequently.

– The 5-number summary of a set of data consists of the minimum value, p1, p2, p3, and the maximum value.

Page 18: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Calculating Percentiles

1. Arrange the data in increasing order:

2. Calculate

3. If is not an integer, then round it up to the next larger integer and

4. If L is an integer, then

Page 19: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.2.5

• Calculate the first quartile, p1 = π0.25

• Calculate the median m = p2 = π0.5

1 80.25(30) 7. 05 .5xL p

15 62 1

1 1(1.7 1.90.5(30 ) 1.8) 5

2 21 x xL p

Page 20: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.2.5

• 5-number summary

0, 0.5, 1.8, 2.9, 7.3• Box Plot

Page 21: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.4 – Sampling Distributions

Definition 4.4.1 A random variable whose values are used to estimate the value of a parameter is called an estimator of . A value of , , is called an estimate of . An estimator is called an unbiased estimator of if

If this equation is not true, then is called a biased estimator.

ˆE

Page 22: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sample Proportion

Suppose we want to know the proportion p of a population who support a particular political candidate– p is a parameter

We survey 735 voters and find 383 that support the candidate– The sample proportion is – This is an estimate of p

Page 23: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sample Proportion

Let denote the number who support the candidate in a sample of – Define the random variable – Called the “sample proportion”– is an observed value of – is an estimate of p– is an estimator of p

Page 24: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sampling Distribution of the Proportion

Theorem 4.4.1 Let be b(n, p). Then as the distribution of the sample proportion

–Meaning: is approximately for n “large enough”– “Large enough” - and

(1 )ˆ approaches ,X p p

P N pn n

Page 25: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.4.3

By examining the spending habits of one particular consumer, a credit card company observes that during the course of normal transactions 37% of the charges exceed $150. Out of 50 charges made in one particular month, 27 exceeded $150. Does it appear that these charges were made in the course of normal transactions?

Page 26: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.4.3

Sample prop. that exceed $150: – Is this unusually large?– Assume normal transactions: is approximately

– This probability is small (< 0.05)• Reject the assumption

0.54 0.37ˆ 0.54 ( 2.49) 0.00640.004662

P P P Z P Z

(0.37,0.37 0.63 / 50) (0.37,0.004662)N N

Page 27: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sample Mean

Suppose we want to know the mean IQ score of all college students in the US, – Estimate it with a sample mean – Let denote the IQ of a randomly selected student

– is an observed value of the sample mean – is an estimate of – is an estimator of

Page 28: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Sampling Distribution of the Mean

• By the Central Limit Theorem

– where

2

is approximate ,ly nX Nn

Page 29: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.5 – Confidence Intervals for a Proportion

Definition 4.5.1 Let Z be and p be a number between 0 and 0.5. A critical z-value is a positive number such that

1pP Z z p

Page 30: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Critical Values

Let be between 0 and 1. Then is between 0 and 0.5, so that the critical z-value is a positive number such that

/2 /2

/2 /2

1 / 2 / 2

1

P Z z P Z z

P z Z z

Page 31: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Confidence Interval

Definition 4.5.2 Let and let x be a number of successes in n observed trials of a Bernoulli experiment with unknown probability of a success p. Define and let be a critical z-value. The interval

is called a 100(1 − α)% confidence interval estimate for p.

/2 /2

ˆ ˆ ˆ ˆ1 1ˆ ˆ,

p p p pp z p z

n n

Page 32: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Confidence Interval

/2

ˆ ˆ1 :

ˆ ˆ1 :

:

: 100(1 )%

p pMargin of error E z

n

p pStandard error of the proportion

nSignificance level

Confidence level

Different forms

ˆ ˆ ˆ ˆ ˆ, , , orp E p E p E p p E p E

Page 33: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Requirements

1. The sample must be random.

2. The conditions for a binomial distribution must be satisfied (at least approximately).

3. There are at least 5 successes and at least 5 failures observed in the n trials.

Page 34: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.5.2

Suppose 383 out of 735 surveyed voters support a particular political candidate. Calculate a 95% confidence interval estimate for the proportion of all voters who support the candidate.

1. Define the population proportion being estimated:p = The proportion of all voters who support the candidate

2. Calculate the sample proportion383

ˆ 0.521735

p

Page 35: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.5.2

3. Find the critical value:

4. Calculate the margin of error

5. Calculate the confidence interval

/2 0.05/2 0.025 1.96z z z

/2

ˆ ˆ1 0.521 1 0.5211.96 0.0361

735

p pE z

n

ˆ ˆ 0.521 0.0361 0.521 0.0361

0.485 0.557

p E p p E p

p

Page 36: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.5.2

Correct interpretation–We are 95% confident that the value of p is

between 0.485 and 0.557.

Meaning– If we were to survey many different samples of

voters and calculate the corresponding 95% confidence interval using the statistics from each sample, then about 95% of the intervals would contain the true value of p.

Page 37: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.6 – Confidence Intervals for a Mean

Definition 4.6.1 Let be the mean of a sample of size n taken from a population with known variance and unknown mean μ. The interval

is called a 100(1 − α)% confidence interval estimate for μ.

/2 /2,x z x zn n

Page 38: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Z-Interval

Requirements1. The sample is random.

2. The population variance is known.

3. The population is normally distributed or .

/2

3 different for

Margin of error:

ms:

, , , or

E zn

x E x E x E x E x E

Page 39: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

T-Interval

Definition 4.6.2 Let be the mean and be the variance of a sample of size n taken from a population with unknown variance and mean μ, and let be a critical Student-t value with degrees of freedom. The interval

is called a 100(1 − α)% confidence interval estimate for μ when is unknown.

/2 /2,s s

x t x tn n

Page 40: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

T-Interval

Requirements1. The sample is random.

2. The population is normally distributed or n > 30.

/2

Stand

Margin

ard err

of error:

or of the mean:

sE t

ns

n

Page 41: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Which Type of Interval?

Suggestions1. If n > 30 or is known, then use a Z-interval.

2. If is unknown, and the population is normally distributed (at least approximately), then use a T-interval.

3. If n ≤ 30, is unknown, and the population is not normally distributed, then see Chapter 7.

Page 42: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.6.3

A random sample of 15 “1-pound” packages of shredded cheddar cheese has a mean weight of lb. and standard deviation of s = 0.02 lb. Calculate a 99% confidence interval estimate for the mean weight of all such packages.

1. Define the population mean being estimated: = The mean weight of all “1-pound” packages of shredded cheddar cheese.

Page 43: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.6.3

2. Find the critical value: α = 0.01 and n = 15

3. Calculate the margin of error:

4. Calculate the confidence interval:

/2 0.005(15 1) (14) 2.977t t

/2

0.022.977 0.0154

15

sE t

n

1.05 0.0154 1.05 0.0154

1.0346 1.0654

x E x E

Page 44: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.7 – Confidence Intervals for a Variance

Definition 4.7.1 Let be the variance of a sample of size n taken from a normally distributed population with unknown variance and let

be critical values. The interval

is a 100(1 − α)% confidence interval estimate for .

2 21 /2 /2( 1) and ( 1)a n b n

2 2( 1) ( 1),

n s n s

b a

Page 45: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Confidence Intervals for a Variance

Requirements1. The sample is random.

2. The population is normally distributed.

2 22Alternate form

( 1)

):

( 1n s n s

b a

Page 46: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.7.2

The proportion of butterfat in 20 batches of butter were measured. The resulting data have a sample variance of . Construct a 95% confidence interval estimate of the variance in the proportion of butterfat of all batches.

1. Define the population variance being estimated:

= The variance in the proportion of butterfat of all batches of butter

Page 47: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.7.2

2. Find the critical values: and

3. Calculate the confidence interval:

2 21 0.05/2 0.975

2 20.05/2 0.025

(19) (19) 8.907

(19) (19) 32.852

a

b

2 22 2

2

( 1) ( 1) (19)0.001102 (19)0.001102

32.852 8.907

0.000637 0.00235

n s n s

b a

Page 48: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.8 – Confidence Intervals for Differences

Definition 4.8.1 Consider two populations with respective proportions and . Let– and be the sample sizes– and be the sample proportions

Then

is a 100(1 − α)% confidence interval estimate for

1 2 /2 1 2 1 2 /2

1 1 2 2

1 2

ˆ ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ1 1where ˆ

p p p z p p p p p z

p p p pp

n n

Page 49: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

2-Proportion Z-Interval

Requirements1. Both samples are random and independent.

2. Each sample contains at least 5 successes and 5 failures.

/2Margin of error: ˆE p z

Page 50: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

2-Sample T-Interval

If two populations are (approximately) normally distributed and their variances are unknown, then an approximate 100(1 − α)% confidence interval for the difference of their means using data from two independent samples of the respective populations is

1 2 1 2 1 2x x E x x E

Page 51: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Equal Variances

• - pooled standard deviation• - critical t-value with degrees of freedom

2 21 1 2 2

/2 1 21 2

1 11/ 1/ ,

2p p

n s n sE s t n n s

n n

Page 52: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Non-equal Variances

where is a critical t-value with r degrees of freedom where

If r is not an integer, then round it down to the nearest whole number.

2 2/2 1 1 2 2/ /E t s n s n

22 21 2

1 22 22 2

1 2

1 1 2 2

1 11 1

s sn n

rs s

n n n n

Page 53: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Requirements

1. Both samples are random and independent.

2. Both populations are normally distributed or both sample sizes are greater than 30.

Page 54: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.9 – Sample Size

Sample size for estimating a population proportion

– - an estimate of the population proportion– E - desired margin of error

2/2

2

ˆ ˆ1z p pn

E

Page 55: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Mean

Sample size for estimating a population mean

– - an estimate of the population variance– E - desired margin of error

2 2/2

2

zn

E

Page 56: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

4.10 – Assessing Normality

Constructing a Normal Quantile Plot1. Arrange the data values in increasing order:

2. For each , define

3. Calculate for each where is the standard normal c.d.f.

1k

kp

n

Page 57: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Normal Quantile Plot

1. Plot the points

2. If the points form a straight-line pattern, then conclude that the population appears to be normal. If the points do not form a straight-line or exhibit some other type of non-linear pattern, then conclude that the population is not normal.

Page 58: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.10.2

The second row of the table below gives the average daily temperatures in the month of November for the city of Lincoln, NE for nine different years (data collected by Brandon Metcalf, 2009). Determine if the population of all such temperatures is normally distributed.

Page 59: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Example 4.10.2

Roughly a straight line– Population is normal

Page 60: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Straight Line

1. Calculate the sample mean and standard deviation of the data, , and s.

2. For each k, calculate the following quantity:

3. Plot the points on the quantile plot and connect them with a straight line.

kk

x xy

s

Page 61: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Straight Line

Page 62: Chapter 4 Statistics. 4.1 – What is Statistics? Definition 4.1.1 Data are observed values of random variables. The field of statistics is a collection.

Fuzzy Central Limit Theorem

If the population is influenced by many small, random, unrelated effects, then the population may be normally distributed.


Recommended